During the lectures I will point out some obvious things thst I think you should work on at home. Some of these things include
- The sentences problem.
- Find a MapReduce system compatible with your programming knowledge.
- Try Apache Pig on top of HADOOP.
Students are expected to complete a "big data" project before the end of the semester. You may work with another student in the class, but you are expected to be able to clearly delineate who did what work.
Jacob: I will continue my previous work attempting to expand images using convolutional, recurrent neural networks for series prediction. The goal is to loop such a network's output back into itself to allow this expansion to be any size desired, but state of the art prediction methods fail to remain stable more than a few steps beyond the start of the looped output. I will attempt to use a combination of series prediction and adversarial training methods to produce results better than either method can achieve individually. In particular, I wish to improve the quality of multi-step predictions through looped output.
Clay and David: We will examine roadway data obtained from WYDoT. Using this data, we would like to make a predictive model that will be able to determine optimal travel times based on current road data. We would also like to develop a more accurate system to examine the need to close major roadways such as Interstate 80 to allow for not only optimal safety of travelers but also minimize the economic loss from closures.
Dane and Ziqiang: We propose is a sentence de-duplication and word recognition project using Sandia National Lab's MR-MPI. We look to implement a weighting system which looks at fractional distances between words. Entirely different words normally would have a weight of 1, here we consider the distance between words to be a weight between zero and one based of differences in characters between words. We also propose to have custom functions to implement these weights based on user preferences, which will require no change in the original program, but allow the user to designate the weights according to their needs.
Jonathan: Look for Ponzi schemes in fracking data from an Eastern U.S. state.