Big Data and Mining
Course Descriptions
Graduate version: An advanced topics course. Dynamic Big Data-Driven Application Systems (DBDDAS) is a paradigm whereby applications and measurements become a symbiotic feedback control system with the ability to dynamically incorporate additional Big Data into executing applications and dynamically steer the measurement process, which provides more accurate analysis and prediction, more precise controls, and more reliable outcomes. Data mining is a paradigm to find hidden data and anomalies in either data sets or bases. The data can be either static or dynamic and can come from streams that are not saved. This course also provides an overview of MapReduce-like systems, hash table methods, finding data in files and data streams, market-basket methods, clustering, machine learning, and dimension reduction methods.
Undergraduate version: The course will be similar to the graduate level course. There will be more emphasis on just data mining, however.
Prerequisites
An eclectic group of students who are not afraid to program or use a computer and manipulate data in new ways.
Office
227 Ross Hall
Office Hours
- Monday, 10:00-11:00
- Tuesday, 1:00-2:00
- Thursday, 1:00-2:00
- By appointment, contact me first (I have no office telephone thanks to math budget cutbacks in Spring, 2015).
Suggested Reading
- Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman, Mining of Massive Datasets, 2nd ed. (version 2.1), Stanford University, 2014. The most up to date version is online at http://www.mmds.org. I will lecture from the 3rd edition draft as well.
- Wooyoung Kim, Parallel Clustering Algorithms: Survey, Parallel Clustering Algorithms: Survey, http://grid.cs.gsu.edu/~wkim/index_files/SurveyParallelClustering.html, 2009.
Note for Computer Science Graduate Students
Computer Science graduate students may use the course to satisfy either the Artificial Intelligence or Systems: Networking, Distributed Computing, and Data Management breadth areas, but not both.