Big Data and Mining
Graduate version: Data mining is a paradigm to find hidden data and anomalies in data sets, data bases, or data streams. The data can be either static or dynamic and can come from streams that are not saved. This course also provides an overview of MapReduce-like systems, hash table methods, finding data in files and data streams. Machine learning is a pardigm in which computers can learn without explicit programming. The learning can be unsupervised, supervised, or reinforced. The course provides an overview of clustering, perceptrons, support vector machines, neural networks, and dimension reduction methods.
Undergraduate version: The course is similar to the graduate level course. More background material is provided and fewer assumptions about a student's background knowledge is assumed.
An eclectic group of students who are not afraid to program or use a computer and manipulate data in new ways.
227 Ross Hall
- Tuesday/Thursday 1:00-2:00 and Friday 3:00-4:00.
- By appointment, contact me first (I have no office telephone thanks to math budget cutbacks in Spring, 2015).
- Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman, Mining of Massive Datasets, 2nd ed. (version 2.1), Stanford University, 2014. The most up to date version is online at http://www.mmds.org. I will lecture from the 3rd edition draft as well.
- Andriy Burkov, The Hundred-Page Machine Learning Book, http://themlbook.com/wiki/doku.php, 2019.
- Wooyoung Kim, Parallel Clustering Algorithms: Survey, Parallel Clustering Algorithms: Survey, http://grid.cs.gsu.edu/~wkim/index_files/SurveyParallelClustering.html, 2009.
Note for Computer Science Graduate Students
Computer Science graduate students may use the course to satisfy either the Artificial Intelligence or Systems: Networking, Distributed Computing, and Data Management breadth areas, but not both.