Course Overview
We now live in a world of information, where data can be leveraged to rapidly answer previously unanswerable questions. This course will teach students how to make sense of the large amounts of data frequently available, from hypothesis formation and data collection to methods of analysis and visualization. We begin by discussing how to set up Internet-level experiments and formulate testable hypotheses. We then learn ways to automatically gather, store and query large datasets. Next, we introduce two important classes of analysis: statistical methods (descriptive and predictive) and information visualization. Students will learn to use the Python and R programming languages to carry out data collection, analysis and visualization, culminating in a final project using real data of the students' choosing.