P0: Project topic proposal

The purpose of P0 is two-fold. First, it should identify the intended project topic and explain why it is interesting to you and should be of interest to others. Second, it should set out the blueprint for the data-collection phase (P1) and data-analysis phase (P2).

Selecting a topic

The first task is to identify a topic worthy of a semester-long investigation. See the resources page for links to blogs on data analysis and websites where data is available. Also included are links to exemplary data projects on a variety of applications. You can also look at the bottom of this page for a list of ideas for topics if you are stumped.

Requirements for topic

If you are working with a "traditional" data source (e.g., indicators of literacy rates and economic development collected from the World Bank), then you must complement the study with data collected using "new" sources as a complementary explanatory variable.

If you are collecting data from a "new" source, then collecting linking data from another source is still strongly encouraged (see me if you think this will be infeasible or not make sense).

Describing the topic

You should first succinctly describe the topic of interest. List several questions that you would like to answer about the topic (list even those questions you anticipate will be hard to answer).

What will the response variable(s) be? What about explanatory variables?

Do you have any hypotheses about the relationship between response variables and explanatory variables? For example, do you hypothesize that the response variable will be positively correlated with explanatory variables?

Data collection plan

From where will you collect the data? For each data resource, explain the following:

  1. Name and URL of resource
  2. Category of resource: web scraping, API access, manually-downloaded file, etc.
  3. Anticipated variables of interest (including identification of response and explanatory variables, plus numerical or categorical, as appropriate).
  4. Data collection frequency: one-off or repeated over a specific time period.
  5. Brief explanation of compliance with terms of service.
  6. Method of data storage: flat files, pickle files, or MongoDB database
  7. If the data is to be joined up with other resources, list which ones and explain any anticipated issues in connecting the datasets.

Creating derivative data for data analysis

After collecting the raw data sources, you will likely need to create derivative measures for the response and indicator variables. Explain what these derivative measures will be. If you are joining up multiple sources, then indicate which values are to be included. In P1 you will eventually create a CSV file with fields for each variable of interest. Here you should list the field names you anticipate including, along with a representative example of what the data might look like. You can use made-up figures for the example.

Dividing responsibility

This project is intended to be a team effort. I strongly recommend that you appoint one person to be the leader on each of the major tasks outlined in the data collection plan and the creation of derivative data. You should each contribute to the tasks, but it will help organize the project if each team member has primary responsibility for different aspects.

What to turn in

Turn in a document in PDF format describing each of the tasks as requested above. Turn in a single document for your team. Include the names of the team, along with a team user name (one word, up to 8 characters), at the top of the document. Include four sections:

  1. Topic description
  2. Data collection plan
  3. Derivative data for analysis
  4. Assignment of leaders to tasks

Please try to make the document as concise as possible while still conveying the key points requested. Email the completed document to qtw@cs.wellesley.edu with subject P0.

Also, please sign up for a time for your team to meet with me to discuss your project proposal P0 after you have turned it in. This will be a great opportunity for me to give feedback and to answer any questions you have before beginning to work on P1. Here is the Google Doc where you can add in the time your team can meet with me.

You will have the opportunity to revise your proposal following our meeting.

Ideas for potential topics

You are encouraged to come up with your own ideas for topics, but here are some ideas of mine that may be of interest.