Set up your computer

Installation

If you already have R and RStudio installed on your laptop, it’d be a good idea to check their version and upgrade them to the latest (if they are not).

Installation Verification

  1. Launch RStudio and you should see a program window like this:
  2. Click the File menu, select New Project…, then Version Control and Git;
  3. Copy & paste this URL: https://github.com/cities/datascience2018.git into the the Repository URL textbox;
  4. Click Create Project.

If you see a popup box that says “Clone Repsitory” with a progress bar and then RStudio refreshes, then your installation is working.

What is Data Science?

According to Wikipedia:

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining.

Data science is a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science.

Data Science Venn Diagram

Data Science Venn Diagram by Drew Conway

Why R

Class project

For the class project, you are expected to create a re-usable R script with the following requirements and commit it to GitHub:

  1. Part I

    1. Contains at least one self-contained function;
    2. Follows a consistent style guide;
    3. Completed with necessary documentation;
    4. Has at least one test that passes;
    5. [Advanced] Organizes the function(s) into a package that passes the checks
  2. Part II
    1. Utilzes tidyverse functions as much as possible
    2. [Advanced] Includes a vignette that demonstrates the usage of the package

You can take and/or re-organize code from your current work or start from scratch. Take the feasibility of completing in a week into consideration when selecting project ideas.

If you don’t have a feasible project idea at the moment, consider writing a R package that reads and visualizes the bike counts on Steel Bridge, Hawthorne Bridge, and Tilikum Crossing. Daily traffic counts data for these bridges can be found here. At the minimum, your package should be able to: