Syllabus

Did you ever feel you are “drinking from a hose” with the amount of data you are attempting to analyze? Have you been frustrated with the tedious steps in your data processing and analysis process and thinking there gotta be a better way to do things? Are you curious what the buzz of data science is about? If any of your answers is yes, then this course is for you. Although computing is now an integral part of every aspect of science and engineering, transportation research included, most students of science, engineering, and planning are never taught how to build, use, validate, and share software well. As a result, many spend hours or days doing things badly that could be done well in just a few minutes. The goal of this course is to start changing that so that the students can spend less time wrestling with software and more time doing useful research. Building on the successful data science training programs, such as the Software Carpentry (http://www.software-carpentry.org/) and Data Carpentry, and recent development of related software and research, this course exposes students in transportation research and practice to the best practices in scientific computing through hands-on lab sessions and aims to help students tackle the challenge of “drinking from a hose” when dealing with overwhelming amount of data that is increasingly common in transportation research and practice.

Topics

The table below shows by date lecture topics, computer labs, and readings, and dates that assignments will be handed out and due. Supplement readings will be posted on course website. Topics are subject to adjustment according to the need of students.

Part I: Data Science Best Practices

Part II: tidyverse workflow

Location and Time

Room: Portland State University Engineering Building 315
Time:
- Part I 8/6 - 7, 2018 9am - 5pm
- Part II 8/8 - 8/10, 2018 9am - 5pm

Prerequisite

Basic knowledge and experience of conduct scientific research with quantitative information; skill of using (or keen to learn) a programming language and/or data processing and statistical software (such as python, R, SPSS, Stata).

Format

Classes will all be hands-on sessions with lecture, discussions and labs. Readings drawn from books, articles, and online resources will be assigned. Students are expected to read them before class and to participate in class discussions. A major component of the class is the class project in which students go through the process of data retrieval, processing, conducting analysis, and developing a report/article while learning the best practices of data science.

Software and Hardware

This course will use R, the free statistical software, and RStudio (https://www.rstudio.com/) as our main interface to R. The lecture and lab instructions will be provided using R. Student must bring their own laptop. The instructor and TA will help the students set up their laptop to run all examples/exercises. They can review/re-run the examples in lectures and labs by themselves.

Textbook and Readings

The course will use the following textbook:

Wickham, H., Grolemund, G., 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, 1 edition. ed. O’Reilly Media.

An electronic version is available on Hadley Wickham’s website.

Journal articles and online resources are used as supplements to the textbook.

Acknowledgements

This course is developed with support from National Institute of Transportation and Communities project #854.

Parts of the course materials have been adpated from the following sources:

R for Data Science by Hadley Wickham
Software Carpentry workshop lessons
UBC Stat 545 by Professor Jenny Bryan at UBC
NEU 5110 Introduction to Data Science by Professor Jan Vitek

I would also acknowledge the Oregon Modeling User Group in providing feedback to the course development and in helping promoting it. I am grateful to Taylor Sutton (PBOT) for kindly sharing the bike counts data used in the course demostration and project.

The writing-up and website is powered by the blogdown package and github.

Introduction to Data Science