R Coding Basics

This section assumes students know little about R and gets them up to speed with the basics

  1. Data Structures
    • How can I read data in R?
    • What are the basic data types in R?
    • How do I represent categorical information in R?
  2. Exploring Data Frames
    • How can I manipulate a data frame?
  3. Subsetting Data
    • How can I work with subsets of data in R?
  4. Control Flow
    • How can I work with subsets of data in R?
  5. Visualization with ggplot2
    • How can I create publication-quality graphics in R?
  6. Vectorization
    • How can I operate on all the elements of a vector at once?
  7. Functions Explained
    • How can I write a new function in R?
  8. Writing Good Software
    • How can I write software that other people can use?

Code Style Guide

In programming as in writing, it is generally a good idea to stick to a consitent coding style. There are two style guides that you can adopt or customize to create your own:

R Command-Line Program

RStudio is good for writing and testing your R code, but for work that needs repetitions or takes a long time to finish, it may be easier to run your program/script in command line instead.

For example, for the gapminder script we created on day 1, we can run it in a command line shell (you can open one in RStudio’s Tools/Shell… menu):

Rscript code/load_gapminder.R

Notice that your script may not print out outputs on the screen when called in the command line unless you explicitly call the print function.

But what if we have many files for which we would like to repeatedly show the basic information (rows, data types etc)? We can refactor our script to accept the file name from command line arguments, so that the script can work with any acceptable files.

In a R script, you can use commandArgs function to get the command line arguments:

args <- commandArgs()
print(args)

So if your script only takes 1 argument for the file name, you can get the value of the argument with:

args <- commandArgs()
file_name <- args[1]

Now modify your code so that it can be invoked in the command line with:

Rscript code/load_gapminder.R data/gapminder-FiveYearData.csv 

Debugging with RStudio

This section is adapted from Visual Debugging with RStudio.

  1. Download foo.R from https://raw.githubusercontent.com/cities/datascience2017/master/code/foo.R and save it to the code (or src) subdirctory of your project folder;
  2. Open foo.R and source it;
  3. In the RStudio Console pane of type foo("-1") and then enter.

Why does the foo function claim “-1 is larger than 0”? Let’s debug the foo function and find out.

Resources: