+ - 0:00:00
Notes for current slide
Notes for next slide

Slides 02

Working in R

Arvind Venkatadri

1

R Basics

2

R Basics

  • R is an interpreter (>)
3

R Basics

  • R is an interpreter (>)

  • Name objects in R (i_like_snake_case <-)

4

R Basics

  • R is an interpreter (>)

  • Name objects in R (i_like_snake_case <-)

  • Know your object types (typeof())

5

R Basics

  • R is an interpreter (>)

  • Name objects in R (i_like_snake_case <-)

  • Know your object types (typeof())

  • Case matters (my_names != My_names)

6

R Basics

  • R is an interpreter (>)

  • Name objects in R (i_like_snake_case <-)

  • Know your object types (typeof())

  • Case matters (my_names != My_names)

  • Use comments! (# use the hashtag symbol)
7

R Basics

  • R is an interpreter (>)

  • Name objects in R (i_like_snake_case <-)

  • Know your object types (typeof())

  • Case matters (my_names != My_names)

  • Use comments! (# use the hashtag symbol)

  • Functions (fun!)

8

R Basics

  • R is an interpreter (>)

  • Name objects in R (i_like_snake_case <-)

  • Know your object types (typeof())

  • Case matters (my_names != My_names)

  • Use comments! (# use the hashtag symbol)

  • Functions (fun!)

  • Use packages ("install once per machine, load once per R session")

9

R Basics

  • R is an interpreter (>)

  • Name objects in R (i_like_snake_case <-)

  • Know your object types (typeof())

  • Case matters (my_names != My_names)

  • Use comments! (# use the hashtag symbol)

  • Functions (fun!)

  • Use packages ("install once per machine, load once per R session")

  • Use the %>% ("dataframe first, dataframe once")

10

R is an interpreter >

11

R is an interpreter >

You enter commands line-by-line (as opposed to compiled languages).

12

R is an interpreter >

You enter commands line-by-line (as opposed to compiled languages).

  • The > means R is a ready for a command
13

R is an interpreter >

You enter commands line-by-line (as opposed to compiled languages).

  • The > means R is a ready for a command

  • The + means your last command isn't complete

14

R is an interpreter >

You enter commands line-by-line (as opposed to compiled languages).

  • The > means R is a ready for a command

  • The + means your last command isn't complete

    • If you get stuck with a + use your escape key!
15

🐍

Name Objects in R

i_like_snake_case <-

16

🐍

Name Objects in R

i_like_snake_case <-

RStudio Keyboard Shortcuts:

OSX: Option + -

Else: Alt + -

(the + means and, not the + key)

17

Name your own objects

us <- c("Pratyush", "Anand", "Arvind") # combine strings
us
[1] "Pratyush" "Anand" "Arvind"
18

Name your own objects

us <- c("Pratyush", "Anand", "Arvind") # combine strings
us
[1] "Pratyush" "Anand" "Arvind"
num_labs <- c(1:10) # combine numbers
num_labs
[1] 1 2 3 4 5 6 7 8 9 10
19

Name your own objects

us <- c("Pratyush", "Anand", "Arvind") # combine strings
us
[1] "Pratyush" "Anand" "Arvind"
num_labs <- c(1:10) # combine numbers
num_labs
[1] 1 2 3 4 5 6 7 8 9 10
mood <- rep("yippee", length(num_labs)) # replicate 10 times
mood
[1] "yippee" "yippee" "yippee" "yippee" "yippee" "yippee" "yippee" "yippee" "yippee" "yippee"
20

Re-name others' objects

my_alpha <- letters # built-in, no package needed
my_alpha
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
21

Re-name others' objects

my_alpha <- letters # built-in, no package needed
my_alpha
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
my_names <- babynames # from the babynames package
my_names
# A tibble: 1,924,665 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
2 1880 F Anna 2604 0.0267
3 1880 F Emma 2003 0.0205
4 1880 F Elizabeth 1939 0.0199
5 1880 F Minnie 1746 0.0179
6 1880 F Margaret 1578 0.0162
7 1880 F Ida 1472 0.0151
8 1880 F Alice 1414 0.0145
9 1880 F Bertha 1320 0.0135
10 1880 F Sarah 1288 0.0132
# … with 1,924,655 more rows
# ℹ Use `print(n = ...)` to see more rows
22

What to name objects?

?make.names

Object names cannot:

23

What to name objects?

?make.names

Object names cannot:

Object names must:

  • Start with a letter
  • Contain letters, numbers, _ and .
24

Adopt a consistent naming style

i_use_snake_case # recommended
otherPeopleUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention

From: http://r4ds.had.co.nz/workflow-basics.html#whats-in-a-name

Read more: http://style.tidyverse.org/syntax.html#object-names

25

🔦

Know Your Data Types

typeof()

26

Know your data types

  • Numeric (2 subtypes)
    • Integers (1, 50)
    • Double (1.5, 50.25, ?double)
  • Character ("hello")
  • Factor (grade = "A" | grade = "B")
  • Logical (TRUE | FALSE)
27

Know your data types

  • Numeric (2 subtypes)
    • Integers (1, 50)
    • Double (1.5, 50.25, ?double)
  • Character ("hello")
  • Factor (grade = "A" | grade = "B")
  • Logical (TRUE | FALSE)
typeof(num_labs) # numeric
[1] "integer"
typeof(mood) # "yippee" is a character
[1] "character"
typeof(mood == "yippee") # is mood equal to "yippee"- T or F?
[1] "logical"
28

Characters can be deceiving

my_things <- c(num_labs, mood)
my_things
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "yippee" "yippee" "yippee" "yippee"
[15] "yippee" "yippee" "yippee" "yippee" "yippee" "yippee"
29

Characters can be deceiving

my_things <- c(num_labs, mood)
my_things
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "yippee" "yippee" "yippee" "yippee"
[15] "yippee" "yippee" "yippee" "yippee" "yippee" "yippee"
typeof(my_things)
[1] "character"
30

NA is special

num_labs <- c(num_labs, NA)
31

NA is special

num_labs <- c(num_labs, NA)
typeof(num_labs)
[1] "integer"
32

NA is special

num_labs <- c(num_labs, NA)
typeof(num_labs)
[1] "integer"
num_labs*3
[1] 3 6 9 12 15 18 21 24 27 30 NA
max(num_labs)
[1] NA
max(num_labs, na.rm = TRUE)
[1] 10
33

Case matters

my_names != My_names

34

Case matters

This works:

glimpse(babynames)
Rows: 1,924,665
Columns: 5
$ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1…
$ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", …
$ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida", "Alice", "Bertha", "Sarah", "Annie", "Clara", "El…
$ n <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258, 1226, 1156, 1063, 1045, 1040, 1012, 995, 982, 949…
$ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.01616720, 0.01508119, 0.01448696, 0.01352390, 0.01319…
35

Case matters

This works:

glimpse(babynames)
Rows: 1,924,665
Columns: 5
$ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1…
$ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", …
$ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida", "Alice", "Bertha", "Sarah", "Annie", "Clara", "El…
$ n <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258, 1226, 1156, 1063, 1045, 1040, 1012, 995, 982, 949…
$ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.01616720, 0.01508119, 0.01448696, 0.01352390, 0.01319…

These do not:

Glimpse(babynames) # no function
Error in Glimpse(babynames): could not find function "Glimpse"
glimpse(Babynames) # no data
Error in glimpse(Babynames): object 'Babynames' not found
36

📢

Comments

# go here

37

Text behind a # is a comment

num_labs + 2 # add 2 here
[1] 3 4 5 6 7 8 9 10 11 12 NA
num_weeks <- num_labs + 2 # save as new object
38

Text behind a # is a comment

num_labs + 2 # add 2 here
[1] 3 4 5 6 7 8 9 10 11 12 NA
num_weeks <- num_labs + 2 # save as new object
# I can say anything I want here...
num_weeks
[1] 3 4 5 6 7 8 9 10 11 12 NA
39

Text behind a # is a comment

num_labs + 2 # add 2 here
[1] 3 4 5 6 7 8 9 10 11 12 NA
num_weeks <- num_labs + 2 # save as new object
# I can say anything I want here...
num_weeks
[1] 3 4 5 6 7 8 9 10 11 12 NA
but not here
Error: <text>:1:5: unexpected symbol
1: but not
^
40

🍰

Functions

41

Functions

Sometimes abbreviated funs in documentation, which is a little ironic 😉.

Functions can come from:

  • base R (these functions are "built in")
  • packages
  • you
42

Base R Functions

seq(1, 12, 1) # base R
[1] 1 2 3 4 5 6 7 8 9 10 11 12
43

Functions from Packages

babynames %>% count(sex) # count is from dplyr
# A tibble: 2 × 2
sex n
<chr> <int>
1 F 1138293
2 M 786372
44

Roll Your Own Functions

greet <- function(name) {
glue::glue("Welcome to SMI, {name}!")
}
greet("Kanishka")
Welcome to SMI, Kanishka!
45

Function help

?seq
?count

Pay attention to:

46

Function help

?seq
?count

Pay attention to:

  • Usage (recipe)
47

Function help

?seq
?count

Pay attention to:

  • Usage (recipe)

  • Arguments (ingredients)

48

Function help

?seq
?count

Pay attention to:

  • Usage (recipe)

  • Arguments (ingredients)

  • Examples

49

📦

Packages

"install once per machine, load once per R session"

51

Packages!

Install once per machine

install.packages("dplyr")
52

Packages!

Install once per machine

install.packages("dplyr")

Load once per R work session

library(dplyr)
53

Packages!

Install once per machine

install.packages("dplyr")

Load once per R work session

library(dplyr)

also: quotes matter, sorry

54

The tidyverse package ecosystem

https://www.tidyverse.org

55

"The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures."

install.packages("tidyverse")
library(tidyverse)

See packages included here: https://www.tidyverse.org/packages/

56

%>%

The pipe

"dataframe first, dataframe once"

57

%>%

The pipe

"dataframe first, dataframe once"

library(dplyr)
58

%>%

The pipe

"dataframe first, dataframe once"

library(dplyr)

RStudio Keyboard Shortcuts:

OSX: CMD + SHIFT + M

Else: CTRL + SHIFT + M

59

Nesting a dataframe inside a function is hard to read.

slice(babynames, 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
60

Nesting a dataframe inside a function is hard to read.

slice(babynames, 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724

Here, the "sentence" starts with a verb.

61

Nesting a dataframe inside a function is hard to read.

slice(babynames, 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724

Here, the "sentence" starts with a verb.


Piping a dataframe into a function lets you read L to R

babynames %>% slice(1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
62

Nesting a dataframe inside a function is hard to read.

slice(babynames, 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724

Here, the "sentence" starts with a verb.


Piping a dataframe into a function lets you read L to R

babynames %>% slice(1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724

Now, the "sentence" starts with a noun.

63

Sequences of functions make you read inside out

slice(filter(babynames, sex == "M"), 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815
64

Sequences of functions make you read inside out

slice(filter(babynames, sex == "M"), 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815

Chaining functions together lets you read L to R

babynames %>% filter(sex == "M") %>% slice(1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815
65

"dataframe first, dataframe once"

66
babynames %>% filter(sex == "M") %>% slice(1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815
67
babynames %>% filter(sex == "M") %>% slice(1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815

This does the same thing:

babynames %>% filter(.data = ., sex == "M") %>% slice(.data = ., 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815
68
babynames %>% filter(sex == "M") %>% slice(1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815

This does the same thing:

babynames %>% filter(.data = ., sex == "M") %>% slice(.data = ., 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815

So does this:

babynames %>% filter(., sex == "M") %>% slice(., 1)
# A tibble: 1 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 M John 9655 0.0815
69

I know...

70

I promise, it gets better.

71

Install & load multiple R packages

This can get to be a long list if we want to use a lot of new packages in our work session. We can make a function to load a list of packages, and install them if not already installed (more on functions later).

pkgs <- c("readr", "dplyr", "tidyr") # list packages needed
73

Install & load multiple R packages

This can get to be a long list if we want to use a lot of new packages in our work session. We can make a function to load a list of packages, and install them if not already installed (more on functions later).

pkgs <- c("readr", "dplyr", "tidyr") # list packages needed
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
74

Install & load multiple R packages

This can get to be a long list if we want to use a lot of new packages in our work session. We can make a function to load a list of packages, and install them if not already installed (more on functions later).

pkgs <- c("readr", "dplyr", "tidyr") # list packages needed
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
ipak(pkgs) # take function, and give it that list

Function from: https://gist.github.com/stevenworthington/3178163

75

R Basics

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow