ggplot2
Before we begin, start a new file with File
→ New File
→ R Script
. As you work through this sheet in the console in R
, also add (copy/paste) your commands that work into this new file. At the end, save it, and run to execute all of your commands at once.
gapminder
that uses a small snippet of this data for exploratory analysis. Install and load the package gapminder
. Type ?gapminder
and hit enter to see a description of the data.gapminder
to see what we’re dealing with.str
ucture of the gapminder
data.## Classes 'tbl_df', 'tbl' and 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
head
of the dataset to get an idea of what the data looks like.country <fctr> | continent <fctr> | year <int> | lifeExp <dbl> | pop <int> | gdpPercap <dbl> |
---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 |
Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 |
Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 |
Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 |
Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 |
Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 |
summary
statistics of all variables.## country continent year lifeExp
## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
## Algeria : 12 Asia :396 Median :1980 Median :60.71
## Angola : 12 Europe :360 Mean :1980 Mean :59.47
## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
## Australia : 12 Max. :2007 Max. :82.60
## (Other) :1632
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
##
ggplot2
histogram
to visualize the distribution of a variable. Make a histogram
of gdpPercap
. Your only aes
thetic here is to map gdpPercap
to x
.## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
aes
thetic that maps continent
to fill
.2## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
global
aes
thetic of mapping continent
to color
. If we want just one regression line, we need to instead move the color = continent
inside the aes
of geom_point
. This will only map continent
to color
for points, not for anything else.## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
"black"
. Try first by putting this inside an aes()
in your geom_smooth
, and try a second time by just putting it inside geom_smooth
without an aes()
. What’s the difference, and why?## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
facet
ing. Add +facet_wrap(~continent)
to create subplots by continent
.## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
facet
layer. The scale
is quite annoying for the x
-axis, a lot of points are clustered on the lower level. Let’s try changing the scale by adding a layer: +scale_x_log10()
.## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
+labs()
. Inside labs
, make proper axes titles for x
, y
, and a title
to the plot. If you want to change the name of the legends (continent color), add one for color
and size
.## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
gapminder
dataframe and subset it to only look at continent=="Americas"
). Assign this to a new dataframe object (call it something like america
.) Now, use this as your data
, and redo the graph from question 17. (You might want to take a look at your new dataframe to make sure it worked first!)country <fctr> | continent <fctr> | year <int> | lifeExp <dbl> | pop <int> | gdpPercap <dbl> |
---|---|---|---|---|---|
Argentina | Americas | 1952 | 62.485 | 17876956 | 5911.315 |
Argentina | Americas | 1957 | 64.399 | 19610538 | 6856.856 |
Argentina | Americas | 1962 | 65.142 | 21283783 | 7133.166 |
Argentina | Americas | 1967 | 65.634 | 22934225 | 8052.953 |
Argentina | Americas | 1972 | 67.065 | 24779799 | 9443.039 |
Argentina | Americas | 1977 | 68.481 | 26983828 | 10079.027 |
Argentina | Americas | 1982 | 69.942 | 29341374 | 8997.897 |
Argentina | Americas | 1987 | 70.774 | 31620918 | 9139.671 |
Argentina | Americas | 1992 | 71.868 | 33958947 | 9308.419 |
Argentina | Americas | 1997 | 73.275 | 36203463 | 10967.282 |
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
country <fctr> | continent <fctr> | year <int> | lifeExp <dbl> | pop <int> | gdpPercap <dbl> |
---|---|---|---|---|---|
Afghanistan | Asia | 2002 | 42.129 | 25268405 | 726.7341 |
Albania | Europe | 2002 | 75.651 | 3508512 | 4604.2117 |
Algeria | Africa | 2002 | 70.994 | 31287142 | 5288.0404 |
Angola | Africa | 2002 | 41.003 | 10866106 | 2773.2873 |
Argentina | Americas | 2002 | 74.340 | 38331121 | 8797.6407 |
Australia | Oceania | 2002 | 80.370 | 19546792 | 30687.7547 |
Austria | Europe | 2002 | 78.980 | 8148312 | 32417.6077 |
Bahrain | Asia | 2002 | 74.795 | 656397 | 23403.5593 |
Bangladesh | Asia | 2002 | 62.013 | 135656790 | 1136.3904 |
Belgium | Europe | 2002 | 78.320 | 10311970 | 30485.8838 |
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'