Getting Set Up

Warm Up to dplyr With gapminder

Example I: Majors

Wrangling Data (with dplyr)

19. Look at the data with glimpse(). This is a suped-up version of str() in tidyverse.
20.
21.

Which major has the lowest unemployment rate?

Getting Set Up

Before we begin, start a new file with File New File R Script. As you work through this sheet in the console in R, also add (copy/paste) your commands that work into this new file. At the end, save it, and run to execute all of your commands at once.

First things first, load tidyverse.

library("tidyverse")

## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.2.0     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Warm Up to `dplyr` With `gapminder`

1. Load `gapminder`. `select()` the variables `year`, `lifeExp`, `country`.

library(gapminder)
# select(gapminder, c(year, lifeExp, country))

# using the pipe

gapminder %>%
  select(c(year, lifeExp, country))

ABCDEFGHIJ0123456789

year <int>	lifeExp <dbl>	country <fctr>
1952	28.80100	Afghanistan
1957	30.33200	Afghanistan
1962	31.99700	Afghanistan
1967	34.02000	Afghanistan
1972	36.08800	Afghanistan
1977	38.43800	Afghanistan
1982	39.85400	Afghanistan
1987	40.82200	Afghanistan
1992	41.67400	Afghanistan
1997	41.76300	Afghanistan

2. `select()` all variables except `pop`.

# select(gapminder, -pop)

# using the pipe

gapminder %>%
  select(-pop)

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	gdpPercap <dbl>
Afghanistan	Asia	1952	28.80100	779.4453
Afghanistan	Asia	1957	30.33200	820.8530
Afghanistan	Asia	1962	31.99700	853.1007
Afghanistan	Asia	1967	34.02000	836.1971
Afghanistan	Asia	1972	36.08800	739.9811
Afghanistan	Asia	1977	38.43800	786.1134
Afghanistan	Asia	1982	39.85400	978.0114
Afghanistan	Asia	1987	40.82200	852.3959
Afghanistan	Asia	1992	41.67400	649.3414
Afghanistan	Asia	1997	41.76300	635.3414

3. `rename()` `continent` to `cont`.

# rename(gapminder, cont=continent)

# using the pipe

gapminder %>%
  rename(cont=continent)

ABCDEFGHIJ0123456789

country <fctr>	cont <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>
Afghanistan	Asia	1952	28.80100	8425333	779.4453
Afghanistan	Asia	1957	30.33200	9240934	820.8530
Afghanistan	Asia	1962	31.99700	10267083	853.1007
Afghanistan	Asia	1967	34.02000	11537966	836.1971
Afghanistan	Asia	1972	36.08800	13079460	739.9811
Afghanistan	Asia	1977	38.43800	14880372	786.1134
Afghanistan	Asia	1982	39.85400	12881816	978.0114
Afghanistan	Asia	1987	40.82200	13867957	852.3959
Afghanistan	Asia	1992	41.67400	16317921	649.3414
Afghanistan	Asia	1997	41.76300	22227415	635.3414

4.

`arrange()` by year.

# arrange(gapminder, year)

# with the pipe
gapminder %>%
  arrange(year)

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>
Afghanistan	Asia	1952	28.80100	8425333	779.4453
Albania	Europe	1952	55.23000	1282697	1601.0561
Algeria	Africa	1952	43.07700	9279525	2449.0082
Angola	Africa	1952	30.01500	4232095	3520.6103
Argentina	Americas	1952	62.48500	17876956	5911.3151
Australia	Oceania	1952	69.12000	8691212	10039.5956
Austria	Europe	1952	66.80000	6927772	6137.0765
Bahrain	Asia	1952	50.93900	120447	9867.0848
Bangladesh	Asia	1952	37.48400	46886859	684.2442
Belgium	Europe	1952	68.00000	8730405	8343.1051

5.

`arrange()` by year, but in descending order.

# arrange(gapminder, desc(year))

# with the pipe
gapminder %>%
  arrange(desc(year))

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>
Afghanistan	Asia	2007	43.82800	31889923	974.5803
Albania	Europe	2007	76.42300	3600523	5937.0295
Algeria	Africa	2007	72.30100	33333216	6223.3675
Angola	Africa	2007	42.73100	12420476	4797.2313
Argentina	Americas	2007	75.32000	40301927	12779.3796
Australia	Oceania	2007	81.23500	20434176	34435.3674
Austria	Europe	2007	79.82900	8199783	36126.4927
Bahrain	Asia	2007	75.63500	708573	29796.0483
Bangladesh	Asia	2007	64.06200	150448339	1391.2538
Belgium	Europe	2007	79.44100	10392226	33692.6051

6.

`arrange()` by year, then by life expectancy.

# arrange(gapminder, year, lifeExp)

# with the pipe

gapminder %>%
  arrange(year, lifeExp)

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>
Afghanistan	Asia	1952	28.80100	8425333	779.4453
Gambia	Africa	1952	30.00000	284320	485.2307
Angola	Africa	1952	30.01500	4232095	3520.6103
Sierra Leone	Africa	1952	30.33100	2143249	879.7877
Mozambique	Africa	1952	31.28600	6446316	468.5260
Burkina Faso	Africa	1952	31.97500	4469979	543.2552
Guinea-Bissau	Africa	1952	32.50000	580653	299.8503
Yemen, Rep.	Asia	1952	32.54800	4963829	781.7176
Somalia	Africa	1952	32.97800	2526994	1135.7498
Guinea	Africa	1952	33.60900	2664249	510.1965

7.

`filter()` observations with `pop` greater than 1 billion.

# arrange(filter, pop>1000000000)

# with the pipe

gapminder %>%
  filter(pop>1000000000)

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>
China	Asia	1982	65.525	1000281000	962.4214
China	Asia	1987	67.274	1084035000	1378.9040
China	Asia	1992	68.690	1164970000	1655.7842
China	Asia	1997	70.426	1230075000	2289.2341
China	Asia	2002	72.028	1280400000	3119.2809
China	Asia	2007	72.961	1318683096	4959.1149
India	Asia	2002	62.879	1034172547	1746.7695
India	Asia	2007	64.698	1110396331	2452.2104

8.

Of those, look only at `India`.

gapminder %>%
  filter(pop>1000000000) %>%
  filter(country=="India")

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>
India	Asia	2002	62.879	1034172547	1746.769
India	Asia	2007	64.698	1110396331	2452.210

9.

Try out the pipe (`%`) if you haven’t already, by chaining commands: `select()` your data to look only at `year`, `gdpPercap`, and `country` in the year `1997`, for countries that have a `gdpPercap` greater than 20,000, and `arrange()` them alphabetically.

gapminder %>%
  select(year, gdpPercap, country) %>%
  filter(year==1997,
         gdpPercap>20000) %>%
  arrange(country)

ABCDEFGHIJ0123456789

year <int>	gdpPercap <dbl>	country <fctr>
1997	26997.94	Australia
1997	29095.92	Austria
1997	20292.02	Bahrain
1997	27561.20	Belgium
1997	28954.93	Canada
1997	29804.35	Denmark
1997	23723.95	Finland
1997	25889.78	France
1997	27788.88	Germany
1997	28377.63	Hong Kong, China

10.

`mutate()` a new variable called `GDP` that is equal to `gdpPercap * pop`.

gapminder %>%
  mutate(GDP = gdpPercap * pop)

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>	GDP <dbl>
Afghanistan	Asia	1952	28.80100	8425333	779.4453	6.567086e+09
Afghanistan	Asia	1957	30.33200	9240934	820.8530	7.585449e+09
Afghanistan	Asia	1962	31.99700	10267083	853.1007	8.758856e+09
Afghanistan	Asia	1967	34.02000	11537966	836.1971	9.648014e+09
Afghanistan	Asia	1972	36.08800	13079460	739.9811	9.678553e+09
Afghanistan	Asia	1977	38.43800	14880372	786.1134	1.169766e+10
Afghanistan	Asia	1982	39.85400	12881816	978.0114	1.259856e+10
Afghanistan	Asia	1987	40.82200	13867957	852.3959	1.182099e+10
Afghanistan	Asia	1992	41.67400	16317921	649.3414	1.059590e+10
Afghanistan	Asia	1997	41.76300	22227415	635.3414	1.412200e+10

11.

`mutate()` a new population variable that is the `pop` in millions.

gapminder %>%
  mutate(popm = pop/1000000)

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>	popm <dbl>
Afghanistan	Asia	1952	28.80100	8425333	779.4453	8.425333
Afghanistan	Asia	1957	30.33200	9240934	820.8530	9.240934
Afghanistan	Asia	1962	31.99700	10267083	853.1007	10.267083
Afghanistan	Asia	1967	34.02000	11537966	836.1971	11.537966
Afghanistan	Asia	1972	36.08800	13079460	739.9811	13.079460
Afghanistan	Asia	1977	38.43800	14880372	786.1134	14.880372
Afghanistan	Asia	1982	39.85400	12881816	978.0114	12.881816
Afghanistan	Asia	1987	40.82200	13867957	852.3959	13.867957
Afghanistan	Asia	1992	41.67400	16317921	649.3414	16.317921
Afghanistan	Asia	1997	41.76300	22227415	635.3414	22.227415

12.

`summarize()` to get the average GDP per capita.

gapminder %>%
  summarize(mean(gdpPercap))

ABCDEFGHIJ0123456789

mean(gdpPercap) <dbl>
7215.327

13.

Get the number of observations, average, minimum, maximum, and standard deviation for GDP per capita.

gapminder %>%
  summarize(Obs = n(),
            Average = mean(gdpPercap),
            Minimum = min(gdpPercap),
            Maximum = max(gdpPercap),
            SD = sd(gdpPercap))

ABCDEFGHIJ0123456789

Obs <int>	Average <dbl>	Minimum <dbl>	Maximum <dbl>	SD <dbl>
1704	7215.327	241.1659	113523.1	9857.455

14.

Get the average GDP per capita over time. Hint, first `group_by()` year.

gapminder %>%
  group_by(year) %>%
  summarize(Average_GDP = mean(gdpPercap))

ABCDEFGHIJ0123456789

year <int>	Average_GDP <dbl>
1952	3725.276
1957	4299.408
1962	4725.812
1967	5483.653
1972	6770.083
1977	7313.166
1982	7518.902
1987	7900.920
1992	8158.609
1997	9090.175

15.

Get the average GDP per capita by continent.

gapminder %>%
  group_by(continent) %>%
  summarize(Average_GDP = mean(gdpPercap))

ABCDEFGHIJ0123456789

continent <fctr>	Average_GDP <dbl>
Africa	2193.755
Americas	7136.110
Asia	7902.150
Europe	14469.476
Oceania	18621.609

16.

Get the average GDP per capita by year and by continent.¹ Then save this as another `tibble` called `gdp`. Create a `ggplot` of a `line` graph of average continent GDP over time using the `gdp` data.

gdp <- gapminder %>%
  group_by(year, continent) %>%
  summarize(Average_GDP = mean(gdpPercap))

ggplot(data = gdp)+
  aes(x = year,
      y = Average_GDP,
      color = continent)+
  geom_line()

17.

Try it again all in one command with the pipe `%>%`. Instead of saving the data as `gdp`, pipe it right into `ggplot`!²

gapminder %>%
  group_by(year, continent) %>%
  summarize(Average_GDP = mean(gdpPercap)) %>%
ggplot(data = .)+
  aes(x = year,
      y = Average_GDP,
      color = continent)+
  geom_line()

Example I: Majors

Now let’s step it up to work with some data “in the wild” to answer some research questions. This will have you combine your dplyr skills and add some new things such as importing with readr.

Let’s look at fivethirtyeight’s article " The Economic Guide To Picking A College Major ". fivethirtyeight is great about making the data behind their articles public, we can download all of their data here. Search for college majors and click download (the blue arrow button).³ We will look at the recent-grads.csv file.

The description in the readme file for the data is as follows:

Header	Description
`Rank`	Rank by median earnings
`Major_code`	Major code, FO1DP in ACS PUMS
`Major`	Major description
`Major_category`	Category of major from Carnevale et al
`Total`	Total number of people with major
`Sample_size`	Sample size (unweighted) of full-time, year-round ONLY (used for earnings)
`Men`	Male graduates
`Women`	Female graduates
`ShareWomen`	Women as share of total
`Employed`	Number employed (ESR == 1 or 2)
`Full_time`	Employed 35 hours or more
`Part_time`	Employed less than 35 hours
`Full_time_year_round`	Employed at least 50 weeks (WKW == 1) and at least 35 hours (WKHP >= 35)
`Unemployed`	Number unemployed (ESR == 3)
`Unemployment_rate`	Unemployed / (Unemployed + Employed)
`Median`	Median earnings of full-time, year-round workers
`P25th`	25th percentile of earnigns
`P75th`	75th percentile of earnings
`College_jobs`	Number with job requiring a college degree
`Non_college_jobs`	Number with job not requiring a college degree
`Low_wage_jobs`	Number in low-wage service jobs

18.

Import the data with `read_csv()` and assign it to `tibble` called `majors`.⁴ The first argument of this command the name of the original file, in quotes.⁵

majors<-read_csv("../data/recent-grads.csv") # note my location on my computer is different than yours!

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   Major = col_character(),
##   Major_category = col_character()
## )

## See spec(...) for full column specifications.

Wrangling Data (with `dplyr`)

19. Look at the data with `glimpse()`. This is a suped-up version of `str()` in `tidyverse`.

glimpse(majors)

## Observations: 173
## Variables: 21
## $ Rank                 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, …
## $ Major_code           <dbl> 2419, 2416, 2415, 2417, 2405, 2418, 6202, 5…
## $ Major                <chr> "PETROLEUM ENGINEERING", "MINING AND MINERA…
## $ Total                <dbl> 2339, 756, 856, 1258, 32260, 2573, 3777, 17…
## $ Men                  <dbl> 2057, 679, 725, 1123, 21239, 2200, 2110, 83…
## $ Women                <dbl> 282, 77, 131, 135, 11021, 373, 1667, 960, 1…
## $ Major_category       <chr> "Engineering", "Engineering", "Engineering"…
## $ ShareWomen           <dbl> 0.1205643, 0.1018519, 0.1530374, 0.1073132,…
## $ Sample_size          <dbl> 36, 7, 3, 16, 289, 17, 51, 10, 1029, 631, 3…
## $ Employed             <dbl> 1976, 640, 648, 758, 25694, 1857, 2912, 152…
## $ Full_time            <dbl> 1849, 556, 558, 1069, 23170, 2038, 2924, 10…
## $ Part_time            <dbl> 270, 170, 133, 150, 5180, 264, 296, 553, 13…
## $ Full_time_year_round <dbl> 1207, 388, 340, 692, 16697, 1449, 2482, 827…
## $ Unemployed           <dbl> 37, 85, 16, 40, 1672, 400, 308, 33, 4650, 3…
## $ Unemployment_rate    <dbl> 0.018380527, 0.117241379, 0.024096386, 0.05…
## $ Median               <dbl> 110000, 75000, 73000, 70000, 65000, 65000, …
## $ P25th                <dbl> 95000, 55000, 50000, 43000, 50000, 50000, 5…
## $ P75th                <dbl> 125000, 90000, 105000, 80000, 75000, 102000…
## $ College_jobs         <dbl> 1534, 350, 456, 529, 18314, 1142, 1768, 972…
## $ Non_college_jobs     <dbl> 364, 257, 176, 102, 4440, 657, 314, 500, 16…
## $ Low_wage_jobs        <dbl> 193, 50, 0, 0, 972, 244, 259, 220, 3253, 31…

20.

What are all of the unique values of Major? How many are there?

majors %>%
  distinct(Major)

ABCDEFGHIJ0123456789

Major <chr>
PETROLEUM ENGINEERING
MINING AND MINERAL ENGINEERING
METALLURGICAL ENGINEERING
NAVAL ARCHITECTURE AND MARINE ENGINEERING
CHEMICAL ENGINEERING
NUCLEAR ENGINEERING
ACTUARIAL SCIENCE
ASTRONOMY AND ASTROPHYSICS
MECHANICAL ENGINEERING
ELECTRICAL ENGINEERING

majors %>%
  summarize(n_distinct(Major))

ABCDEFGHIJ0123456789

n_distinct(Major) <int>
173

21.

Which major has the lowest unemployment rate?

majors %>%
  arrange(Unemployment_rate) %>%
  select(Major, Unemployment_rate)

ABCDEFGHIJ0123456789

Major <chr>	Unemployment_rate <dbl>
MATHEMATICS AND COMPUTER SCIENCE	0.000000000
MILITARY TECHNOLOGIES	0.000000000
BOTANY	0.000000000
SOIL SCIENCE	0.000000000
EDUCATIONAL ADMINISTRATION AND SUPERVISION	0.000000000
ENGINEERING MECHANICS PHYSICS AND SCIENCE	0.006334343
COURT REPORTING	0.011689692
MATHEMATICS TEACHER EDUCATION	0.016202835
PETROLEUM ENGINEERING	0.018380527
GENERAL AGRICULTURE	0.019642463

22.

What are the top 3 majors that have the highest percentage of women?

majors %>%
  arrange(desc(ShareWomen)) %>%
  select(Major, ShareWomen) %>%
  slice(1:3) # head(., n=3) also works

ABCDEFGHIJ0123456789

Major <chr>	ShareWomen <dbl>
EARLY CHILDHOOD EDUCATION	0.9689537
COMMUNICATION DISORDERS SCIENCES AND SERVICES	0.9679981
MEDICAL ASSISTING SERVICES	0.9278072

23.

Make a boxplot of `Median` wage by `Major_Category`.⁶

ggplot(data = majors)+
  aes(x = Major_category,
      y = Median,
      fill = Major_category)+
  geom_boxplot()+
  theme(axis.text.x=element_text(angle=45, hjust=1), legend.position = "")

24.

Which major category is the least popular in this sample?⁷

majors %>%
  group_by(Major_category) %>%
  count() %>%
  arrange(n)

ABCDEFGHIJ0123456789

Major_category <chr>	n <int>
Interdisciplinary	1
Communications & Journalism	4
Law & Public Policy	5
Industrial Arts & Consumer Services	7
Arts	8
Psychology & Social Work	9
Social Science	9
Agriculture & Natural Resources	10
Physical Sciences	10
Computers & Mathematics	11

25.

Is there a systematic difference between STEM majors and non-STEM majors? First define:

stem_categories <- c("Biology & Life Science",
                     "Computers & Mathematics",
                     "Engineering",
                     "Physical Sciences")

Next, make a variable stem, for whether or not a Major_category is "stem" or "not_stem".⁸

majors <- majors %>%
  mutate(stem = ifelse(Major_category %in% stem_categories,
                       "stem",
                       "not_stem")) %>%
  select(stem, everything()) # just so we can see it up front to verify

Then summarize() median for stem and not stem groups.

majors %>%
  group_by(stem) %>%
  summarize(mean(Median))

ABCDEFGHIJ0123456789

stem <chr>	mean(Median) <dbl>
not_stem	35624.77
stem	47860.94

Hint: do year first, if you do continent first, there are no years to group by!↩︎
Hint: You can use . as a placeholder.↩︎
This will download a .zip file that contains many spreadsheets. Unzip it with a program that unzips files (such as WinZip, 7-zip, the Unarchiver, etc).↩︎
One way to avoid error messages is to move it to the same file as R’s working directory, which again you can determine with getwd().↩︎
If the file is in a different folder, the argument is the full path in quotes.↩︎
You won’t be able to read the labels easily, so add theme(axis.text.x=element_text(angle=45, hjust=1) to angle x-axis labels (and move them down by 1)↩︎
Hint: use group_by first.↩︎
Hint: try out the ifelse() function which has three inputs: condition(s) for a variable(s), what to do if TRUE (the if), and what to if FALSE (the else).↩︎

1.5 Data Wrangling with tidyverse - R Practice (Answers)

Ryan Safner

ECON 480 - Fall 2019

Getting Set Up

Warm Up to dplyr With gapminder

1. Load gapminder. select() the variables year, lifeExp, country.

2. select() all variables except pop.

3. rename() continent to cont.

4.

arrange() by year.

5.

arrange() by year, but in descending order.

6.

arrange() by year, then by life expectancy.

7.

filter() observations with pop greater than 1 billion.

8.

Of those, look only at India.

9.

Try out the pipe (%) if you haven’t already, by chaining commands: select() your data to look only at year, gdpPercap, and country in the year 1997, for countries that have a gdpPercap greater than 20,000, and arrange() them alphabetically.

10.

mutate() a new variable called GDP that is equal to gdpPercap * pop.

11.

mutate() a new population variable that is the pop in millions.

12.

summarize() to get the average GDP per capita.

13.

Get the number of observations, average, minimum, maximum, and standard deviation for GDP per capita.

14.

Get the average GDP per capita over time. Hint, first group_by() year.

15.

Get the average GDP per capita by continent.

16.

Get the average GDP per capita by year and by continent.1 Then save this as another tibble called gdp. Create a ggplot of a line graph of average continent GDP over time using the gdp data.

17.

Try it again all in one command with the pipe %>%. Instead of saving the data as gdp, pipe it right into ggplot!2

Example I: Majors

18.

Import the data with read_csv() and assign it to tibble called majors.4 The first argument of this command the name of the original file, in quotes.5

Wrangling Data (with dplyr)

19. Look at the data with glimpse(). This is a suped-up version of str() in tidyverse.

20.

21.

Which major has the lowest unemployment rate?

22.

What are the top 3 majors that have the highest percentage of women?

23.

Make a boxplot of Median wage by Major_Category.6

24.

Which major category is the least popular in this sample?7

25.

Is there a systematic difference between STEM majors and non-STEM majors? First define:

Warm Up to `dplyr` With `gapminder`

1. Load `gapminder`. `select()` the variables `year`, `lifeExp`, `country`.

2. `select()` all variables except `pop`.

3. `rename()` `continent` to `cont`.

`arrange()` by year.

`arrange()` by year, but in descending order.

`arrange()` by year, then by life expectancy.

`filter()` observations with `pop` greater than 1 billion.

Of those, look only at `India`.

Try out the pipe (`%`) if you haven’t already, by chaining commands: `select()` your data to look only at `year`, `gdpPercap`, and `country` in the year `1997`, for countries that have a `gdpPercap` greater than 20,000, and `arrange()` them alphabetically.

`mutate()` a new variable called `GDP` that is equal to `gdpPercap * pop`.

`mutate()` a new population variable that is the `pop` in millions.

`summarize()` to get the average GDP per capita.

Get the average GDP per capita over time. Hint, first `group_by()` year.

Get the average GDP per capita by year and by continent.¹ Then save this as another `tibble` called `gdp`. Create a `ggplot` of a `line` graph of average continent GDP over time using the `gdp` data.

Try it again all in one command with the pipe `%>%`. Instead of saving the data as `gdp`, pipe it right into `ggplot`!²

Import the data with `read_csv()` and assign it to `tibble` called `majors`.⁴ The first argument of this command the name of the original file, in quotes.⁵

Wrangling Data (with `dplyr`)

19. Look at the data with `glimpse()`. This is a suped-up version of `str()` in `tidyverse`.

Make a boxplot of `Median` wage by `Major_Category`.⁶

Which major category is the least popular in this sample?⁷