Fuel economy data
1.0.1 Exercises
- List five functions that you could use to get more information about the mpg dataset.
summary, table, mean, sd, var, quantile, max, min, range, ncol, is.na
…
- How can you find out what other datasets are included with ggplot2?
data(package = "ggplot2")
Data sets in package ‘ggplot2’:
diamonds Prices of over 50,000 round cut diamonds
economics US economic time series
economics_long US economic time series
faithfuld 2d density estimate of Old Faithful data
luv_colours 'colors()' in Luv space
midwest Midwest demographics
mpg Fuel economy data from 1999 to 2008 for 38 popular
models of cars
msleep An updated and expanded version of the mammals sleep
dataset
presidential Terms of 12 presidents from Eisenhower to Trump
seals Vector field of seal movements
txhousing Housing sales in TX
- Apart from the US, most countries use fuel consumption (fuel consumed over fixed distance) rather than fuel economy (distance travelled with fixed amount of fuel). How could you convert cty and hwy into the European standard of l/100km?
mpg %>% mutate(cgp100km = 2.35/cty*100, hgp100km = 235 / hwy)
- Which manufacturer has the most models in this dataset?
## # A tibble: 15 × 2
## manufacturer n
## <chr> <int>
## 1 dodge 37
## 2 toyota 34
## 3 volkswagen 27
## 4 ford 25
## 5 chevrolet 19
## 6 audi 18
## 7 hyundai 14
## 8 subaru 14
## 9 nissan 13
## 10 honda 9
## 11 jeep 8
## 12 pontiac 5
## 13 land rover 4
## 14 mercury 4
## 15 lincoln 3
## # A tibble: 1 × 2
## manufacturer n
## <chr> <int>
## 1 dodge 37
## dodge
## 37
- Which model has the most variations?
## # A tibble: 38 × 2
## model n
## <chr> <int>
## 1 caravan 2wd 11
## 2 ram 1500 pickup 4wd 10
## 3 civic 9
## 4 dakota pickup 4wd 9
## 5 jetta 9
## 6 mustang 9
## 7 a4 quattro 8
## 8 grand cherokee 4wd 8
## 9 impreza awd 8
## 10 a4 7
## # ℹ 28 more rows
- Does your answer change if you remove the redundant specification of drive train (e.g. “pathfinder 4wd”, “a4 quattro”) from the model name?
mpg %>%
mutate(model = sub(" 4wd", "", model),
model = sub(" awd", "", model),
model = sub(" 2wd", "", model),
model = sub(" quattro", "", model)
) %>% count(model) %>% arrange(desc(n))
## # A tibble: 37 × 2
## model n
## <chr> <int>
## 1 a4 15
## 2 caravan 11
## 3 ram 1500 pickup 10
## 4 civic 9
## 5 dakota pickup 9
## 6 jetta 9
## 7 mustang 9
## 8 grand cherokee 8
## 9 impreza 8
## 10 camry 7
## # ℹ 27 more rows