Useful R Packages
Packages are bundles of specialized code that you can add in to go beyond the basic R functions. When you install a package you are adding in extra coding options that can help you analyze or visualize your data more easily. Anyone can write packages and they can be general or very specific, so depending on your task, you may find someone has written a package to make it easier for you.
The terms "package" and "library" are used interchangeably. When using R you will run install.packages()
when you need to add a package for the first time, then you run library()
to load the package. install.packages()
is like buying a book—you only need to do it once—and library()
is like getting it off the shelf—you need to do it everytime you want to use that book.
The one package to rule them all
The first step in all of your scripts will likely be this line of code:
library(tidyverse)
is a meta-package that will load many other packages within a single step. When you run the line above, it will load in the following packages for you automatically:
ggplot2
,dplyr
tidyr
readr
purrr
tibble
stringr
forcats
Below are categories that contain other useful packages. If you are trying to load in data from an online database (ex: US Census) be sure to check out the Direct Data Access libraries. There may be a library that will load your data in for you without the need for you to download it from the website.
Loading Data
Package Name |
What It Does |
Learn More |
readr | for loading in .csv, .txt, and more file types | |
readxl | for loading .xlsx file types and other Excel extensions | |
haven | for loading Stata, SPS, and SPSS files | |
jsonlite | for importing JSON objects and converting to R data types | |
googlesheets4 | for loading data from a Google Drive account | |
rvest | for web-scraping | |
duckdb | for loading more data than R likes to load; if you have a huge dataset, use this package |
Formatting Data
Package Name |
What It Does |
Learn More |
dplyr | contains the most commonly used tools for data manipulation | |
tidyr | tools for pivoting tables from wide to long format and vice versa | |
janitor | for cleaning up and standardizing data names | |
stringr | helpful functions for manipulating strings | |
scales | for overriding default settings for significant digits, plot axes, and more | |
lubridate | a must have package for formatting any data that is a date or time | |
data.table | good functions for speeding up analysis when you have large data sets | |
broom | for making your data more tidyverse friendly | |
purrr | tools for working with functions and vectors, helpful for converting from lists of lists to data frames |
Creating Nice Plots & Tables
Package Name |
What It Does |
Learn More |
ggplot2 | the best package for making your graphs look nice | |
gt | stands for "great tables" and follow through on its promise | |
gtsummary | works with gt to display publication-ready summary of regressions and more | |
viridis | has pretty color palettes | |
RColorBrewer | has pretty color palettes | |
ggpubr | customization for ggplot2 that helps make publication-ready documents | |
patchwork | works well with ggplot2 to help align multiple plots or tables in one figure or page | |
gridExtra | helps align multiple plots or tables in one figure or page | |
wesanderson | has color palettes that correspond to each Wes Anderson movie | |
plotly | for making your graphs interactive, works well with the shiny package |
Useful Stats Packages
Package Name |
What It Does |
Learn More |
stats | the main source for statistical functions beyond base R | |
lme4 | for linear regression with mixed-effects models | |
lmerTest | statistical tests for analyzing linear mixed-effect models | |
MASS | for regression analysis of non-linear models | |
Hmisc | a lot of miscellaneous additional functions for statistical analyis | |
FactoMineR | for multivariate exploratory data analysis | |
outliers | many specific tests for detecting outliers | |
vegan | for ordination analyses and diversity stats, particularly good for ecology | |
car | extra tools for regression analysis | |
cluster | tools for performing cluster analysis | |
forcats | tools for working with categorical variables |
Direct Data Access
Package Name |
Database Accessed |
Learn More |
tidycensus | US Census | |
rnoaa* | National Oceanic and Atmospheric Administration |
|
COVID19 | daily updates on Covid data | |
wbstats | World Bank data | |
tidyquant | Stock market data | |
crimedata | Crime Open Database | |
eurostat | Eurostat Open Data | eurostat documentation |
WDI | World Bank and World Development Indicators | |
imf.data | International Monetary Fund | |
fredr | Federal Reserve of Economic Data | |
googleanalyticsR | Google Analytics |
* They're working on a replacement, but it is still usable.