Usage
The get_education_data function will return a data.frame from a call to the Education Data API.
library(educationdata)
get_education_data(level, source, topic, by, filters, add_labels, csv)
where:
- level (required) - API data level to query.
- source (required) - API data source to query.
- topic (required) - API data topic to query.
- by (optional) - Optional
list of grouping parameters for an API call.
- filters (optional) - Optional
list query to filter the results from an API call.
- add_labels - Add variable labels as factors (when applicable)? Defaults to
FALSE.
- csv - Download the full csv file? Defaults to
FALSE.
This simple example will obtain ‘college-university’ level data from the ‘ipeds’ source for the ‘student-faculty-ratio’ topic:
library(educationdata)
df <- get_education_data(
level = 'college-university',
source = 'ipeds',
topic = 'student-faculty-ratio'
)
head(df)
#> unitid year fips student_faculty_ratio
#> 1 100654 2009 1 14
#> 2 100663 2009 1 17
#> 3 100690 2009 1 10
#> 4 100706 2009 1 17
#> 5 100724 2009 1 17
#> 6 100751 2009 1 20
A somewhat more complex example will obtain ‘school’ level data from the ‘ccd’ source for the ‘enrollment’ topic, broken out by ‘race’ and ‘sex’. The API query is subset with filters for the ‘year’ 2008, ‘grade’ 9 through 12, and a ‘ncessch’ code of 340606000122. Finally, the add_labels flag will map integer codes to their factor labels (‘race’ and ‘sex’ in this instance).
library(educationdata)
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 2008,
grade = 9:12,
ncessch = '340606000122'),
add_labels = TRUE)
#> Warning in get_education_data(level = "schools", source = "ccd", topic = "enrollment", : The `by` argument has been deprecated in favor of `subtopic`.
#> Please update your script to use `subtopic` instead.
head(df)
#> year ncessch ncessch_num grade race sex
#> 1 2008 340606000122 3.40606e+11 9 Black Male
#> 2 2008 340606000122 3.40606e+11 9 Hispanic Male
#> 3 2008 340606000122 3.40606e+11 9 American Indian or Alaska Native Female
#> 4 2008 340606000122 3.40606e+11 9 American Indian or Alaska Native Male
#> 5 2008 340606000122 3.40606e+11 9 Black Female
#> 6 2008 340606000122 3.40606e+11 9 Asian Female
#> enrollment fips leaid
#> 1 41 New Jersey 3406060
#> 2 39 New Jersey 3406060
#> 3 0 New Jersey 3406060
#> 4 0 New Jersey 3406060
#> 5 46 New Jersey 3406060
#> 6 32 New Jersey 3406060
Main Filters
Due to the way the API is set-up, the variables listed within ‘main filters’ are often the fastest way to subset an API call.
In addition to year, the other main filters for certain endpoints accept the following values:
Grade
grade = 'grade-pk' |
Pre-K |
grade = 'grade-k' |
Kindergarten |
grade = 'grade-1' |
Grade 1 |
grade = 'grade-2' |
Grade 2 |
grade = 'grade-3' |
Grade 3 |
grade = 'grade-4' |
Grade 4 |
grade = 'grade-5' |
Grade 5 |
grade = 'grade-6' |
Grade 6 |
grade = 'grade-7' |
Grade 7 |
grade = 'grade-8' |
Grade 8 |
grade = 'grade-9' |
Grade 9 |
grade = 'grade-10' |
Grade 10 |
grade = 'grade-11' |
Grade 11 |
grade = 'grade-12' |
Grade 12 |
grade = 'grade-13' |
Grade 13 |
grade = 'grade-14' |
Adult Education |
grade = 'grade-15' |
Ungraded |
grade = 'grade-16' |
K-12 |
grade = 'grade-20' |
Grades 7 and 8 |
grade = 'grade-21' |
Grade 9 and 10 |
grade = 'grade-22' |
Grades 11 and 12 |
grade = 'grade-99' |
Total |
Level of Study
level_of_study = 'undergraduate' |
Undergraduate |
level_of_study = 'graduate' |
Graduate |
level_of_study = 'first-professional' |
First Professional |
level_of_study = 'post-baccalaureate' |
Post-baccalaureate |
level_of_study = '99' |
Total |
Examples
Let’s build up some examples, from the following set of endpoints.
| schools |
ccd |
enrollment |
NA |
year, grade |
1986–2018 |
| schools |
ccd |
enrollment |
race |
year, grade |
1986–2018 |
| schools |
ccd |
enrollment |
race, sex |
year, grade |
1986–2018 |
| schools |
ccd |
enrollment |
sex |
year, grade |
1986–2018 |
| schools |
crdc |
enrollment |
disability, sex |
year |
2011, 2013, 2015, 2017 |
| schools |
crdc |
enrollment |
lep, sex |
year |
2011, 2013, 2015, 2017 |
| schools |
crdc |
enrollment |
race, sex |
year |
2011, 2013, 2015, 2017 |
The following will return a data.frame across all years and grades:
library(educationdata)
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment')
Note that this endpoint is also callable by certain variables:
These variables can be added to the by argument:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'))
You may also filter the results of an API call. In this case year and grade will provide the most time-efficient subsets, and can be vectorized:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8))
Additional variables can also be passed to filters to subset further:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'))
Finally, the add_labels flag will map variables to a factor from their labels in the API.
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'),
add_labels = TRUE)
Finally, the csv flag can be set to download the full .csv data frame. In general, the csv functionality is much faster when retrieving the full data frame (or a large subset) and much slower when retrieving a small subset of a data frame (especially ones with a lot of filters added). In this example, the full csv for 2008 must be downloaded and then subset to the 96 observations.
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'),
add_labels = TRUE,
csv = TRUE)