Section 8 User Documentation
8.1 Purpose of the Outdoor Equity App
The Outdoor Equity App provides users with tools to explore trends at different overnight reservable sites to analyze access to these sites. The intended audience is federal public land managers and researchers as well as nonprofit organizations and recreation users.
The Recreation Information Database (RIDB) data are comprehensive when it comes to information regarding the site and reservation, but do not include information about the visitor outside of their home ZIP code. US Census American Community Survey (ACS) data is used to approximate socioeconomic demographics by joining information about the visitors’ home ZIP code to the RIDB data.
8.2 How to Use the Outdoor Equity App
8.2.1 About the App
The About tab of the Outdoor Equity App includes background information about what the App is, why outdoor recreation is important, the App creators, and what data is used in the App. These sections are similar to the About Section of this technical documentation. The About tab also includes example questions that a user might explore through the different parts of the Analysis tab.
The Analysis tab of the App consists of three parts Data Summary, Data Relationship and Visitorshed Maps. Each of these pages includes a brief explanatory section of how to interpret the plot or graph, a section to subset the data to the desired campsite, and the plot or map outputs.
The Metadata tab of the App includes metadata for all variables in the combined RIDB and ACS dataset. This section mirrors the Metadata section in the Products and Deliverables Section of this document.
The Data Download section of the App allows a user to download a subset of data include as many or as few campsites and variables as they require for further analysis or use.
8.3 How to Maintain the Outdoor Equity App
8.3.1 Repository Directory Structure
Our code is version controlled in a GitHub repository. This includes the data cleaning, wrangling, and analysis, creation of plots and maps, and structure of the shiny app. An overview of the directory structure can be found in the Appendix Section.
8.3.2 Data Preparation Methods
8.3.2.1 RIDB Data
RIDB data data are available through direct download as CSV files or via the API as JSON files via Recreation.gov. API access requires creating a Recreation.gov account and requesting second tier API access via the Recreation.gov website’s Contact Us page. CSV files are readily available for download via the Recreation.gov website. Data are collected each time a visitor makes a reservation through Recreation.gov. Data packages are posted annually in the spring by Recreation One Stop (R1S) and contain the previous fiscal year’s reservations (ex: the 2018 package includes 2018-10-01 through 2018-09-30). Data packages are available for download from 2006 to present. Each annual data package file contains a range between 2 million and 5 million observations (or reservations) and includes variables in character, numeric, and date/time formats about each reservation. A shift in the data collection and storage processes occurred in 2019, changing what variables are available and how they are labeled. Currently, the Outdoor Equity App contains only data for reservable sites in California and reservations for fiscal year 2018 due to time limitations of this project. To see more about expanding the App temporally and spatially see the How to Expand the Outdoor Equity App Section.
We created a reproducible workflow for cleaning and wrangling data, which is employed in the data_wrangle_and_clean.Rmd
document that sources custom functions to prepare RIDB data for joining with ACS data. All custom functions are listed in the Access, Clean, and Wrangle Data Section of this document. These functions rely heavily on functions from the tidyverse
collection of packages (Wickham 2021).
First, we use the function_ridb_subset-pre2018.R
to subset the RIDB data, filtering only reservations within the selected state that are listed as “Overnight” reservations within the use_type
variable. For the 2018 California dataset this results in a starting “raw” data frame with 521,682 reservations
# filter for state
filter(facility_state == state_abbrev) %>%
# filter for use type
filter(use_type == "Overnight")
We then select only the necessary variables, including information about the site (agency, park or forest name, site name, site type, and site location) and information about the reservation (home ZIP come, total paid, visit start and end dates, visit order date, and number of people in party).
# select variables
select(c("agency", "parent_location", "region_description",
"park", "site_type", "facility_id", "facility_state",
"facility_longitude", "facility_latitude",
"customer_zip", "total_paid", "start_date",
"end_date", "order_date", "number_of_people")) %>%
mutate(site_type = tolower(site_type)) %>%
filter(!site_type %in% c("historic tour", "hiking zone",
"group picnic area", "cave tour",
"management", "picnic",
"entry point", "trailhead"))
Next we normalize the customer ZIP code values. This includes filtering for only US ZIP codes and shortening all 9 digit ZIP codes to include only the first 5 digits.
# filter out invalid ZIP codes
filter(str_detect(string = customer_zip,
pattern = "^[:digit:]{5}(?!.)") |
str_detect(string = customer_zip,
pattern = "^[:digit:]{5}(?=-)")) %>%
filter(!customer_zip %in% c("00000", "99999")) %>%
mutate(customer_zip = str_extract(string = customer_zip,
pattern = "[:digit:]{5}"))
This function results in the removal of 24,866 reservations (or 4.77%) from the original “raw” dataset that included all reservations for reservable overnight campsites in California.
We created a second custom function function_ridb_variable_calculate-pre2018.R
to calculate and manipulate variables of interest. We use start, end, and order dates to calculate the lengths of stay and booking windows (number of days from order to start date) of each reservation. The booking window calculations return a number of results that are negative. This is a known issue that others working with the RIDB data have encountered. This resulted in 4,530 reservations (or 0.87%) without a valid booking window, which are removed for plots that visualize the booking window variable.
mutate(start_date = as.Date(start_date),
end_date = as.Date(end_date),
order_date = as.Date(order_date),
# calculate new variables
length_of_stay = as.numeric(difftime(end_date, start_date),
units = "days"),
booking_window = as.numeric(difftime(start_date, order_date),
units = "days"))
We calculate daily cost per reservation by dividing total costs by lengths of stay. We then calculated daily cost per visitor by dividing the daily cost by the number of visitors.
# calculate new variables
mutate(daily_cost = total_paid / length_of_stay,
daily_cost_per_visitor = daily_cost / number_of_people)
We aggregate the site_type
variable to create 7 broader site categories.
# aggregate site type
mutate(aggregated_site_type =
case_when(site_type %in% c("walk to", "hike to",
"group hike to", "group walk to"
~ "remote",
) %in% c("cabin nonelectric", "cabin electric",
site_type "yurt","shelter nonelectric"
~ "shelter",
) %in% c("boat in", "anchorage") ~ "water",
site_type %in% c("group equestrian",
site_type "equestrian nonelectric"
~ "equestrian",
) %in% c("rv nonelectric", "rv electric",
site_type "group rv area nonelectric"
~ "rv only",
) %in% c("group standard nonelectric",
site_type "standard nonelectric",
"standard electric",
"group standard area nonelectric",
"group standard electric"
~ "rv or tent",
) %in% c("tent only nonelectric",
site_type "group tent only area nonelectric",
"tent only electric"
~ "tent only")) )
We create an administrative unit variable by combining the parent_location
and region_description
variables as different federal agencies track the administrative unit information within different variables. Then we update the agency
, admin_uni
and park
variable character strings using multiple functions from the stringr
package (Wickham 2019). These updates expand acronyms, remove unnecessary characters (such as “–” or location codes), and normalize any discrepencies in characater strings.
mutate(admin_unit =
case_when(agency == "USFS" ~ parent_location,
%in% c(
agency "NPS", "BOR", "USACE"
~ region_description)) %>%
) # edit values
mutate(
# agency abbreviations to names
agency = str_replace(string = agency,
pattern = "NPS",
replacement = "National Park Service"),
agency = str_replace(string = agency,
pattern = "USFS",
replacement = "US Forest Service"),
agency = str_replace(string = agency,
pattern = "USACE",
replacement = "US Army Corps of Engineers"),
agency = str_replace(string = agency,
pattern = "BOR",
replacement = "Bureau of Reclamation"),
# update admin_unit values (generic)
admin_unit = str_replace(string = admin_unit,
pattern = paste(c("NF - FS", "NF -FS",
"NF- FS", "NF-FS",
"-FS", " - FS"),
collapse = "|"),
replacement = "National Forest"),
admin_unit = str_to_title(admin_unit),
admin_unit = str_replace(string = admin_unit,
pattern = "And",
replacement = "&"),
# update park values (generic)
park = str_remove(string = park,
pattern = paste(c("\\(.*", " \\(.*",
"---.*", " ---.*",
",.*"),
collapse = "|")),
park = str_to_title(park),
park = str_replace(string = park,
pattern = "Cg",
replacement = "Campground"),
park = str_replace(string = park,
pattern = "Nhp",
replacement = "National Historic Park"),
park = str_replace(string = park,
pattern = "@",
replacement = "At"),
park = str_replace(string = park,
pattern = "&",
replacement = "And"),
park = str_replace(string = park,
pattern = paste(c("/", " / "), collapse = "|"),
replacement = " "),
park = str_remove_all(string = park,
pattern = " \\d.*"),
# update park values (CA specific)
park = str_remove(string = park,
pattern = paste(c(" - Angeles Nf", " -Hwy"),
collapse = "|")),
park = str_replace(string = park,
pattern = "Tunnel Mills Il",
replacement = "Tunnel Mills"))
We calculate distance traveled by measuring the distance from the latitude and longitude coordinate facility locations to the centroid of the home ZIP code. Here we utilize both the tidycensus
package (Walker and Herman 2021) and the sf
package (Pebesma 2022).
# bootstrap geometries and reproject to NAD 83
<- df %>%
df_geometries st_as_sf(coords = c("facility_longitude", "facility_latitude"),
crs = 4326) %>%
st_transform(crs = 4269) # using NAD83 because measured in meters
# get centroid of geometries for all US ZIP codes
<-
df_zip_centroids_us get_acs(geography = "zcta", year = 2018,
geometry = TRUE,
summary_var = "B01001_001",
survey = "acs5",
variables = c(male = "B01001_002")) %>%
select(NAME, geometry) %>%
mutate(zip_code = str_sub(NAME, start = -5, end = -1)) %>%
select(zip_code, geometry) %>%
st_centroid()
# join data and calculate `distance_traveled` variable
<-
df_joined_geometries left_join(x = df_geometries %>% as.data.frame(),
y = df_zip_centroids_us %>% as.data.frame(),
by = c("customer_zip" = "zip_code")) %>%
st_sf(sf_column_name = 'geometry.x') %>%
mutate(distance_traveled_m = st_distance(x = geometry.x,
y = geometry.y,
by_element = TRUE),
distance_traveled_m = as.numeric(distance_traveled_m))
And finally, we add a variable indicating in which state or territory each customer zip code is located. This portion of the code utilizes the zipcodeR
package (Rozzi 2021).
# create df of fips and full state names
<- c(
fips_list "01", "02", "04", "05", "06", "08", "09", "10", "11", "12",
"13", "15", "16", "17", "18", "19", "20", "21", "22", "23",
"24", "25", "26", "27", "28", "29", "30", "31", "32", "33",
"34", "35", "36", "37", "38", "39", "40", "41", "42", "44",
"45", "46", "47", "48", "49", "50", "51", "53", "54", "55",
"56", "72")
<- c(
state_list "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL",
"GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME",
"MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH",
"NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI",
"SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI",
"WY", "PR")
<- c(
states_names_list "Alabama", "Alaska", "Arizona", "Arkansas", "California",
"Colorado", "Connecticut", "Delaware",
"District of Columbia", "Florida","Georgia", "Hawaii",
"Idaho", "Illinois", "Indiana", "Iowa", "Kansas",
"Kentucky", "Louisiana", "Maine","Maryland",
"Massachusetts", "Michigan", "Minnesota", "Mississippi",
"Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire",
"New Jersey", "New Mexico", "New York", "North Carolina",
"North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania",
"Rhode Island","South Carolina", "South Dakota", "Tennessee",
"Texas", "Utah", "Vermont", "Virginia", "Washington",
"West Virginia", "Wisconsin","Wyoming", "Puerto Rico")
<- as.data.frame(list(fips = fips_list,
df_states_fips state = state_list,
state_full = states_names_list))
# loop through state df to get all ZIP codes w/in state
<- data.frame()
df_states_zip_codes
for (i in seq_along(fips_list)){
<- zipcodeR::search_fips(state_fips = fips_list[[i]]) %>%
state select(zipcode, state)
<- rbind(df_states_zip_codes, state)
df_states_zip_codes
}
# add full state name and fips code to list of all ZIP codes for each state
<-
df_states_fips_zip_codes left_join(x = df_states_zip_codes,
y = df_states_fips,
by = "state") %>%
select(-fips) %>%
rename(customer_zip_state = state,
customer_zip_state_full = state_full,
zip_code = zipcode)
8.3.2.2 U.S. Census Data
US Census data is publicly accessible in many ways. Our product utilizes the R package tidycensus
(Walker and Herman 2021) to access the necessary variables from the 2018 American Community Survey (ACS) via API. API access requires an api key, which can then be saved into your RStudio environment. Learn more about API keys and working with tidycensus
here.
# API set up
# Only have to run this the first time using on a new machine
<- source("private/census-api.R")
census_api census_api_key(key = census_api[[1]], install = TRUE, overwrite = TRUE)
# run in console:
readRenviron("~/.Renviron")
Sample data are collected for the ACS each year and includes many variables that cover social, economic, housing, and demographic characteristics. All variable options can be easily viewed using the code below:
# look at variable options
View(load_variables(2018, "acs5", cache = TRUE))
The Outdoor Equity App utilizes the ACS 5-year data, which is an estimate representing data collected over the designated 5 year period. We used the 5-year ACS data over the 1-year ACS data because it increases “statistical reliability of the data for less populated areas and small population subgroups” (Bureau 2022).
The variables included in the App include median-income, race, language(s) spoken at home, and highest level of education attained. All variables are represented as estimates in numeric format for a census tract. Data are called by geographic region, in our case the ZIP code tabulation area, and include an estimated number of people that fall into each category within each ZIP code, a margin of error, and an estimated number of total people in the area. Within our custom functions function_acs_education.R
, function_acs_language.R
, function_acs_median_income.R
, and function_acs_race.R
we first imported just the necessary columns for each ACS variable. See Table 8.1 for an example of the race variable.
# import variables for race
<-
race_df get_acs(
geography = "zcta", year = 2018,
state = "CA",
summary_var = "B03002_001", #Estimate!!Total:
variables = c(
white = "B03002_003", #White alone
black = "B03002_004", #Black or African American alone
native_american = "B03002_005", #American Indian and Alaska Native alone
asian = "B03002_006", #Asian alone
pacific_islander = "B03002_007", #Native Hawaiian/Other Pacific Islander alone
other = "B03002_008", #Some other race alone
multiracial = "B03002_009", #Two or more races
hispanic_latinx = "B03002_012" #Hispanic or Latino
))
GEOID | NAME | variable | estimate | moe | summary_est | summary_moe |
---|---|---|---|---|---|---|
90001 | ZCTA5 90001 | white | 413 | 165 | 58975 | 1725 |
90001 | ZCTA5 90001 | black | 5138 | 646 | 58975 | 1725 |
90001 | ZCTA5 90001 | native_american | 37 | 42 | 58975 | 1725 |
90001 | ZCTA5 90001 | asian | 67 | 38 | 58975 | 1725 |
90001 | ZCTA5 90001 | pacific_islander | 0 | 29 | 58975 | 1725 |
90001 | ZCTA5 90001 | other | 168 | 169 | 58975 | 1725 |
90001 | ZCTA5 90001 | multiracial | 66 | 44 | 58975 | 1725 |
90001 | ZCTA5 90001 | hispanic_latinx | 53086 | 1591 | 58975 | 1725 |
90002 | ZCTA5 90002 | white | 223 | 81 | 53111 | 2031 |
90002 | ZCTA5 90002 | black | 10110 | 876 | 53111 | 2031 |
We calculated percentages for the categories within each ACS variable for race, language(s) spoken in the home, and highest level of education by dividing the estimate for a category by the estimated total population for that ZIP code.
<-
race_df_percent %>%
race_df group_by(zip_code, race) %>%
summarise(estimate = sum(estimate),
moe = sum(moe),
summary_est = unique(summary_est),
summary_moe = unique(summary_moe),
percent = estimate / summary_est)
For median-income, we use the estimated median-income as is, without further adjustments.
<-
median_income_df get_acs(geography = "zcta", year = 2018,
state = "CA",
variables = c(
median_income = "B19013_001"
%>%
)) clean_names() %>%
rename(median_income = estimate) %>%
mutate(zip_code = str_sub(name, start = -5, end = -1)) %>%
select(median_income, zip_code)
We transform teh ACS dataframe by pivoting wider, which moves the different categories and percentages (ex: racial groups) to their own columns to create a single row for each ZIP code. This is necessary for joining the ACS data sets to the RIDS data set.
select(zip_code, summary_est, race, percent) %>%
rename(zip_code_population = summary_est) %>%
# create column for each percentage for each group (pivot_wider)
# necessary to be able to left_join() with RIDB data
pivot_wider(names_from = "race",
values_from = "percent")
Finally, we use the function_join_ridb_acs.R
custom functions to join the RIDB and ACS data to create the dataset that is available for download on the App and utilized for all plots and maps.
<-
joined_df left_join(x = ridb_df,
y = acs_df_race,
by = c("customer_zip" = "zip_code")) %>%
left_join(y = acs_df_education,
by = c("customer_zip" = "zip_code")) %>%
left_join(y = acs_df_median_income,
by = c("customer_zip" = "zip_code")) %>%
left_join(y = acs_df_language,
by = c("customer_zip" = "zip_code"))
8.3.3 Statistical Analysis and Data Wrangling for Plots
Each type of plot and map require unique data wrangling which is explained in this section.
8.3.3.1 Data Summary
We created custom functions for data wrangling and plotting for use within the Outdoor Equity App code. Wrangling begins with filtering based on the user’s choice campsite input. We then further subset the dataset to include only the necessary columns for plotting.
# example for booking window data summary
<- reactive({
booking_window_rdf validate(
need(siteInput != "",
"Please select a reservable site to visualize.")
# EO validate
)
%>%
ridb_df filter(park %in% siteInput,
> 0,
booking_window != "Inf") %>%
booking_window select(park, booking_window) %>%
filter(!is.na(booking_window))
})
We create histograms, bar charts, or density plots to show the distribution of the data for each variable. For ACS variables, data from reservations are plotted against data for the full California population. This allows a user to compare the distribution within the reservation data to that of the California census in order to see where reservations either under- or over-represent that variable as compared to California residents. We used the plotly
package (Sievert et al. 2021) to allow a user to hover over the plot to view helper text. We also included additional helper text outside of the plots for more complicated plots.
## -- example for booking window data summary -- ##
# parameters
<- c("#64863C", "#466C04")
hist_colors <- c("#FACE00")
quant_80_color <- c("#ac8d00")
caption_color
# plot for shiny app
<- ggplot(
plotly data = booking_window_rdf()) +
geom_histogram(
aes(x = booking_window,
text = paste0(percent(..count.. / nrow(booking_window_rdf()),
accuracy = 0.1),
" of all visits are reserved between ", xmin,
" and ", xmax,
" days before the start of the visit",
"<br>",
"(Total reservations to site: ",
comma(nrow(booking_window_rdf()), accuracy = 1),
")")),
binwidth = 7,
center = 7 / 2,
fill = hist_colors[[1]],
col = hist_colors[[2]], size = 0.05) +
labs(x = "Days in advance before visit (each bar = 1 week)",
y = "") +
scale_x_continuous(limits = c(0, x_max)) +
geom_vline(xintercept = quant_80,
linetype = "dashed", alpha = 0.5, color = quant_80_color) +
theme_minimal() +
theme(plot.background = element_rect("white"),
panel.grid.major.y = element_blank())
# add 6 month / 1 year annotation if data allows
if (x_max <= 180) {
# don't include 6 month or 1 year annotation
plotly
else if (x_max > 180 & x_max <= 360){
} # include 6 month annotation
<- plotly +
plotly geom_vline(xintercept = 180,
linetype = "dashed", size = .3, alpha = .5) +
annotate("text", label = "6 months", size = 3,
x = 180, y = 5)
else if (x_max >= 360) {
}
# include 6 month and 1 year annotation
<- plotly +
plotly geom_vline(xintercept = 180,
linetype = "dashed", size = .3, alpha = .5) +
annotate("text", label = "6 months", size = 3,
x = 180, y = 5) +
geom_vline(xintercept = 360,
linetype = "dashed", size = .3, alpha = .5) +
annotate("text", label = "1 year", size = 3,
x = 360, y = 5)
# EO else if for plotly
}
ggplotly(plotly,
tooltip = list("text"),
dynamicTicks = TRUE) %>%
layout(title =
list(text =
paste0('<b>', siteInput, '<br>', admin_unitInput, '</b>',
'<br>',
'Number of days from reservation to start of visit'),
font = list(size = 15)),
xaxis = list(separatethousands = TRUE),
yaxis = list(separatethousands = TRUE),
margin = list(b = 130, t = 100),
annotations =
list(x = x_max/2, y = -0.6,
text =
paste0("80% of reservations reserve their visit less than ",
'<b>', quant_80, '</b>',
" days before the start date",
"<br>",
"(shown on plot with yellow dotted line)."),
showarrow = F,
xre = 'paper', yref = 'paper',
xanchor = 'middle', yanchor = 'auto',
xshift = 0, yshift = 0,
font = list(size = 12, color = caption_color))) %>%
config(modeBarButtonsToRemove = list("pan", "select", "lasso2d", "autoScale2d",
"hoverClosestCartesian",
"hoverCompareCartesian"))
8.3.3.2 Data Relationships
Visualizing the relationship between RIDB and ACS variables is challenging because the socioeconomic data available are an estimate of the ZIP code population, rather than data of the individual making the reservation. In order to create simple, easy to interpret plots, we determined whether reservations were from a location with “high” percentages of a given ACS variable category (ex: one of the eight racial groups). We determined “high” as anything above the weighted 3rd quartile (weighted based on the California census). This method allows for reservations to be included in the “high” category for multiple values within one ACS variable (ex: a reservation might be from a ZIP code that falls into the “high” category for both Black and Asian folks). By allowing a reservation to appear in multiple categories within a single ACS variable, the uncertainty of the reserver’s actual socioeconomic status is retained.
Example code for calculating the “high” threshold for language can be seen below:
# `function_acs_top_quartile_language.R` code
<- acs_df %>%
threshold_df # filter variables of interest
select(zip_code, english_only, not_english_only, mean_zip_code_population) %>%
pivot_longer(cols = 2:3,
names_to = "language",
values_to = "language_percentage") %>%
# filter category of interest
filter(language == language_group) %>%
drop_na(language_percentage)
# weighted median value (weighted based on ZIP code populations)
<- weighted.mean(x = threshold_df$language_percentage,
weighted_half w = threshold_df$mean_zip_code_population)
# drop rows below weighted median
<- threshold_df %>% filter(language_percentage >= weighted_half)
df_half
# weighted 3rd quartile
## weighted median value of top half (weighted based on ZIP code populations)
<- weighted.mean(x = df_half$language_percentage,
weighted_quartile w = df_half$mean_zip_code_population)
source("r/function_acs_top_quartile_language.R")
# language groups
<- c("english_only", "not_english_only")
language_group
# calculate value of 3rd quartile for each language group
<-
language_quants_df %>%
language_group map_dbl(language_top_quartile, acs_df = data_acs_ca_all) %>%
cbind("language_group" = language_group,
"weighted_quartile" = .) %>%
as.data.frame() %>%
mutate(language_group = str_replace_all(string = language_group,
pattern = "_",
replacement = " "),
language_group = str_to_title(language_group),
weighted_quartile = percent(as.numeric(weighted_quartile),
accuracy = 0.01)) %>%
rename("Language Group" = language_group,
"Threshold" = weighted_quartile)
The thresholds used within the Outdoor Equity App can be seen in Table 8.2 and Table 8.3.
|
|
Race Group | Threshold |
---|---|
Other | 0.52% |
Pacific Islander | 0.87% |
Multiracial | 4.38% |
Asian | 30.29% |
Black | 13.16% |
White | 59.08% |
Native American | 0.93% |
Hispanic Latinx | 62.07% |
We create lollipop and bar plots to show the relationships of the data for each set of variables. We used the plotly
package (Sievert et al. 2021) to allow a user to hover over the plot to view helper text. An example of the code used to create the education compared to booking window plot is show below:
## -- data wrangle -- ##s
# create reactive dataframe and further subset
<- reactive ({
rdf
validate(
need(siteInput != "",
"Please select a reservable site to visualize.")
# EO validate
)
%>%
education_top_quartile_df # filter to user site of choice
filter(park == siteInput) %>%
# select to variables of interest
select(park, customer_zip,
education, education_percentage, education_y_lab, %>%
booking_window) drop_na(booking_window, education_percentage) %>%
filter(booking_window >= 0) %>%
# summarize to inner quartile range, median, and total reservations
group_by(education, education_y_lab) %>%
summarize(median_booking_window = median(booking_window),
quartile_lower = quantile(booking_window)[[2]],
quartile_upper = quantile(booking_window)[[4]],
count = n())
#EO reactive df
})
validate(need(
nrow(rdf()) > 0,
paste0("There are no reservations to ", siteInput, ", ", admin_unitInput,
" that come from communities in the high range for any educational",
"categories.")
# EO validate
))
## -- create plot -- ##
# parameters
<-
education_group_colors c(
"HS, GED,\nor Below" = "#a6cee3",
"Some College or\nTrade School" = "#1f78b4",
"Associates or\nBachelors Degree" = "#b2df8a",
"Masters Degree\nor Above" = "#33a02c"
)
# create plot
<- ggplot(data = rdf(),
plotly aes(x = median_booking_window,
y = education_y_lab)) +
geom_segment(aes(xend = 0, yend = education_y_lab)) +
geom_point(
aes(
color = education_y_lab,
fill = education_y_lab,
text = paste0(
comma(count, accuracy = 1),
" unique visits were made by people who live in ZIP codes with high ",
" rates of",
"<br>", education,
" as the maximum level of education. ",
"Typically these visitors reserved their visit between",
"<br>", comma(quartile_lower, accuracy = 1), " and ",
comma(quartile_upper, accuracy = 1),
" days before the start of their trip, with a median booking window of ",
comma(median_booking_window, accuracy = 1), " days."
)
),size = 3.5, shape = 21, stroke = 2 ) +
scale_y_discrete(expand = c(0.45, 0)) +
scale_fill_manual(values = education_group_colors) +
scale_color_manual(values = education_group_colors) +
labs(x = paste("Estimated Number of Days in Advance Site is Reserved"),
y = "") +
theme_minimal() +
theme(
plot.background = element_rect("white"),
panel.grid.major.y = element_blank(),
legend.position = "none")
# create plotly
ggplotly(plotly,
tooltip = list("text")) %>%
config(
modeBarButtonsToRemove =
list("zoom", "pan", "select", "lasso2d", "autoScale2d",
"hoverClosestCartesian", "hoverCompareCartesian")) %>%
layout(title = list(
text = paste0( '<b>', siteInput, '<br>', admin_unitInput, '</b>', '<br>',
'Number of Days in Advance Site is Reserved by Visitors with",
"Different Levels of Education'),
font = list(size = 15) )) %>%
add_annotations(
text = "Reservations from ZIP codes<br>with high proportions of:",
x = -0.15, y = 0.9,
font = list(size = 11),
xref = 'paper', yref = 'paper',
showarrow = FALSE)
8.3.3.3 Spatial analysis
We created a state-level visitorshed map to show the number of visitors to overnight reservable sites from each state. We use the package zipcodeR
(Rozzi 2021) to determine the state in which each ZIP code, and thus visitor, is located, using the RIDB data. We then create a dataframe with the necessary geometries using the tigris
(Walker 2022) package. And finally we simplify the geometries to lower the load time within the App using the rmapshaper
(Teucher and Russell 2021) package.
# ZIP codes and respective states
<-
fips_list c("01", "02", "04", "05", "06", "08", "09", "10", "11", "12",
"13", "15", "16", "17", "18", "19", "20", "21", "22", "23",
"24", "25", "26", "27", "28", "29", "30", "31", "32", "33",
"34", "35", "36", "37", "38", "39", "40", "41", "42", "44",
"45", "46", "47", "48", "49", "50", "51", "53", "54", "55",
"56", "72")
<-
state_list c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL",
"GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME",
"MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH",
"NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI",
"SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI",
"WY", "PR")
<-
states_names_list c("Alabama", "Alaska", "Arizona", "Arkansas", "California",
"Colorado", "Connecticut", "Delaware", "District of Columbia",
"Florida","Georgia", "Hawaii", "Idaho", "Illinois", "Indiana",
"Iowa", "Kansas", "Kentucky", "Louisiana", "Maine","Maryland",
"Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri",
"Montana", "Nebraska", "Nevada", "New Hampshire","New Jersey",
"New Mexico", "New York", "North Carolina", "North Dakota", "Ohio",
"Oklahoma", "Oregon", "Pennsylvania", "Rhode Island","South Carolina",
"South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia",
"Washington", "West Virginia", "Wisconsin","Wyoming", "Puerto Rico")
# create dataframe of states
<- as.data.frame(list(fips = fips_list,
df_states_fips state = state_list,
state_full = states_names_list))
# loop through state df to get all ZIP codes w/in state
<- data.frame()
df_states_zip_codes
for (i in seq_along(fips_list)){
<- zipcodeR::search_fips(state_fips = fips_list[[i]]) %>%
state select(zipcode, state)
<- rbind(df_states_zip_codes, state)
df_states_zip_codes
}
# add full state name, fips code, etc. to list of all ZIP codes for each state
<-
df_states_fips_zip_codes left_join(x = df_states_zip_codes,
y = df_states_fips,
by = "state") %>%
select(fips, zipcode) %>%
rename(zip_code = zipcode)
# get state geometries for full US
<- tigris::states(year = 2018) %>%
df_state_geometries_us select(GEOID, STUSPS, NAME, geometry) %>%
rename(fips = GEOID,
state_abbrev = STUSPS,
state = NAME) %>%
::ms_simplify(keep = 0.005, keep_shapes = TRUE) rmapshaper
We then calculate the number of reservations per state for the site (as chosen by the app user) and use tmaps
package (Tennekes 2022) to create an interactive map that colors states based on the total number of reservations for that state.
## -- data wrangle -- ##
# reactive data frame for siteInput
<- reactive ({
rdf
validate(
need(siteInput != "",
"Please select a reservable site to visualize.")
# EO validate
)
%>%
ridb_df filter(park %in% siteInput) %>%
select(agency, admin_unit, park, customer_zip_state_full,
customer_zip_state)
})
# value of total reservations for this park
<- nrow(rdf())
total_reservations
# number of reservations and % per state
<- rdf() %>%
map_data group_by(customer_zip_state_full, customer_zip_state) %>%
summarize(number_reservations = n(),
percentage_reservations =
percent((number_reservations / total_reservations),
accuracy = 0.01)) %>%
filter(!is.na(customer_zip_state_full))
# add state geometries
<-
map_data_geometries %>%
state_geometries_df left_join(y = map_data,
by = c("state_abbrev" = "customer_zip_state")) %>%
mutate_at(vars(number_reservations),
~replace(number_reservations, is.na(number_reservations), 0)) %>%
mutate_at(vars(percentage_reservations),
~replace(percentage_reservations, is.na(percentage_reservations),
0)) %>%
select(customer_zip_state_full, number_reservations,
percentage_reservations, geometry)
## -- create map -- ##
tmap_mode("view")
tm_shape(map_data_geometries) +
tm_borders(col = "grey", alpha = 0.5) +
tm_fill(col = "number_reservations",
title = "Number of Visits",
palette = "YlGn",
n = 10,
style = "jenks",
id = "customer_zip_state_full",
popup.vars = c("Total Visits" = "number_reservations",
"Percentage of All Visits" = "percentage_reservations")
+
) tm_view(set.view = c(-101.834335, 40.022356, 2)) +
tmap_options(basemaps = 'https://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}')
A ZIP-code level visitorshed map shows the number of visitors to sites for each ZIP code within California only due to computational load of calculating for all ZIP codes in the United States. We then create a dataframe with the necessary geometries using the tigris
(Walker 2022) package. And finally we simplify the geometries to lower the load time within the App using the rmapshaper
(Teucher and Russell 2021) package.
## -- ZIP codes and respective states -- ##
<-
fips_list c("01", "02", "04", "05", "06", "08", "09", "10", "11", "12",
"13", "15", "16", "17", "18", "19", "20", "21", "22", "23",
"24", "25", "26", "27", "28", "29", "30", "31", "32", "33",
"34", "35", "36", "37", "38", "39", "40", "41", "42", "44",
"45", "46", "47", "48", "49", "50", "51", "53", "54", "55",
"56", "72")
<-
state_list c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL",
"GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME",
"MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH",
"NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI",
"SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI",
"WY", "PR")
<-
states_names_list c("Alabama", "Alaska", "Arizona", "Arkansas", "California",
"Colorado", "Connecticut", "Delaware", "District of Columbia",
"Florida","Georgia", "Hawaii", "Idaho", "Illinois", "Indiana",
"Iowa", "Kansas", "Kentucky", "Louisiana", "Maine","Maryland",
"Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri",
"Montana", "Nebraska", "Nevada", "New Hampshire","New Jersey",
"New Mexico", "New York", "North Carolina", "North Dakota", "Ohio",
"Oklahoma", "Oregon", "Pennsylvania", "Rhode Island","South Carolina",
"South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia",
"Washington", "West Virginia", "Wisconsin","Wyoming", "Puerto Rico")
# create dataframe of states
<- as.data.frame(list(fips = fips_list,
df_states_fips state = state_list,
state_full = states_names_list))
# loop through state df to get all ZIP codes w/in state
<- data.frame()
df_states_zip_codes
for (i in seq_along(fips_list)){
<- zipcodeR::search_fips(state_fips = fips_list[[i]]) %>%
state select(zipcode, state)
<- rbind(df_states_zip_codes, state)
df_states_zip_codes
}# add full state name, fips code, etc. to list of all ZIP codes for each state
<-
df_states_fips_zip_codes left_join(x = df_states_zip_codes,
y = df_states_fips,
by = "state") %>%
rename(state_abbrev = state,
zip_code = zipcode) %>%
relocate(fips, .before = 2)
## -- CA ZIP geometries -- ##
<- get_acs(geography = "zcta", year = 2018,
df_zip_geometries_ca geometry = TRUE,
state = "California",
summary_var = "B01001_001",
variables = c(male = "B01001_002")) %>%
select(NAME, geometry) %>%
mutate(zip_code = str_sub(NAME, start = -5, end = -1)) %>%
select(zip_code, geometry) %>%
::ms_simplify(keep = 0.005, keep_shapes = TRUE) %>%
rmapshaperleft_join(y = df_states_fips_zip_codes,
by = "zip_code") %>%
relocate(zip_code, .before = 1)
We calculate the number of reservations per ZIP code for the site (as chosen by the app user) and use tmaps
to create an interactive map that colors ZIP codes based on the total number of reservations for that ZIP code.
## -- data wrangle -- ##
# reactive data frame for siteInput
<- reactive ({
rdf
validate(
need(siteInput != "",
"Please select a reservable site to visualize.")
# EO validate
)
%>%
ridb_df select(agency, admin_unit, park, customer_zip, facility_latitude,
%>%
facility_longitude) filter(park %in% siteInput)
})
# number of reservations per ZIP code
<- rdf() %>%
map_data group_by(customer_zip) %>%
summarize(number_reservations = n())
# join with ZIP geometries
<-
map_data_geometries %>%
zip_geometries_df left_join(map_data,
by = c("zip_code" = "customer_zip")) %>%
mutate_at(vars(number_reservations),
~replace(number_reservations, is.na(number_reservations), 0)) %>%
select(zip_code, number_reservations, geometry)
# value of total CA reservations for this park
<- sum(map_data_geometries$number_reservations)
total_reservations
# add percentage of all CA reservations for each ZIP
<- map_data_geometries %>%
map_data_geometries mutate(percentage_reservations =
percent((number_reservations / total_reservations),
accuracy = 0.01)) %>%
mutate_at(vars(percentage_reservations),
~replace(percentage_reservations, is.na(percentage_reservations),
0))
# calculate location of park for point on map
<- rdf() %>%
park_location_geom group_by(park) %>%
summarise(facility_latitude = median(facility_latitude),
facility_longitude = median(facility_longitude)) %>%
st_as_sf(coords = c("facility_latitude", "facility_longitude"),
crs = 4326) %>%
st_transform(crs = 4269) # using NAD83 because measured in meters
print(paste("The site's location has been calculated and it is:",
$geometry))
park_location_geom
## -- create map -- ##
tmap_mode("view")
tm_shape(map_data_geometries) +
tm_fill(
col = "number_reservations",
title = "Number of Visits",
palette = "PuRd",
style = "jenks",
n = 10,
popup.vars = c("Total Visits" = "number_reservations",
"Percentage of All CA Visits" = "percentage_reservations")) +
tm_shape(park_location_geom) +
tm_dots(col = "darkgreen", size = 0.1) +
tm_shape(park_location_geom) +
tm_symbols(shape = map_site_icon,
id = "park") +
tm_shape(ca_cities_df) +
tm_text(col = "black",
text = "city") +
tm_view(set.view = c(-119.559917, 37.061753, 6)) +
tmap_options(basemaps = 'https://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}')
8.3.4 Data updates
The data_wrangle_and_clean.Rmd
document is set up to create the joined dataset necessary for the Outdoor Equity App. To access the ACS data we use the tidycensus
package (Walker and Herman 2021) to access the necessary data via API, meaning data updates are not necessary outside of the .Rmd
document. The RIDB data is accesses via direct download from the Recreation.gov website. See the How to Expand the Outdoor Equity App Section on expanding the time periods included in the App.
8.3.5 Server Hosting
The Outdoor Equity App is hosted on server owned by the Bren School at the University of California, Santa Barbara until the end of 2022. At that time the App will be moved to the shiny.io server for the duration of 2023.
8.4 How to Expand the Outdoor Equity App
The Outdoor Equity App currently only includes reservation data from California for 2018 due to server hosting capabilities. Expanding the App to include additional years and states will require a larger server and likely more computing power.
8.4.1 Temporal Expansions
To expand the years of data included in the Outdoor Equity App, users must first update the data_wrangle_and_clean.Rmd
where all data cleaning and wrangling occurs outside of the shiny app code.
For RIDB data from 2018 and early, users can simply include new code chunks in the .Rmd
document that use the following functions to create a joined dataset for each new year: function_ridb_subset-pre2018.R
, function_ridb_variable_calculate-pre2018.R
, function_acs_race.R
, function_acs_median_income.R
, function_acs_education.R
, function_acs_language.R
, and function_join_ridb_acs.R
.
For RIDB data from 2019 and later, users must update the function_ridb_subset-post2019.R
function. We did not continue to update this function once we decided to include only 2018 data in the App. We recommend updating this function to first select only the necessary variables, then rename the variables so that they match the 2018 and earlier naming conventions, and finally copy and paste the remaining code used in the function_ridb_subset-pre2018.R
function. Users can then add new code chunks to the .Rmd
document that use the following functions to create a joined dataset for each new year: function_ridb_subset-post2019.R
, function_ridb_variable_calculate-pre2018.R
, function_acs_race.R
, function_acs_median_income.R
, function_acs_education.R
, function_acs_language.R
, and function_join_ridb_acs.R
.
RIDB data from 2019 and later may have additional variables available, so we also recommend exploring all available variables to decide if adding in additional variables is appropriate. If users decide to add additional variables, edits will have to be made to all custom functions within the data_preparation
and outdoor-equity-app
directories.
Data standards for the RIDB data is in the process of being updated as of June 2022. Users should check to see if this impacts any decisions in adding additional years to the Outdoor Equity App.
For the Outdoor Equity App consider these updates to these files:
global.R
:- Add any new functions, packages, or data
ui.R
:- Create input widgets in Analysis tabs (data summary, data relationships, and visitorsheds) that allow for a user to select a specific year or mulitple years (consider making a function for this if the input is the same across all tabs)
server.R
:- Update plot or map functions that use a reactive data frame that includes the temporal input from
ui.R
- Update plot or map functions that use a reactive data frame that includes the temporal input from
Additionally, for the entire App consider how load time will be affected by adding more years of data. It might be helpful to consider creating separated datasets or creating functions that create datasets that contain only the necessary data needed to create the visual(s).
8.4.2 Spatial Expansions
To expand the states included in the Outdoor Equity App, users must update the data_wrangle_and_clean.Rmd
where all data cleaning and wrangling occurs outside of the shiny app code.
The function_ridb_variable_calculate-pre2018.R
function has a section that manipulates the character strings of the agency
, admin_unit
, and park
variables. This code would likely need to be changed for additional states as the specific irregularities in character strings may differ state to state. We recommend creating a new function for each additional state. See the RIDB Data Section for a more detailed description of this function code. Once a new custom function is created for each state, code chunks can be added to the .Rmd
document to clean and wrangle data for additional states.
For the Outdoor Equity App consider these updates to these files:
global.R
:- Add any new functions, packages or data
- Update or create new objects that contain information about sites and agencies
- Update agency to administrative unit dictionary, and administrative unit to park dictionary so that these key value pairs appear for states beyond California
ui.R
:- Create an input widget that allows for someone to select a year or multiple years
- Build out an informational box that contains basic information about the site chosen
server. R
:- Update
state_visitorshed_map()
function if the time input or other data needs to be added to this function - Update
ca_zip_code_visitorshed_map()
function to allow for a specific state to be called and the data associated with it
- Update
Additionally, for the entire App consider load time especially because spatial data is much larger and therefore takes much longer to load. It might be helpful to consider the structure of the data and which data gets loaded into the App to make the visuals, so that only the necessary data is being used in the back end of the App. See Technical Challenges Section for more insight and suggestions.
8.4.3 Statistical Analysis
Currently the Outdoor Equity App only visualizes patterns in the data, but does not allow a user to explore why these patterns are occurring. We recommend considering the following additions to expand the statistical analysis abilities of the App:
- Simple statistical analysis (such as linear regressions, ANOVA, etc.) to accompany all plots
- Ability to combine multiple campsites in a single plot or map to explore patterns and trends across geographic areas and years
For the Outdoor Equity App consider these updates to these files:
global.R
:- Add any new functions, packages, or data
ui.R
:- Create and add a new tab to the
navbarPage()
layout using the functiontabPanel()
for the statistical analysis - Build out layout in the new tab. To match the layout of the other tabs use
fluidRow()
andbox()
withintabPanel()
- Create necessary inputs to create visuals or run statistical tests
- Create and add a new tab to the
server.R
:- Create necessary objects to render the visuals desired (i.e. reactive data frames, render plots or tables)
- Create necessary conditional outputs based on an input using
observeEvents()
Because statistical analysis was beyond the scope of our project, please note that these updates are not an exhaustive list of what needs to be done to successfully add statistical analysis to the App.