Section 8 User Documentation

8.1 Purpose of the Outdoor Equity App

The Outdoor Equity App provides users with tools to explore trends at different overnight reservable sites to analyze access to these sites. The intended audience is federal public land managers and researchers as well as nonprofit organizations and recreation users.

The Recreation Information Database (RIDB) data are comprehensive when it comes to information regarding the site and reservation, but do not include information about the visitor outside of their home ZIP code. US Census American Community Survey (ACS) data is used to approximate socioeconomic demographics by joining information about the visitors’ home ZIP code to the RIDB data.

8.2 How to Use the Outdoor Equity App

8.2.1 About the App

The About tab of the Outdoor Equity App includes background information about what the App is, why outdoor recreation is important, the App creators, and what data is used in the App. These sections are similar to the About Section of this technical documentation. The About tab also includes example questions that a user might explore through the different parts of the Analysis tab.

The Analysis tab of the App consists of three parts Data Summary, Data Relationship and Visitorshed Maps. Each of these pages includes a brief explanatory section of how to interpret the plot or graph, a section to subset the data to the desired campsite, and the plot or map outputs.

The Metadata tab of the App includes metadata for all variables in the combined RIDB and ACS dataset. This section mirrors the Metadata section in the Products and Deliverables Section of this document.

The Data Download section of the App allows a user to download a subset of data include as many or as few campsites and variables as they require for further analysis or use.

8.3 How to Maintain the Outdoor Equity App

8.3.1 Repository Directory Structure

Our code is version controlled in a GitHub repository. This includes the data cleaning, wrangling, and analysis, creation of plots and maps, and structure of the shiny app. An overview of the directory structure can be found in the Appendix Section.

8.3.2 Data Preparation Methods

8.3.2.1 RIDB Data

RIDB data data are available through direct download as CSV files or via the API as JSON files via Recreation.gov. API access requires creating a Recreation.gov account and requesting second tier API access via the Recreation.gov website’s Contact Us page. CSV files are readily available for download via the Recreation.gov website. Data are collected each time a visitor makes a reservation through Recreation.gov. Data packages are posted annually in the spring by Recreation One Stop (R1S) and contain the previous fiscal year’s reservations (ex: the 2018 package includes 2018-10-01 through 2018-09-30). Data packages are available for download from 2006 to present. Each annual data package file contains a range between 2 million and 5 million observations (or reservations) and includes variables in character, numeric, and date/time formats about each reservation. A shift in the data collection and storage processes occurred in 2019, changing what variables are available and how they are labeled. Currently, the Outdoor Equity App contains only data for reservable sites in California and reservations for fiscal year 2018 due to time limitations of this project. To see more about expanding the App temporally and spatially see the How to Expand the Outdoor Equity App Section.

We created a reproducible workflow for cleaning and wrangling data, which is employed in the data_wrangle_and_clean.Rmd document that sources custom functions to prepare RIDB data for joining with ACS data. All custom functions are listed in the Access, Clean, and Wrangle Data Section of this document. These functions rely heavily on functions from the tidyverse collection of packages (Wickham 2021).

First, we use the function_ridb_subset-pre2018.R to subset the RIDB data, filtering only reservations within the selected state that are listed as “Overnight” reservations within the use_type variable. For the 2018 California dataset this results in a starting “raw” data frame with 521,682 reservations

# filter for state
filter(facility_state == state_abbrev) %>%
  # filter for use type
  filter(use_type == "Overnight")

We then select only the necessary variables, including information about the site (agency, park or forest name, site name, site type, and site location) and information about the reservation (home ZIP come, total paid, visit start and end dates, visit order date, and number of people in party).

# select variables
select(c("agency", "parent_location", "region_description", 
         "park", "site_type", "facility_id", "facility_state", 
         "facility_longitude", "facility_latitude", 
         "customer_zip", "total_paid", "start_date",
         "end_date", "order_date", "number_of_people")) %>%
  mutate(site_type = tolower(site_type)) %>%
  filter(!site_type %in% c("historic tour", "hiking zone", 
                           "group picnic area", "cave tour",
                           "management", "picnic", 
                           "entry point", "trailhead"))

Next we normalize the customer ZIP code values. This includes filtering for only US ZIP codes and shortening all 9 digit ZIP codes to include only the first 5 digits.

# filter out invalid ZIP codes
filter(str_detect(string = customer_zip,
                  pattern = "^[:digit:]{5}(?!.)") |
         str_detect(string = customer_zip,
                    pattern = "^[:digit:]{5}(?=-)")) %>%
  filter(!customer_zip %in% c("00000", "99999")) %>%
  mutate(customer_zip = str_extract(string = customer_zip,
                                    pattern = "[:digit:]{5}"))

This function results in the removal of 24,866 reservations (or 4.77%) from the original “raw” dataset that included all reservations for reservable overnight campsites in California.

We created a second custom function function_ridb_variable_calculate-pre2018.R to calculate and manipulate variables of interest. We use start, end, and order dates to calculate the lengths of stay and booking windows (number of days from order to start date) of each reservation. The booking window calculations return a number of results that are negative. This is a known issue that others working with the RIDB data have encountered. This resulted in 4,530 reservations (or 0.87%) without a valid booking window, which are removed for plots that visualize the booking window variable.

mutate(start_date = as.Date(start_date),
       end_date = as.Date(end_date),
       order_date = as.Date(order_date),
       # calculate new variables
       length_of_stay = as.numeric(difftime(end_date, start_date), 
                                   units = "days"),
       booking_window = as.numeric(difftime(start_date, order_date), 
                                   units = "days"))

We calculate daily cost per reservation by dividing total costs by lengths of stay. We then calculated daily cost per visitor by dividing the daily cost by the number of visitors.

# calculate new variables
mutate(daily_cost = total_paid / length_of_stay,
       daily_cost_per_visitor = daily_cost / number_of_people)

We aggregate the site_type variable to create 7 broader site categories.

# aggregate site type
mutate(aggregated_site_type = 
         case_when(site_type %in% c("walk to", "hike to", 
                                    "group hike to", "group walk to"
         ) ~ "remote",
         site_type %in% c("cabin nonelectric", "cabin electric", 
                          "yurt","shelter nonelectric"
         ) ~ "shelter",
         site_type %in% c("boat in", "anchorage") ~ "water",
         site_type %in% c("group equestrian", 
                          "equestrian nonelectric"
         ) ~ "equestrian",
         site_type %in% c("rv nonelectric", "rv electric", 
                          "group rv area nonelectric"
         ) ~ "rv only",
         site_type %in% c("group standard nonelectric", 
                          "standard nonelectric",
                          "standard electric", 
                          "group standard area nonelectric",
                          "group standard electric"
         ) ~ "rv or tent",
         site_type %in% c("tent only nonelectric", 
                          "group tent only area nonelectric",
                          "tent only electric"
         ) ~ "tent only"))

We create an administrative unit variable by combining the parent_location and region_description variables as different federal agencies track the administrative unit information within different variables. Then we update the agency, admin_uni and park variable character strings using multiple functions from the stringr package (Wickham 2019). These updates expand acronyms, remove unnecessary characters (such as “–” or location codes), and normalize any discrepencies in characater strings.

mutate(admin_unit = 
         case_when(agency == "USFS" ~ parent_location,
                   agency %in% c(
                     "NPS", "BOR", "USACE"
                   ) ~ region_description)) %>% 
  # edit values
  mutate(
    # agency abbreviations to names
    agency = str_replace(string = agency,
                         pattern = "NPS",
                         replacement = "National Park Service"),
    agency = str_replace(string = agency,
                         pattern = "USFS", 
                         replacement = "US Forest Service"),
    agency = str_replace(string = agency,
                         pattern = "USACE",
                         replacement = "US Army Corps of Engineers"),
    agency = str_replace(string = agency,
                         pattern = "BOR",
                         replacement = "Bureau of Reclamation"),
    # update admin_unit values (generic)
    admin_unit = str_replace(string = admin_unit,
                             pattern = paste(c("NF - FS", "NF -FS", 
                                               "NF- FS", "NF-FS", 
                                               "-FS", " - FS"), 
                                             collapse = "|"),
                             replacement = "National Forest"),
    admin_unit = str_to_title(admin_unit),
    admin_unit = str_replace(string = admin_unit,
                             pattern = "And",
                             replacement = "&"),
    # update park values (generic)
    park = str_remove(string = park,
                      pattern = paste(c("\\(.*", " \\(.*",
                                        "---.*", " ---.*",
                                        ",.*"), 
                                      collapse = "|")),
    park = str_to_title(park),
    park = str_replace(string = park,
                       pattern = "Cg",
                       replacement = "Campground"),
    park = str_replace(string = park,
                       pattern = "Nhp",
                       replacement = "National Historic Park"),
    park = str_replace(string = park,
                       pattern = "@",
                       replacement = "At"),
    park = str_replace(string = park,
                       pattern = "&",
                       replacement = "And"),
    park = str_replace(string = park,
                       pattern = paste(c("/", " / "), collapse = "|"),
                       replacement = " "),
    park = str_remove_all(string = park,
                          pattern = " \\d.*"),
    # update park values (CA specific)
    park = str_remove(string = park,
                      pattern = paste(c(" - Angeles Nf", " -Hwy"), 
                                      collapse = "|")),
    park = str_replace(string = park,
                       pattern = "Tunnel Mills Il",
                       replacement = "Tunnel Mills"))

We calculate distance traveled by measuring the distance from the latitude and longitude coordinate facility locations to the centroid of the home ZIP code. Here we utilize both the tidycensus package (Walker and Herman 2021) and the sf package (Pebesma 2022).

# bootstrap geometries and reproject to NAD 83
df_geometries <- df %>% 
  st_as_sf(coords = c("facility_longitude", "facility_latitude"),
           crs = 4326) %>% 
  st_transform(crs = 4269) # using NAD83 because measured in meters

# get centroid of geometries for all US ZIP codes 
df_zip_centroids_us <- 
  get_acs(geography = "zcta", year = 2018, 
          geometry = TRUE, 
          summary_var = "B01001_001",
          survey = "acs5",
          variables = c(male = "B01001_002")) %>% 
  select(NAME, geometry) %>% 
  mutate(zip_code = str_sub(NAME, start = -5, end = -1)) %>% 
  select(zip_code, geometry) %>% 
  st_centroid()

# join data and calculate `distance_traveled` variable
df_joined_geometries <- 
  left_join(x = df_geometries %>% as.data.frame(),
            y = df_zip_centroids_us %>% as.data.frame(), 
            by = c("customer_zip" = "zip_code")) %>%
  st_sf(sf_column_name = 'geometry.x') %>% 
  mutate(distance_traveled_m = st_distance(x = geometry.x, 
                                           y = geometry.y,
                                           by_element = TRUE),
         distance_traveled_m = as.numeric(distance_traveled_m)) 

And finally, we add a variable indicating in which state or territory each customer zip code is located. This portion of the code utilizes the zipcodeR package (Rozzi 2021).

# create df of fips and full state names
fips_list <- c(
  "01", "02", "04", "05", "06", "08", "09", "10", "11", "12", 
  "13", "15", "16", "17", "18", "19", "20", "21", "22", "23", 
  "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", 
  "34", "35", "36", "37", "38", "39", "40", "41", "42", "44", 
  "45", "46", "47", "48", "49", "50", "51", "53", "54", "55", 
  "56", "72")
state_list <- c(
  "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL",
  "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME",
  "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH",
  "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI",
  "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI",
  "WY", "PR")
states_names_list <- c(
  "Alabama", "Alaska", "Arizona", "Arkansas", "California", 
  "Colorado", "Connecticut", "Delaware", 
  "District of Columbia", "Florida","Georgia", "Hawaii", 
  "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", 
  "Kentucky", "Louisiana", "Maine","Maryland", 
  "Massachusetts", "Michigan", "Minnesota", "Mississippi", 
  "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire",
  "New Jersey", "New Mexico", "New York", "North Carolina", 
  "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", 
  "Rhode Island","South Carolina", "South Dakota", "Tennessee", 
  "Texas", "Utah", "Vermont", "Virginia", "Washington", 
  "West Virginia", "Wisconsin","Wyoming", "Puerto Rico")
df_states_fips <- as.data.frame(list(fips = fips_list,
                                     state = state_list,
                                     state_full = states_names_list))

# loop through state df to get all ZIP codes w/in state
df_states_zip_codes <- data.frame()

for (i in seq_along(fips_list)){
  state <- zipcodeR::search_fips(state_fips = fips_list[[i]]) %>% 
    select(zipcode, state)
  df_states_zip_codes <- rbind(df_states_zip_codes, state)
}

# add full state name and fips code to list of all ZIP codes for each state
df_states_fips_zip_codes <- 
  left_join(x = df_states_zip_codes,
            y = df_states_fips,
            by = "state") %>% 
  select(-fips) %>% 
  rename(customer_zip_state = state,
         customer_zip_state_full = state_full,
         zip_code = zipcode)

8.3.2.2 U.S. Census Data

US Census data is publicly accessible in many ways. Our product utilizes the R package tidycensus (Walker and Herman 2021) to access the necessary variables from the 2018 American Community Survey (ACS) via API. API access requires an api key, which can then be saved into your RStudio environment. Learn more about API keys and working with tidycensus here.

# API set up
# Only have to run this the first time using on a new machine
census_api <- source("private/census-api.R")
census_api_key(key = census_api[[1]], install = TRUE, overwrite = TRUE)
# run in console:
readRenviron("~/.Renviron")

Sample data are collected for the ACS each year and includes many variables that cover social, economic, housing, and demographic characteristics. All variable options can be easily viewed using the code below:

# look at variable options
View(load_variables(2018, "acs5", cache = TRUE))

The Outdoor Equity App utilizes the ACS 5-year data, which is an estimate representing data collected over the designated 5 year period. We used the 5-year ACS data over the 1-year ACS data because it increases “statistical reliability of the data for less populated areas and small population subgroups” (Bureau 2022).

The variables included in the App include median-income, race, language(s) spoken at home, and highest level of education attained. All variables are represented as estimates in numeric format for a census tract. Data are called by geographic region, in our case the ZIP code tabulation area, and include an estimated number of people that fall into each category within each ZIP code, a margin of error, and an estimated number of total people in the area. Within our custom functions function_acs_education.R, function_acs_language.R, function_acs_median_income.R, and function_acs_race.R we first imported just the necessary columns for each ACS variable. See Table 8.1 for an example of the race variable.

# import variables for race
race_df <- 
  get_acs(
    geography = "zcta", year = 2018,
    state = "CA",
    summary_var = "B03002_001", #Estimate!!Total: 
    variables = c(
      white = "B03002_003", #White alone
      black = "B03002_004", #Black or African American alone
      native_american = "B03002_005", #American Indian and Alaska Native alone
      asian = "B03002_006", #Asian alone
      pacific_islander = "B03002_007", #Native Hawaiian/Other Pacific Islander alone
      other = "B03002_008", #Some other race alone
      multiracial = "B03002_009", #Two or more races
      hispanic_latinx = "B03002_012" #Hispanic or Latino
    ))
Table 8.1: Example of raw ACS Race Variable.
GEOID NAME variable estimate moe summary_est summary_moe
90001 ZCTA5 90001 white 413 165 58975 1725
90001 ZCTA5 90001 black 5138 646 58975 1725
90001 ZCTA5 90001 native_american 37 42 58975 1725
90001 ZCTA5 90001 asian 67 38 58975 1725
90001 ZCTA5 90001 pacific_islander 0 29 58975 1725
90001 ZCTA5 90001 other 168 169 58975 1725
90001 ZCTA5 90001 multiracial 66 44 58975 1725
90001 ZCTA5 90001 hispanic_latinx 53086 1591 58975 1725
90002 ZCTA5 90002 white 223 81 53111 2031
90002 ZCTA5 90002 black 10110 876 53111 2031

We calculated percentages for the categories within each ACS variable for race, language(s) spoken in the home, and highest level of education by dividing the estimate for a category by the estimated total population for that ZIP code.

race_df_percent <- 
  race_df %>% 
  group_by(zip_code, race) %>% 
  summarise(estimate = sum(estimate),
            moe = sum(moe),
            summary_est = unique(summary_est),
            summary_moe = unique(summary_moe),
            percent = estimate / summary_est)

For median-income, we use the estimated median-income as is, without further adjustments.

median_income_df <- 
  get_acs(geography = "zcta", year = 2018,
          state = "CA",
          variables = c(
            median_income = "B19013_001" 
          )) %>% 
  clean_names() %>% 
  rename(median_income = estimate) %>% 
  mutate(zip_code = str_sub(name, start = -5, end = -1)) %>% 
  select(median_income, zip_code)

We transform teh ACS dataframe by pivoting wider, which moves the different categories and percentages (ex: racial groups) to their own columns to create a single row for each ZIP code. This is necessary for joining the ACS data sets to the RIDS data set.

select(zip_code, summary_est, race, percent) %>% 
  rename(zip_code_population = summary_est) %>% 
  # create column for each percentage for each group (pivot_wider)
  # necessary to be able to left_join() with RIDB data
  pivot_wider(names_from = "race",
              values_from = "percent")

Finally, we use the function_join_ridb_acs.R custom functions to join the RIDB and ACS data to create the dataset that is available for download on the App and utilized for all plots and maps.

joined_df <- 
  left_join(x = ridb_df,
            y = acs_df_race,
            by = c("customer_zip" = "zip_code")) %>% 
  left_join(y = acs_df_education,
            by = c("customer_zip" = "zip_code")) %>% 
  left_join(y = acs_df_median_income,
            by = c("customer_zip" = "zip_code")) %>% 
  left_join(y = acs_df_language,
            by = c("customer_zip" = "zip_code"))

8.3.3 Statistical Analysis and Data Wrangling for Plots

Each type of plot and map require unique data wrangling which is explained in this section.

8.3.3.1 Data Summary

We created custom functions for data wrangling and plotting for use within the Outdoor Equity App code. Wrangling begins with filtering based on the user’s choice campsite input. We then further subset the dataset to include only the necessary columns for plotting.

# example for booking window data summary
booking_window_rdf <- reactive({
  validate(
    need(siteInput != "",
         "Please select a reservable site to visualize.")
  ) # EO validate
  
  ridb_df %>%
    filter(park %in% siteInput,
           booking_window > 0,
           booking_window != "Inf") %>% 
    select(park, booking_window) %>% 
    filter(!is.na(booking_window))
})

We create histograms, bar charts, or density plots to show the distribution of the data for each variable. For ACS variables, data from reservations are plotted against data for the full California population. This allows a user to compare the distribution within the reservation data to that of the California census in order to see where reservations either under- or over-represent that variable as compared to California residents. We used the plotly package (Sievert et al. 2021) to allow a user to hover over the plot to view helper text. We also included additional helper text outside of the plots for more complicated plots.

## -- example for booking window data summary -- ##
# parameters
hist_colors <- c("#64863C", "#466C04")
quant_80_color <- c("#FACE00")
caption_color <- c("#ac8d00")

# plot for shiny app
plotly <- ggplot(
  data = booking_window_rdf()) +
  geom_histogram(
    aes(x = booking_window,
        text = paste0(percent(..count.. / nrow(booking_window_rdf()), 
                              accuracy = 0.1), 
                      " of all visits are reserved between ", xmin, 
                      " and ", xmax, 
                      " days before the start of the visit", 
                      "<br>",
                      "(Total reservations to site: ",
                      comma(nrow(booking_window_rdf()), accuracy = 1), 
                      ")")),
    binwidth = 7,
    center = 7 / 2,
    fill = hist_colors[[1]], 
    col = hist_colors[[2]], size = 0.05) +
  labs(x = "Days in advance before visit (each bar = 1 week)",
       y = "") +
  scale_x_continuous(limits = c(0, x_max)) +
  geom_vline(xintercept = quant_80,
             linetype = "dashed", alpha = 0.5, color = quant_80_color) +
  theme_minimal() +
  theme(plot.background = element_rect("white"),
        panel.grid.major.y = element_blank())

# add 6 month / 1 year annotation if data allows
if (x_max <= 180) {
  # don't include 6 month or 1 year annotation
  plotly
  
} else if (x_max > 180 & x_max <= 360){
  # include 6 month annotation
  plotly <- plotly +
    geom_vline(xintercept = 180, 
               linetype = "dashed", size = .3, alpha = .5) +
    annotate("text", label = "6 months",  size = 3,
             x = 180, y = 5)
  
} else if (x_max >= 360) {
  
  # include 6 month and 1 year annotation
  plotly <- plotly +
    geom_vline(xintercept = 180, 
               linetype = "dashed", size = .3, alpha = .5) +
    annotate("text", label = "6 months", size = 3,
             x = 180, y = 5) +
    geom_vline(xintercept = 360,
               linetype = "dashed", size = .3, alpha = .5) +
    annotate("text", label = "1 year", size = 3,
             x = 360, y = 5)
} # EO else if for plotly

ggplotly(plotly,
         tooltip = list("text"),
         dynamicTicks = TRUE) %>% 
  layout(title = 
           list(text = 
                  paste0('<b>', siteInput, '<br>', admin_unitInput, '</b>',
                         '<br>',
                         'Number of days from reservation to start of visit'),
                font = list(size = 15)),
         xaxis = list(separatethousands = TRUE),
         yaxis = list(separatethousands = TRUE),
         margin = list(b = 130, t = 100), 
         annotations =  
           list(x = x_max/2, y = -0.6, 
                text = 
                  paste0("80% of reservations reserve their visit less than ", 
                         '<b>', quant_80, '</b>', 
                         " days before the start date", 
                         "<br>",
                         "(shown on plot with yellow dotted line)."), 
                showarrow = F, 
                xre = 'paper', yref = 'paper', 
                             xanchor = 'middle', yanchor = 'auto', 
                             xshift = 0, yshift = 0,
                             font = list(size = 12, color = caption_color))) %>%
  config(modeBarButtonsToRemove = list("pan", "select", "lasso2d", "autoScale2d", 
                                       "hoverClosestCartesian", 
                                       "hoverCompareCartesian"))

8.3.3.2 Data Relationships

Visualizing the relationship between RIDB and ACS variables is challenging because the socioeconomic data available are an estimate of the ZIP code population, rather than data of the individual making the reservation. In order to create simple, easy to interpret plots, we determined whether reservations were from a location with “high” percentages of a given ACS variable category (ex: one of the eight racial groups). We determined “high” as anything above the weighted 3rd quartile (weighted based on the California census). This method allows for reservations to be included in the “high” category for multiple values within one ACS variable (ex: a reservation might be from a ZIP code that falls into the “high” category for both Black and Asian folks). By allowing a reservation to appear in multiple categories within a single ACS variable, the uncertainty of the reserver’s actual socioeconomic status is retained.

Example code for calculating the “high” threshold for language can be seen below:

# `function_acs_top_quartile_language.R` code
threshold_df <- acs_df %>%
  # filter variables of interest
  select(zip_code, english_only, not_english_only, mean_zip_code_population) %>% 
  pivot_longer(cols = 2:3,
               names_to = "language",
               values_to = "language_percentage") %>% 
  # filter category of interest
  filter(language == language_group) %>% 
  drop_na(language_percentage)

# weighted median value (weighted based on ZIP code populations)
weighted_half <- weighted.mean(x = threshold_df$language_percentage, 
                               w = threshold_df$mean_zip_code_population)

# drop rows below weighted median
df_half <- threshold_df %>% filter(language_percentage >= weighted_half)

# weighted 3rd quartile
## weighted median value of top half (weighted based on ZIP code populations)
weighted_quartile <- weighted.mean(x = df_half$language_percentage, 
                                   w = df_half$mean_zip_code_population)
source("r/function_acs_top_quartile_language.R")

# language groups
language_group <- c("english_only", "not_english_only")

# calculate value of 3rd quartile for each language group
language_quants_df <-
  language_group %>%
  map_dbl(language_top_quartile, acs_df = data_acs_ca_all) %>%
  cbind("language_group" = language_group,
        "weighted_quartile" = .) %>%
  as.data.frame() %>% 
  mutate(language_group = str_replace_all(string = language_group,
                                          pattern = "_",
                                          replacement = " "),
         language_group = str_to_title(language_group),
         weighted_quartile = percent(as.numeric(weighted_quartile), 
                                     accuracy = 0.01)) %>% 
  rename("Language Group" = language_group,
         "Threshold" = weighted_quartile)

The thresholds used within the Outdoor Equity App can be seen in Table 8.2 and Table 8.3.

Table 8.2: Threshold values for ACS language and education variables.
Language Group Threshold
English Only 72.98%
Not English Only 62.75%
Education Group Threshold
Hs Ged Or Below 54.21%
Some College 25.62%
College 36.85%
Master Or Above 22.02%
Table 8.3: Threshold values for ACS race variable.
Race Group Threshold
Other 0.52%
Pacific Islander 0.87%
Multiracial 4.38%
Asian 30.29%
Black 13.16%
White 59.08%
Native American 0.93%
Hispanic Latinx 62.07%

We create lollipop and bar plots to show the relationships of the data for each set of variables. We used the plotly package (Sievert et al. 2021) to allow a user to hover over the plot to view helper text. An example of the code used to create the education compared to booking window plot is show below:

## -- data wrangle -- ##s
# create reactive dataframe and further subset
rdf <- reactive ({
  
  validate(
    need(siteInput != "",
         "Please select a reservable site to visualize.")
  ) # EO validate
  
  education_top_quartile_df %>%
    # filter to user site of choice
    filter(park == siteInput) %>%
    # select to variables of interest
    select(park, customer_zip, 
           education, education_percentage, education_y_lab, 
           booking_window) %>% 
    drop_na(booking_window, education_percentage) %>% 
    filter(booking_window >= 0) %>% 
    # summarize to inner quartile range, median, and total reservations
    group_by(education, education_y_lab) %>% 
    summarize(median_booking_window = median(booking_window),
              quartile_lower = quantile(booking_window)[[2]],
              quartile_upper = quantile(booking_window)[[4]],
              count = n())
  
}) #EO reactive df

validate(need(
  nrow(rdf()) > 0,
  paste0("There are no reservations to ", siteInput, ", ", admin_unitInput, 
         " that come from communities in the high range for any educational", 
         "categories.")
)) # EO validate

## -- create plot -- ##
# parameters
education_group_colors <-
  c(
    "HS, GED,\nor Below" = "#a6cee3",
    "Some College or\nTrade School"  = "#1f78b4",
    "Associates or\nBachelors Degree" = "#b2df8a",
    "Masters Degree\nor Above" = "#33a02c"
  )

# create plot
plotly <- ggplot(data = rdf(),
                 aes(x = median_booking_window,
                     y = education_y_lab)) +
  geom_segment(aes(xend = 0, yend = education_y_lab)) +
  geom_point(
    aes(
      color = education_y_lab,
      fill = education_y_lab,
      text = paste0(
        comma(count, accuracy = 1),
        " unique visits were made by people who live in ZIP codes with high ", 
        " rates of",
        "<br>", education, 
        " as the maximum level of education. ", 
        "Typically these visitors reserved their visit between",
        "<br>", comma(quartile_lower, accuracy = 1), " and ", 
        comma(quartile_upper, accuracy = 1),
        " days before the start of their trip, with a median booking window of ", 
        comma(median_booking_window, accuracy = 1),  " days."
      )
    ),
    size = 3.5, shape = 21, stroke = 2 ) +
  scale_y_discrete(expand = c(0.45, 0)) +
  scale_fill_manual(values = education_group_colors) +
  scale_color_manual(values = education_group_colors) +
  labs(x = paste("Estimated Number of Days in Advance Site is Reserved"),
       y = "") +
  theme_minimal() +
  theme(
    plot.background = element_rect("white"),
    panel.grid.major.y = element_blank(),
    legend.position = "none")

# create plotly
ggplotly(plotly,
         tooltip = list("text")) %>%
  config(
    modeBarButtonsToRemove = 
      list("zoom", "pan", "select", "lasso2d", "autoScale2d",
           "hoverClosestCartesian", "hoverCompareCartesian")) %>%
  layout(title = list(
    text = paste0( '<b>', siteInput, '<br>', admin_unitInput, '</b>', '<br>',
                   'Number of Days in Advance Site is Reserved by Visitors with", 
                   "Different Levels of Education'),
    font = list(size = 15) )) %>%
  add_annotations(
    text = "Reservations from ZIP codes<br>with high proportions of:",
    x = -0.15, y = 0.9, 
    font = list(size = 11),
    xref = 'paper', yref = 'paper', 
    showarrow = FALSE)

8.3.3.3 Spatial analysis

We created a state-level visitorshed map to show the number of visitors to overnight reservable sites from each state. We use the package zipcodeR (Rozzi 2021) to determine the state in which each ZIP code, and thus visitor, is located, using the RIDB data. We then create a dataframe with the necessary geometries using the tigris (Walker 2022) package. And finally we simplify the geometries to lower the load time within the App using the rmapshaper (Teucher and Russell 2021) package.

# ZIP codes and respective states
fips_list <- 
  c("01", "02", "04", "05", "06", "08", "09", "10", "11", "12", 
    "13", "15", "16", "17", "18", "19", "20", "21", "22", "23", 
    "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", 
    "34", "35", "36", "37", "38", "39", "40", "41", "42", "44", 
    "45", "46", "47", "48", "49", "50", "51", "53", "54", "55", 
    "56", "72")
state_list <- 
  c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL",
    "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME",
    "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH",
    "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI",
    "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI",
    "WY", "PR")
states_names_list <- 
  c("Alabama", "Alaska", "Arizona", "Arkansas", "California", 
    "Colorado", "Connecticut", "Delaware", "District of Columbia", 
    "Florida","Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", 
    "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine","Maryland", 
    "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", 
    "Montana", "Nebraska", "Nevada", "New Hampshire","New Jersey", 
    "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", 
    "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island","South Carolina", 
    "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", 
    "Washington", "West Virginia", "Wisconsin","Wyoming", "Puerto Rico")

# create dataframe of states
df_states_fips <- as.data.frame(list(fips = fips_list,
                                     state = state_list,
                                     state_full = states_names_list))

# loop through state df to get all ZIP codes w/in state
df_states_zip_codes <- data.frame()

for (i in seq_along(fips_list)){
  state <- zipcodeR::search_fips(state_fips = fips_list[[i]]) %>% 
    select(zipcode, state)
  df_states_zip_codes <- rbind(df_states_zip_codes, state)
}

# add full state name, fips code, etc. to list of all ZIP codes for each state
df_states_fips_zip_codes <- 
  left_join(x = df_states_zip_codes,
            y = df_states_fips,
            by = "state") %>% 
  select(fips, zipcode) %>% 
  rename(zip_code = zipcode)

# get state geometries for full US
df_state_geometries_us <- tigris::states(year = 2018) %>%
  select(GEOID, STUSPS, NAME, geometry) %>% 
  rename(fips = GEOID,
         state_abbrev = STUSPS,
         state = NAME) %>% 
  rmapshaper::ms_simplify(keep = 0.005, keep_shapes = TRUE)

We then calculate the number of reservations per state for the site (as chosen by the app user) and use tmaps package (Tennekes 2022) to create an interactive map that colors states based on the total number of reservations for that state.

## -- data wrangle -- ##
# reactive data frame for siteInput
rdf <- reactive ({
  
  validate(
    need(siteInput != "",
         "Please select a reservable site to visualize.")
  ) # EO validate
  
  ridb_df %>% 
    filter(park %in% siteInput) %>%
    select(agency, admin_unit, park, customer_zip_state_full, 
           customer_zip_state)
})

# value of total reservations for this park
total_reservations <- nrow(rdf())

# number of reservations and % per state
map_data <- rdf() %>% 
  group_by(customer_zip_state_full, customer_zip_state) %>%
  summarize(number_reservations = n(),
            percentage_reservations = 
              percent((number_reservations / total_reservations), 
                                              accuracy = 0.01)) %>%
  filter(!is.na(customer_zip_state_full))

# add state geometries
map_data_geometries <-
  state_geometries_df %>%
  left_join(y = map_data,
            by = c("state_abbrev" = "customer_zip_state")) %>%
  mutate_at(vars(number_reservations), 
            ~replace(number_reservations, is.na(number_reservations), 0)) %>%
  mutate_at(vars(percentage_reservations), 
            ~replace(percentage_reservations, is.na(percentage_reservations), 
                     0)) %>% 
  select(customer_zip_state_full, number_reservations, 
         percentage_reservations, geometry)

## -- create map -- ##
tmap_mode("view")

tm_shape(map_data_geometries) +
  tm_borders(col = "grey", alpha = 0.5) +
  tm_fill(col = "number_reservations",
          title = "Number of Visits",
          palette = "YlGn",
          n = 10,
          style = "jenks",
          id = "customer_zip_state_full",
          popup.vars = c("Total Visits" = "number_reservations",
                         "Percentage of All Visits" = "percentage_reservations")
          ) +
  tm_view(set.view = c(-101.834335, 40.022356, 2)) +
  tmap_options(basemaps = 'https://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}')

A ZIP-code level visitorshed map shows the number of visitors to sites for each ZIP code within California only due to computational load of calculating for all ZIP codes in the United States. We then create a dataframe with the necessary geometries using the tigris (Walker 2022) package. And finally we simplify the geometries to lower the load time within the App using the rmapshaper (Teucher and Russell 2021) package.

## -- ZIP codes and respective states -- ##
fips_list <- 
  c("01", "02", "04", "05", "06", "08", "09", "10", "11", "12", 
    "13", "15", "16", "17", "18", "19", "20", "21", "22", "23", 
    "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", 
    "34", "35", "36", "37", "38", "39", "40", "41", "42", "44", 
    "45", "46", "47", "48", "49", "50", "51", "53", "54", "55", 
    "56", "72")
state_list <- 
  c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL",
    "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME",
    "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH",
    "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI",
    "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI",
    "WY", "PR")
states_names_list <- 
  c("Alabama", "Alaska", "Arizona", "Arkansas", "California", 
    "Colorado", "Connecticut", "Delaware", "District of Columbia", 
    "Florida","Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", 
    "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine","Maryland", 
    "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", 
    "Montana", "Nebraska", "Nevada", "New Hampshire","New Jersey", 
    "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", 
    "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island","South Carolina", 
    "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", 
    "Washington", "West Virginia", "Wisconsin","Wyoming", "Puerto Rico")

# create dataframe of states
df_states_fips <- as.data.frame(list(fips = fips_list,
                                     state = state_list,
                                     state_full = states_names_list))

# loop through state df to get all ZIP codes w/in state
df_states_zip_codes <- data.frame()

for (i in seq_along(fips_list)){
  state <- zipcodeR::search_fips(state_fips = fips_list[[i]]) %>% 
    select(zipcode, state)
  df_states_zip_codes <- rbind(df_states_zip_codes, state)
}
# add full state name, fips code, etc. to list of all ZIP codes for each state
df_states_fips_zip_codes <- 
  left_join(x = df_states_zip_codes,
            y = df_states_fips,
            by = "state") %>% 
  rename(state_abbrev = state,
         zip_code = zipcode) %>% 
  relocate(fips, .before = 2)


## -- CA ZIP geometries -- ##
df_zip_geometries_ca <- get_acs(geography = "zcta", year = 2018, 
                                geometry = TRUE,
                                state = "California",
                                summary_var = "B01001_001",
                                variables = c(male = "B01001_002")) %>%
  select(NAME, geometry) %>%
  mutate(zip_code = str_sub(NAME, start = -5, end = -1)) %>%
  select(zip_code, geometry) %>% 
  rmapshaper::ms_simplify(keep = 0.005, keep_shapes = TRUE) %>% 
  left_join(y = df_states_fips_zip_codes,
            by = "zip_code") %>% 
  relocate(zip_code, .before = 1)

We calculate the number of reservations per ZIP code for the site (as chosen by the app user) and use tmaps to create an interactive map that colors ZIP codes based on the total number of reservations for that ZIP code.

## -- data wrangle -- ##
# reactive data frame for siteInput
rdf <- reactive ({
  
  validate(
    need(siteInput != "",
         "Please select a reservable site to visualize.")
  ) # EO validate
  
  ridb_df %>% 
    select(agency, admin_unit, park, customer_zip, facility_latitude, 
           facility_longitude) %>% 
    filter(park %in% siteInput)
})

# number of reservations per ZIP code
map_data <- rdf() %>% 
  group_by(customer_zip) %>%
  summarize(number_reservations = n())

# join with ZIP geometries
map_data_geometries <-
  zip_geometries_df %>% 
  left_join(map_data,
            by = c("zip_code" = "customer_zip")) %>%
  mutate_at(vars(number_reservations), 
            ~replace(number_reservations, is.na(number_reservations), 0))  %>% 
  select(zip_code, number_reservations, geometry)

# value of total CA reservations for this park
total_reservations <- sum(map_data_geometries$number_reservations)

# add percentage of all CA reservations for each ZIP
map_data_geometries <- map_data_geometries %>% 
  mutate(percentage_reservations = 
           percent((number_reservations / total_reservations), 
                   accuracy = 0.01)) %>%
  mutate_at(vars(percentage_reservations), 
            ~replace(percentage_reservations, is.na(percentage_reservations), 
                     0))

# calculate location of park for point on map
park_location_geom <- rdf() %>%
  group_by(park) %>%
  summarise(facility_latitude = median(facility_latitude),
            facility_longitude = median(facility_longitude)) %>%
  st_as_sf(coords = c("facility_latitude", "facility_longitude"),
           crs = 4326) %>%
  st_transform(crs = 4269) # using NAD83 because measured in meters

print(paste("The site's location has been calculated and it is:", 
            park_location_geom$geometry))

## -- create map -- ##
tmap_mode("view")

tm_shape(map_data_geometries) +
  tm_fill(
    col = "number_reservations",
    title = "Number of Visits",
    palette = "PuRd",
    style = "jenks",
    n = 10,
    popup.vars = c("Total Visits" = "number_reservations",
                   "Percentage of All CA Visits" = "percentage_reservations")) +
  tm_shape(park_location_geom) +
  tm_dots(col = "darkgreen", size = 0.1) +
  tm_shape(park_location_geom) +
  tm_symbols(shape = map_site_icon,
             id = "park") +
  tm_shape(ca_cities_df) +
  tm_text(col = "black",
          text = "city") +
  tm_view(set.view = c(-119.559917, 37.061753, 6)) +
  tmap_options(basemaps = 'https://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}')

8.3.4 Data updates

The data_wrangle_and_clean.Rmd document is set up to create the joined dataset necessary for the Outdoor Equity App. To access the ACS data we use the tidycensus package (Walker and Herman 2021) to access the necessary data via API, meaning data updates are not necessary outside of the .Rmd document. The RIDB data is accesses via direct download from the Recreation.gov website. See the How to Expand the Outdoor Equity App Section on expanding the time periods included in the App.

8.3.5 Server Hosting

The Outdoor Equity App is hosted on server owned by the Bren School at the University of California, Santa Barbara until the end of 2022. At that time the App will be moved to the shiny.io server for the duration of 2023.

8.4 How to Expand the Outdoor Equity App

The Outdoor Equity App currently only includes reservation data from California for 2018 due to server hosting capabilities. Expanding the App to include additional years and states will require a larger server and likely more computing power.

8.4.1 Temporal Expansions

To expand the years of data included in the Outdoor Equity App, users must first update the data_wrangle_and_clean.Rmd where all data cleaning and wrangling occurs outside of the shiny app code.

For RIDB data from 2018 and early, users can simply include new code chunks in the .Rmd document that use the following functions to create a joined dataset for each new year: function_ridb_subset-pre2018.R, function_ridb_variable_calculate-pre2018.R, function_acs_race.R, function_acs_median_income.R, function_acs_education.R, function_acs_language.R, and function_join_ridb_acs.R.

For RIDB data from 2019 and later, users must update the function_ridb_subset-post2019.R function. We did not continue to update this function once we decided to include only 2018 data in the App. We recommend updating this function to first select only the necessary variables, then rename the variables so that they match the 2018 and earlier naming conventions, and finally copy and paste the remaining code used in the function_ridb_subset-pre2018.R function. Users can then add new code chunks to the .Rmd document that use the following functions to create a joined dataset for each new year: function_ridb_subset-post2019.R, function_ridb_variable_calculate-pre2018.R, function_acs_race.R, function_acs_median_income.R, function_acs_education.R, function_acs_language.R, and function_join_ridb_acs.R.

RIDB data from 2019 and later may have additional variables available, so we also recommend exploring all available variables to decide if adding in additional variables is appropriate. If users decide to add additional variables, edits will have to be made to all custom functions within the data_preparation and outdoor-equity-app directories.

Data standards for the RIDB data is in the process of being updated as of June 2022. Users should check to see if this impacts any decisions in adding additional years to the Outdoor Equity App.

For the Outdoor Equity App consider these updates to these files:

  • global.R:

    • Add any new functions, packages, or data
  • ui.R:

    • Create input widgets in Analysis tabs (data summary, data relationships, and visitorsheds) that allow for a user to select a specific year or mulitple years (consider making a function for this if the input is the same across all tabs)
  • server.R:

    • Update plot or map functions that use a reactive data frame that includes the temporal input from ui.R

Additionally, for the entire App consider how load time will be affected by adding more years of data. It might be helpful to consider creating separated datasets or creating functions that create datasets that contain only the necessary data needed to create the visual(s).

8.4.2 Spatial Expansions

To expand the states included in the Outdoor Equity App, users must update the data_wrangle_and_clean.Rmd where all data cleaning and wrangling occurs outside of the shiny app code.

The function_ridb_variable_calculate-pre2018.R function has a section that manipulates the character strings of the agency, admin_unit, and park variables. This code would likely need to be changed for additional states as the specific irregularities in character strings may differ state to state. We recommend creating a new function for each additional state. See the RIDB Data Section for a more detailed description of this function code. Once a new custom function is created for each state, code chunks can be added to the .Rmd document to clean and wrangle data for additional states.

For the Outdoor Equity App consider these updates to these files:

  • global.R:

    • Add any new functions, packages or data
    • Update or create new objects that contain information about sites and agencies
    • Update agency to administrative unit dictionary, and administrative unit to park dictionary so that these key value pairs appear for states beyond California
  • ui.R:

    • Create an input widget that allows for someone to select a year or multiple years
    • Build out an informational box that contains basic information about the site chosen
  • server. R:

    • Update state_visitorshed_map() function if the time input or other data needs to be added to this function
    • Update ca_zip_code_visitorshed_map() function to allow for a specific state to be called and the data associated with it

Additionally, for the entire App consider load time especially because spatial data is much larger and therefore takes much longer to load. It might be helpful to consider the structure of the data and which data gets loaded into the App to make the visuals, so that only the necessary data is being used in the back end of the App. See Technical Challenges Section for more insight and suggestions.

8.4.3 Statistical Analysis

Currently the Outdoor Equity App only visualizes patterns in the data, but does not allow a user to explore why these patterns are occurring. We recommend considering the following additions to expand the statistical analysis abilities of the App:

  • Simple statistical analysis (such as linear regressions, ANOVA, etc.) to accompany all plots
  • Ability to combine multiple campsites in a single plot or map to explore patterns and trends across geographic areas and years

For the Outdoor Equity App consider these updates to these files:

  • global.R:

    • Add any new functions, packages, or data
  • ui.R:

    • Create and add a new tab to the navbarPage() layout using the function tabPanel() for the statistical analysis
    • Build out layout in the new tab. To match the layout of the other tabs use fluidRow() and box() within tabPanel()
    • Create necessary inputs to create visuals or run statistical tests
  • server.R:

    • Create necessary objects to render the visuals desired (i.e. reactive data frames, render plots or tables)
    • Create necessary conditional outputs based on an input using observeEvents()

Because statistical analysis was beyond the scope of our project, please note that these updates are not an exhaustive list of what needs to be done to successfully add statistical analysis to the App.

References

Bureau, United States Census. 2022. “American Community Survey 5-Year Data (2009-2020).” website.
Pebesma, Edzer. 2022. Sf: Simple Features for r. https://CRAN.R-project.org/package=sf.
Rozzi, Gavin. 2021. zipcodeR: Data & Functions for Working with US ZIP Codes. https://CRAN.R-project.org/package=zipcodeR.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2021. Plotly: Create Interactive Web Graphics via Plotly.js. https://CRAN.R-project.org/package=plotly.
Tennekes, Martijn. 2022. Tmap: Thematic Maps. https://github.com/r-tmap/tmap.
Teucher, Andy, and Kenton Russell. 2021. Rmapshaper: Client for Mapshaper for Geospatial Operations. https://github.com/ateucher/rmapshaper.
Walker, Kyle. 2022. Tigris: Load Census TIGER/Line Shapefiles. https://github.com/walkerke/tigris.
Walker, Kyle, and Matt Herman. 2021. Tidycensus: Load US Census Boundary and Attribute Data as Tidyverse and Sf-Ready Data Frames. https://walker-data.com/tidycensus/.
Wickham, Hadley. 2019. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.