NYCSquirrelAnalysis

Author

Ainsley Owens

Abstract

This is an analysis of the NYC Squirrels Census dataset. This dataset includes information regarding squirrel fur color, behavior, and geographical location. This analysis will determine commonalities among squirrels like most common fur color, most common behavior, activity compared between day and night, as well as comparisons between juveniles and adults. Does most commonly seen behavior differ from juveniles and adults? Does commonly seen behavior differ at night/during the day? The intention behind this analysis is to develop a further understanding of the information in this dataset.

Introduction

This dataset contains observations from the NYC Squirrel Census. 2,373 squirrels were observed in Central Park. Juveniles and adults were observed. Fur color was observed, as well as behavior and geographical location. Behaviors observed include climbing, chasing, eating, foraging, interactions with humans, and communicative behaviors like kuks, quaas, moans, tail twitches, etc. This exploratory analysis seeks to answer the following questions:

  • What is the most common fur color?

  • How many squirrels of each fur color are there?

  • How many juveniles and adults are there?

  • Are squirrels more likely to be above ground or on the ground plane?

  • Are squirrels seen more at night or during the day?

  • What is the most commonly seen behavior?

  • Does most commonly seen behavior differ from juveniles and adults?

  • Does commonly seen behavior differ at night/during the day?

Loading Packages and Datasets

Here we loaded the tidymodels package and squirrels data.

nyc_squirrels <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-29/nyc_squirrels.csv")
Rows: 3023 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): unique_squirrel_id, hectare, shift, age, primary_fur_color, highli...
dbl  (9): long, lat, date, hectare_squirrel_number, zip_codes, community_dis...
lgl (13): running, chasing, climbing, eating, foraging, kuks, quaas, moans, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#install.packages("tidymodels")
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
✔ broom        1.0.2      ✔ recipes      1.0.4 
✔ dials        1.1.0      ✔ rsample      1.1.1 
✔ dplyr        1.0.10     ✔ tibble       3.1.8 
✔ ggplot2      3.4.0      ✔ tidyr        1.2.1 
✔ infer        1.0.4      ✔ tune         1.0.1 
✔ modeldata    1.1.0      ✔ workflows    1.1.2 
✔ parsnip      1.0.3      ✔ workflowsets 1.0.0 
✔ purrr        1.0.0      ✔ yardstick    1.1.0 
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()
• Dig deeper into tidy modeling with R at https://www.tmwr.org
#Load the tidyverse
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ readr   2.1.3     ✔ forcats 0.5.2
✔ stringr 1.5.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ readr::col_factor() masks scales::col_factor()
✖ purrr::discard()    masks scales::discard()
✖ dplyr::filter()     masks stats::filter()
✖ stringr::fixed()    masks recipes::fixed()
✖ dplyr::lag()        masks stats::lag()
✖ readr::spec()       masks yardstick::spec()
library (kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
nyc_squirrels_splits <- initial_split(nyc_squirrels, prop = 0.5)

exploratory_data <- training(nyc_squirrels_splits)
test_data <- testing(nyc_squirrels_splits)
exploratory_data %>% 
  head(6)%>% 
  kable() %>% 
  kable_styling (c("striped", "hover"))
long lat unique_squirrel_id hectare shift date hectare_squirrel_number age primary_fur_color highlight_fur_color combination_of_primary_and_highlight_color color_notes location above_ground_sighter_measurement specific_location running chasing climbing eating foraging other_activities kuks quaas moans tail_flags tail_twitches approaches indifferent runs_from other_interactions lat_long zip_codes community_districts borough_boundaries city_council_districts police_precincts
-73.96960 40.78023 18C-PM-1018-08 18C PM 10182018 8 Juvenile Black NA Black+ NA Ground Plane FALSE NA FALSE FALSE FALSE FALSE TRUE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE NA POINT (-73.9695964983454 40.7802302192799) NA 19 4 19 13
-73.95894 40.78965 31F-PM-1007-04 31F PM 10072018 4 Adult Gray NA Gray+ NA Ground Plane FALSE NA FALSE FALSE FALSE FALSE TRUE NA FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE NA POINT (-73.9589367709165 40.7896465303483) NA 19 4 19 13
-73.95642 40.79575 38E-AM-1010-01 38E AM 10102018 1 Adult Gray Cinnamon Gray+Cinnamon NA Ground Plane FALSE NA FALSE FALSE FALSE FALSE TRUE NA FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE NA POINT (-73.9564215689843 40.7957539444068) NA 19 4 19 13
-73.95965 40.79404 35C-PM-1013-06 35C PM 10132018 6 Adult Gray Cinnamon Gray+Cinnamon NA Ground Plane FALSE NA FALSE FALSE FALSE FALSE TRUE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE NA POINT (-73.9596505138955 40.7940367734988) NA 19 4 19 13
-73.95887 40.79924 40A-PM-1014-03 40A PM 10142018 3 Adult Gray NA Gray+ NA Ground Plane FALSE NA TRUE FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE runs from (ran from base of tree into bushes) POINT (-73.9588689820908 40.799238413797205) NA 19 4 19 13
-73.97110 40.77875 15C-PM-1017-01 15C PM 10172018 1 Adult Gray Cinnamon Gray+Cinnamon NA Ground Plane FALSE NA TRUE FALSE FALSE TRUE TRUE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE NA POINT (-73.9711001278389 40.7787466095841) NA 19 4 19 13

This table represents the geographical, physical, and behavioral features of the squirrels in the dataset.

Exploratory Analysis

What is the most common fur color? How many squirrels of each fur color are there?

Here I use the select() tool to isolate the primary fur color column and then the count() tool to summarize the amount of squirrels per fur color. Then, using the ggplot() tool, I was able to visualize the data into a bar graph.

exploratory_data %>%
  select(primary_fur_color)
# A tibble: 1,511 × 1
   primary_fur_color
   <chr>            
 1 Black            
 2 Gray             
 3 Gray             
 4 Gray             
 5 Gray             
 6 Gray             
 7 Gray             
 8 Cinnamon         
 9 Gray             
10 Gray             
# … with 1,501 more rows
exploratory_data %>%
  count(primary_fur_color)
# A tibble: 4 × 2
  primary_fur_color     n
  <chr>             <int>
1 Black                50
2 Cinnamon            172
3 Gray               1262
4 <NA>                 27
exploratory_data %>%
  ggplot() +
  geom_bar(mapping = aes(x = primary_fur_color), fill = "pink", color= "black") +
  labs(title = "Counts of Primary Fur Color" , x = "Primary Fur Color", y = "Count")

The most common fur color is gray, with 1,223 squirrels followed by cinnamon, 208 squirrels, with black, 49 squirrels, being the least common fur color. 31 squirrels were observed as N/A, not contributing to this data form. This may be explained by natural selection, where gray squirrels are better adapted to blend in with the environment, whereas cinnamon and black furred squirrels may be more visible to predators.

How many juveniles and adults are there?

Here, I again used the count() tool to summarize the amount of squirrels per age group (age) and ggplot() to visualize the data.

exploratory_data %>% 
  count (age)
# A tibble: 4 × 2
  age          n
  <chr>    <int>
1 ?            2
2 Adult     1295
3 Juvenile   154
4 <NA>        60
exploratory_data %>%
  ggplot() +
  geom_bar(mapping = aes(x = age), fill = "pink", color= "black") +
  labs(title = "Squirrel Age Count" , x = "Age", y = "Count")

There are more adult squirrels than juveniles. From the sample, there were 1,273 adults, compared to 160 juveniles. This may be because there is a higher percentage of adults in the sample, but may also be due to the fact that adults adventure into public sight more often than juveniles. This could potentially be explained by the fact that squirrels care for their offspring for up to 3 months, meaning the juveniles may not leave the nest often until they are older.

Are squirrels more likely to be above ground or on the ground plane?

Here, I again used the count() tool to summarize the amount of squirrels per location observed (location) and ggplot() to visualize the data.

exploratory_data %>% 
  count (location)
# A tibble: 3 × 2
  location         n
  <chr>        <int>
1 Above Ground   414
2 Ground Plane  1062
3 <NA>            35
exploratory_data %>%
  ggplot() +
  geom_bar(mapping = aes(x = location), fill = "pink", color= "black") +
  labs(title = "Squirrel Location Counts" , x = "Location", y = "Count")

Squirrels were seen more on the ground plane than above ground in this dataset. There were 1,064 squirrels seen on the ground plane compared to only 425 above ground. Squirrels may spend more time on the ground plane rather than above ground, but this may also be explained by the fact that it is more difficult to find squirrels in trees. Because this analysis is run on data only provided by the NYC Central Park Squirrels Census dataset, this exploratory hypothesis cannot be generalized for the species as a whole, but just for this sample.

Are squirrels seen more at night or during the day?

Here, I again used the count() tool to summarize the amount of squirrels per time of day observed (shift) and ggplot() to visualize the data.

exploratory_data %>%
  count (shift)
# A tibble: 2 × 2
  shift     n
  <chr> <int>
1 AM      689
2 PM      822
exploratory_data %>%
  ggplot() +
  geom_bar(mapping = aes(x = shift), fill = "pink", color= "black") +
  labs(title = "Squirrel Time Frequencies" , x = "Time of Day", y = "Count")

Squirrels were seen more at night than during the day, although the difference was not significant. There were 659 squirrels seen during the day, and 852 squirrels seen at night. This suggests that the squirrels in this sample may be more active at night.

What is the most commonly seen behavior?

Here, I downloaded the skimr package. The skim() tool summarizes and gives numerical values for data. This was necessary to analyze behavior because the observations in this dataset were recorded as false/true for observation per behavior. Utilizing this tool allowed me to determine which behaviors were the most and least commonly observed.

#install.packages("skimr")
library(skimr)
exploratory_data %>%
  skim()
Data summary
Name Piped data
Number of rows 1511
Number of columns 36
_______________________
Column type frequency:
character 14
logical 13
numeric 9
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
unique_squirrel_id 0 1.00 13 14 0 1511 0
hectare 0 1.00 3 3 0 323 0
shift 0 1.00 2 2 0 2 0
age 60 0.96 1 8 0 3 0
primary_fur_color 27 0.98 4 8 0 3 0
highlight_fur_color 546 0.64 4 22 0 10 0
combination_of_primary_and_highlight_color 0 1.00 1 27 0 20 0
color_notes 1422 0.06 3 120 0 70 0
location 35 0.98 12 12 0 2 0
above_ground_sighter_measurement 59 0.96 1 5 0 34 0
specific_location 1270 0.16 4 102 0 173 0
other_activities 1297 0.14 6 132 0 159 0
other_interactions 1375 0.09 2 106 0 120 0
lat_long 0 1.00 38 45 0 1511 0

Variable type: logical

skim_variable n_missing complete_rate mean count
running 0 1 0.24 FAL: 1143, TRU: 368
chasing 0 1 0.09 FAL: 1373, TRU: 138
climbing 0 1 0.23 FAL: 1164, TRU: 347
eating 0 1 0.25 FAL: 1140, TRU: 371
foraging 0 1 0.49 FAL: 773, TRU: 738
kuks 0 1 0.03 FAL: 1467, TRU: 44
quaas 0 1 0.02 FAL: 1484, TRU: 27
moans 0 1 0.00 FAL: 1510, TRU: 1
tail_flags 0 1 0.04 FAL: 1445, TRU: 66
tail_twitches 0 1 0.15 FAL: 1288, TRU: 223
approaches 0 1 0.06 FAL: 1414, TRU: 97
indifferent 0 1 0.48 FAL: 789, TRU: 722
runs_from 0 1 0.24 FAL: 1151, TRU: 360

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
long 0 1.00 -73.97 0.01 -73.98 -73.97 -73.97 -73.96 -73.95 ▅▇▆▆▃
lat 0 1.00 40.78 0.01 40.76 40.77 40.78 40.79 40.80 ▇▇▃▅▆
date 0 1.00 10119218.53 41854.83 10062018.00 10082018.00 10122018.00 10142018.00 10202018.00 ▇▂▇▂▃
hectare_squirrel_number 0 1.00 4.04 3.04 1.00 2.00 3.00 6.00 21.00 ▇▂▁▁▁
zip_codes 1503 0.01 12045.50 805.05 10090.00 12081.00 12421.50 12423.00 12423.00 ▁▁▁▁▇
community_districts 0 1.00 18.99 0.32 11.00 19.00 19.00 19.00 23.00 ▁▁▁▇▁
borough_boundaries 0 1.00 4.00 0.00 4.00 4.00 4.00 4.00 4.00 ▁▁▇▁▁
city_council_districts 0 1.00 19.12 1.73 19.00 19.00 19.00 19.00 51.00 ▇▁▁▁▁
police_precincts 0 1.00 13.01 0.31 10.00 13.00 13.00 13.00 18.00 ▁▇▁▁▁

Here, I used the count() tool to separate the numerical values of the most and least commonly observed behaviors.

exploratory_data %>%
  count (foraging)
# A tibble: 2 × 2
  foraging     n
  <lgl>    <int>
1 FALSE      773
2 TRUE       738
exploratory_data %>%
  count (moans)
# A tibble: 2 × 2
  moans     n
  <lgl> <int>
1 FALSE  1510
2 TRUE      1

The most commonly seen behavior is foraging. The least commonly seen behavior is moans 717 squirrels from the sample were observed foraging. Foraging is necessary for survival, which explains why this is the most commonly observed behavior. Moans are a behavior used to communicate with other squirrels to identify the presence or lack of predators. These moans are very quiet, which may account for why only one squirrel was observed displaying this behavior.

Does behavior differ among juveniles and adults?

This data frame will show how commonly different behaviors are observed between adults and juveniles.

Here, I use the filter() tool to remove the squirrels whose ages were not observed from the dataset to remove those observations because they would not contribute to this analysis. I then used the group_by() tool to categorize the behavioral data by age (juveniles and adults). Again, because this dataset features behavioral observations in a false/true manner, I used the summarize() tool to quantify numerical quantities for each behavior per age group.

exploratory_data %>%
  filter(!is.na(age)) %>%
  filter(age!="?") %>%
  group_by(age) %>%
  summarize (pct_moans = mean(moans), pct_foraging = mean(foraging), pct_foraging = mean(foraging), pct_running = mean(running), pct_chasing = mean(chasing), pct_climbing = mean(climbing), pct_eating = mean(eating), pct_kuks = mean(kuks), pct_tail_flags = mean(tail_flags), pct_quaas = mean(quaas), pct_tail_twitches = mean(tail_twitches), pct_approaches = mean(approaches), pct_indifferent = mean(indifferent), pct_runs_from = mean(runs_from))
# A tibble: 2 × 14
  age    pct_m…¹ pct_f…² pct_r…³ pct_c…⁴ pct_c…⁵ pct_e…⁶ pct_k…⁷ pct_t…⁸ pct_q…⁹
  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Adult  7.72e-4   0.503   0.248  0.0903   0.232   0.242  0.0255  0.0417  0.0162
2 Juven… 0         0.416   0.201  0.0974   0.247   0.286  0.0390  0.0649  0.0195
# … with 4 more variables: pct_tail_twitches <dbl>, pct_approaches <dbl>,
#   pct_indifferent <dbl>, pct_runs_from <dbl>, and abbreviated variable names
#   ¹​pct_moans, ²​pct_foraging, ³​pct_running, ⁴​pct_chasing, ⁵​pct_climbing,
#   ⁶​pct_eating, ⁷​pct_kuks, ⁸​pct_tail_flags, ⁹​pct_quaas

Because the observation format of this dataset features “false/true” observations for behaviors, it can make analysis difficult. This data frame represents the numerical proportion of times each behavior is being observed (“true”) so that we are able to determine differences between behavior in groups. The groups in this data form are adults and juveniles. This data form shows that there are differences in behavior based on age. Adults were observed more foraging and climbing than juveniles, while juveniles were observed more running and chasing than adults. This may be because adult squirrels are focused primarily on survival, while juveniles may be more likely to partake in “playful” behaviors with one another. It is also important to remember that there are more adults than juveniles in this dataset, so it may not be an accurate representation of behavior among different age groups.

Here, still using the filter() tool to remove the squirrels whose ages were not observed, I used ggplot() to visualize the data, representing the differences in most and least commonly observed behaviors per age group.

exploratory_data %>%
  filter(!is.na(age)) %>%
  filter(age!="?") %>%
  ggplot() +
  geom_bar(aes(x=age,
               fill=foraging),
           position="fill")

exploratory_data %>%
  filter(!is.na(age)) %>%
  filter(age!="?") %>%
  ggplot() +
  geom_bar(aes(x=age,
               fill=moans),
           position="fill")

This graph shows the differences in frequency of behavior between adults and juveniles, using the most common and least common behaviors, foraging and moaning. This demonstrates that while there is virtually no difference between adults and juveniles observed moaning, adults were observed more than juveniles foraging.

Does behavior differ dependent on time of day?

Here, I use the filter() tool to remove the “na” observations from the dataset. Using the groupby() tool, I am able to categorize the data between AM and PM observations. The summarize() tool provided numerical values for the behavioral frequencies. I included all behaviors to determine which behaviors occurred more or less commonly at different times of day.

This data frame will show how behavior differs dependent on time of day, AM or PM.

exploratory_data %>%
  filter(!is.na(shift)) %>%
  filter(shift!="?") %>%
  group_by(shift) %>%
  summarize (pct_moans = mean(moans), pct_foraging = mean(foraging), pct_foraging = mean(foraging), pct_running = mean(running), pct_chasing = mean(chasing), pct_climbing = mean(climbing), pct_eating = mean(eating), pct_kuks = mean(kuks), pct_tail_flags = mean(tail_flags), pct_quaas = mean(quaas), pct_tail_twitches = mean(tail_twitches), pct_approaches = mean(approaches), pct_indifferent = mean(indifferent), pct_runs_from = mean(runs_from))
# A tibble: 2 × 14
  shift pct_mo…¹ pct_f…² pct_r…³ pct_c…⁴ pct_c…⁵ pct_e…⁶ pct_k…⁷ pct_t…⁸ pct_q…⁹
  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 AM     0         0.441   0.244  0.0958   0.258   0.196  0.0348  0.0450  0.0160
2 PM     0.00122   0.528   0.243  0.0876   0.206   0.287  0.0243  0.0426  0.0195
# … with 4 more variables: pct_tail_twitches <dbl>, pct_approaches <dbl>,
#   pct_indifferent <dbl>, pct_runs_from <dbl>, and abbreviated variable names
#   ¹​pct_moans, ²​pct_foraging, ³​pct_running, ⁴​pct_chasing, ⁵​pct_climbing,
#   ⁶​pct_eating, ⁷​pct_kuks, ⁸​pct_tail_flags, ⁹​pct_quaas

This data frame shows that certain behaviors, like moans, chasing, climbing, kuks, tail twitches, and approaches are more commonly observed in the morning/daytime, while behaviors like running away, tail flags, quaas, eating, foraging, and running are more commonly observed in the evening/nighttime. Previous analysis shows that squirrels are more commonly observed at night, so it is interesting to compare what behaviors are more commonly observed at different times throughout the day. This dataset is representative only of observed behaviors in the NYC Central Park squirrel population, as the squirrels are observed in their natural habitat and not under 24/7 observation in a laboratory setting, so it is possible that this analysis does not accurately describe general squirrel behavior and patterns, although it does allow us to draw conclusions about the behaviors of this particular sample.

Here, still using the filter() tool to remove the squirrels whose ages were not observed, I used ggplot() to visualize the data, representing the differences in most and least commonly observed behaviors per time of day (AM and PM).

exploratory_data %>%
  filter(!is.na(shift)) %>%
  filter(shift!="?") %>%
   ggplot() +
  geom_bar(aes(x=shift,
               fill=foraging),
           position="fill")

exploratory_data %>%
  filter(!is.na(shift)) %>%
  filter(shift!="?") %>%
   ggplot() +
  geom_bar(aes(x=shift,
               fill=moans
),
           position="fill")

This bar graph represents the difference in squirrels observed foraging and moaning, the most commonly observed and least commonly observed behaviors, in the morning and at night. Foraging was observed relatively equally between the AM and PM shift, although there is a slightly higher quantity of foraging observed at night than during the day. The squirrels in this dataset were more commonly observed at night than during the day, so this finding correlates with the other data. Moans were very infrequently observed in this sample, shown by the quantity of the false bar in the graph, so there is not much distinction to be made between morning and night.

Conclusions

The exploratory analysis from this dataset allowed us to develop several tentative conclusions to further analyze using the test data. The most commonly observed fur color is grey, and there were more adult squirrels observed in this sample than juveniles. This dataset shows that squirrels in NYC Central Park were more likely to be observed on the ground plane than above ground, and also more likely to be observed at night than during the day. The most commonly observed behavior across the sample is foraging, while the least commonly observed behavior is moaning. Further analysis determined that there are behavioral differences between juveniles and adults, as well as differences observed in frequencies of specific behaviors at night compared to during the day. Certain behaviors, like moans, chasing, climbing, kuks, tail twitches, and approaches are more commonly observed in the morning/daytime, while behaviors like running away, tail flags, quaas, eating, foraging, and running are more commonly observed in the evening/nighttime. Adults were observed more foraging and climbing than juveniles, while juveniles were observed more running and chasing than adults. Because this exploratory analysis and conclusions are based off one singular sample (the NYC Squirrels Census), in one geographical area (Central Park, NYC) conclusions and generalizations cannot be made for squirrel species as a whole. However, the exploratory analysis did allow us to draw preliminary conclusions for the exploratory data as well as generalize hypotheses for the test data.