3 Tidyverse

The goal of this Tidyverse section is to learn

  • Subsetting Datasets
  • Merging Datasets
  • Summarizing
  • Plotting

In this section we will be continuing to be working with the BLS data we downloaded in the Introduction.

Tidyverse provides us with some useful tools for data manipulation and cleaning. I include examples and descriptions of the commands I have most frequently used.

3.1 Filtering with Pipes

indiana<-                                    # 1
    unemp_data %>% filter(State_fips=="18")  # 2

What is going on in this line of code? Let’s start with the second line. The operator %>% is called a pipe. This comes from a package called magittr which is included within tidyverse. Pipes are very handy for neatly writing longer sequences of code. Here we begin with a pretty simple example. We can think of pipes as saying “then.” That is it takes the argument before it and uses it as input in the following command. Here we are using a filter command, so we are saying take our data.frame county_data, then filter or “give us” the values where State_fips is equal to “18” (this is fips code for Indiana. Go Hoosiers!). So in summary we are just filtering out indiana data and saving it as a new data frame (this is line 1). Note: this operation does NOT impact county_data, the only way we ‘over write’ a data frame is to explicitly tell it to. For example the following WOULD overwrite county_data.

unemp_data<-unemp_data %>% filter(State_fips=="18")  # 1

3.2 Plotting Data

In this lesson we will be plotting the data gathered in previous lessons. ggplot2 is a very popular package for plotting and is now included as part of the tidyverse package. It allows us to make many different kinds of plots and customize them. As it is so popular there are many guides for it. Tidyverse’s website summarizes some very useful resources.

As ggplot2 has so many options, we will just be barely skimming the surface in this introduction. Let’s start with a minimal command. First let’s load in ggplot2 filter our data down into just Indiana.

library(ggplot2)                                    # 1
                                                    # 2
indiana <- unemp_data %>% filter(State_fips=="18")  # 3
ggplot(data=indiana, aes(x=Year, y=Unemp_rate,   # 1
    colour=County_fips, group=County_fips))+     # 2
  geom_line()                                    # 3
  • Line #1: Tells ggplot we will be using the data.frame indiana. aes is short for aesthetics. This is where we tell ggplot how to map our data into the plots. Here I want ggplot to plot Year on the x-axis, Unemp_rate on the y-axis.
  • Line #2: A continuation of the aesthetic mapping, his line is saying plot separately for each County_fips separately, giving them a different color.
  • Line #3: Tells ggplot to plot the data as lines.

First thing after this plot, the legend is excessive. Let’s just suppress it for now. To do this we can define the legend.position in theme as none. The theme command allows for a lot of customization with the plot. If you want to see all options click here. Let’s also change the thickness of the lines, we can do this with the alpha argument.

ggplot(data=indiana, aes(x=Year, y=Unemp_rate,   # 1
    colour=County_fips, group=County_fips))+     # 2
  geom_line(size=0.3)+                           # 3
  theme(legend.position="none")                  # 4

This already is looking a lot better. But can we do something about the labels on the x axis, they are overlaying on each other. We could make the text smaller but then it would be difficult to see, so let’s just rotate them. Also let’s make the y-axis name a bit more meaningful.

ggplot(data=indiana, aes(x=Year, y=Unemp_rate,   # 1
    colour=County_fips, group=County_fips))+     # 2
  geom_line(size=0.3)+                           # 3
  theme(legend.position="none",                  # 4
    axis.text.x=element_text(angle=45)) +        # 5
  ylab('Indiana Unemployment Rate by County')    # 6

Now we’re getting somewhere! But what if we want to emphasize the county Bloomington is in, Monroe County?

monroecty<-indiana %>% filter(County_fips=="105")       # 1
                                                        # 2
ggplot(data=indiana, aes(x=Year, y=Unemp_rate,          # 3
    colour=County_fips, group=County_fips))+            # 4
  geom_line(size=0.3)+                                  # 5
  geom_line(data=monroecty, aes(x=Year, y=Unemp_rate,   # 6
    colour=County_fips, group=1),                       # 7
    size=0.6, colour='black') +                         # 8
  theme(legend.position="none",                         # 9
    axis.text.x=element_text(angle=45)) +              # 10
  ylab('Indiana Unemployment Rate by County')          # 11

We can make the black line for Monroe County ‘pop’ a bit more too. Let’s change the opacity of the other counties lines. To do this we can set a value, alpha. This takes values 0 to 1 where smaller values are more transparent.

ggplot(data=indiana, aes(x=Year, y=Unemp_rate,          # 1
    colour=County_fips, group=County_fips))+            # 2
  geom_line(size=0.3, alpha=0.5)+                       # 3
  geom_line(data=monroecty, aes(x=Year, y=Unemp_rate,   # 4
    colour=County_fips, group=1),                       # 5
    size=0.6, colour='black') +                         # 6
  theme(legend.position="none",                         # 7
    axis.text.x=element_text(angle=45)) +               # 8
  ylab('Indiana Unemployment Rate by County')           # 9

3.2.1 Scatter plot (geom_point)

What if we want points to insteaed of lines for each year?

ggplot(data=indiana, aes(x=Year, y=Unemp_rate,   # 1
  colour=County_fips, group=County_fips))+       # 2
  geom_point()                                   # 3

3.3 Manipulating within Groups

Our next chunk of code gets a bit more complex, let’s take it peice by peice though.

  indiana %>% group_by(Year) %>%          # 1
    summarise(Average=mean(Labor_force))  # 2

Let’s again take it from the end and work our ways backwards through it. The 3rd line above is calculating a new variable within a group that is called ‘Average.’ The group (line 2) that we are calculating within is each Year for our Indiana data.frame. That is this snippet of code will give us one value back for each year, the average Labor Force level across Indiana’s counties.

Year Average
1990 30593.65
1991 30357.00
1992 31058.60
1993 32046.84
1994 33308.01
1995 34078.00
1996 33800.80
1997 33919.72
1998 33947.47
1999 33965.91
2000 33982.50
2001 34140.32
2002 34469.34
2003 34597.76
2004 34432.72
2005 34841.75
2006 35164.35
2007 34866.21
2008 35131.49
2009 34717.45
2010 34512.97
2011 34586.91
2012 34454.88
2013 34656.59
2014 35052.00
2015 35508.30
2016 36178.11
2017 36243.96
2018 36756.59
2019 36819.25

This group_by command paired with summarise is quite useful. For example going back to our original data.frame, county_data. Suppose we wanted to do what we just did but for all states. That is calculate within each state and year combination. With the group_by command this is quite simple! (I print the first 100 lines of output below)

unemp_data %>%                          # 1
  group_by(Year, State_fips) %>%        # 2
  summarise(Average=mean(Labor_force))  # 3
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
Year State_fips Average
1990 01 28468.343
1990 02 10323.923
1990 04 119079.067
1990 05 15091.147
1990 06 261007.414
1990 08 27557.641
1990 09 227471.000
1990 10 120074.667
1990 11 328924.000
1990 12 96480.269
1990 13 20653.579
1990 15 137730.500
1990 16 11193.045
1990 17 58304.049
1990 18 30593.652
1990 19 14664.212
1990 20 12119.305
1990 21 14619.667
1990 22 29216.547
1990 23 39563.625
1990 24 107919.000
1990 25 229027.357
1990 26 55509.072
1990 27 27573.690
1990 28 14438.854
1990 29 22737.852
1990 30 7198.929
1990 31 8871.215
1990 32 39407.176
1990 33 62009.100
1990 34 193117.000
1990 35 21630.061
1990 36 142149.565
1990 37 34765.130
1990 38 5985.226
1990 39 61586.670
1990 40 19786.338
1990 41 41408.833
1990 42 86992.045
1990 44 105072.600
1990 45 37903.326
1990 46 5260.303
1990 47 25207.768
1990 48 33932.858
1990 49 28352.138
1990 50 21768.214
1990 51 24452.541
1990 53 64751.974
1990 54 13857.600
1990 55 35970.542
1990 56 10259.348
1990 72 14517.090
1991 01 28665.134
1991 02 10626.000
1991 04 120334.200
1991 05 15060.213
1991 06 260801.552
1991 08 28035.766
1991 09 230137.750
1991 10 120571.333
1991 11 321433.000
1991 12 97499.104
1991 13 20619.044
1991 15 142897.250
1991 16 11547.432
1991 17 58277.431
1991 18 30357.000
1991 19 14896.960
1991 20 12190.105
1991 21 14689.508
1991 22 29804.094
1991 23 40318.625
1991 24 109023.333
1991 25 228211.357
1991 26 55369.482
1991 27 27903.586
1991 28 14439.512
1991 29 23000.835
1991 30 7247.179
1991 31 8992.527
1991 32 41338.118
1991 33 61429.900
1991 34 193610.571
1991 35 21974.485
1991 36 140985.613
1991 37 35380.200
1991 38 5947.509
1991 39 61726.409
1991 40 19575.922
1991 41 42135.556
1991 42 87390.030
1991 44 103603.400
1991 45 38234.348
1991 46 5303.182
1991 47 25407.611
1991 48 34512.854
1991 49 29390.414
1991 50 21683.857
1991 51 25107.677
1991 53 65281.667

Average is just one statistic we may want to calculate, how about percentiles, minimum, maximum, standard deviation. our summarise command will accept more than one argument, we just have to separate by a comma. Let’s also save this as a new data.frame for use in future lessons.

indiana_laborforce<-                       # 1
  indiana %>% group_by(Year) %>%           # 2
  summarise(Min=min(Labor_force),          # 3
    p10th=quantile(Labor_force, c(0.1)),   # 4
    p25th=quantile(Labor_force, c(0.25)),  # 5
    p50th=quantile(Labor_force, c(0.5)),   # 6
    p75th=quantile(Labor_force, c(0.75)),  # 7
    p90th=quantile(Labor_force, c(0.9)),   # 8
    Max=max(Labor_force),                  # 9
    Average=mean(Labor_force),            # 10
    StDev=sd(Labor_force))                # 11
Year Min p10th p25th p50th p75th p90th Max Average StDev
1990 2629 6639.3 9617.50 14858.5 30089.50 60784.5 424053 30593.65 52989.19
1991 2541 6380.5 9480.00 14713.0 30131.00 62309.9 421789 30357.00 52670.08
1992 2550 6284.2 9615.50 15169.5 31117.00 63623.2 429239 31058.60 53551.02
1993 2607 6200.6 10037.75 15652.5 32034.75 64549.9 438377 32046.84 54683.97
1994 2681 6453.1 10480.50 16472.0 33808.25 66506.3 453457 33308.01 56427.35
1995 2714 6543.2 10994.75 16648.5 34588.00 67364.5 459603 34078.00 57219.27
1996 2724 6522.6 10827.50 16551.0 34597.25 66487.5 453728 33800.80 56670.37
1997 2698 6585.3 10966.00 16671.5 34786.50 66195.3 453928 33919.72 56870.28
1998 2701 6629.7 11008.75 16660.0 34150.50 65887.6 452412 33947.47 56827.82
1999 2691 6496.7 10986.75 16958.5 33968.50 65196.0 448732 33965.91 56434.89
2000 2985 7053.5 10816.50 16984.5 34996.50 63718.6 455135 33982.50 56818.02
2001 2982 7085.7 10709.00 17127.5 34674.75 64832.0 459505 34140.32 57318.59
2002 3051 7040.7 10802.75 17244.5 34974.75 66572.3 462330 34469.34 57807.67
2003 3092 6941.0 10709.50 17199.0 34514.75 67751.5 464331 34597.76 58056.22
2004 3109 6815.2 10547.00 17083.5 33431.50 68476.2 459140 34432.72 57583.09
2005 3152 7075.8 10702.75 17357.5 33606.00 69985.6 459492 34841.75 57925.42
2006 3107 6883.1 10620.50 17326.5 34554.50 72792.0 461468 35164.35 58484.53
2007 3026 6816.7 10447.50 16881.0 34304.50 73128.1 460046 34866.21 58342.88
2008 3044 6816.2 10488.00 16673.5 34065.50 75072.0 463200 35131.49 58819.13
2009 3088 6758.4 10434.25 16452.0 33700.25 74434.1 457524 34717.45 58083.80
2010 3263 6440.5 10308.00 16622.0 33921.75 74406.0 452836 34512.97 57806.16
2011 3210 6459.9 10184.25 16630.5 33804.00 75620.5 456011 34586.91 58199.53
2012 3181 6336.2 10058.50 16453.0 32924.00 76458.0 458377 34454.88 58388.80
2013 3165 6325.9 9900.50 16289.5 33053.00 78149.5 463346 34656.59 58983.65
2014 3216 6401.2 9851.50 16524.0 33215.50 80113.3 467169 35052.00 59579.47
2015 3178 6341.8 9924.75 16908.0 33960.00 82327.7 472330 35508.30 60275.44
2016 3226 6444.8 9930.50 17093.5 35149.00 85223.0 481434 36178.11 61517.07
2017 3153 6512.5 9783.75 17119.0 35516.50 85487.6 483607 36243.96 61840.36
2018 3189 6561.9 9943.50 17334.5 36211.00 86905.8 488698 36756.59 62659.34
2019 3201 6615.6 9812.75 17243.0 36470.50 87146.9 492967 36819.25 63114.77

3.4 Defining New Variables

How about if we don’t want to calculate summary statistics within a group, but just want to calculate a new variable from each observation? Consider if we had unemployed and labor force levels but did not have unemployment rate, how could we go about calculating it with mutate?

indiana<-                                                                # 1
  indiana %>%                                                            # 2
  mutate(unemp_rate_calc=round((Unemployed/Labor_force)*100, digits=1))  # 3

Note since we are defining our new data frame as indiana, the one we are manipulating, we are in this case overwriting the indiana data.frame. Since the unemployment rate in our file was listed as percent and rounded to the nearest tenth, I did the same for our calculated. (digits=1 means one decimal place)

Now let’s compare the first rows of the given unemployment rate with the one we just calculated.

## # A tibble: 6 x 2
##   Unemp_rate unemp_rate_calc
##        <dbl>           <dbl>
## 1        6.5             6.5
## 2        5.1             5.1
## 3        4.8             4.8
## 4        3.4             3.4
## 5        9.2             9.2
## 6        2.4             2.4

3.5 Dropping Variables

How can we go about dropping or keeping certain variables?

Say we wanted to drop the unemp_rate_calc, Labor_force, and Employed?

temp<-                                                 # 1
  indiana %>%                                          # 2
  select(-c(unemp_rate_calc, Labor_force,  Employed))  # 3
head(temp)                                             # 4
## # A tibble: 6 x 7
##   LAUS_code       State_fips County_fips County_name            Year  Unemployed Unemp_rate
##   <chr>           <chr>      <chr>       <chr>                  <chr>      <dbl>      <dbl>
## 1 CN1800100000000 18         001         Adams County, IN       1990         998        6.5
## 2 CN1800300000000 18         003         Allen County, IN       1990        8317        5.1
## 3 CN1800500000000 18         005         Bartholomew County, IN 1990        1632        4.8
## 4 CN1800700000000 18         007         Benton County, IN      1990         156        3.4
## 5 CN1800900000000 18         009         Blackford County, IN   1990         643        9.2
## 6 CN1801100000000 18         011         Boone County, IN       1990         483        2.4

Now what if we wanted to keep just State_fips, County_fips, Year, and Unemp_rate?

temp<-                                                  # 1
  indiana %>%                                           # 2
  select(c(State_fips, County_fips, Year, Unemp_rate))  # 3
head(temp)                                              # 4
## # A tibble: 6 x 4
##   State_fips County_fips Year  Unemp_rate
##   <chr>      <chr>       <chr>      <dbl>
## 1 18         001         1990         6.5
## 2 18         003         1990         5.1
## 3 18         005         1990         4.8
## 4 18         007         1990         3.4
## 5 18         009         1990         9.2
## 6 18         011         1990         2.4

That is if we include the “-” before our variables, we are telling dplyr through our select argument to drop the variables we list. While if we don’t have the “-,” we are telling dplyr to keep only these variables.

3.6 Merging Datasets

What if we have 2 data sets, both county level. How could we combine them?

3.6.1 Poverty Estimates

We will be merging poverty estimates for the US counties in 2019. A table prepared by the USDA can be found on their website here. Let’s download this and put it into our original_data folder. Now let’s open it and see what the data looks like. One of the first things we can notice is the first 2 lines, United States and Alabama. That is there is county and state level observations in addition to the county level ones. We need to keep this in mind before we merge with our unemployment data.

Try to figure out how to read this data into R by yourself. If you need to review, go back to Downloading/Reading Data. Remember using R Studio to assist with this is probably the simplest way until we get very comfortable with the commands.

Here is the command I use for it:

PovertyEstimates <-                                             # 1
  read_excel(paste0(folder_raw_data, "/PovertyEstimates.xls"),  # 2
  skip = 4)                                                     # 3

Make sure we also have our county unemployment data read in. We can observe both of these data.frames now. The way I would clean this data is the following (of course depends on the question/results you are trying to obtain):

  • unemployment data:
    1. restrict to 2019 data (as our other county data is just 2019)
    2. Drop Year variable (as we only have one year now)
    3. Drop Puerto Rico (State FIPS 72)
    4. combine state_fips and county_fips into a FIPStxt variable (this is a variable in our poverty data which we will later be using as an ID to merge data on.)
  • poverty data:
    1. filter out state/country level observations, leaving just county level.

Here is the code I use for this:

unem_2019<-county_data %>%                                                            # 1
  filter(Year==2019, State_fips!=72) %>% #restrict sample to only year 2019, drop PR  # 2
  select(-Year) %>% #drop the variable Year                                           # 3
  mutate(FIPStxt=paste0(State_fips, County_fips)) #Create variable to merge on        # 4
                                                                                      # 5
pov_2019 <- PovertyEstimates %>%                                                      # 6
  filter(substr(FIPStxt, str_length(FIPStxt)-2, str_length(FIPStxt))!="000")          # 7

Now if we observe both data frames we are left with what we want to merge. We have defined a variable FIPStxt which we would like to join observations on. That is for a given FIPStxt ID in our unemployment data, we would like to find the row with that FIPStxt ID in our poverty data, then add the variables in the poverty data to our unemployment data. Then do this for all iterations of our FIPStxt ID.

There are join commands for this in the dplyr package. The clearest join command is full_join. This matches data frames based on common variable names (or defined variable names to match on.) I print the first 3 rows below.

clean_full<-                         # 1
  unem_2019 %>% full_join(pov_2019)  # 2
## Joining, by = "FIPStxt"
LAUS_code State_fips County_fips County_name Labor_force Employed Unemployed Unemp_rate FIPStxt Stabr Area_name Rural-urban_Continuum_Code_2003 Urban_Influence_Code_2003 Rural-urban_Continuum_Code_2013 Urban_Influence_Code_2013 POVALL_2019 CI90LBALL_2019 CI90UBALL_2019 PCTPOVALL_2019 CI90LBALLP_2019 CI90UBALLP_2019 POV017_2019 CI90LB017_2019 CI90UB017_2019 PCTPOV017_2019 CI90LB017P_2019 CI90UB017P_2019 POV517_2019 CI90LB517_2019 CI90UB517_2019 PCTPOV517_2019 CI90LB517P_2019 CI90UB517P_2019 MEDHHINC_2019 CI90LBINC_2019 CI90UBINC_2019 POV04_2019 CI90LB04_2019 CI90UB04_2019 PCTPOV04_2019 CI90LB04P_2019 CI90UB04P_2019
CN0100100000000 01 001 Autauga County, AL 26172 25458 714 2.7 01001 AL Autauga County 2 2 2 2 6723 5517 7929 12.1 9.9 14.3 2040 1472 2608 15.9 11.5 20.3 1376 902 1850 14.4 9.4 19.4 58233 52517 63949 NA NA NA NA NA NA
CN0100300000000 01 003 Baldwin County, AL 97328 94675 2653 2.7 01003 AL Baldwin County 4 5 3 2 22360 18541 26179 10.1 8.4 11.8 6323 4521 8125 13.5 9.6 17.4 4641 3295 5987 13.3 9.4 17.2 59871 54593 65149 NA NA NA NA NA NA
CN0100500000000 01 005 Barbour County, AL 8537 8213 324 3.8 01005 AL Barbour County 6 6 6 6 5909 4787 7031 27.1 22.0 32.2 2050 1560 2540 41.0 31.2 50.8 1468 1114 1822 39.5 30.0 49.0 35972 31822 40122 NA NA NA NA NA NA

3.6.2 left_join

While the above full_join command is probably the ‘safest’ route since it prints all of the variables and rows of both data frames, knowing more about other joins can save us some time.

For example, consider above when we first loaded in our poverty data. It had observations at state and country wide levels in addition to the county level data we wanted. We would ideally want a command that took in the county unemployment data, matched the poverty data based on FIPStxt, and then just exclude any extras from the poverty data. We can do this with left (or right) join.

unclean_left<-                               # 1
  unem_2019 %>% left_join(PovertyEstimates)  # 2
## Joining, by = "FIPStxt"
LAUS_code State_fips County_fips County_name Labor_force Employed Unemployed Unemp_rate FIPStxt Stabr Area_name Rural-urban_Continuum_Code_2003 Urban_Influence_Code_2003 Rural-urban_Continuum_Code_2013 Urban_Influence_Code_2013 POVALL_2019 CI90LBALL_2019 CI90UBALL_2019 PCTPOVALL_2019 CI90LBALLP_2019 CI90UBALLP_2019 POV017_2019 CI90LB017_2019 CI90UB017_2019 PCTPOV017_2019 CI90LB017P_2019 CI90UB017P_2019 POV517_2019 CI90LB517_2019 CI90UB517_2019 PCTPOV517_2019 CI90LB517P_2019 CI90UB517P_2019 MEDHHINC_2019 CI90LBINC_2019 CI90UBINC_2019 POV04_2019 CI90LB04_2019 CI90UB04_2019 PCTPOV04_2019 CI90LB04P_2019 CI90UB04P_2019
CN0100100000000 01 001 Autauga County, AL 26172 25458 714 2.7 01001 AL Autauga County 2 2 2 2 6723 5517 7929 12.1 9.9 14.3 2040 1472 2608 15.9 11.5 20.3 1376 902 1850 14.4 9.4 19.4 58233 52517 63949 NA NA NA NA NA NA
CN0100300000000 01 003 Baldwin County, AL 97328 94675 2653 2.7 01003 AL Baldwin County 4 5 3 2 22360 18541 26179 10.1 8.4 11.8 6323 4521 8125 13.5 9.6 17.4 4641 3295 5987 13.3 9.4 17.2 59871 54593 65149 NA NA NA NA NA NA
CN0100500000000 01 005 Barbour County, AL 8537 8213 324 3.8 01005 AL Barbour County 6 6 6 6 5909 4787 7031 27.1 22.0 32.2 2050 1560 2540 41.0 31.2 50.8 1468 1114 1822 39.5 30.0 49.0 35972 31822 40122 NA NA NA NA NA NA

In this case we are starting with our unemployment data, and then telling R to match all possible values by our common variable (FIPStxt here), then we don’t care about any poverty estimate observations that can’t be matched. However what about unemployment observations that can’t be? We simply fill these values with NA’s. For example, if we DON’T exclude Puerto Rico, it would be in our unemployment data, but not in our poverty data. Let’s see what happens:

unem_2019_pr<-county_data %>%                      # 1
  filter(Year==2019) %>%                           # 2
  select(-Year) %>%                                # 3
  mutate(FIPStxt=paste0(State_fips, County_fips))  # 4
pr<-                                               # 5
  unem_2019_pr %>%                                 # 6
  left_join(PovertyEstimates) %>%                  # 7
  filter(State_fips==72)                           # 8
## Joining, by = "FIPStxt"
LAUS_code State_fips County_fips County_name Labor_force Employed Unemployed Unemp_rate FIPStxt Stabr Area_name Rural-urban_Continuum_Code_2003 Urban_Influence_Code_2003 Rural-urban_Continuum_Code_2013 Urban_Influence_Code_2013 POVALL_2019 CI90LBALL_2019 CI90UBALL_2019 PCTPOVALL_2019 CI90LBALLP_2019 CI90UBALLP_2019 POV017_2019 CI90LB017_2019 CI90UB017_2019 PCTPOV017_2019 CI90LB017P_2019 CI90UB017P_2019 POV517_2019 CI90LB517_2019 CI90UB517_2019 PCTPOV517_2019 CI90LB517P_2019 CI90UB517P_2019 MEDHHINC_2019 CI90LBINC_2019 CI90UBINC_2019 POV04_2019 CI90LB04_2019 CI90UB04_2019 PCTPOV04_2019 CI90LB04P_2019 CI90UB04P_2019
CN7200100000000 72 001 Adjuntas Municipio, PR 4115 3485 630 15.3 72001 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
CN7200300000000 72 003 Aguada Municipio, PR 11743 10553 1190 10.1 72003 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
CN7200500000000 72 005 Aguadilla Municipio, PR 14350 12881 1469 10.2 72005 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

3.6.3 Additional Resources

In addition to full and left join shown here, there is also inner join and right join. Details of these can be found here.