Collapse/Gen Question

Davia Downey

Join Date: Jul 2017
Posts: 135

Collapse/Gen Question

25 May 2022, 13:11

Hi Statalisters,

I have a collapse/gen question. What you’ll observe in the dataset is a variable called DeclarationRequestDate which corresponds to a county, state, and fips code in each state from 2014-2019. What I need to do is create a county(sum) variable that counts the number of times a county-issued a Declaration Request during this time period. I need to retain the fips and the newly created count variable so that I can then merge with a master dataset (I know how to do this step). I think a collapse gen code is necessary but I’m not sure the correct sequence to generate a count variable and then identify the number of times a county requested a disaster declaration. Any assistance or tips are appreciated. A data example is provided so you can see the shape of the data.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str30 sname str52 cname float(year fips) double DeclarationRequestDate
"West Virginia" "Cabell"       2014 54011 19732.833333333336
"West Virginia" "Jackson"      2014 54035 19732.833333333336
"West Virginia" "Putnam"       2014 54079 19732.833333333336
"West Virginia" "Kanawha"      2014 54039 19732.833333333336
"West Virginia" "Roane"        2014 54087 19732.833333333336
"West Virginia" "Boone"        2014 54005 19732.833333333336
"West Virginia" "Clay"         2014 54015 19732.833333333336
"West Virginia" "Lincoln"      2014 54043 19732.833333333336
"West Virginia" "Logan"        2014 54045 19732.833333333336
"Oklahoma"      "Le Flore"     2014 40079              19740
"Vermont"       "Franklin"     2014 50011              19740
"Oklahoma"      "Pushmataha"   2014 40127              19740
"Vermont"       "Lamoille"     2014 50015              19740
"Vermont"       "Grand Isle"   2014 50013              19740
"Vermont"       "Chittenden"   2014 50007              19740
"Oklahoma"      "Choctaw"      2014 40023              19740
"Vermont"       "Orleans"      2014 50019              19740
"Vermont"       "Caledonia"    2014 50005              19740
"Vermont"       "Essex"        2014 50009              19740
"Pennsylvania"  "Delaware"     2014 42045 19759.865972222222
"Pennsylvania"  "Chester"      2014 42029 19759.865972222222
"Pennsylvania"  "Lancaster"    2014 42071 19759.865972222222
"Pennsylvania"  "Montgomery"   2014 42091 19759.865972222222
"Pennsylvania"  "Philadelphia" 2014 42101 19759.865972222222
"Pennsylvania"  "York"         2014 42133 19759.865972222222
"Pennsylvania"  "Bucks"        2014 42017 19759.865972222222
"Georgia"       "Henry"        2014 13151              19764
"Georgia"       "Clayton"      2014 13063              19764
"Georgia"       "Rabun"        2014 13241              19764
"Georgia"       "Paulding"     2014 13223              19764
"Georgia"       "Carroll"      2014 13045              19764
"Georgia"       "Twiggs"       2014 13289              19764
"Georgia"       "Candler"      2014 13043              19764
"Georgia"       "Screven"      2014 13251              19764
"Georgia"       "Fayette"      2014 13113              19764
"Georgia"       "Polk"         2014 13233              19764
"Georgia"       "Coweta"       2014 13077              19764
"Georgia"       "Gordon"       2014 13129              19764
"Georgia"       "Douglas"      2014 13097              19764
"Georgia"       "Burke"        2014 13033              19764
"Georgia"       "Forsyth"      2014 13117              19764
"Georgia"       "Clarke"       2014 13059              19764
"Georgia"       "Johnson"      2014 13167              19764
"Georgia"       "Baldwin"      2014 13009              19764
"Georgia"       "Oconee"       2014 13219              19764
"Georgia"       "Emanuel"      2014 13107              19764
"Georgia"       "Upson"        2014 13293              19764
"Georgia"       "Butts"        2014 13035              19764
"Georgia"       "Wilkinson"    2014 13319              19764
"Georgia"       "Laurens"      2014 13175              19764
end
format %td DeclarationRequestDate

Last edited by Davia Downey; 25 May 2022, 13:24.

Tags: collapse, data

Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#2

25 May 2022, 13:21

In your data example, each county/state combination appears only once. So the answer is in each case "once" and there is nothing more to compute.

What is less clear, however, is how to approach this in your real data set. I do not work in this field and so I'm not familiar with what the data might look like. But might a county file multiple disaster requests at the same time (for example, one to the state governor and another to the federal government, or something like that)? If so, do you want to count those as separate requests, or to lump all requests on the same date as a single "time?" In addition to answering this question, it would probably be helpful to post a different data example that includes at least one county that, by whatever your criteria are, did file requests more than one time, and in your post point out that county and indicate what your hand-calculation of the correct answer would be.
Comment

Davia Downey

Join Date: Jul 2017
Posts: 135

25 May 2022, 13:31

These are good questions. This is a FEMA dataset so the data I provided represents federal disaster declarations. Counties can only ask once per disaster in any given year (there's also a variable with a unique declaration number that I didn't include). So, in essence, even if more than one state/area declares a disaster, it would be unique to the county that requested aid. In the case of this data, every year there are new disasters that occur so it is possible that the same area requests new assistance during a new disaster. So something like:

County 123 in 2014 declared 5 disasters during different dates and in 2015 the same county declared a disaster 7 times and so on...so what I need is a variable that summarizes the number of times each county declared a disaster over the 5 year period. I've attached a different data example so you can see what I mean.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str30 sname str52 cname float(year fips) double DeclarationRequestDate
"Alabama" "Autauga"  2016 1001          20467.625
"Alabama" "Autauga"  2017 1001              21099
"Alabama" "Autauga"  2017 1001              21122
"Alabama" "Autauga"  2017 1001              21072
"Alabama" "Baldwin"  2014 1003              19844
"Alabama" "Baldwin"  2017 1003              21122
"Alabama" "Baldwin"  2017 1003              21072
"Alabama" "Baldwin"  2017 1003              21099
"Alabama" "Baldwin"  2018 1003              21467
"Alabama" "Barbour"  2016 1005          20467.625
"Alabama" "Barbour"  2017 1005              21072
"Alabama" "Barbour"  2017 1005              21099
"Alabama" "Barbour"  2018 1005              21467
"Alabama" "Bibb"     2017 1007              21072
"Alabama" "Bibb"     2017 1007              21099
"Alabama" "Blount"   2014 1009              19844
"Alabama" "Blount"   2016 1009          20467.625
"Alabama" "Blount"   2017 1009              21072
"Alabama" "Blount"   2019 1009              21634
"Alabama" "Bullock"  2014 1011              19844
"Alabama" "Bullock"  2016 1011          20467.625
"Alabama" "Bullock"  2017 1011              21072
"Alabama" "Bullock"  2017 1011              21099
"Alabama" "Bullock"  2018 1011              21467
"Alabama" "Butler"   2014 1013              19844
"Alabama" "Butler"   2016 1013          20467.625
"Alabama" "Butler"   2017 1013              21099
"Alabama" "Butler"   2017 1013              21072
"Alabama" "Butler"   2018 1013              21467
"Alabama" "Calhoun"  2017 1015              21072
"Alabama" "Calhoun"  2018 1015 21286.140277777777
"Alabama" "Chambers" 2016 1017          20467.625
"Alabama" "Chambers" 2017 1017              21072
"Alabama" "Cherokee" 2016 1019          20467.625
"Alabama" "Cherokee" 2017 1019              21072
"Alabama" "Cherokee" 2019 1019              21634
"Alabama" "Chilton"  2017 1021              21072
"Alabama" "Chilton"  2017 1021              21099
"Alabama" "Choctaw"  2017 1023              21122
"Alabama" "Choctaw"  2017 1023              21099
"Alabama" "Choctaw"  2017 1023              21072
"Alabama" "Clarke"   2017 1025              21122
"Alabama" "Clarke"   2017 1025              21099
"Alabama" "Clarke"   2017 1025              21072
"Alabama" "Clay"     2016 1027          20467.625
"Alabama" "Clay"     2017 1027              21072
"Alabama" "Clay"     2017 1027              21122
"Alabama" "Cleburne" 2016 1029          20467.625
"Alabama" "Cleburne" 2017 1029              21072
"Alabama" "Coffee"   2016 1031          20467.625
end
format %td DeclarationRequestDate

Comment

Davia Downey

Join Date: Jul 2017

Posts: 135
#4

25 May 2022, 13:53

Clyde Schechter I got this to work using the following code:
egen count = count(DeclarationRequestDate) by (fips)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#5

25 May 2022, 14:02

Good! That looks right to me.
Comment

Davia Downey

Join Date: Jul 2017
Posts: 135

25 May 2022, 14:39

Clyde Schechter But I still need your help with joinby (if you can guide me!). Now, I have the count variable in the FEMA dataset but need to bring it into my master for analysis. In the FEMA dataset the data looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str52 cname float(countofdisasterrequests fips id) str2 sab str30 sname float year
"Autauga" 4 1001  1 "AL" "Alabama" 2016
"Autauga" 4 1001  2 "AL" "Alabama" 2017
"Autauga" 4 1001  3 "AL" "Alabama" 2017
"Autauga" 4 1001  4 "AL" "Alabama" 2017
"Baldwin" 5 1003  5 "AL" "Alabama" 2014
"Baldwin" 5 1003  6 "AL" "Alabama" 2017
"Baldwin" 5 1003  7 "AL" "Alabama" 2017
"Baldwin" 5 1003  8 "AL" "Alabama" 2017
"Baldwin" 5 1003  9 "AL" "Alabama" 2018
"Barbour" 4 1005 10 "AL" "Alabama" 2016
"Barbour" 4 1005 11 "AL" "Alabama" 2017
"Barbour" 4 1005 12 "AL" "Alabama" 2017
"Barbour" 4 1005 13 "AL" "Alabama" 2018
"Bibb"    2 1007 14 "AL" "Alabama" 2017
"Bibb"    2 1007 15 "AL" "Alabama" 2017
"Blount"  4 1009 16 "AL" "Alabama" 2014
"Blount"  4 1009 17 "AL" "Alabama" 2016
"Blount"  4 1009 18 "AL" "Alabama" 2017
"Blount"  4 1009 19 "AL" "Alabama" 2019
"Bullock" 5 1011 20 "AL" "Alabama" 2014
"Bullock" 5 1011 21 "AL" "Alabama" 2016
"Bullock" 5 1011 22 "AL" "Alabama" 2017
"Bullock" 5 1011 23 "AL" "Alabama" 2017
"Bullock" 5 1011 24 "AL" "Alabama" 2018
"Butler"  5 1013 25 "AL" "Alabama" 2014
end

------------------ copy up to and including the previous line ------------------

Listed 25 out of 6387 observations

and in the master, the data looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str28 cname long fips str4 sab int year str19 RISK_RATNG
"Acadia"   22001 "LA" 2019 "Relatively Moderate"
"Acadia"   22001 "LA" 2014 "Relatively Moderate"
"Acadia"   22001 "LA" 2015 "Relatively Moderate"
"Acadia"   22001 "LA" 2020 "Relatively Moderate"
"Acadia"   22001 "LA" 2018 "Relatively Moderate"
"Acadia"   22001 "LA" 2016 "Relatively Moderate"
"Acadia"   22001 "LA" 2017 "Relatively Moderate"
"Accomack" 51001 "VA" 2016 "Relatively Moderate"
"Accomack" 51001 "VA" 2020 "Relatively Moderate"
"Accomack" 51001 "VA" 2017 "Relatively Moderate"
"Accomack" 51001 "VA" 2019 "Relatively Moderate"
"Accomack" 51001 "VA" 2014 "Relatively Moderate"
"Accomack" 51001 "VA" 2018 "Relatively Moderate"
"Accomack" 51001 "VA" 2015 "Relatively Moderate"
"Ada"      16001 "ID" 2018 "Relatively Low"     
"Ada"      16001 "ID" 2015 "Relatively Low"     
"Ada"      16001 "ID" 2020 "Relatively Low"     
"Ada"      16001 "ID" 2016 "Relatively Low"     
"Ada"      16001 "ID" 2017 "Relatively Low"     
"Ada"      16001 "ID" 2014 "Relatively Low"     
"Ada"      16001 "ID" 2019 "Relatively Low"     
"Adams"    38001 "ND" 2018 "Very Low"           
"Adams"    18001 "IN" 2018 "Very Low"           
"Adams"    31001 "NE" 2019 "Relatively Moderate"
"Adams"    53001 "WA" 2019 "Relatively Low"     
end

So what I need to do next is use joinby to add the countofdisasters request to my master without dropping any of the years where a county did not declare a disaster so that when I run a panel regression I can estimate a model that includes the countofdisasterrequests along with other variables from the master. You'll note that Acadia does not show up in the FEMA data, this is an example of a county that did not issue a disaster declaration during the time period I'm interested in but has other data connected to it that I still need for my models. I hope this makes sense.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30357

25 May 2022, 14:46

At least based on your example data, you do not need -joinby- for this.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str52 cname float(countofdisasterrequests fips id) str2 sab str30 sname float year
"Autauga" 4 1001  1 "AL" "Alabama" 2016
"Autauga" 4 1001  2 "AL" "Alabama" 2017
"Autauga" 4 1001  3 "AL" "Alabama" 2017
"Autauga" 4 1001  4 "AL" "Alabama" 2017
"Baldwin" 5 1003  5 "AL" "Alabama" 2014
"Baldwin" 5 1003  6 "AL" "Alabama" 2017
"Baldwin" 5 1003  7 "AL" "Alabama" 2017
"Baldwin" 5 1003  8 "AL" "Alabama" 2017
"Baldwin" 5 1003  9 "AL" "Alabama" 2018
"Barbour" 4 1005 10 "AL" "Alabama" 2016
"Barbour" 4 1005 11 "AL" "Alabama" 2017
"Barbour" 4 1005 12 "AL" "Alabama" 2017
"Barbour" 4 1005 13 "AL" "Alabama" 2018
"Bibb"    2 1007 14 "AL" "Alabama" 2017
"Bibb"    2 1007 15 "AL" "Alabama" 2017
"Blount"  4 1009 16 "AL" "Alabama" 2014
"Blount"  4 1009 17 "AL" "Alabama" 2016
"Blount"  4 1009 18 "AL" "Alabama" 2017
"Blount"  4 1009 19 "AL" "Alabama" 2019
"Bullock" 5 1011 20 "AL" "Alabama" 2014
"Bullock" 5 1011 21 "AL" "Alabama" 2016
"Bullock" 5 1011 22 "AL" "Alabama" 2017
"Bullock" 5 1011 23 "AL" "Alabama" 2017
"Bullock" 5 1011 24 "AL" "Alabama" 2018
"Butler"  5 1013 25 "AL" "Alabama" 2014
end
tempfile dataset1
save `dataset1'

* Example generated by -dataex-. For more info, type help dataex
clear
input str28 cname long fips str4 sab int year str19 RISK_RATNG
"Acadia"   22001 "LA" 2019 "Relatively Moderate"
"Acadia"   22001 "LA" 2014 "Relatively Moderate"
"Acadia"   22001 "LA" 2015 "Relatively Moderate"
"Acadia"   22001 "LA" 2020 "Relatively Moderate"
"Acadia"   22001 "LA" 2018 "Relatively Moderate"
"Acadia"   22001 "LA" 2016 "Relatively Moderate"
"Acadia"   22001 "LA" 2017 "Relatively Moderate"
"Accomack" 51001 "VA" 2016 "Relatively Moderate"
"Accomack" 51001 "VA" 2020 "Relatively Moderate"
"Accomack" 51001 "VA" 2017 "Relatively Moderate"
"Accomack" 51001 "VA" 2019 "Relatively Moderate"
"Accomack" 51001 "VA" 2014 "Relatively Moderate"
"Accomack" 51001 "VA" 2018 "Relatively Moderate"
"Accomack" 51001 "VA" 2015 "Relatively Moderate"
"Ada"      16001 "ID" 2018 "Relatively Low"     
"Ada"      16001 "ID" 2015 "Relatively Low"     
"Ada"      16001 "ID" 2020 "Relatively Low"     
"Ada"      16001 "ID" 2016 "Relatively Low"     
"Ada"      16001 "ID" 2017 "Relatively Low"     
"Ada"      16001 "ID" 2014 "Relatively Low"     
"Ada"      16001 "ID" 2019 "Relatively Low"     
"Adams"    38001 "ND" 2018 "Very Low"           
"Adams"    18001 "IN" 2018 "Very Low"           
"Adams"    31001 "NE" 2019 "Relatively Moderate"
"Adams"    53001 "WA" 2019 "Relatively Low"     
end
tempfile dataset2
save `dataset2'

use `dataset1', clear
merge m:1 fips year using `dataset2'
replace countofdisasterrequests = 0 if missing(countofdisasterrequests)

will give you what you need. The final command, changing the count of disaster requests to 0 for those counties that never had one will be necessary if you are going to include those counties in a model that uses that count as a variable. And 0 is the proper choice here because those counties really did submit 0 requests.

Comment

Davia Downey

Join Date: Jul 2017

Posts: 135
#8

25 May 2022, 14:54

I am getting a 'variables fips year do not uniquely identify observations in the using data' error here.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#9

25 May 2022, 15:27

OK, so the question is why the second data set has multiple observations with the same fips and year. In your example data it does not. So you need to do some detective work.

Broadly there are two cases: they are supposed to be there, or they are not.

If they are supposed to be there, then we need to understand what the relationships between the observations in the first and second data sets are and how to specify which observations with a given fips and year in the first should be paired with which observations of the at same fips and year in the second. If they are not supposed to be there, we need to repair the broken data set.

Probably the first step in either case is to look at the duplicate observations:

Code:

use `dataset2', clear duplicates tag fips year, gen(flag) browse if flag

will show them to you. Does it look like these should really be there? If so, can you figure out which one is the right one to go with which observations from data set 1?

If they shouldn't be there, can you eliminate the surplus observations? (Better still, can you fix the data management that created the data set so that they aren't there in the first place, also fixing any other mistakes made along the way?)

While we're at it, let's look more carefully at dataset 1. Why does fips 1011 appear twice in year 2017? And fips 1001 and 1003 each appear three times in 2017 for what reason? What's going on in 2017? What purpose do these repeated observations serve in your data? How did they get there?

In short, we need a deeper understanding of both data sets to resolve this.
Comment
Davia Downey

Join Date: Jul 2017

Posts: 135
#10

25 May 2022, 16:04

It is because there's a variable called DisasterDeclarationType. Basically, FEMA allows for different types of declarations (MAJOR or EMERGENCY) so when I created the count variable initially it counted both as unique types in the count. They should technically be counted, but don't necessarily need them to be included in the count variable (or I could create a count_emergency and a count_major variable to separate the two different types). I think what I need to do here is go back a few steps, eliminate one of the categories or code based on the difference in declaration type and then proceed with the merge you suggested above. I'll be sure to ping you if I run into issues. Thanks for your patience. The difference between an emergency declaration is that it is short-term whereas a major declaration unlocks more money just in case you need to know this.

Last edited by Davia Downey; 25 May 2022, 16:38.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#11

25 May 2022, 16:24

Thanks for the explanation. Good luck with implementing that--it shouldn't be too hard. I'll be here the rest of the day on and off if you need help with it. But I'm going to be off list for a while starting tomorrow. Hopefully somebody else will pick up the thread in my absence if you need it.
2 likes
Comment

Announcement

Collapse/Gen Question

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment