Creating an indicator for when different observations in a group have the same value on a variable

Tom Scott

Join Date: Apr 2019

Posts: 266
#1

Creating an indicator for when different observations in a group have the same value on a variable

16 Sep 2020, 22:21

Could someone please take a look at a line of code and tell me where I am going wrong. Of my 3,000 US counties, I have 50 counties with a county police agency (agencysamptype == 700) and a county sheriff's agency (agencysamptype == 1) that have the same value of POPESTIMATE2010, which equals the agency's population served/policed in 2010. Other counties have a county police department and a county sheriff's office but with different population served values. I am trying to generate an indicator equal to 1 when a county has a county police department and a county sheriff's office with the same value of POPESTIMATE2010. I tried two different codes based on things I read online, but neither are working. Once I create the indicator, I would like to apply the result to every observation in the county, even agencies that are not county police departments or sheriff's offices. Any help with this code would also be much appreciated.

Code 1:

by fstate fcounty, sort: gen samepop = cond(POPESTIMATE2010/(agencysamptype == 1) == POPESTIMATE2010/(agencysamptype == 700), 1, 0)

Based on: https://www.stata.com/support/faqs/d...ng-properties/

Code 2:

sort fstate fcounty agencysamptype POPESTIMATE2010
by fstate fcounty, sort: gen samepop = cond(POPESTIMATE2010[1] == POPESTIMATE2010[_N], 1, 0)

Based on: https://www.stata.com/support/faqs/d...ions-in-group/

Thank you for your time!

Tom

Last edited by Tom Scott; 16 Sep 2020, 22:24.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17704

17 Sep 2020, 04:03

Tom:
probably too basic a hint (that you can hopefully tweak according to your research goals):

Code:

. set obs 3
number of observations (_N) was 0, now 3

. g police=700 in 1/2
(1 missing value generated)

. replace police=600 if police==.
(1 real change made)

. g sheriff=700

. egen equal=group( police sheriff)

. list

     +--------------------------+
     | police   sheriff   equal |
     |--------------------------|
  1. |    700       700       2 |
  2. |    700       700       2 |
  3. |    600       700       1 |
     +--------------------------+


. g indicator=1 if equal==2
(1 missing value generated)

. replace indicator=0 if equal!=2
(1 real change made)

. list

     +-------------------------------------+
     | police   sheriff   equal   indica~r |
     |-------------------------------------|
  1. |    700       700       2          1 |
  2. |    700       700       2          1 |
  3. |    600       700       1          0 |
     +-------------------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Tom Scott

Join Date: Apr 2019

Posts: 266
#3

17 Sep 2020, 05:49

Carlo Lazzaro thank you for your response, but that is not what I am looking for. I am trying to indicate whether different observations in the same group (county-state) have the same value of POPESTIMATE2010. Each observation can only be a sheriff's office or a county police department, not both. I don't think your code applies to linking different observations based on values on a single variable
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#4

17 Sep 2020, 07:10

If the data is sorted, a comparison of the 1st and the last observation detects whether all observations are the same:

Code:

bysort fstate fcounty (POPESTIMATE2010): gen byte samepop = (POPESTIMATE2010[1] == POPESTIMATE2010[_N])

You can add onto that a comparison to see if the county in question has both a police and sheriff's office, which, if I recall correctly from your other post, is an available variable for each observation in your dataset.
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#5

17 Sep 2020, 12:00

Mike Lacy, thank you for your response. I also need to sort by agencysamptype and POPESTIMATE, since I want to look at the first case (always county sheriff because agencysamptype==1) and last case (county pd with the greatest population), by county-state. When I don't add that to the code it doesn't sort the cases that way by county-state, but when I do add that to the code it's not working right because it seems to also be looking within agencytype within county-state instead of just within county-state. Am I missing something? Thank you again for helping me through this.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10168
#6

17 Sep 2020, 12:11

Create a data example with about 10 observations illustrating the issue and your expectation. Your previous post had a data example but was long with too many details. My attention span is short, so I passed it over.
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#7

17 Sep 2020, 12:54

I'm with Andrew here. "I also need to sort by agencysamptype" describes a means, not an end, and I don't understand quite what your desired end is. The best I can say here without an example would be that perhaps you are not using parentheses quite correctly on the -bysort- command.
1 like
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#8

20 Sep 2020, 19:45

Andrew Musau Mike Lacy Thanks you for your time. I provided a short data example below. Here, fstate and fcounty are numeric indicators, agencysamptype is nominal (1 = sheriff, 3 = local pd, and 4 = county pd), and popestimate10 is the 2010 population of the agency's jurisdiction (either city or county). I have two variables I generated that I am hoping to populate--primary population and secondary population. I am doing everything within county-state. In my data example there are two county-states, with different circumstances that I am trying to contend with, described in separate paragraphs below.

First, for county-states with a sheriff's office (agencysamptype==1) and county pd (agencysamptype==4) that have a different value in popestimate10, I would like to replace the sheriff's office's missing primarypop with its value of popestimate10 minus the sum of the county pd and local pd's popestimate10 value in that county-state.For the sheriff's office's secondarypop value, I would like to replace the missing value with the value of the summed local pd and county pd value (i.e., the value subtracted from the primarypop variable). Mike Lacy gave me the code for this in my other post. Thank you again!

Second, when the county pd and sheriff's office within the same county-state have the same value on popestimate10, I would like to do the same thing stated above but for the county pd while replacing the sheriff's office's primarypop and secondarypop values with zeroes. Some county-states have multiple county pds, with only one having the same population as the county sheriff's office. Here, I want to treat the county pds with a different population as the sheriff's office the same way as county pds and local pds in the above paragraph--sum them and subtract them from the popestimate10 value of the county pd with the same population as the sheriff office to use as the value of primarypop and use the summed value as the county pd's secondary population value.

I hope this is clear and the data example below is sufficient. Please let me know if not. Thank you again for your time and assistance.

inp fstate fcounty agencysamptype popestimate10 primarypop secondarypop
1 1 4 250 . .
1 1 3 300 . .
1 1 1 1000 . .
2 4 4 3000 . .
2 4 3 500 . .
2 4 4 150
2 4 1 3000 . .
end
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10168

21 Sep 2020, 16:07

Reading #1 and #2, you are repeating most of the same things, so if you understand how to do the first, you can do the second.

Code:

*CREATE INDICATORS
bys fstate fcounty: egen sheriff= max(agencysamptype==1)
bys fstate fcounty: egen countypd= max(agencysamptype==4)

*SUM OF COUNTY AND LOCAL PD
bys fstate fcounty: egen sumpop10cl=  ///
total(inlist(agencysamptype, 3, 4)*popestimate10)

*WANTED
bys fstate fcounty (popestimate10): replace primarypop = ///
popestimate10- sumpop10cl if popestimate10[1]!= popestimate10[_N] ///
& sheriff& countypd & agencysamptype==1

bys fstate fcounty (popestimate10): replace secondarypop = ///
sumpop10cl if popestimate10[1]!= popestimate10[_N] & sheriff& ///
countypd & agencysamptype==1

For #2, just tag the county pd with the same population as the sheriff's office and proceed as above.

Comment

Tom Scott

Join Date: Apr 2019

Posts: 266
#10

21 Sep 2020, 17:10

Andrew Musau Thank you very much for your help. It is much appreciated.
Comment

Announcement

Creating an indicator for when different observations in a group have the same value on a variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment