Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking if all counties are observed for all months of a year.

    I have monthly data over all US counties. However, for some counties, I only have one observation (likely the annual sum). How can I list which counties are not observed over all months?

  • #2
    It depends on how your data are organized. This can't be answered without example data being shown. Please fire up the -dataex- command and post back.

    If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hi Clyde, Here's a sample of my panel data.
      1. What I need to check is which counties which have reporting gaps in their crime data. My sample period is 1987-2018 and I want to check if all counties are observed across the entire sample period.
      2. Secondly, since I would like to drop the counties that have large reporting gaps, how do I then apply a condition over all observations of a group and then drop the group if it satisfies the condition?
      clear
      input int year long fips str33 county str20 Total_Crime
      1987 1001 "AUTAUGA" "ALABAMA" 1599
      1988 1001 "AUTAUGA" "ALABAMA" 1488
      1989 1001 "AUTAUGA" "ALABAMA" 1784
      1990 1001 "AUTAUGA" "ALABAMA" 1442
      1991 1001 "AUTAUGA" "ALABAMA" 1804
      1992 1001 "AUTAUGA" "ALABAMA" 2291
      1993 1001 "AUTAUGA" "ALABAMA" 1954
      1994 1001 "AUTAUGA" "ALABAMA" 2361
      1995 1001 "AUTAUGA" "ALABAMA" 2511
      1996 1001 "AUTAUGA" "ALABAMA" 3053
      1997 1001 "AUTAUGA" "ALABAMA" 3078
      1998 1001 "AUTAUGA" "ALABAMA" 3093
      1999 1001 "AUTAUGA" "ALABAMA" 2827
      2000 1001 "AUTAUGA" "ALABAMA" 2876
      2001 1001 "AUTAUGA" "ALABAMA" 2868
      2002 1001 "AUTAUGA" "ALABAMA" 2786
      2003 1001 "AUTAUGA" "ALABAMA" 3190
      2004 1001 "AUTAUGA" "ALABAMA" 3342
      2005 1001 "AUTAUGA" "ALABAMA" 3295
      2006 1001 "AUTAUGA" "ALABAMA" 2938
      2007 1001 "AUTAUGA" "ALABAMA" 3212
      2008 1001 "AUTAUGA" "ALABAMA" 2799
      2009 1001 "AUTAUGA" "ALABAMA" 1674
      2010 1001 "AUTAUGA" "ALABAMA" 2939
      2015 1001 "AUTAUGA" "ALABAMA" 2759
      2016 1001 "AUTAUGA" "ALABAMA" 3363
      2017 1001 "AUTAUGA" "ALABAMA" 3640
      2018 1001 "AUTAUGA" "ALABAMA" 2351
      1987 1003 "BALDWIN" "ALABAMA" 3918
      1988 1003 "BALDWIN" "ALABAMA" 3820
      1989 1003 "BALDWIN" "ALABAMA" 4295
      1990 1003 "BALDWIN" "ALABAMA" 4628
      1991 1003 "BALDWIN" "ALABAMA" 5145
      1992 1003 "BALDWIN" "ALABAMA" 5289
      1993 1003 "BALDWIN" "ALABAMA" 5247
      1994 1003 "BALDWIN" "ALABAMA" 5208
      1995 1003 "BALDWIN" "ALABAMA" 7060
      1996 1003 "BALDWIN" "ALABAMA" 6921
      1997 1003 "BALDWIN" "ALABAMA" 6724
      1998 1003 "BALDWIN" "ALABAMA" 7004
      1999 1003 "BALDWIN" "ALABAMA" 8214
      2002 1003 "BALDWIN" "ALABAMA" 8425
      2003 1003 "BALDWIN" "ALABAMA" 8302
      2004 1003 "BALDWIN" "ALABAMA" 5646
      2009 1003 "BALDWIN" "ALABAMA" 8208
      2010 1003 "BALDWIN" "ALABAMA" 9165
      2015 1003 "BALDWIN" "ALABAMA" 14805
      2016 1003 "BALDWIN" "ALABAMA" 15737
      2017 1003 "BALDWIN" "ALABAMA" 16736
      2018 1003 "BALDWIN" "ALABAMA" 18413
      1987 1005 "BARBOUR" "ALABAMA" 947
      1988 1005 "BARBOUR" "ALABAMA" 837
      1989 1005 "BARBOUR" "ALABAMA" 1188
      1990 1005 "BARBOUR" "ALABAMA" 976
      1991 1005 "BARBOUR" "ALABAMA" 1023
      1992 1005 "BARBOUR" "ALABAMA" 781
      1993 1005 "BARBOUR" "ALABAMA" 663
      1994 1005 "BARBOUR" "ALABAMA" 785
      1995 1005 "BARBOUR" "ALABAMA" 773
      1996 1005 "BARBOUR" "ALABAMA" 896
      1997 1005 "BARBOUR" "ALABAMA" 746
      1998 1005 "BARBOUR" "ALABAMA" 970
      1999 1005 "BARBOUR" "ALABAMA" 1094
      2001 1005 "BARBOUR" "ALABAMA" 1128
      2002 1005 "BARBOUR" "ALABAMA" 1158
      2003 1005 "BARBOUR" "ALABAMA" 1260
      2004 1005 "BARBOUR" "ALABAMA" 1391
      2005 1005 "BARBOUR" "ALABAMA" 56
      2009 1005 "BARBOUR" "ALABAMA" 1226
      2010 1005 "BARBOUR" "ALABAMA" 810
      2015 1005 "BARBOUR" "ALABAMA" 936
      2016 1005 "BARBOUR" "ALABAMA" 864
      2017 1005 "BARBOUR" "ALABAMA" 977
      2018 1005 "BARBOUR" "ALABAMA" 1019
      1991 1007 "BIBB" "ALABAMA" 1
      2002 1007 "BIBB" "ALABAMA" 521
      2004 1007 "BIBB" "ALABAMA" 884
      2006 1007 "BIBB" "ALABAMA" 720
      2007 1007 "BIBB" "ALABAMA" 1116
      2008 1007 "BIBB" "ALABAMA" 1214
      2009 1007 "BIBB" "ALABAMA" 1188
      2010 1007 "BIBB" "ALABAMA" 817
      2015 1007 "BIBB" "ALABAMA" 839
      2016 1007 "BIBB" "ALABAMA" 790
      2017 1007 "BIBB" "ALABAMA" 991
      2018 1007 "BIBB" "ALABAMA" 1163
      1987 1009 "BLOUNT" "ALABAMA" 1031
      1988 1009 "BLOUNT" "ALABAMA" 1159
      1989 1009 "BLOUNT" "ALABAMA" 1135
      1990 1009 "BLOUNT" "ALABAMA" 1248
      1991 1009 "BLOUNT" "ALABAMA" 1385
      1992 1009 "BLOUNT" "ALABAMA" 1524
      1993 1009 "BLOUNT" "ALABAMA" 1196
      1994 1009 "BLOUNT" "ALABAMA" 1218
      1995 1009 "BLOUNT" "ALABAMA" 1298
      1996 1009 "BLOUNT" "ALABAMA" 834
      1997 1009 "BLOUNT" "ALABAMA" 1373
      1998 1009 "BLOUNT" "ALABAMA" 1526
      1999 1009 "BLOUNT" "ALABAMA" 1267
      2000 1009 "BLOUNT" "ALABAMA" 1523
      end

      Comment


      • #4
        OK. It turns out that all of the counties in your example have some kind of gap.
        I should also mention that something is wrong with your -dataex- output: it doesn't mention state as a variable, so what's actually under Total_Crime is the state, and Total_Crime is lost when we run it. I think you did something to modify your -dataex- output. Please don't do that! Although this particular modification did not result in anything that prevented solving this problem. But in the future, it might. So always be sure to post the -dataex- output exactly as it was, and don't make any changes to it.

        Here is my solution, along with a working -dataex-:
        Code:
        clear
        input int year long fips str33 county str20 state float Total_Crime
        1987 1001 "AUTAUGA" "ALABAMA" 1599
        1988 1001 "AUTAUGA" "ALABAMA" 1488
        1989 1001 "AUTAUGA" "ALABAMA" 1784
        1990 1001 "AUTAUGA" "ALABAMA" 1442
        1991 1001 "AUTAUGA" "ALABAMA" 1804
        1992 1001 "AUTAUGA" "ALABAMA" 2291
        1993 1001 "AUTAUGA" "ALABAMA" 1954
        1994 1001 "AUTAUGA" "ALABAMA" 2361
        1995 1001 "AUTAUGA" "ALABAMA" 2511
        1996 1001 "AUTAUGA" "ALABAMA" 3053
        1997 1001 "AUTAUGA" "ALABAMA" 3078
        1998 1001 "AUTAUGA" "ALABAMA" 3093
        1999 1001 "AUTAUGA" "ALABAMA" 2827
        2000 1001 "AUTAUGA" "ALABAMA" 2876
        2001 1001 "AUTAUGA" "ALABAMA" 2868
        2002 1001 "AUTAUGA" "ALABAMA" 2786
        2003 1001 "AUTAUGA" "ALABAMA" 3190
        2004 1001 "AUTAUGA" "ALABAMA" 3342
        2005 1001 "AUTAUGA" "ALABAMA" 3295
        2006 1001 "AUTAUGA" "ALABAMA" 2938
        2007 1001 "AUTAUGA" "ALABAMA" 3212
        2008 1001 "AUTAUGA" "ALABAMA" 2799
        2009 1001 "AUTAUGA" "ALABAMA" 1674
        2010 1001 "AUTAUGA" "ALABAMA" 2939
        2015 1001 "AUTAUGA" "ALABAMA" 2759
        2016 1001 "AUTAUGA" "ALABAMA" 3363
        2017 1001 "AUTAUGA" "ALABAMA" 3640
        2018 1001 "AUTAUGA" "ALABAMA" 2351
        1987 1003 "BALDWIN" "ALABAMA" 3918
        1988 1003 "BALDWIN" "ALABAMA" 3820
        1989 1003 "BALDWIN" "ALABAMA" 4295
        1990 1003 "BALDWIN" "ALABAMA" 4628
        1991 1003 "BALDWIN" "ALABAMA" 5145
        1992 1003 "BALDWIN" "ALABAMA" 5289
        1993 1003 "BALDWIN" "ALABAMA" 5247
        1994 1003 "BALDWIN" "ALABAMA" 5208
        1995 1003 "BALDWIN" "ALABAMA" 7060
        1996 1003 "BALDWIN" "ALABAMA" 6921
        1997 1003 "BALDWIN" "ALABAMA" 6724
        1998 1003 "BALDWIN" "ALABAMA" 7004
        1999 1003 "BALDWIN" "ALABAMA" 8214
        2002 1003 "BALDWIN" "ALABAMA" 8425
        2003 1003 "BALDWIN" "ALABAMA" 8302
        2004 1003 "BALDWIN" "ALABAMA" 5646
        2009 1003 "BALDWIN" "ALABAMA" 8208
        2010 1003 "BALDWIN" "ALABAMA" 9165
        2015 1003 "BALDWIN" "ALABAMA" 14805
        2016 1003 "BALDWIN" "ALABAMA" 15737
        2017 1003 "BALDWIN" "ALABAMA" 16736
        2018 1003 "BALDWIN" "ALABAMA" 18413
        1987 1005 "BARBOUR" "ALABAMA" 947
        1988 1005 "BARBOUR" "ALABAMA" 837
        1989 1005 "BARBOUR" "ALABAMA" 1188
        1990 1005 "BARBOUR" "ALABAMA" 976
        1991 1005 "BARBOUR" "ALABAMA" 1023
        1992 1005 "BARBOUR" "ALABAMA" 781
        1993 1005 "BARBOUR" "ALABAMA" 663
        1994 1005 "BARBOUR" "ALABAMA" 785
        1995 1005 "BARBOUR" "ALABAMA" 773
        1996 1005 "BARBOUR" "ALABAMA" 896
        1997 1005 "BARBOUR" "ALABAMA" 746
        1998 1005 "BARBOUR" "ALABAMA" 970
        1999 1005 "BARBOUR" "ALABAMA" 1094
        2001 1005 "BARBOUR" "ALABAMA" 1128
        2002 1005 "BARBOUR" "ALABAMA" 1158
        2003 1005 "BARBOUR" "ALABAMA" 1260
        2004 1005 "BARBOUR" "ALABAMA" 1391
        2005 1005 "BARBOUR" "ALABAMA" 56
        2009 1005 "BARBOUR" "ALABAMA" 1226
        2010 1005 "BARBOUR" "ALABAMA" 810
        2015 1005 "BARBOUR" "ALABAMA" 936
        2016 1005 "BARBOUR" "ALABAMA" 864
        2017 1005 "BARBOUR" "ALABAMA" 977
        2018 1005 "BARBOUR" "ALABAMA" 1019
        1991 1007 "BIBB" "ALABAMA" 1
        2002 1007 "BIBB" "ALABAMA" 521
        2004 1007 "BIBB" "ALABAMA" 884
        2006 1007 "BIBB" "ALABAMA" 720
        2007 1007 "BIBB" "ALABAMA" 1116
        2008 1007 "BIBB" "ALABAMA" 1214
        2009 1007 "BIBB" "ALABAMA" 1188
        2010 1007 "BIBB" "ALABAMA" 817
        2015 1007 "BIBB" "ALABAMA" 839
        2016 1007 "BIBB" "ALABAMA" 790
        2017 1007 "BIBB" "ALABAMA" 991
        2018 1007 "BIBB" "ALABAMA" 1163
        1987 1009 "BLOUNT" "ALABAMA" 1031
        1988 1009 "BLOUNT" "ALABAMA" 1159
        1989 1009 "BLOUNT" "ALABAMA" 1135
        1990 1009 "BLOUNT" "ALABAMA" 1248
        1991 1009 "BLOUNT" "ALABAMA" 1385
        1992 1009 "BLOUNT" "ALABAMA" 1524
        1993 1009 "BLOUNT" "ALABAMA" 1196
        1994 1009 "BLOUNT" "ALABAMA" 1218
        1995 1009 "BLOUNT" "ALABAMA" 1298
        1996 1009 "BLOUNT" "ALABAMA" 834
        1997 1009 "BLOUNT" "ALABAMA" 1373
        1998 1009 "BLOUNT" "ALABAMA" 1526
        1999 1009 "BLOUNT" "ALABAMA" 1267
        2000 1009 "BLOUNT" "ALABAMA" 1523
        end
        
        by fips (year), sort: egen byte has_gap = max(!(year[1] == 1987 & year[_N] == 2018 ///
            & (year == year[_n-1]+1 | _n == 1)))
        Note: I'm assuming that the fips variable contains the county-level FIPS codes, so I have used fips to identify counties in the code. If that's not the case, replace -fips- by -county state- in the code.

        Comment


        • #5
          Hi Clyde,

          Here's my original sample. I had dropped counties which report zero total crimes and have less than 50% coverage to generate the sample which I had shared in my earlier, post. I see that the code you shared does calculate if the terminal values are 1987 and 2018, however some counties have years missing in between as well. Is there a way in which I can check for that?

          clear
          input long fips int year str20 state double Total_Crime
          1001 1987 "ALABAMA" 1599
          1001 1988 "ALABAMA" 1488
          1001 1989 "ALABAMA" 1784
          1001 1990 "ALABAMA" 1442
          1001 1991 "ALABAMA" 1804
          1001 1992 "ALABAMA" 2291
          1001 1993 "ALABAMA" 1954
          1001 1994 "ALABAMA" 2361
          1001 1995 "ALABAMA" 2511
          1001 1996 "ALABAMA" 3053
          1001 1997 "ALABAMA" 3078
          1001 1998 "ALABAMA" 3093
          1001 1999 "ALABAMA" 2827
          1001 2000 "ALABAMA" 2876
          1001 2001 "ALABAMA" 2868
          1001 2002 "ALABAMA" 2786
          1001 2003 "ALABAMA" 3190
          1001 2004 "ALABAMA" 3342
          1001 2005 "ALABAMA" 3295
          1001 2006 "ALABAMA" 2938
          1001 2007 "ALABAMA" 3212
          1001 2008 "ALABAMA" 2799
          1001 2009 "ALABAMA" 1674
          1001 2010 "ALABAMA" 2939
          1001 2011 "ALABAMA" 0
          1001 2012 "ALABAMA" 0
          1001 2013 "ALABAMA" 0
          1001 2014 "ALABAMA" 0
          1001 2015 "ALABAMA" 2759
          1001 2016 "ALABAMA" 3363
          1001 2017 "ALABAMA" 3640
          1001 2018 "ALABAMA" 2351
          1003 1987 "ALABAMA" 3918
          1003 1988 "ALABAMA" 3820
          1003 1989 "ALABAMA" 4295
          1003 1990 "ALABAMA" 4628
          1003 1991 "ALABAMA" 5145
          1003 1992 "ALABAMA" 5289
          1003 1993 "ALABAMA" 5247
          1003 1994 "ALABAMA" 5208
          1003 1995 "ALABAMA" 7060
          1003 1996 "ALABAMA" 6921
          1003 1997 "ALABAMA" 6724
          1003 1998 "ALABAMA" 7004
          1003 1999 "ALABAMA" 8214
          1003 2000 "ALABAMA" 6308
          1003 2001 "ALABAMA" 6259
          1003 2002 "ALABAMA" 8425
          1003 2003 "ALABAMA" 8302
          1003 2004 "ALABAMA" 5646
          1003 2005 "ALABAMA" 6998
          1003 2006 "ALABAMA" 4965
          1003 2007 "ALABAMA" 11636
          1003 2008 "ALABAMA" 7057
          1003 2009 "ALABAMA" 8208
          1003 2010 "ALABAMA" 9165
          1003 2011 "ALABAMA" 0
          1003 2012 "ALABAMA" 0
          1003 2013 "ALABAMA" 0
          1003 2014 "ALABAMA" 1935
          1003 2015 "ALABAMA" 14805
          1003 2016 "ALABAMA" 15737
          1003 2017 "ALABAMA" 16736
          1003 2018 "ALABAMA" 18413
          1005 1987 "ALABAMA" 947
          1005 1988 "ALABAMA" 837
          1005 1989 "ALABAMA" 1188
          1005 1990 "ALABAMA" 976
          1005 1991 "ALABAMA" 1023
          1005 1992 "ALABAMA" 781
          1005 1993 "ALABAMA" 663
          1005 1994 "ALABAMA" 785
          1005 1995 "ALABAMA" 773
          1005 1996 "ALABAMA" 896
          1005 1997 "ALABAMA" 746
          1005 1998 "ALABAMA" 970
          1005 1999 "ALABAMA" 1094
          1005 2000 "ALABAMA" 87
          1005 2001 "ALABAMA" 1128
          1005 2002 "ALABAMA" 1158
          1005 2003 "ALABAMA" 1260
          1005 2004 "ALABAMA" 1391
          1005 2005 "ALABAMA" 56
          1005 2006 "ALABAMA" 101
          1005 2007 "ALABAMA" 106
          1005 2008 "ALABAMA" 135
          1005 2009 "ALABAMA" 1226
          1005 2010 "ALABAMA" 810
          1005 2011 "ALABAMA" 0
          1005 2012 "ALABAMA" 0
          1005 2013 "ALABAMA" 0
          1005 2014 "ALABAMA" 76
          1005 2015 "ALABAMA" 936
          1005 2016 "ALABAMA" 864
          1005 2017 "ALABAMA" 977
          1005 2018 "ALABAMA" 1019
          1007 1987 "ALABAMA" 0
          1007 1988 "ALABAMA" 2
          1007 1989 "ALABAMA" 0
          1007 1990 "ALABAMA" 0
          end
          [/CODE]

          Comment


          • #6
            I see that the code you shared does calculate if the terminal values are 1987 and 2018, however some counties have years missing in between as well. Is there a way in which I can check for that?
            The code also checks for missing years in between. That's what the -(year == year[_n-1]+1 | _n == 1)- part does.

            Comment


            • #7
              Hi Clyde,
              The code worked out great! Thank you very much.

              Comment

              Working...
              X