Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lags and Leads for Data that Cannot Be xtset or tsset

    Just checking here to see if someone might have an idea for how to easily generate lags and leads for hundreds of variables outside of the context of xtset and tsset. I'm working with data that are not really suitable for xtset or tsset because there are repeated time values even when I've panelized the data. In the example data, I've created a panel ID that includes the department code and the topic, but there are still many repeat values of the year for each panel. Adding year into the panelid doesn't solve the problem, because there are still repeats (i.e., a department could have issued many rules on a particular topic in a particular year). I understand that it would be possible to manually generate lags and leads for any given variable, but since I'm going to be doing so much of it (there are far more than just bills and eos in the data), I wonder if there's a way to get around the repeated time values problem in xtset or tsset.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year byte indep int bills byte eos long(topic deptcode2) float panelid
    2008 0   19 0  1  5   5
    2005 0    4 0  8 10 115
    2006 0   24 0 19 14 284
    1996 0   49 0 41 11 611
    2007 0   68 0 19 14 284
    2007 0  172 0 17 11 251
    2008 0  348 0 21 12 312
    1996 0   56 0 21  9 309
    2010 0   11 0 13  7 187
    2016 0    6 0 36  4 529
    1999 0    8 0 32 11 476
    2001 0   32 0 14  4 199
    2000 0    3 0 22  8 323
    2009 0   77 0 25  2 362
    2010 0   13 0 16 14 239
    2011 0   34 0 24 10 355
    2013 0   12 0 10  5 140
    2009 0  903 0 21 11 311
    2006 0   12 0 10 14 149
    2018 0    . . 22 14 329
    2005 0   49 1 31  9 459
    2007 0   83 0 14  3 198
    2010 0   26 0 18  6 261
    2018 0    . . 30 14 449
    2008 0   23 0 40 11 596
    2002 0   17 2 25 14 374
    2005 0   58 1  6 11  86
    2006 0    0 0 30  2 437
    1999 0   66 2  6 14  89
    1998 0   10 0 13 14 194
    2006 0   12 0 18 14 269
    2007 0   33 0 33 12 492
    2018 0    . . 31 14 464
    2007 0   68 0 19 14 284
    2013 0   71 1 37 14 554
    2008 0   37 0 19 14 284
    2000 0   14 0 10  9 144
    2009 0   10 0 13  9 189
    2005 0  134 0 17  5 245
    2009 0   77 0 25  6 366
    2012 0   31 0  7  8  98
    2002 0    1 0  8  9 114
    2015 0    4 0 23 14 344
    2002 0   57 2 20 11 296
    2007 0   79 1 11 14 164
    2008 0   37 0 19 14 284
    2012 0   12 0 27  2 392
    2018 0    . . 34  5 500
    2016 0    4 0 27  3 393
    2011 0  138 0 20 14 299
    2005 0   22 0 40 11 596
    2014 0    1 0 32 13 478
    2008 0   37 0 19 14 284
    2005 0   44 0 37  1 541
    2008 0   37 0 19 14 284
    2013 0   23 1  5  5  65
    2001 0   43 0  7  9  99
    2002 0   20 0 24 11 356
    2009 0   68 0 19 14 284
    2009 0   48 0  3  8  38
    1998 0   34 0 31  2 452
    2008 0   27 0 36 13 538
    2003 0   71 0 41 11 611
    2012 0 2090 0 21  8 308
    2009 0   81 0 38  8 563
    2013 0    6 0 28 11 416
    2008 0   32 0 27 11 401
    1998 0    9 0 28 11 416
    2014 0   70 0 20  3 288
    2004 0   22 0 31 12 462
    2018 0    . . 24 14 359
    2010 0   79 0 17  1 241
    2009 0   68 0 19 14 284
    2009 0   68 0 19 14 284
    2009 0   68 0 19 14 284
    2013 0   54 1  7  5  95
    2018 0    . . 25 14 374
    2009 0   54 0 33 11 491
    2017 0    . 0 13 14 194
    1997 0    4 0 23  8 338
    2007 0   79 1 11 12 162
    2004 0   15 0 24  5 350
    2010 0   15 0 19 14 284
    2008 0   13 0  5  9  69
    2010 0   15 0 19 14 284
    2006 0    3 0  8  8 113
    2011 0   20 0 15  4 214
    2002 0   10 0 32 11 476
    2007 0   25 0 25  3 363
    2018 0    . . 12  5 170
    2018 0    . . 34  5 500
    2006 0    6 0  9 13 133
    2003 0   77 0 27  9 399
    1996 0   33 0 20  6 291
    2013 0   33 1  2  1  16
    2010 0   15 0 19 14 284
    2010 0   15 0 19 14 284
    2010 0   26 0 41 14 614
    2002 0   39 0 23  6 336
    2003 0   71 0 41 12 612
    end
    label values topic topic
    label def topic 1 "Agriculture", modify
    label def topic 2 "Air Pollution", modify
    label def topic 3 "Aviation", modify
    label def topic 5 "Chemicals", modify
    label def topic 6 "Civil Service", modify
    label def topic 7 "Consumer Finance", modify
    label def topic 8 "Contracting", modify
    label def topic 9 "Disability", modify
    label def topic 10 "Education", modify
    label def topic 11 "Energy Efficiency", modify
    label def topic 12 "Exports", modify
    label def topic 13 "Fisheries", modify
    label def topic 14 "Food & Drug", modify
    label def topic 15 "Food Safety", modify
    label def topic 16 "Hazardous Waste", modify
    label def topic 17 "Health Insurance", modify
    label def topic 18 "Highway Safety", modify
    label def topic 19 "Hospitals", modify
    label def topic 20 "Immigration", modify
    label def topic 21 "Imports", modify
    label def topic 22 "Intergovernmental", modify
    label def topic 23 "Land Management", modify
    label def topic 24 "Long-Term Care", modify
    label def topic 25 "Maritime", modify
    label def topic 27 "Medicare", modify
    label def topic 28 "Planning", modify
    label def topic 30 "Public Housing", modify
    label def topic 31 "Records", modify
    label def topic 32 "Rural Development", modify
    label def topic 33 "Securities", modify
    label def topic 34 "Small Business 1", modify
    label def topic 36 "Species 1", modify
    label def topic 37 "Species 2", modify
    label def topic 38 "Species 3", modify
    label def topic 40 "Training", modify
    label def topic 41 "Water Resources", modify
    label values deptcode2 deptcode2
    label def deptcode2 1 "1000", modify
    label def deptcode2 2 "1100", modify
    label def deptcode2 3 "1200", modify
    label def deptcode2 4 "1500", modify
    label def deptcode2 5 "1600", modify
    label def deptcode2 6 "1800", modify
    label def deptcode2 7 "1900", modify
    label def deptcode2 8 "2000", modify
    label def deptcode2 9 "2100", modify
    label def deptcode2 10 "2500", modify
    label def deptcode2 11 "500", modify
    label def deptcode2 12 "600", modify
    label def deptcode2 13 "700", modify
    label def deptcode2 14 "900", modify
    label values panelid panelid
    label def panelid 5 "Agriculture 1600", modify
    label def panelid 16 "Air Pollution 1000", modify
    label def panelid 38 "Aviation 2000", modify
    label def panelid 65 "Chemicals 1600", modify
    label def panelid 69 "Chemicals 2100", modify
    label def panelid 86 "Civil Service 500", modify
    label def panelid 89 "Civil Service 900", modify
    label def panelid 95 "Consumer Finance 1600", modify
    label def panelid 98 "Consumer Finance 2000", modify
    label def panelid 99 "Consumer Finance 2100", modify
    label def panelid 113 "Contracting 2000", modify
    label def panelid 114 "Contracting 2100", modify
    label def panelid 115 "Contracting 2500", modify
    label def panelid 133 "Disability 700", modify
    label def panelid 140 "Education 1600", modify
    label def panelid 144 "Education 2100", modify
    label def panelid 149 "Education 900", modify
    label def panelid 162 "Energy Efficiency 600", modify
    label def panelid 164 "Energy Efficiency 900", modify
    label def panelid 170 "Exports 1600", modify
    label def panelid 187 "Fisheries 1900", modify
    label def panelid 189 "Fisheries 2100", modify
    label def panelid 194 "Fisheries 900", modify
    label def panelid 198 "Food & Drug 1200", modify
    label def panelid 199 "Food & Drug 1500", modify
    label def panelid 214 "Food Safety 1500", modify
    label def panelid 239 "Hazardous Waste 900", modify
    label def panelid 241 "Health Insurance 1000", modify
    label def panelid 245 "Health Insurance 1600", modify
    label def panelid 251 "Health Insurance 500", modify
    label def panelid 261 "Highway Safety 1800", modify
    label def panelid 269 "Highway Safety 900", modify
    label def panelid 284 "Hospitals 900", modify
    label def panelid 288 "Immigration 1200", modify
    label def panelid 291 "Immigration 1800", modify
    label def panelid 296 "Immigration 500", modify
    label def panelid 299 "Immigration 900", modify
    label def panelid 308 "Imports 2000", modify
    label def panelid 309 "Imports 2100", modify
    label def panelid 311 "Imports 500", modify
    label def panelid 312 "Imports 600", modify
    label def panelid 323 "Intergovernmental 2000", modify
    label def panelid 329 "Intergovernmental 900", modify
    label def panelid 336 "Land Management 1800", modify
    label def panelid 338 "Land Management 2000", modify
    label def panelid 344 "Land Management 900", modify
    label def panelid 350 "Long-Term Care 1600", modify
    label def panelid 355 "Long-Term Care 2500", modify
    label def panelid 356 "Long-Term Care 500", modify
    label def panelid 359 "Long-Term Care 900", modify
    label def panelid 362 "Maritime 1100", modify
    label def panelid 363 "Maritime 1200", modify
    label def panelid 366 "Maritime 1800", modify
    label def panelid 374 "Maritime 900", modify
    label def panelid 392 "Medicare 1100", modify
    label def panelid 393 "Medicare 1200", modify
    label def panelid 399 "Medicare 2100", modify
    label def panelid 401 "Medicare 500", modify
    label def panelid 416 "Planning 500", modify
    label def panelid 437 "Public Housing 1100", modify
    label def panelid 449 "Public Housing 900", modify
    label def panelid 452 "Records 1100", modify
    label def panelid 459 "Records 2100", modify
    label def panelid 462 "Records 600", modify
    label def panelid 464 "Records 900", modify
    label def panelid 476 "Rural Development 500", modify
    label def panelid 478 "Rural Development 700", modify
    label def panelid 491 "Securities 500", modify
    label def panelid 492 "Securities 600", modify
    label def panelid 500 "Small Business 1 1600", modify
    label def panelid 529 "Species 1 1500", modify
    label def panelid 538 "Species 1 700", modify
    label def panelid 541 "Species 2 1000", modify
    label def panelid 554 "Species 2 900", modify
    label def panelid 563 "Species 3 2000", modify
    label def panelid 596 "Training 500", modify
    label def panelid 611 "Water Resources 500", modify
    label def panelid 612 "Water Resources 600", modify
    label def panelid 614 "Water Resources 900", modify

  • #2
    I'm working with data that are not really suitable for xtset or tsset because there are repeated time values even when I've panelized the data.
    It's not just some quirk of Stata that it won't do lags and leads under this cirumstance. It's because lags and leads have no meaning in this setting.

    If, for example, a given panel has two observations in year 2007, then L1.x is undefined in year 2008 because there is no way to determine which 2007 observation's value of x is to be used.

    Now, in your example data, I notice that the values of variables indep, bills, eos, topic, and deptcode2 are all actually consistent within any combination of panelid and year. If this is true of your data as a whole, then you have a way out because you you would get the same result no matter which observation from a given year you would choose. This raises the question: why do you have all these identical observations? They convey no additional information.

    Code:
    duplicates drop
    xtset panelid year
    works fine in your example, and, coincidentally, it verifies that any two observations that agree on panelid and year also agree on everything else. And with the redundant observations removed, having -xtset- the data you can work with lags and leads as you wish. So why not just eliminate the duplicate observations and work from the simplified data set?

    Now, it may be that in your real data this consistency does not hold. In that case, lags and leads are not defined and there is no solution to your problem, neither in Stata, nor in any other software, nor can you do it "manually." You can manually create some variables that you "think" are lags or leads, but they will just be wrong.



    Comment


    • #3
      Thanks. The pattern you observed doesn't hold for the rest of the data. What I had hoped to do is to regress the x's from 2007 on y's values from 2008. Is that not a lag of a sort?

      I suppose the other option here would be to collapse my data into average values across relevant groups and years and then I will have unique observations for each panel and year. I just wanted to avoid that because I lose a lot of observations that way. My thinking was if I could find a way to generate new variables that take on the value from the previous year (or subsequent, or any other combination), then I could retain all of my observations and operate outside of the tsset universe.

      Comment


      • #4
        What I had hoped to do is to regress the x's from 2007 on y's values from 2008. Is that not a lag of a sort?
        Well, there is no way to do that without making some arbitrary choice of which x's to associate with which y. So your results would be completely dependent on some arbitrary choice you made. Alternatively, you might match every 2007 observations x's with every 2008's y's, which would greatly expand your data set, but introduce even more dependence among observations than you had just from it being panel data. And it wouldn't be a dependence that you could deal with by fixed or random effects modeling.

        Averaging all the values for each panel-year combination is one way to go (and -collapse- will take you there). Other options are the median, or the max, or the min, or the sum, or lots of other possibilities. Which of these would be the best approach really depends on which aggregated statistic is most meaningful in the context of your research goals.

        Comment


        • #5
          Thanks! I guess collapse it is.

          Comment


          • #6
            So I thought more about this. If my theory is that the values of x in 2007 (say they are all the same within my panel and each year) influence the values of y in 2008, then wouldn't it be acceptable to create a new variable, x2, that takes on the value of x from 2007 but is "matched" to the y's from 2008? To make it a little more concrete, I have data on executive orders (organized by topic) which I have matched to data on agency rules addressing these same topics. The theory I want to test is whether agencies emphasize the topic more a year or two after presidents issue executive orders touching on the subject. Within each topic-year pair, the x's (the executive orders) are all the same, so there will be no arbitrary matching to the y values. All I want to do is shift the executive order data back by one, two, or three years so that when I am correlating executive orders with agency rules, there's a delay effect. Is that statistically problematic?

            Assuming its not, is there an easy way to accomplish this shift? Here's a sample of my "covariates" dataset, which I will merge with my main data once I accomplish the shift:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int year str18 topic int bills byte(eos hearings publaw sotu)
            1995 "Small Business 1"    32 0 63  1  3
            1995 "Rural Development"    7 0  0  4  0
            1995 "Training"            22 0  7  0  9
            1995 "Land Management"      5 0  8  0  0
            1995 "Energy Efficiency"    4 0  1  0  0
            1995 "Contracting"          5 0  1  0  0
            1995 "Medical Payments"    25 0  3  0  1
            1995 "Species 1"           47 0 31  1  0
            1995 "Food Safety"         15 0  4  1  0
            1995 "Health Insurance"    68 0 23  2  7
            1995 "Disability"          19 0  1  0  0
            1995 "Immigration"         76 0 20  1  8
            1995 "Public Housing"       9 0  3  0  0
            1995 "Species 2"           47 0 31  1  0
            1995 "Species 3"           47 0 31  1  0
            1995 "Banking"             52 0 16  0  0
            1995 "Intergovernmental"   23 0 10  1  0
            1995 "Prescription Drugs"   6 0  0  0  0
            1995 "Planning"             5 0  8  0  0
            1995 "Fisheries"           20 1  9  1  0
            1995 "Maritime"           138 0  6  0  0
            1995 "Chemicals"           51 1 10  0  1
            1995 "Air Pollution"       41 0 15  1  0
            1995 "Long-Term Care"      11 0  3  0  0
            1995 "Civil Service"       76 1 19  1  0
            1995 "Education"            6 0  8  0  5
            1995 "Medicare"            52 0  3  0  0
            1995 "Imports"             37 0  1  0  0
            1995 "Aviation"            43 0 26  0  0
            1995 "Agriculture"         58 0  7  0  0
            1995 "Highway Safety"      43 0  6  1  0
            1995 "Species 4"           47 0 31  1  0
            1995 "Hospitals"           25 0  3  0  1
            1995 "Consumer Finance"    17 0  1  0  0
            1995 "Food & Drug"         29 0  9  0  0
            1995 "Records"             21 1  4  0  0
            1995 "Securities"          45 0  7  3  0
            1995 "Hazardous Waste"     51 1 10  0  1
            1995 "Small Business 2"    32 0 63  1  3
            1995 "Exports"             12 1  6  0  0
            1995 "Water Resources"     53 0 14  2  0
            1996 "Agriculture"         29 0  2  1  0
            1996 "Disability"           9 0  3  1  0
            1996 "Chemicals"           34 0  2  1  2
            1996 "Training"             9 0  1  0  6
            1996 "Hospitals"           16 0  3  2  0
            1996 "Hazardous Waste"     34 0  2  1  2
            1996 "Exports"              3 3  3  2  0
            1996 "Food & Drug"         15 0  5  1  0
            1996 "Records"             27 0  6  2  0
            1996 "Contracting"          6 0  0  0  0
            1996 "Water Resources"     49 0 15 10  0
            1996 "Fisheries"            1 0  1  1  0
            1996 "Securities"          14 0  2  1  0
            1996 "Imports"             56 0  2  1  0
            1996 "Air Pollution"       10 0  4  1  1
            1996 "Medicare"            34 0  2  0  0
            1996 "Highway Safety"      27 0  9  1  0
            1996 "Intergovernmental"   11 0  5  1  0
            1996 "Long-Term Care"       9 0  2  0  0
            1996 "Small Business 2"    24 0 50  1  3
            1996 "Food Safety"          6 0  9  2  0
            1996 "Education"            4 0  1  0  1
            1996 "Immigration"         33 0 11  1 10
            1996 "Energy Efficiency"    1 0  1  0  0
            1996 "Land Management"      3 0  1  0  0
            1996 "Banking"             22 0 13  0  0
            1996 "Prescription Drugs"   9 0  1  0  0
            1996 "Small Business 1"    24 0 50  1  3
            1996 "Species 1"           21 1 15  2  0
            1996 "Rural Development"    1 0  0  0  0
            1996 "Species 3"           21 1 15  2  0
            1996 "Medical Payments"    16 0  3  2  0
            1996 "Species 2"           21 1 15  2  0
            1996 "Consumer Finance"     7 0  2  0  0
            1996 "Civil Service"       38 2 15  2  9
            1996 "Maritime"            51 0  6  3  0
            1996 "Species 4"           21 1 15  2  0
            1996 "Public Housing"       2 0  2  0  2
            1996 "Planning"             3 0  1  0  0
            1996 "Aviation"            37 1 26  0  1
            1996 "Health Insurance"    38 0 17  6  6
            1997 "Aviation"            53 1 28  0  4
            1997 "Public Housing"       7 0  1  0  4
            1997 "Contracting"          6 0  1  0  0
            1997 "Agriculture"         44 0  6  0  0
            1997 "Rural Development"    4 0  1  3  0
            1997 "Securities"          21 0  6  2  0
            1997 "Banking"             39 0 20  1  0
            1997 "Small Business 1"    48 0 34  1  0
            1997 "Fisheries"            9 0  1  0  0
            1997 "Education"           14 0  8  1  7
            1997 "Maritime"            56 0 10  0  0
            1997 "Disability"          12 0  2  0  0
            1997 "Hazardous Waste"     48 0 13  0  0
            1997 "Air Pollution"       16 0 20  0  0
            1997 "Food Safety"         22 0  2  1  1
            1997 "Medical Payments"    21 0  5  1  0
            1997 "Species 4"           44 0 20  2  0
            1997 "Energy Efficiency"    4 0  1  0  0
            end

            Is there a simple set of commands that could basically take every executive order (eos) observation for each topic and shift it back one, two, or three years so I get the 1997 value of eos for the topic energy efficiency for 1996 and 1995, for instance?
            Last edited by Daniel Walters; 29 Aug 2018, 20:29.

            Comment


            • #7
              Or, I guess maybe if this isn't a problem statistically speaking, an easy way to accomplish it would be to create a lagyear variable that is year-1, then merge by lagyear. That would essentially be the same thing, I believe. The only question is whether there's some dependency that I'm creating that is problematic.

              Comment


              • #8
                Well, in the data set you show in #6, there is actually only one observation for each topic in any given year, so if your real data looks like this, you can just -xtset topic year- and use the regular lag operators.

                But let's go back to the data in #1. That data set matches what you are describing now in #6: within panelid topic and yer, the values of all the other variables (indep bills eos deptcode2) are consistent, so it is indeed possible to do what you want, namely to assign to each observation the unique value of indep, bills, eos, and deptcode2 from the same panelid and topic the year before. The code below will get you the first, second and third "lags":

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int year byte indep int bills byte eos long(topic deptcode2) float panelid
                2008 0   19 0  1  5   5
                2005 0    4 0  8 10 115
                2006 0   24 0 19 14 284
                1996 0   49 0 41 11 611
                2007 0   68 0 19 14 284
                2007 0  172 0 17 11 251
                2008 0  348 0 21 12 312
                1996 0   56 0 21  9 309
                2010 0   11 0 13  7 187
                2016 0    6 0 36  4 529
                1999 0    8 0 32 11 476
                2001 0   32 0 14  4 199
                2000 0    3 0 22  8 323
                2009 0   77 0 25  2 362
                2010 0   13 0 16 14 239
                2011 0   34 0 24 10 355
                2013 0   12 0 10  5 140
                2009 0  903 0 21 11 311
                2006 0   12 0 10 14 149
                2018 0    . . 22 14 329
                2005 0   49 1 31  9 459
                2007 0   83 0 14  3 198
                2010 0   26 0 18  6 261
                2018 0    . . 30 14 449
                2008 0   23 0 40 11 596
                2002 0   17 2 25 14 374
                2005 0   58 1  6 11  86
                2006 0    0 0 30  2 437
                1999 0   66 2  6 14  89
                1998 0   10 0 13 14 194
                2006 0   12 0 18 14 269
                2007 0   33 0 33 12 492
                2018 0    . . 31 14 464
                2007 0   68 0 19 14 284
                2013 0   71 1 37 14 554
                2008 0   37 0 19 14 284
                2000 0   14 0 10  9 144
                2009 0   10 0 13  9 189
                2005 0  134 0 17  5 245
                2009 0   77 0 25  6 366
                2012 0   31 0  7  8  98
                2002 0    1 0  8  9 114
                2015 0    4 0 23 14 344
                2002 0   57 2 20 11 296
                2007 0   79 1 11 14 164
                2008 0   37 0 19 14 284
                2012 0   12 0 27  2 392
                2018 0    . . 34  5 500
                2016 0    4 0 27  3 393
                2011 0  138 0 20 14 299
                2005 0   22 0 40 11 596
                2014 0    1 0 32 13 478
                2008 0   37 0 19 14 284
                2005 0   44 0 37  1 541
                2008 0   37 0 19 14 284
                2013 0   23 1  5  5  65
                2001 0   43 0  7  9  99
                2002 0   20 0 24 11 356
                2009 0   68 0 19 14 284
                2009 0   48 0  3  8  38
                1998 0   34 0 31  2 452
                2008 0   27 0 36 13 538
                2003 0   71 0 41 11 611
                2012 0 2090 0 21  8 308
                2009 0   81 0 38  8 563
                2013 0    6 0 28 11 416
                2008 0   32 0 27 11 401
                1998 0    9 0 28 11 416
                2014 0   70 0 20  3 288
                2004 0   22 0 31 12 462
                2018 0    . . 24 14 359
                2010 0   79 0 17  1 241
                2009 0   68 0 19 14 284
                2009 0   68 0 19 14 284
                2009 0   68 0 19 14 284
                2013 0   54 1  7  5  95
                2018 0    . . 25 14 374
                2009 0   54 0 33 11 491
                2017 0    . 0 13 14 194
                1997 0    4 0 23  8 338
                2007 0   79 1 11 12 162
                2004 0   15 0 24  5 350
                2010 0   15 0 19 14 284
                2008 0   13 0  5  9  69
                2010 0   15 0 19 14 284
                2006 0    3 0  8  8 113
                2011 0   20 0 15  4 214
                2002 0   10 0 32 11 476
                2007 0   25 0 25  3 363
                2018 0    . . 12  5 170
                2018 0    . . 34  5 500
                2006 0    6 0  9 13 133
                2003 0   77 0 27  9 399
                1996 0   33 0 20  6 291
                2013 0   33 1  2  1  16
                2010 0   15 0 19 14 284
                2010 0   15 0 19 14 284
                2010 0   26 0 41 14 614
                2002 0   39 0 23  6 336
                2003 0   71 0 41 12 612
                end
                label values topic topic
                label def topic 1 "Agriculture", modify
                label def topic 2 "Air Pollution", modify
                label def topic 3 "Aviation", modify
                label def topic 5 "Chemicals", modify
                label def topic 6 "Civil Service", modify
                label def topic 7 "Consumer Finance", modify
                label def topic 8 "Contracting", modify
                label def topic 9 "Disability", modify
                label def topic 10 "Education", modify
                label def topic 11 "Energy Efficiency", modify
                label def topic 12 "Exports", modify
                label def topic 13 "Fisheries", modify
                label def topic 14 "Food & Drug", modify
                label def topic 15 "Food Safety", modify
                label def topic 16 "Hazardous Waste", modify
                label def topic 17 "Health Insurance", modify
                label def topic 18 "Highway Safety", modify
                label def topic 19 "Hospitals", modify
                label def topic 20 "Immigration", modify
                label def topic 21 "Imports", modify
                label def topic 22 "Intergovernmental", modify
                label def topic 23 "Land Management", modify
                label def topic 24 "Long-Term Care", modify
                label def topic 25 "Maritime", modify
                label def topic 27 "Medicare", modify
                label def topic 28 "Planning", modify
                label def topic 30 "Public Housing", modify
                label def topic 31 "Records", modify
                label def topic 32 "Rural Development", modify
                label def topic 33 "Securities", modify
                label def topic 34 "Small Business 1", modify
                label def topic 36 "Species 1", modify
                label def topic 37 "Species 2", modify
                label def topic 38 "Species 3", modify
                label def topic 40 "Training", modify
                label def topic 41 "Water Resources", modify
                label values deptcode2 deptcode2
                label def deptcode2 1 "1000", modify
                label def deptcode2 2 "1100", modify
                label def deptcode2 3 "1200", modify
                label def deptcode2 4 "1500", modify
                label def deptcode2 5 "1600", modify
                label def deptcode2 6 "1800", modify
                label def deptcode2 7 "1900", modify
                label def deptcode2 8 "2000", modify
                label def deptcode2 9 "2100", modify
                label def deptcode2 10 "2500", modify
                label def deptcode2 11 "500", modify
                label def deptcode2 12 "600", modify
                label def deptcode2 13 "700", modify
                label def deptcode2 14 "900", modify
                label values panelid panelid
                label def panelid 5 "Agriculture 1600", modify
                label def panelid 16 "Air Pollution 1000", modify
                label def panelid 38 "Aviation 2000", modify
                label def panelid 65 "Chemicals 1600", modify
                label def panelid 69 "Chemicals 2100", modify
                label def panelid 86 "Civil Service 500", modify
                label def panelid 89 "Civil Service 900", modify
                label def panelid 95 "Consumer Finance 1600", modify
                label def panelid 98 "Consumer Finance 2000", modify
                label def panelid 99 "Consumer Finance 2100", modify
                label def panelid 113 "Contracting 2000", modify
                label def panelid 114 "Contracting 2100", modify
                label def panelid 115 "Contracting 2500", modify
                label def panelid 133 "Disability 700", modify
                label def panelid 140 "Education 1600", modify
                label def panelid 144 "Education 2100", modify
                label def panelid 149 "Education 900", modify
                label def panelid 162 "Energy Efficiency 600", modify
                label def panelid 164 "Energy Efficiency 900", modify
                label def panelid 170 "Exports 1600", modify
                label def panelid 187 "Fisheries 1900", modify
                label def panelid 189 "Fisheries 2100", modify
                label def panelid 194 "Fisheries 900", modify
                label def panelid 198 "Food & Drug 1200", modify
                label def panelid 199 "Food & Drug 1500", modify
                label def panelid 214 "Food Safety 1500", modify
                label def panelid 239 "Hazardous Waste 900", modify
                label def panelid 241 "Health Insurance 1000", modify
                label def panelid 245 "Health Insurance 1600", modify
                label def panelid 251 "Health Insurance 500", modify
                label def panelid 261 "Highway Safety 1800", modify
                label def panelid 269 "Highway Safety 900", modify
                label def panelid 284 "Hospitals 900", modify
                label def panelid 288 "Immigration 1200", modify
                label def panelid 291 "Immigration 1800", modify
                label def panelid 296 "Immigration 500", modify
                label def panelid 299 "Immigration 900", modify
                label def panelid 308 "Imports 2000", modify
                label def panelid 309 "Imports 2100", modify
                label def panelid 311 "Imports 500", modify
                label def panelid 312 "Imports 600", modify
                label def panelid 323 "Intergovernmental 2000", modify
                label def panelid 329 "Intergovernmental 900", modify
                label def panelid 336 "Land Management 1800", modify
                label def panelid 338 "Land Management 2000", modify
                label def panelid 344 "Land Management 900", modify
                label def panelid 350 "Long-Term Care 1600", modify
                label def panelid 355 "Long-Term Care 2500", modify
                label def panelid 356 "Long-Term Care 500", modify
                label def panelid 359 "Long-Term Care 900", modify
                label def panelid 362 "Maritime 1100", modify
                label def panelid 363 "Maritime 1200", modify
                label def panelid 366 "Maritime 1800", modify
                label def panelid 374 "Maritime 900", modify
                label def panelid 392 "Medicare 1100", modify
                label def panelid 393 "Medicare 1200", modify
                label def panelid 399 "Medicare 2100", modify
                label def panelid 401 "Medicare 500", modify
                label def panelid 416 "Planning 500", modify
                label def panelid 437 "Public Housing 1100", modify
                label def panelid 449 "Public Housing 900", modify
                label def panelid 452 "Records 1100", modify
                label def panelid 459 "Records 2100", modify
                label def panelid 462 "Records 600", modify
                label def panelid 464 "Records 900", modify
                label def panelid 476 "Rural Development 500", modify
                label def panelid 478 "Rural Development 700", modify
                label def panelid 491 "Securities 500", modify
                label def panelid 492 "Securities 600", modify
                label def panelid 500 "Small Business 1 1600", modify
                label def panelid 529 "Species 1 1500", modify
                label def panelid 538 "Species 1 700", modify
                label def panelid 541 "Species 2 1000", modify
                label def panelid 554 "Species 2 900", modify
                label def panelid 563 "Species 3 2000", modify
                label def panelid 596 "Training 500", modify
                label def panelid 611 "Water Resources 500", modify
                label def panelid 612 "Water Resources 600", modify
                label def panelid 614 "Water Resources 900", modify
                
                //    VERIFY CONSISTENCY OF OTHER VARIABLES
                //    WITHIN PANELID YEAR TOPIC
                ds panelid topic year, not
                local vbles `r(varlist)'
                foreach v of local vbles {
                    by panelid topic year (`v'), sort: assert `v'[1] == `v'[_N]
                }
                
                //    CALCULATE THE "LAGS"
                forvalues l = 1/3 {
                    rangestat (first) `vbles', by(panelid topic) interval(year -`l' -`l')
                    rename *_first lag`l'_*
                }
                To use this code you need the -rangestat- command, written by Robert Picard, Nick Cox, and Roberto Ferrer, available from SSC.

                That said, if this data set contains only variables that are consistent within panelid topic and year, I don't understand what the point of the multiple observations is. -duplicates drop- will reduce you to one observation per panelid topic year combination, and then you can

                Code:
                duplicates drop
                egen panel_topic = group(panelid topic)
                xtset panel_topic year
                and work with the usual lag operators from there. Why do you want to carry the duplicate observations along? In fact, they will just get in the way of any analysis you do because you will be inappropriately counting panelid-year combinations in proportion to the number of times they are duplicated instead of weighting all equally. So you will then need the additional complication of either getting rid of the duplicates after the fact, or creating a new variable to select just one observation for each panelid-topic-year combination to use in analysis.

                Perhaps there are other variables in the data set that are not consistent within panelid topic and year, and for which you do not want to (and, in any case, cannot) calculate "lags?" Note: If this is the case, do not use the underlined lines of code shown above. Instead, replace them with this line:

                Code:
                local vbles eos bills // AND ANY OTHER VARIABLES YOU WANT TO "LAG"
                Added: Crossed with #7. Yes, that is another way to do it, but it is more complicated to code than the approach shown here and, in addition, if the real data set is large the approach in #7 will be much slower.

                Comment


                • #9
                  Thanks! This code did exactly what I wanted. It is the case that there are other variables that are not consistent within panel and year, which means that dropping the duplicates doesn't actually eliminate many observations in my data (~1,300 of 160,000). This is just a relic of the fact that my observations are of individual rules--some covariates are rule-level, whereas others are year-level. As long as I run your last bit of code from #8, I can confine the operation to just those variables I want to lag, which are just the year-level variables. Problem solved!

                  Comment


                  • #10
                    Circling back to this, how would you change the code if you wanted to create one, two, and three-year "leads" instead of lags? I'm not sure I entirely understand the code syntax (even though it worked).

                    Comment


                    • #11
                      It's basically the same code, with the minus signs removed from the -interval()- option of the -rangestat- command. So:

                      Code:
                      forvalues l = 1/3 {
                          rangestat (first) `vbles', by(panelid topic) interval(year `l' `l')
                          rename *_first lead`l'_*
                      }
                      Changes shown in bold face.

                      Comment


                      • #12
                        Okay, thanks so much! I'll have to read more about the rangestat command.

                        Comment


                        • #13
                          -rangestat- takes a little getting used to. It's not, at first, always clear what is supposed to be within the range of what. But after you use it several times, it becomes intuitive.

                          Comment


                          • #14
                            I just have one more question on this thread. It may be too tangential, in which case I can post it as a separate thread.

                            My project involves testing the effect of executive orders (eos) on rule production (prevalence), and now thanks to this code I have lags and leads on the eos variable that allow me to account for delay effects. But if I want to be precise, then I probably want to ignore eos that came from a previous president, since those aren't measuring responsiveness to the current president. They're sort of "dead hand" executive orders.

                            So I want to have an indicator for when the lag or lead eos observation comes from a previous or subsequent president. If I have another variable that indicates the president (i.e., Clinton=1, Bush=2, Obama=3, etc.), is there any easy way to amend or add to this code to ensure that when I lag or lead a variable, an indicator is created that shows "0" if the lagged value actually comes from a different president than the president that is sitting at the time of the observation for prevalence and "1" if it's the observation is still within the same presidency?

                            Comment


                            • #15
                              Code:
                              forvalues l = 1/3 {
                                  rangestat (first) `vbles' president, by(panelid topic) interval(year `l' `l')
                                  rename *_first lead`l'_*
                                  gen byte same_president`l' = (president == lead`l'_president)
                              }
                              should do the trick if I understand the question correctly. Works the same way for the lags, too.

                              Comment

                              Working...
                              X