Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create Variable "lagged"

    Hello every one!

    I have "red_corrupt" (for year 2009, 2011, 2013, 2015 and 2017)

    And I want to create another one "lagged" because de question is "in the past 2 year have you seen any reduction of corruption..."

    So, I want to create a variable for:
    "red_corrupt" (at 2009) with data of 2011
    "red_corrupt" (at 2011) with data of 2013
    "red_corrupt" (at 2013) with data of 2015
    "red_corrupt" (at 2015) with data of 2017


    Any ideas?

    Thanks!
    Last edited by Jose Ignacio Torrealba; 12 Dec 2022, 13:54.

  • #2
    Well, it makes a difference whether you have a simple time series or you have panel data. If you have panel data, you have to -xtset- your data; for a simple single time series you need to -tsset- it. I'll assume you have only a simple single time series.

    Code:
    tsset year, delta(2)
    gen lagged_red_corrupt = L1.red_corrupt
    That said, there may not be any need to create this variable. If you need it only to include it as an independent variable in a regression, you don't need the variable: you can just write L1.red_corrupt in the regression command itself and Stata will calculate the lagged version "on the fly." (You do have to first -tsset-, or -xtset- as the case may be, your data, however.)

    Added: In composing this answer I have had to make guesses about your data: is it panel or time series? What is the name of the time variable? Are the years 2011, 2013, 2015, and 2017 the only values of that variable? If I have guessed wrong, you will have to waste some time modifying the code I'm showing you. And it will have been a partial waste of my time as well. In the future, when asking for help with code, please show example data and meta-data using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.
    Last edited by Clyde Schechter; 12 Dec 2022, 13:40.

    Comment


    • #3
      Thank you Clyde. When I tried to do it:

      . tsset year, delta(2)
      repeated time values in sample

      And yes, the "base" variable is "comuna" (like county), so there are 52 different comunas per year.

      Some other way?

      added:


      . dataex year comuna

      ----------------------- copy starting from the next line -----------------------
      Code:
      input float year double comuna
      2009 13101
      2009 13101
      2009 13101
      2009 13101
      2009 13101
      2009 13101
      .
      .
      .
      
      end
      label values comuna comuna
      label def comuna 13101 "Santiago", modify
      ------------------ copy up to and including the previous line ------------------

      Listed 100 out of 188309 observations
      Use the count() option to list more

      . tabul comuna, nol

      Comuna | Freq. Percent Cum.
      ------------+-----------------------------------
      13101 | 5,713 3.03 3.03
      13102 | 2,209 1.17 4.21
      13103 | 3,886 2.06 6.27
      13104 | 7,639 4.06 10.33
      13105 | 6,828 3.63 13.95
      13106 | 3,319 1.76 15.72
      13107 | 2,548 1.35 17.07
      13108 | 2,173 1.15 18.22
      13109 | 2,268 1.20 19.43
      13110 | 7,459 3.96 23.39
      13111 | 3,495 1.86 25.24
      13112 | 4,454 2.37 27.61
      13113 | 2,370 1.26 28.87
      13114 | 7,929 4.21 33.08
      13115 | 2,759 1.47 34.54
      13116 | 2,844 1.51 36.05
      13117 | 2,853 1.52 37.57
      13118 | 2,708 1.44 39.01
      13119 | 8,130 4.32 43.32
      13120 | 6,300 3.35 46.67
      13121 | 3,506 1.86 48.53
      13122 | 4,946 2.63 51.16
      13123 | 5,996 3.18 54.34
      13124 | 5,234 2.78 57.12
      13125 | 4,702 2.50 59.62
      13126 | 3,146 1.67 61.29
      13127 | 4,064 2.16 63.45
      13128 | 3,661 1.94 65.39
      13129 | 2,519 1.34 66.73
      13130 | 3,404 1.81 68.54
      13131 | 2,677 1.42 69.96
      13132 | 2,423 1.29 71.25
      13201 | 11,217 5.96 77.20
      13202 | 1,649 0.88 78.08
      13203 | 1,666 0.88 78.96
      13301 | 3,250 1.73 80.69
      13302 | 2,925 1.55 82.24
      13303 | 1,923 1.02 83.26
      13401 | 5,580 2.96 86.23
      13402 | 2,324 1.23 87.46
      13403 | 1,917 1.02 88.48
      13404 | 2,377 1.26 89.74
      13501 | 2,802 1.49 91.23
      13502 | 1,539 0.82 92.05
      13503 | 1,826 0.97 93.02
      13504 | 1,482 0.79 93.80
      13505 | 1,199 0.64 94.44
      13601 | 2,464 1.31 95.75
      13602 | 1,816 0.96 96.71
      13603 | 1,885 1.00 97.71
      13604 | 1,934 1.03 98.74
      13605 | 2,371 1.26 100.00
      ------------+-----------------------------------
      Total | 188,308 100.00

      . tabul year

      year | Freq. Percent Cum.
      ------------+-----------------------------------
      2009 | 49,706 26.40 26.40
      2011 | 39,513 20.98 47.38
      2015 | 57,548 30.56 77.94
      2017 | 41,541 22.06 100.00
      ------------+-----------------------------------
      Total | 188,308 100.00

      .
      Last edited by Jose Ignacio Torrealba; 12 Dec 2022, 14:07.

      Comment


      • #4
        .
        Last edited by Jose Ignacio Torrealba; 12 Dec 2022, 14:02.

        Comment


        • #5
          It looks like this data, at least as far as you have shown it, cannot be analyzed in this way. You have multiple observations with the same communa in 2009. If they have different values for red_corrupt, there is no way to know which of those would be the lagged value for that any observation of the same communa in 2010.

          Now, it may be that communa, in combination with some other variable(s) in your data set will identify entities that are distinct and carry forward from one year to the next. If that is the case, you first need to create a new variable that reflects that combination, using the -egen, group()- function, and then use -xtset-, not -tsset-, with that new variable and year. But you have not, so far, said anything that suggests that there are any such variables. You just seem to have multiple observations per year of the same communas, and multiple years of data with that. No lag operator can be defined in such a situation.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            It looks like this data, at least as far as you have shown it, cannot be analyzed in this way. You have multiple observations with the same communa in 2009. If they have different values for red_corrupt, there is no way to know which of those would be the lagged value for that any observation of the same communa in 2010.

            Now, it may be that communa, in combination with some other variable(s) in your data set will identify entities that are distinct and carry forward from one year to the next. If that is the case, you first need to create a new variable that reflects that combination, using the -egen, group()- function, and then use -xtset-, not -tsset-, with that new variable and year. But you have not, so far, said anything that suggests that there are any such variables. You just seem to have multiple observations per year of the same communas, and multiple years of data with that. No lag operator can be defined in such a situation.
            Perfect! Thank you for all your time Clyde!
            Good luck,
            José

            Comment

            Working...
            X