Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Lagged and Forward Commands in Panel Data. L. automatically fills in values, but F. does not?

    Hi,

    I am using lagged and forward commands with panel data. In my data set I have some missing values, and I want to create new variables that imputes values both forwards and backwards. In the example, I show below, I impute values very simply, either add 1 to the previous value or subtract 1 from the succeeding value.

    As you will see, adding 1 to the previous value will automatically fill in, but subtracting 1 from the succeeding value does not.

    Code:
    .
    clear
    
    . input str8 country year value
    
           country       year      value
      1. "Cambodia" 2007 5
      2. "Cambodia"      2008 .
      3. "Cambodia"      2009 .
      4. "Cambodia" 2010 .
      5. "Cambodia" 2011 10
      6. "China" 2007 100
      7. "China" 2008 120
      8. "China" 2009 .
      9. "China" 2010 .
     10. "China" 2011 100
     11. end
    
    . 
    . list, sepby(country)
    
         +-------------------------+
         |  country   year   value |
         |-------------------------|
      1. | Cambodia   2007       5 |
      2. | Cambodia   2008       . |
      3. | Cambodia   2009       . |
      4. | Cambodia   2010       . |
      5. | Cambodia   2011      10 |
         |-------------------------|
      6. |    China   2007     100 |
      7. |    China   2008     120 |
      8. |    China   2009       . |
      9. |    China   2010       . |
     10. |    China   2011     100 |
         +-------------------------+
    
    . 
    . encode country, generate(country_id)
    
    . tsset country_id year
           panel variable:  country_id (strongly balanced)
            time variable:  year, 2007 to 2011
                    delta:  1 unit
    
    . 
    . /*Forward Impute Values*/
    . gen forward_value = value
    (5 missing values generated)
    
    . replace forward_value = L.forward_value + 1 if mi(value)
    (5 real changes made)
    
    . 
    . list, sepby(country)
    
         +-----------------------------------------------+
         |  country   year   value   country_id   forward_value |
         |-----------------------------------------------|
      1. | Cambodia   2007       5   Cambodia          5 |
      2. | Cambodia   2008       .   Cambodia          6 |
      3. | Cambodia   2009       .   Cambodia          7 |
      4. | Cambodia   2010       .   Cambodia          8 |
      5. | Cambodia   2011      10   Cambodia         10 |
         |-----------------------------------------------|
      6. |    China   2007     100      China        100 |
      7. |    China   2008     120      China        120 |
      8. |    China   2009       .      China        121 |
      9. |    China   2010       .      China        122 |
     10. |    China   2011     100      China        100 |
         +-----------------------------------------------+
    Notice that the variable forward_value automatically fills in the values. However, if I impute the values backward:

    Code:
    . 
    . 
    . /*Backward Impute Values*/
    . *For some reason this doesn't fill automatically
    . gen backward_value = value
    (5 missing values generated)
    
    . replace backward_value = F.backward_value - 1 if mi(value)
    (2 real changes made)
    
    . 
    . list, sepby(country)
    
         +----------------------------------------------------------+
         |  country   year   value   country_id   forward_value   backward_value |
         |----------------------------------------------------------|
      1. | Cambodia   2007       5   Cambodia          5          5 |
      2. | Cambodia   2008       .   Cambodia          6          . |
      3. | Cambodia   2009       .   Cambodia          7          . |
      4. | Cambodia   2010       .   Cambodia          8          9 |
      5. | Cambodia   2011      10   Cambodia         10         10 |
         |----------------------------------------------------------|
      6. |    China   2007     100      China        100        100 |
      7. |    China   2008     120      China        120        120 |
      8. |    China   2009       .      China        121          . |
      9. |    China   2010       .      China        122         99 |
     10. |    China   2011     100      China        100        100 |
         +----------------------------------------------------------+
    Notice here that backward_value does not automatically fill in. There are still some missing values. How do I change this so that I can automatically fill in backwards as well?

    Thank you for your help in advance!

    Vincent


  • #2
    Independently of whether this imputation is convenient or not for your analysis, the programming issue is the following; the "backwards imputation" doesn't work because Stata starts the -replace-ment operation from observation 1 to observation _N. If you carefully carry out the operations (e.g. in your head or using paper), you will see that Stata is doing exactly what you asked it to do. Nothing unexpected there.

    To achieve the result you want, using time series operators, I believe you need to sort the original time variable in descending order. You can do this with an auxiliary time variable:

    Code:
    clear
    
    *----- example data set -----
    
    input ///
    str8 country year value
      "Cambodia" 2007 5
      "Cambodia"      2008 .
      "Cambodia"      2009 .
      "Cambodia" 2010 .
      "Cambodia" 2011 10
      "China" 2007 100
      "China" 2008 120
      "China" 2009 .
      "China" 2010 .
      "China" 2011 100
    end
    
    encode country, gen(country_id)
    drop country
    order country_id
    
    list, sepby(country)
    
    *----- what you want -----
    
    tsset country_id year
     
    // forward impute
    gen forward_value = value
    replace forward_value = L.forward_value + 1 if mi(value)
    
    // backwards impute
    gen year2 = year[_N] - year
    tsset country_id year2
    
    gen backward_value = value
    replace backward_value = L.backward_value - 1 if mi(value)
    
    tsset country_id year
    
    // check
    drop year2
    list, sepby(country)
    I'm assuming no missings for the variable -year-.

    If you can dispense with the time series operators, an alternative solution should be possible.
    Last edited by Roberto Ferrer; 22 Feb 2015, 00:03.
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      Hi Roberto. Thanks for the response! Of course, I can dispense with the time series operator. Is there an easier way without the time series operators? I just thought that using the time series operators would be the easiest.

      Comment


      • #4
        This is an FAQ

        FAQ . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
        1/12 How can I replace missing values with previous or
        following nonmissing values or within sequences?
        http://www.stata.com/support/faqs/data-management/
        replacing-missing-values/

        That FAQ covers the points validly made by Roberto, with more besides.

        Comment

        Working...
        X