Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding rows with criteria

    Dear all,

    I am working on a panel dataset, in which the years (time dimension) available are 2001 2003 2005 2007 2009 and so on, every time skipping one year.
    To make an example of the dataset:
    ID Year VarA VarB
    1 2001 10 5
    1 2003 20 15
    1 2005 30 25
    2 2001 15 10
    2 2003 25 20
    2 2005 35 30
    3 2001 60 65
    3 2003 70 75
    3 2005 80 85
    I would like to insert for each ID a row for the missing years (2002 and 2004 in the example above) and for some selected variables (VarA and VarB in example) fill the gaps with the mean of the previous value with the following value. To give an example of the final result:
    ID Year VarA VarB
    1 2001 10 5
    1 2002 15 10
    1 2003 20 15
    1 2004 25 20
    1 2005 30 25
    2 2001 15 10
    2 2002 20 15
    2 2003 25 20
    2 2004 30 25
    2 2005 35 30
    3 2001 60 65
    3 2002 65 70
    3 2003 70 75
    3 2004 75 80
    3 2005 80 85
    Do you know how to do this?
    Thanks in advance for the help!

    Luca.

  • #2
    It can be done, but it's usually a bad idea. Just declare your data as

    Code:
    tsset ID Year, delta(2)
    Otherwise anything you do has model figures of merit and degrees of freedom that are spurious.

    I'll explain how to do it nevertheless, because sometimes the motive is wanting to merge with other data that are yearly

    Note that while your data example is manageable, using dataex (SSC) is the recommended way to do it. This is documented (FAQ Advice #12).

    Code:
    clear 
    input ID    Year    VarA    VarB
    1    2001    10    5
    1    2003    20    15
    1    2005    30    25
    2    2001    15    10
    2    2003    25    20
    2    2005    35    30
    3    2001    60    65
    3    2003    70    75
    3    2005    80    85
    end 
    local Np1 = _N + 1
    expand 2 
    bysort ID (Year) : replace Year = Year + 1 if Year == Year[_n-1] 
    
    foreach v in VarA VarB { 
        replace `v' = . if mod(Year, 2) == 0 
        bysort ID: ipolate `v' Year, gen(`v'2) 
    } 
    
    list, sepby(ID) 
    
         +-----------------------------------------+
         | ID   Year   VarA   VarB   VarA2   VarB2 |
         |-----------------------------------------|
      1. |  1   2001     10      5      10       5 |
      2. |  1   2002      .      .      15      10 |
      3. |  1   2003     20     15      20      15 |
      4. |  1   2004      .      .      25      20 |
      5. |  1   2005     30     25      30      25 |
      6. |  1   2006      .      .       .       . |
         |-----------------------------------------|
      7. |  2   2001     15     10      15      10 |
      8. |  2   2002      .      .      20      15 |
      9. |  2   2003     25     20      25      20 |
     10. |  2   2004      .      .      30      25 |
     11. |  2   2005     35     30      35      30 |
     12. |  2   2006      .      .       .       . |
         |-----------------------------------------|
     13. |  3   2001     60     65      60      65 |
     14. |  3   2002      .      .      65      70 |
     15. |  3   2003     70     75      70      75 |
     16. |  3   2004      .      .      75      80 |
     17. |  3   2005     80     85      80      85 |
     18. |  3   2006      .      .       .       . |
         +-----------------------------------------+

    Comment


    • #3
      Originally posted by Nick Cox View Post
      It can be done, but it's usually a bad idea.
      It is not for estimation, it should work fine!

      Thanks, the code is perfect!

      Luca.

      Comment

      Working...
      X