Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • building a continuous year variable

    Dear all,

    Following is my dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 country float(year var3)
    "A" 2001 2.4
    "A" 2003 3.5
    "A" 2009 6.9
    "A" 2010 3.1
    "B" 2002 6.4
    "B" 2003 5.7
    "B" 2004 8.3
    "C" 2007 5.3
    "C" 2008   8
    "C" 2009 9.3
    "C" 2010 7.3
    "C" 2011 6.8
    end
    I would like to create a variable year which runs from 2001 to 2011, leaving the observation for var3 empty where there is no data available. The resulting data should look like this:
    country year var3
    A 2001 2.4
    A 2002
    A 2003 3.5
    A 2004
    A 2005
    A 2006
    A 2007
    A 2008
    A 2009 6.9
    A 2010 3.1
    A 2011
    B 2001
    B 2002 6.4
    B 2003 5.7
    B 2004 8.3
    B 2005
    B 2006
    B 2007
    B 2008
    B 2009
    B 2010
    B 2011
    C 2001
    C 2002
    C 2003
    C 2004
    C 2005
    C 2006
    C 2007 5.3
    C 2008 8
    C 2009 9.3
    C 2010 7.3
    C 2011 6.8
    I would really appreciate your help. Thanks

    Kusum

  • #2
    You may try this:

    Code:
    encode country, gen(mycountry)
    
    tsset mycountry year
    
    tsfill, full
    
    . list
    
         +----------------------------------+
         | country   year   var3   mycoun~y |
         |----------------------------------|
      1. |       A   2001    2.4          A |
      2. |           2002      .          A |
      3. |       A   2003    3.5          A |
      4. |           2004      .          A |
      5. |           2005      .          A |
         |----------------------------------|
      6. |           2006      .          A |
      7. |           2007      .          A |
      8. |           2008      .          A |
      9. |       A   2009    6.9          A |
     10. |       A   2010    3.1          A |
         |----------------------------------|
     11. |           2011      .          A |
     12. |           2001      .          B |
     13. |       B   2002    6.4          B |
     14. |       B   2003    5.7          B |
     15. |       B   2004    8.3          B |
         |----------------------------------|
     16. |           2005      .          B |
     17. |           2006      .          B |
     18. |           2007      .          B |
     19. |           2008      .          B |
     20. |           2009      .          B |
         |----------------------------------|
     21. |           2010      .          B |
     22. |           2011      .          B |
     23. |           2001      .          C |
     24. |           2002      .          C |
     25. |           2003      .          C |
         |----------------------------------|
     26. |           2004      .          C |
     27. |           2005      .          C |
     28. |           2006      .          C |
     29. |       C   2007    5.3          C |
     30. |       C   2008      8          C |
         |----------------------------------|
     31. |       C   2009    9.3          C |
     32. |       C   2010    7.3          C |
     33. |       C   2011    6.8          C |
         +----------------------------------+
    Last edited by Marcos Almeida; 19 Mar 2019, 04:19.
    Best regards,

    Marcos

    Comment


    • #3
      You can do this, but there is usually no point to it. Exceptions: you intend to carry interpolation; you intend to merge with other datasets for which there are values for the years not present in the dataset.

      Here is some technique:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str1 country float(year var3)
      "A" 2001 2.4
      "A" 2003 3.5
      "A" 2009 6.9
      "A" 2010 3.1
      "B" 2002 6.4
      "B" 2003 5.7
      "B" 2004 8.3
      "C" 2007 5.3
      "C" 2008   8
      "C" 2009 9.3
      "C" 2010 7.3
      "C" 2011 6.8
      end
      
      encode country, gen(id) 
      tsset id year 
      
      fillin id year 
      
      bysort id (country) : replace country = country[_N] 
      
      tsset  
      list, sepby(country) 
      
           +--------------------------------------+
           | country   year   var3   id   _fillin |
           |--------------------------------------|
        1. |       A   2001    2.4    A         0 |
        2. |       A   2002      .    A         1 |
        3. |       A   2003    3.5    A         0 |
        4. |       A   2004      .    A         1 |
        5. |       A   2007      .    A         1 |
        6. |       A   2008      .    A         1 |
        7. |       A   2009    6.9    A         0 |
        8. |       A   2010    3.1    A         0 |
        9. |       A   2011      .    A         1 |
           |--------------------------------------|
       10. |       B   2001      .    B         1 |
       11. |       B   2002    6.4    B         0 |
       12. |       B   2003    5.7    B         0 |
       13. |       B   2004    8.3    B         0 |
       14. |       B   2007      .    B         1 |
       15. |       B   2008      .    B         1 |
       16. |       B   2009      .    B         1 |
       17. |       B   2010      .    B         1 |
       18. |       B   2011      .    B         1 |
           |--------------------------------------|
       19. |       C   2001      .    C         1 |
       20. |       C   2002      .    C         1 |
       21. |       C   2003      .    C         1 |
       22. |       C   2004      .    C         1 |
       23. |       C   2007    5.3    C         0 |
       24. |       C   2008      8    C         0 |
       25. |       C   2009    9.3    C         0 |
       26. |       C   2010    7.3    C         0 |
       27. |       C   2011    6.8    C         0 |
           +--------------------------------------+

      It is usually better to stop after

      Code:
      encode country, gen(id) 
      tsset id year
      and use time series operators thereafter.

      Comment


      • #4
        Thank you so much all. Dear Nick, I intend to interpolate, so!

        Also if I needed to create till 2012, because I will extrapolate, how do I do that? same dataset- I need 2012 for A,B,C.
        Help?

        Comment


        • #5
          Just add one extra observation with year 2012 and then tsfill again.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str1 country float(year var3)
          "A" 2001 2.4
          "A" 2003 3.5
          "A" 2009 6.9
          "A" 2010 3.1
          "B" 2002 6.4
          "B" 2003 5.7
          "B" 2004 8.3
          "C" 2007 5.3
          "C" 2008   8
          "C" 2009 9.3
          "C" 2010 7.3
          "C" 2011 6.8
          end
          
          encode country, gen(id)
          tsset id year
          
          fillin id year
          
          bysort id (country) : replace country = country[_N]
          
          tsset  
          
          set obs `=_N + 1'
          replace id = 1 in L
          replace year = 2012 in L
          tsfill, full
          
          tsset
          Last edited by Nick Cox; 19 Mar 2019, 05:47.

          Comment


          • #6
            Thank you so much Nick!!

            Comment

            Working...
            X