Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting 5-year interval to yearly panel data

    Dear Statalisters,
    I would like to insert the missing years in a large multi-country five-year interval panel data so that the time-series component becomes yearly. The objective is to interpolate, at a later stage, yearly data for relevant variables. Here are, heuristically, samples of how my starting and target data look like:

    Sample of the data

    Target data


    I am using Stata 14.2. Any feedback would be much appreciated.

    Thanks.

    Imed.


  • #2
    Imed:
    unfortunately, your post do not show any excerpt/example of your data.
    A better option is to use -dataex- to share it with interested listers.
    As an aside -ipolate- 80% of your data (ie, for 4 out of 5 years) sounds like a risky approach.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo, here is the original sample

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float id str46 country double year float lu
      1 "Algeria" 2000 30.901855
      1 "Algeria" 2005  25.70785
      1 "Algeria" 2010 22.017984
      2 "Bahrain" 2000  9.807788
      2 "Bahrain" 2005  7.523893
      2 "Bahrain" 2010  5.780531
      end
      format %ty year
      the interpolation will partly use data from other sources.

      Imed


      Last edited by Imed Limam; 22 Dec 2018, 07:56.

      Comment


      • #4
        Imed:
        you may want to try:
        Code:
        expand 5 if year==2000 | year==2005
        
        
        bysort country: replace year=.
        
        
        bysort country: replace year=2000 if _n==1
        
        
        bysort country: replace year=year[_n-1]+1 if year>2000
        
        
        bysort country: replace lu=. if _n>1 & _n<6
        
        
        bysort country: replace lu=. if _n>6 & _n<11
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks Carlo. Very judicious combination of the sorting and expand commands. However, the values of the "lu" variable corresponding to the years 2000, 2005 and 2010 are different from the original data. eg. lu(2000) for Algeria was 30.9 became 25.7, which is the value corresponding to the year 2005. The same thing happens for Bahrain.
          I will try to play around with the code you kindly provided. Of course any further ideas would be welcome.

          Kind regards,

          Imed.

          Comment


          • #6
            Imed:
            sorry for the mistake.
            I've clearly omitted one line of code in my previous reply..
            The revised version sounds like this:
            Code:
            . expand 5 if year==2000 | year==2005
            (16 observations created)
            
            . sort id year
            
            . bysort country: replace year=.
            
            . bysort country: replace year=2000 if _n==1
            
            
            . bysort country: replace year=year[_n-1]+1 if year>2000
            
            
            . bysort country: replace lu=. if _n>1 & _n<6
            
            
            . bysort country: replace lu=. if _n>6 & _n<11
            
            . list
            
                 +--------------------------------+
                 | id   country   year         lu |
                 |--------------------------------|
              1. |  1   Algeria   2000   30.90186 |
              2. |  1   Algeria   2001          . |
              3. |  1   Algeria   2002          . |
              4. |  1   Algeria   2003          . |
              5. |  1   Algeria   2004          . |
                 |--------------------------------|
              6. |  1   Algeria   2005   25.70785 |
              7. |  1   Algeria   2006          . |
              8. |  1   Algeria   2007          . |
              9. |  1   Algeria   2008          . |
             10. |  1   Algeria   2009          . |
                 |--------------------------------|
             11. |  1   Algeria   2010   22.01798 |
             12. |  2   Bahrain   2000   9.807788 |
             13. |  2   Bahrain   2001          . |
             14. |  2   Bahrain   2002          . |
             15. |  2   Bahrain   2003          . |
                 |--------------------------------|
             16. |  2   Bahrain   2004          . |
             17. |  2   Bahrain   2005   7.523893 |
             18. |  2   Bahrain   2006          . |
             19. |  2   Bahrain   2007          . |
             20. |  2   Bahrain   2008          . |
                 |--------------------------------|
             21. |  2   Bahrain   2009          . |
             22. |  2   Bahrain   2010   5.780531 |
                 +--------------------------------+
            
            .
            Then, you have to -ipolate- (but I would not be confident that such a relevant volume of imputed data can give you reliable estimates of the data generating process).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Thank you again Carlo. It works fine.

              All the best.

              Comment


              • #8
                Dear Imed, You may also want to try
                Code:
                xtset id year
                tsfill
                bys id (year): replace country = country[_n-1] if country == ""
                Ho-Chuan (River) Huang
                Stata 19.0, MP(4)

                Comment


                • #9
                  Dear Ho-Chuan, many thanks for your suggestion. I just saw it. Kind regards, Imed.

                  Comment

                  Working...
                  X