Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Average annual growth rate from panel with gaps in the time series

    Hello everyone,

    I have a panel with occupation shares (already in percentages) across 9 occupation different (ISCO-88) categories for a set of countries over time, for the period 1993-2010.
    For a cross-sectional regression I want to calculate the average annual percentage change for each country for the 9 occupation categories.
    The problem is now that the data series often show gaps. The countries vary significantly in the number of observations.
    So my idea was to use the observations I have, calculate the change rate and divide this by the number of periods between the two observations.
    If the time series is complete, there is no problem, as the change rates can be easily calculated, as in the case of for example Canada (see tabe below).
    I do that with the following code

    Code:
    tsset c_id year 
    foreach x in occ1 occ2 occ3 occ4 occ5 occ6 occ7 occ8 occ9 occ_total occ_x_not_elsewhere { 
        gen `x'_gr = D.`x' 
    }
    As my data is already in percentages, the simple difference between two values yields changes in percentage points.

    The challenge is now to deal with gaps in the data. As Stata skips gaps in the time series, so for example in the case of Cambodia, I only get changes for 2000-2001, but not for 2001-2004, due to missing observations. I would like to generate a code that also includes the latter change, and divide it (in this case) by 3, or whatever the number of respective periods, to later calculate average annual changes, taking several change rates together.

    c_id country year occ1 occ2 occ3 occ4 occ5 occ6 occ7 occ8 occ9 occ1_gr occ2_gr
    13 Cambodia 1993
    13 Cambodia 1994
    13 Cambodia 1995
    13 Cambodia 1996
    13 Cambodia 1997
    13 Cambodia 1998
    13 Cambodia 1999
    13 Cambodia 2000 .5 1.3 2 .6 8.9 72.1 6.6 2.6 4.4
    13 Cambodia 2001 .7 1.2 2 .4 10.5 65.6 8.2 3.5 7.3 .2 -.0999999
    13 Cambodia 2002
    13 Cambodia 2003
    13 Cambodia 2004 .3 .1 2.1 .8 6.7 70.4 5.5 1.2 5.4
    13 Cambodia 2005
    13 Cambodia 2006
    13 Cambodia 2007
    13 Cambodia 2008
    13 Cambodia 2009
    13 Cambodia 2010
    14 Canada 1993 10.6 14.8 12.5 14.8 14.2 3.7 9.3 10.2 9.8
    14 Canada 1994 10.4 15.4 12.4 14.2 14.1 3.5 9.4 10.6 10 -.2000008 .5999994
    14 Canada 1995 11 15.2 12.5 13.7 14.1 3.5 9.4 10.9 9.7 .6000004 -.1999998
    14 Canada 1996 11 14.8 12.7 13.4 14.4 3.5 9.4 10.3 9.8 0 -.3999996
    14 Canada 1997 10.1 15.8 13.1 13.8 14 3.4 10.4 10.5 8.9 -.8999996 1
    14 Canada 1998 9.8 15.8 13.6 13.6 14.1 3.3 10.3 10.5 8.9 -.3000002 0
    14 Canada 1999 9.7 16 13.6 13.8 14.2 3.2 10.5 10.8 8.2 -.1000004 .1999998
    14 Canada 2000 9.8 15.8 13.9 13.8 14.2 2.9 10.3 11 8.2 .1000004 -.1999998
    14 Canada 2001 9.1 16.2 14.2 14.1 14.5 2.6 10.3 10.8 8 -.6999998 .4000006
    14 Canada 2002 9 16.2 14.2 13.8 14.7 2.5 10.7 10.8 8.1 -.1000004 0
    14 Canada 2003 8.9 15.7 14.6 13.9 14.7 2.5 10.6 10.8 8.2 -.1000004 -.500001
    14 Canada 2004 9.2 15.8 14.3 14.1 14.7 2.4 10.5 10.6 8.3 .3000002 .1000004
    14 Canada 2005 9.2 16.9 14.8 13.8 14.1 2.4 10.3 10 8.3 0 1.099999
    14 Canada 2006 9.3 17.1 15 13.8 14.2 2.4 10.1 9.7 8.4 .1000004 .2000008
    14 Canada 2007 9.1 17.4 15.2 13.5 14.6 2.3 10.2 9.3 8.3 -.1999998 .2999992
    14 Canada 2008 9.3 17.6 15.5 13.4 14.5 2.1 10.5 8.9 8 .1999998 .2000008
    14 Canada 2009 9.3 18 16 13.2 15.1 2.2 10.2 8.4 7.6 0 .3999996
    14 Canada 2010 9.2 18.2 16.4 13 15 2.1 10.1 8.4 7.6 -.1000004 .2000008

    I hope I explained the problem well. Any help is appreciated! :-)

    Regards,
    Jonas

  • #2
    Your problem at its simplest can be resolved by first interpolating linearly within the data. Note here the use of dataex as earnestly reguested in FAQ Advice #12.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte c_id str8 country float(year occ1)
    13 "Cambodia" 1993    .
    13 "Cambodia" 1994    .
    13 "Cambodia" 1995    .
    13 "Cambodia" 1996    .
    13 "Cambodia" 1997    .
    13 "Cambodia" 1998    .
    13 "Cambodia" 1999    .
    13 "Cambodia" 2000   .5
    13 "Cambodia" 2001   .7
    13 "Cambodia" 2002    .
    13 "Cambodia" 2003    .
    13 "Cambodia" 2004   .3
    13 "Cambodia" 2005    .
    13 "Cambodia" 2006    .
    13 "Cambodia" 2007    .
    13 "Cambodia" 2008    .
    13 "Cambodia" 2009    .
    13 "Cambodia" 2010    .
    14 "Canada"   1993 10.6
    14 "Canada"   1994 10.4
    14 "Canada"   1995   11
    14 "Canada"   1996   11
    14 "Canada"   1997 10.1
    14 "Canada"   1998  9.8
    14 "Canada"   1999  9.7
    14 "Canada"   2000  9.8
    14 "Canada"   2001  9.1
    14 "Canada"   2002    9
    14 "Canada"   2003  8.9
    14 "Canada"   2004  9.2
    14 "Canada"   2005  9.2
    14 "Canada"   2006  9.3
    14 "Canada"   2007  9.1
    14 "Canada"   2008  9.3
    14 "Canada"   2009  9.3
    14 "Canada"   2010  9.2
    end
    
    . ipolate occ1 year, by(c_id) gen(occ1_2)
    (13 missing values generated)
    
    . format occ1_2 %8.3f
    
    . list, sepby(country)
    
         +----------------------------------------+
         | c_id    country   year   occ1   occ1_2 |
         |----------------------------------------|
      1. |   13   Cambodia   1993      .        . |
      2. |   13   Cambodia   1994      .        . |
      3. |   13   Cambodia   1995      .        . |
      4. |   13   Cambodia   1996      .        . |
      5. |   13   Cambodia   1997      .        . |
      6. |   13   Cambodia   1998      .        . |
      7. |   13   Cambodia   1999      .        . |
      8. |   13   Cambodia   2000     .5    0.500 |
      9. |   13   Cambodia   2001     .7    0.700 |
     10. |   13   Cambodia   2002      .    0.567 |
     11. |   13   Cambodia   2003      .    0.433 |
     12. |   13   Cambodia   2004     .3    0.300 |
     13. |   13   Cambodia   2005      .        . |
     14. |   13   Cambodia   2006      .        . |
     15. |   13   Cambodia   2007      .        . |
     16. |   13   Cambodia   2008      .        . |
     17. |   13   Cambodia   2009      .        . |
     18. |   13   Cambodia   2010      .        . |
         |----------------------------------------|
     19. |   14     Canada   1993   10.6   10.600 |
     20. |   14     Canada   1994   10.4   10.400 |
     21. |   14     Canada   1995     11   11.000 |
     22. |   14     Canada   1996     11   11.000 |
     23. |   14     Canada   1997   10.1   10.100 |
     24. |   14     Canada   1998    9.8    9.800 |
     25. |   14     Canada   1999    9.7    9.700 |
     26. |   14     Canada   2000    9.8    9.800 |
     27. |   14     Canada   2001    9.1    9.100 |
     28. |   14     Canada   2002      9    9.000 |
     29. |   14     Canada   2003    8.9    8.900 |
     30. |   14     Canada   2004    9.2    9.200 |
     31. |   14     Canada   2005    9.2    9.200 |
     32. |   14     Canada   2006    9.3    9.300 |
     33. |   14     Canada   2007    9.1    9.100 |
     34. |   14     Canada   2008    9.3    9.300 |
     35. |   14     Canada   2009    9.3    9.300 |
     36. |   14     Canada   2010    9.2    9.200 |
         +----------------------------------------+
    However, various questions arise:

    1. Growth rates in percent measures might be better assessed on some other scale, e.g. logit.

    2. Linear interpolation is not the only possible interpolation method. See e.g. https://www.statalist.org/forums/for...-interpolation

    3. You have here compositional data and constraints across the components are ignored in treating occupations separately.

    Comment


    • #3
      Thank's a lot Nick, this solved my problem! Linear interpolation is sufficient in my case. Will use dataex next time!

      Comment

      Working...
      X