Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to create 5-years non-overlapping panel with one observation per period

    Hi to everyone, I need to create a 5-years non-overlapping panel database starting from an usual panel database. I have data from 1970 to 2015 and I need just one observation for 5 years to reduce the effect of missing data. I want to subtract from the last one observation the first observation and divide by 5 and repeat this procedure on all the data by id. But there is a problem of missing data, thus sometimes the 5-years period is reduced for examples to 3-years period. For example:

    year id GDP
    1970 1 -
    1971 1 -
    1972 1 1500
    1973 1 1700
    1974 1 2000
    1975 1 1800
    1976 1 1900
    1977 1 2000
    1978 1 2100
    1979 1 2000
    1970 2 -
    1971 2 -
    1972 2 -
    1973 2 1700
    1974 2 2000
    1975 2 1800
    1976 2 1900
    1977 2 2000
    1978 2 2100
    1979 2 2000

    for id 1
    first period: 1970-1974 I need to calculate (GDP74-GDP72)/3 since I have missing data
    second period: 1975-1979 I need to calculate (GDP79-GDP75)/5 since no missing.

    for id 2
    first period: 1970-1974 I need to calculate (GDP74-GDP73)/2 since I have missing data
    second period: 1975-1979 I need to calculate (GDP79-GDP75)/5 since no missing.

    and so on.
    How can I say this to STATA? Thank you!


  • #2
    In the example you show, the years with missing GDP are all at the beginning of the period. Is that always the case in your data? If not, what would you do if an id had, say data for 1972 and 1973 only in the 1970-1974 period?

    Also, when posting back, please use the -dataex- command to show your example data. You can install -dataex- by running -ssc install dataex-. The simple instructions for using it are in -help dataex-. The problem with the data you posted is that it is difficult to bring it into Stata to try out a solution. By using -dataex- you will provide those who want to help you with the ability to create a faithful Stata replica of your data with just a simple copy/paste operation.

    Comment


    • #3
      I usually have missing at the beginning but not always, so I need a general formula to tell to Stata to bring the last one observation minus the first one and divide by the numbers of the years observed. And I have to repeat this by id and every 5 years in order to obtain 9 observations. In that case I would do (GDP73-GDP72)/2. Sorry for the inconvenience but did not know the command. Thanks a lot.

      Comment


      • #4
        Since you did not use -dataex- to repost your example data, what I show you here is untested code. If it isn't right, it should anyhow point you in the right direction.

        Code:
        //    IDENTIFY FIVE YEAR PERIODS BY THEIR START YEAR
        gen period = 5*ceil((year +1)/5)
        
        //    FIND FIRST AND LAST YEAR WITH GDP IN EACH PERIOD
        gen byte has_gdp = !missing(GDP)
        by id period has_gdp (year), sort: gen weight = -1 if _n == 1
        by id period has_gdp (year): replace weight = 1 if _n == _N
        
        //    CALCULATE DIFFRENCE BETWEEN FIRST & LAST GDP
        //    AND DIFFERENCE BETWEEN FIRST AND LAST YEAR
        by id period has_gdp (year): egen numerator = total(weight*GDP)
        by id period has_gdp (year): egen denominator = total(weight*year)
        
        //    CALCULATE THE DESIRED RATIO
        by id period: gen desired = numerator[_N]/(denominator[_N] + 1)
        sort id year
        Added: There is a problem with the above code. If there is some five-year period where you have only a single year's data, it will give you the GDP for that year divided by two. Probably that isn't what you want. It seems like you are trying to calculate the average annual change in GDP over the period. If there is only one non-missing value of GDP in the period, then that average really is just undefined. So probably you should add the following to the end of the above code:

        Code:
        by id period, sort: egen n_years = total(has_gdp)
        by id period: replace desired = . if n_years < 2
        sort id year
        Added:
        Sorry for the inconvenience but did not know the command.
        Fair enough, but please re-read the second paragraph of my response in #2, which tells you how to install the -dataex- command and how to learn to use it. It will only take you a couple of minutes, if that much. It will save you and others on this Forum an enormous amount of time if you use it going forward.

        Last edited by Clyde Schechter; 28 Jan 2017, 18:40.

        Comment

        Working...
        X