Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Limit the Number of Years Extrapolated?

    Dear Listserv Readers—

    I’m extrapolating using a cross-national time-series dataset, and want to limit the number of years I extrapolate out on a country-by-country basis (so that the extrapolated data doesn’t go on indefinitely for like 40 years).

    My cases/observations are country-years, and I have about 75 years per country in my dataset. One of my independent variables (hoji) is for average attitudes. For hoji, I have one to eight observations per country. (These observations are randomly spaced out over the course of 35 years.) To try and “fill in” the data for missing years, I have tried interpolating AND extrapolating the data. For countries with only one data point, I have also made up a way to fill in that single data point across every year. (See my Stata code BELOW, where cno is the name of the country and hojie is the new variable generated.) I’m using Stata 13.

    My problem is, I'm not super-confident about the resulting data. (1) For the extrapolated data, I want to limit things, so that the data are only extrapolated five years out from the last existing actual data point. (2) Similarly, for countries with only data point, I want to fill in that data point only five years before and after the year in which the actual observation occurred. …Does anyone know how I might do this?

    Thanks!
    Louisa

    Here’s the code I’m using now:

    by cno: ipolate hoji year, generate(hojie1) epolate
    by cno: egen hojie1plus = min(hojie1)
    gen hojie = hojie1
    replace hojie= hojie1plus if hojie1==.
    label var hojie "hojust interp, extrap & means for 1-obs cases"

    (P.S. It looks like the user-written mipolate command might be a good one here. But again, I can’t figure out how to limit the number of years extrapolated or filled out.)

  • #2
    Interpolation (wide sense including extrapolation) is, as implemented in ipolate (and mipolate (SSC) which you allude to), is a deterministic process. You want time series forecasting, which is a different art.

    Comment


    • #3
      I think rather than trying to limit the application of the extrapolation, it is easier to do the extrapolation and then overwrite the unwanted values with zero. So something like this:

      Code:
      by cno (year), sort: egen last_actual_data_year = max(cond(!misssing(hoji), year, .))
      by cno: egen n_actual_values = total(!missing(hoji))
      // NOW INSERT YOUR INTERPOLATION/EXTRAPOLATION CODE HERE
      
      replace hojie = . if year > last_actual_data_year + 5
      replace hojie = . if year < last_actual_data_year - 5 & n_actual_values == 1
      Crossed with Nick's post. He makes a good point here. My response has focused solely on how to do what was asked; I offered no opinion on whether it was a good idea to do that or not. I agree with Nick's opinion that it probably is not appropriate in this context.
      Last edited by Clyde Schechter; 15 Dec 2016, 15:05.

      Comment


      • #4
        Hi Nick,

        Thanks for your suggestion. I did some super-quick research on time series forecasting. Unfortunately, the Stata manual suggests it's not possible in this case. According to the manual, the Stata command "forecast" ,must be used with a "strongly balanced" panel dataset -- which is not the case with my data.

        Louisa

        Comment


        • #5
          Thanks a bunch for that code, Clyde! I'll try it out and report back on how it works.

          Louisa

          Comment


          • #6
            I believe it, but your interpolation is doing nothing to exploit the panel structure. Each panel is being extrapolated separately. In these circumstances, forecasting individual panels can be no worse and better insofar as you are using more information from the whole of a series and not just the last two known data points (literally!).

            I should underline that I am interested in interpolation and think it unduly neglected in data analysis. But your problem is one of time series forecasting, so far as I can understand it.

            Disclosure: I am the author of mipolate (SSC) and advising not to use it here!

            Comment


            • #7
              Dear Nick and Clyde,

              Thanks again for your feedback last week!

              Nick: I can see your point now about the value of using the forecast command -- so as to take advantage of data from all countries, not just from each individual country. Are you aware of any study that has uses the forecast command (or something similar) to create an independent or dependent variable?

              Clyde: Your code was great, and included some important tricks I didn't know about, thanks!! Here's what I ended up doing, precisely -- see below. (Note that hojie is the variable that has no limit on the number of years extrapolated or otherwise "filled in".) I made a small addition, Clyde, to your code, to prevent unlimited extrapolation backwards in time, as well as forward. And it works.

              by cno (year), sort: egen last_actual_data_year = max(cond(!missing(hojust), year, .))
              by cno (year), sort: egen first_actual_data_year = min(cond(!missing(hojust), year, .))
              by cno: egen n_actual_values = total(!missing(hojust))
              gen hojie5y = hojie
              replace hojie5y = . if year > last_actual_data_year + 5
              replace hojie5y = . if year < first_actual_data_year - 5
              label var hojie5y "hojie + or - 5 years"

              Louisa

              Comment


              • #8
                Sorry, but I don't know anything worth repeating on time series forecasting, in Stata or elsewhere.

                Comment

                Working...
                X