Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-typical periodicity in panel/time series

    When using tsset or xtset, there are a bunch of frequencies that are standard and can be set easily
    - help xtset -
    daily, weekly, monthly, quarterly, halfyearly, yearly

    I have a dataset in which there are time periods in years, but observations are available every 5 years...in every year ending in 2 and 7, e.g. 2002, 2007, 2012, 2017, etc
    How do I set the period?
    Code:
    xtset panelvar year, delta(5 years)
    Also, do I need to get rid of the rows without the observations, or is it irrelevant?
    Thank you for your help!

    Stata SE/17.0, Windows 10 Enterprise

  • #2
    Additionally, what if y is available for all years, but x only every 5 years? On one hand, the y values may be handy in looking at lagged impact on y; but on the other hand, I am not sure how to accomplish this model, i.e. how to set xtset and construct the model...after all if I set xtset with time period 1 year, then a 80% of x values will come up missing! While if I set xtset time period as 5 yearly (OP asks how do I even do that?), I am not sure how to refer to y values in the intervening years, e.g. 2003, 2004, 2005, 2006 etc
    Thank you for your help!

    Stata SE/17.0, Windows 10 Enterprise

    Comment


    • #3
      Typically observations with missing values on some variables cannot be included in a model.

      That is the end of the story, unless you are willing to interpolate 4/5 of your data.

      Whether interpolation is a good idea for your project or acceptable in your field is really difficult to say, but I think you would have a fight on your hands explaining that in most presentations or papers or theses. It's almost predictable that results from a dataset for yearly data with heavy use of interpolation would be similar to that for 5-yearly data -- how could they be very different? -- but the sample size would be spuriously inflated with serious consequences for CIs and P-values. Also, interpolated data can't help being smoother (in most circumstances) than the real unknown data, so goodness of fit statistics would usually be in doubt. But smoothness might be real for some variables e.g. population sizes.

      Oddly enough, I have written various commands for interpolation in Stata, but that is consistent with advising against it here.

      Comment

      Working...
      X