Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ipolate many variables

    Hi,

    Im a graduate student working on my master thesis with a large dataset on tax in Sub-Saharan states. Unfortunately, this means that i have a lot of missing data on many of my variables. I've tried to find general advise but without luck so i hope some of you can help me.

    Im am considering using mipolate/ipolate, but i am afraid that if i use it on too many variables i might get misleading data. Is there a general "limit"/rule of thumb for how many variables you can interpolate without making too much "noice" in your data?

    If I can use it on several variables, which of the following ways to interpolate seems most appropriate?
    1) ipolate nrtax_ex_sc year, gen(Total_tax_rev) by(nation)

    2) gen logt = log(nrtax_ex_sc)
    mipolate logt year, by(nation) gen(loglinear)
    replace loglinear = exp(loglinear)

    Hope someone can help!

    Best regards,
    Matilde


  • #2
    I know of no general rule of thumb. The more you interpolate, the more you gain spurious degrees of freedom. Why you want to interpolate any way? I've written commands in this territory but like anybody else I don't think interpolation is always the right answer. That depends on the question.

    The choice between 1) and 2) is between methods, not commands. You could rewrite 2) to

    Code:
    gen logt = log(nrtax_ex_sc)
    ipolate logt year, by(nation) gen(loglinear)
    replace loglinear = exp(loglinear)
    and then the choice is simpler and clearer. It's between linear and exponential change as expected pattern.

    mipolate is from SSC, as you're asked to explain.

    Comment

    Working...
    X