Hi,
Im a graduate student working on my master thesis with a large dataset on tax in Sub-Saharan states. Unfortunately, this means that i have a lot of missing data on many of my variables. I've tried to find general advise but without luck so i hope some of you can help me.
Im am considering using mipolate/ipolate, but i am afraid that if i use it on too many variables i might get misleading data. Is there a general "limit"/rule of thumb for how many variables you can interpolate without making too much "noice" in your data?
If I can use it on several variables, which of the following ways to interpolate seems most appropriate?
1) ipolate nrtax_ex_sc year, gen(Total_tax_rev) by(nation)
2) gen logt = log(nrtax_ex_sc)
mipolate logt year, by(nation) gen(loglinear)
replace loglinear = exp(loglinear)
Hope someone can help!
Best regards,
Matilde
Im a graduate student working on my master thesis with a large dataset on tax in Sub-Saharan states. Unfortunately, this means that i have a lot of missing data on many of my variables. I've tried to find general advise but without luck so i hope some of you can help me.
Im am considering using mipolate/ipolate, but i am afraid that if i use it on too many variables i might get misleading data. Is there a general "limit"/rule of thumb for how many variables you can interpolate without making too much "noice" in your data?
If I can use it on several variables, which of the following ways to interpolate seems most appropriate?
1) ipolate nrtax_ex_sc year, gen(Total_tax_rev) by(nation)
2) gen logt = log(nrtax_ex_sc)
mipolate logt year, by(nation) gen(loglinear)
replace loglinear = exp(loglinear)
Hope someone can help!
Best regards,
Matilde
Comment