Dear all,
Stata beginner here. I’m working on a multiple imputed dataset (5 imputations).
I have household observations for three years (2010, 2014, 2017) and several countries. The panel is perfectly balanced.
I need to do the following steps:
Point 1 - so far so good…
mi xtset hh_id year
// generate variables
mi passive: egen labour = rowtotal(di1100 di1200 di1500)
mi passive: egen financial = rowtotal(di1400 di1600)
Please note that I used rowtotal due to missing observations
Point 2 – the code below works, but I’m not able to use D.log()
// take the log of the variables
mi passive: gen lg_labour=log(labour)
mi passive: gen lg_financial=log(financial)
// compute the differences of the variables (log-differences)
mi passive: by hh_id (year): gen dlg_labour=lg_labour[_n] - lg_labour[_n-1]
mi passive: by hh_id (year): gen dlg_financial=lg_financial[_n] - lg_financial[_n-1]
As mentioned above, I would try to use D.log() , but the MI does not allow to use gen newvar=log(D.oldvar). Even if I take the logs of the variables and then run mi passive: gen dlg_labour=D.lg_labour or mi passive: by hh_id (wave): gen dlg_labour=D.lg_labour the outcome variable dlg_labour return missing values for all observations
Point 3 - Here, I believe that the following code would do the job, but is unacceptably inefficient (I used BE and DE as an example):
mi passive: egen mean_dlg_labour_BE2014 = mean(dlg_labour) if year==2014 & country=="BE"
mi passive: egen mean_dlg_labour_BE2017 = mean(dlg_labour) if year==2017 & country=="BE"
mi passive: gen mean_dlg_labour = max(mean_dlg_labour_BE2014, mean_dlg_labour_BE2017)
mi passive: egen mean_dlg_labour_DE2014 = mean(dlg_labour) if wave==2014 & country=="DE"
mi passive: egen mean_dlg_labour_DE2017 = mean(dlg_labour) if wave==2017 & country=="DE"
mi passive: replace mean_dlg_labour = max(mean_dlg_labour_DE2014, mean_dlg_labour_DE2017) if country=="DE"
.... and so on, for all countries, as well as for "financial" variable
Point 4 – would be simply
mi passive: gen i_dlg_labour = dlg_labour - mean_dlg_labour
mi passive: gen i_dlg_financial = dlg_financial - mean_dlg_financial
Could you please help me in creating a foreach loop for point 3? Probably, the best solution would include also point 2 and 4. (I’d rather save you from the torture of showing my attempts)
Could you also please explain me why D.log is not working?
Thank you very much for your assistance.
Best,
Nicola
Stata beginner here. I’m working on a multiple imputed dataset (5 imputations).
I have household observations for three years (2010, 2014, 2017) and several countries. The panel is perfectly balanced.
I need to do the following steps:
- Create two new variables as sum of other variables
- Compute the “log-differences” of the new variables. Clearly, with 3 years, I will have the results for 2014 (2014 minus 2010) and 2017 (2017 minus 2014)
- Compute the mean of the log-differences by year and by country
- Create a new variable as the log-difference minus its mean calculated in point3 (for each observation)
Point 1 - so far so good…
mi xtset hh_id year
// generate variables
mi passive: egen labour = rowtotal(di1100 di1200 di1500)
mi passive: egen financial = rowtotal(di1400 di1600)
Please note that I used rowtotal due to missing observations
Point 2 – the code below works, but I’m not able to use D.log()
// take the log of the variables
mi passive: gen lg_labour=log(labour)
mi passive: gen lg_financial=log(financial)
// compute the differences of the variables (log-differences)
mi passive: by hh_id (year): gen dlg_labour=lg_labour[_n] - lg_labour[_n-1]
mi passive: by hh_id (year): gen dlg_financial=lg_financial[_n] - lg_financial[_n-1]
As mentioned above, I would try to use D.log() , but the MI does not allow to use gen newvar=log(D.oldvar). Even if I take the logs of the variables and then run mi passive: gen dlg_labour=D.lg_labour or mi passive: by hh_id (wave): gen dlg_labour=D.lg_labour the outcome variable dlg_labour return missing values for all observations
Point 3 - Here, I believe that the following code would do the job, but is unacceptably inefficient (I used BE and DE as an example):
mi passive: egen mean_dlg_labour_BE2014 = mean(dlg_labour) if year==2014 & country=="BE"
mi passive: egen mean_dlg_labour_BE2017 = mean(dlg_labour) if year==2017 & country=="BE"
mi passive: gen mean_dlg_labour = max(mean_dlg_labour_BE2014, mean_dlg_labour_BE2017)
mi passive: egen mean_dlg_labour_DE2014 = mean(dlg_labour) if wave==2014 & country=="DE"
mi passive: egen mean_dlg_labour_DE2017 = mean(dlg_labour) if wave==2017 & country=="DE"
mi passive: replace mean_dlg_labour = max(mean_dlg_labour_DE2014, mean_dlg_labour_DE2017) if country=="DE"
.... and so on, for all countries, as well as for "financial" variable
Point 4 – would be simply
mi passive: gen i_dlg_labour = dlg_labour - mean_dlg_labour
mi passive: gen i_dlg_financial = dlg_financial - mean_dlg_financial
Could you please help me in creating a foreach loop for point 3? Probably, the best solution would include also point 2 and 4. (I’d rather save you from the torture of showing my attempts)
Could you also please explain me why D.log is not working?
Thank you very much for your assistance.
Best,
Nicola
Comment