Multiple imputations: log-differences and foreach loops

Nicola Di Renzo

Join Date: Sep 2022

Posts: 8
#1

Multiple imputations: log-differences and foreach loops

02 Dec 2022, 10:24

Dear all,

Stata beginner here. I’m working on a multiple imputed dataset (5 imputations).
I have household observations for three years (2010, 2014, 2017) and several countries. The panel is perfectly balanced.

I need to do the following steps:
Create two new variables as sum of other variables

Compute the “log-differences” of the new variables. Clearly, with 3 years, I will have the results for 2014 (2014 minus 2010) and 2017 (2017 minus 2014)

Compute the mean of the log-differences by year and by country

Create a new variable as the log-difference minus its mean calculated in point3 (for each observation)

I wrote a code, which should work, but is highly inefficient.

Point 1 - so far so good…

mi xtset hh_id year

// generate variables
mi passive: egen labour = rowtotal(di1100 di1200 di1500)
mi passive: egen financial = rowtotal(di1400 di1600)

Please note that I used rowtotal due to missing observations

Point 2 – the code below works, but I’m not able to use D.log()

// take the log of the variables
mi passive: gen lg_labour=log(labour)
mi passive: gen lg_financial=log(financial)

// compute the differences of the variables (log-differences)
mi passive: by hh_id (year): gen dlg_labour=lg_labour[_n] - lg_labour[_n-1]
mi passive: by hh_id (year): gen dlg_financial=lg_financial[_n] - lg_financial[_n-1]

As mentioned above, I would try to use D.log() , but the MI does not allow to use gen newvar=log(D.oldvar). Even if I take the logs of the variables and then run mi passive: gen dlg_labour=D.lg_labour or mi passive: by hh_id (wave): gen dlg_labour=D.lg_labour the outcome variable dlg_labour return missing values for all observations

Point 3 - Here, I believe that the following code would do the job, but is unacceptably inefficient (I used BE and DE as an example):

mi passive: egen mean_dlg_labour_BE2014 = mean(dlg_labour) if year==2014 & country=="BE"
mi passive: egen mean_dlg_labour_BE2017 = mean(dlg_labour) if year==2017 & country=="BE"
mi passive: gen mean_dlg_labour = max(mean_dlg_labour_BE2014, mean_dlg_labour_BE2017)

mi passive: egen mean_dlg_labour_DE2014 = mean(dlg_labour) if wave==2014 & country=="DE"
mi passive: egen mean_dlg_labour_DE2017 = mean(dlg_labour) if wave==2017 & country=="DE"
mi passive: replace mean_dlg_labour = max(mean_dlg_labour_DE2014, mean_dlg_labour_DE2017) if country=="DE"

.... and so on, for all countries, as well as for "financial" variable

Point 4 – would be simply
mi passive: gen i_dlg_labour = dlg_labour - mean_dlg_labour
mi passive: gen i_dlg_financial = dlg_financial - mean_dlg_financial

Could you please help me in creating a foreach loop for point 3? Probably, the best solution would include also point 2 and 4. (I’d rather save you from the torture of showing my attempts)

Could you also please explain me why D.log is not working?

Thank you very much for your assistance.

Best,
Nicola

Last edited by Nicola Di Renzo; 02 Dec 2022, 10:36.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#2

04 Dec 2022, 07:29

Could you please help me in creating a foreach loop for point 3?

mi passive: egen mean_dlg_labour_BE2014 = mean(dlg_labour) if year==2014 & country=="BE"
mi passive: egen mean_dlg_labour_BE2017 = mean(dlg_labour) if year==2017 & country=="BE"
mi passive: gen mean_dlg_labour = max(mean_dlg_labour_BE2014, mean_dlg_labour_BE2017)

mi passive: egen mean_dlg_labour_DE2014 = mean(dlg_labour) if wave==2014 & country=="DE"
mi passive: egen mean_dlg_labour_DE2017 = mean(dlg_labour) if wave==2017 & country=="DE"
mi passive: replace mean_dlg_labour = max(mean_dlg_labour_DE2014, mean_dlg_labour_DE2017) if country=="DE"

could be generalized to

Code:

g mean_dlg_labour=. quietly levelsof country, local(countries) foreach country of local countries{ mi passive: egen mean_dlg_labour_`country'2014 = mean(dlg_labour) if year==2014 & country=="`country'" mi passive: egen mean_dlg_labour_`country'2017 = mean(dlg_labour) if year==2017 & country=="`country'" mi passive: replace mean_dlg_labour = max(mean_dlg_labour_`country'2014, mean_dlg_labour_`country'2017) if country=="`country'" }

Could you also please explain me why D.log is not working?

You cannot combine time-series operators and the log function in Stata at present. You need to do this in two steps. First generate the logged variables and then apply the difference operator.
Comment
Nicola Di Renzo

Join Date: Sep 2022

Posts: 8
#3

05 Dec 2022, 04:03

Thank you very much Andrew!! Appreciate a lot your support
The code works, only for the first line, I gen a passive variable

mi passive: gen m_dlg_labour=.

otherwise stata gives me the error: "existing variable mean_dlg_labour not passive"
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

05 Dec 2022, 04:27

Try

Code:

quietly levelsof country, local(countries)
foreach country of local countries{
    mi passive: egen mean_dlg_labour_`country'2014 = mean(dlg_labour) if year==2014 & country=="`country'"
    mi passive: egen mean_dlg_labour_`country'2017 = mean(dlg_labour) if year==2017 & country=="`country'"
    mi passive: cap gen mean_dlg_labour = max(mean_dlg_labour_`country'2014, mean_dlg_labour_`country'2017) if country=="`country'"
    mi passive: replace mean_dlg_labour = max(mean_dlg_labour_`country'2014, mean_dlg_labour_`country'2017) if country=="`country'"
}

Announcement

Multiple imputations: log-differences and foreach loops

Comment

Comment

Comment