Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputations: log-differences and foreach loops

    Dear all,

    Stata beginner here. I’m working on a multiple imputed dataset (5 imputations).
    I have household observations for three years (2010, 2014, 2017) and several countries. The panel is perfectly balanced.


    I need to do the following steps:
    1. Create two new variables as sum of other variables
    2. Compute the “log-differences” of the new variables. Clearly, with 3 years, I will have the results for 2014 (2014 minus 2010) and 2017 (2017 minus 2014)
    3. Compute the mean of the log-differences by year and by country
    4. Create a new variable as the log-difference minus its mean calculated in point3 (for each observation)
    I wrote a code, which should work, but is highly inefficient.

    Point 1 - so far so good…

    mi xtset hh_id year

    // generate variables
    mi passive: egen labour = rowtotal(di1100 di1200 di1500)
    mi passive: egen financial = rowtotal(di1400 di1600)

    Please note that I used rowtotal due to missing observations


    Point 2 – the code below works, but I’m not able to use D.log()

    // take the log of the variables
    mi passive: gen lg_labour=log(labour)
    mi passive: gen lg_financial=log(financial)

    // compute the differences of the variables (log-differences)
    mi passive: by hh_id (year): gen dlg_labour=lg_labour[_n] - lg_labour[_n-1]
    mi passive: by hh_id (year): gen dlg_financial=lg_financial[_n] - lg_financial[_n-1]

    As mentioned above, I would try to use D.log() , but the MI does not allow to use gen newvar=log(D.oldvar). Even if I take the logs of the variables and then run mi passive: gen dlg_labour=D.lg_labour or mi passive: by hh_id (wave): gen dlg_labour=D.lg_labour the outcome variable dlg_labour return missing values for all observations


    Point 3 - Here, I believe that the following code would do the job, but is unacceptably inefficient (I used BE and DE as an example):

    mi passive: egen mean_dlg_labour_BE2014 = mean(dlg_labour) if year==2014 & country=="BE"
    mi passive: egen mean_dlg_labour_BE2017 = mean(dlg_labour) if year==2017 & country=="BE"
    mi passive: gen mean_dlg_labour = max(mean_dlg_labour_BE2014, mean_dlg_labour_BE2017)

    mi passive: egen mean_dlg_labour_DE2014 = mean(dlg_labour) if wave==2014 & country=="DE"
    mi passive: egen mean_dlg_labour_DE2017 = mean(dlg_labour) if wave==2017 & country=="DE"
    mi passive: replace mean_dlg_labour = max(mean_dlg_labour_DE2014, mean_dlg_labour_DE2017) if country=="DE"

    .... and so on, for all countries, as well as for "financial" variable


    Point 4 – would be simply
    mi passive: gen i_dlg_labour = dlg_labour - mean_dlg_labour
    mi passive: gen i_dlg_financial = dlg_financial - mean_dlg_financial


    Could you please help me in creating a foreach loop for point 3? Probably, the best solution would include also point 2 and 4. (I’d rather save you from the torture of showing my attempts)

    Could you also please explain me why D.log is not working?


    Thank you very much for your assistance.

    Best,
    Nicola
    Last edited by Nicola Di Renzo; 02 Dec 2022, 10:36.

  • #2
    Could you please help me in creating a foreach loop for point 3?
    mi passive: egen mean_dlg_labour_BE2014 = mean(dlg_labour) if year==2014 & country=="BE"
    mi passive: egen mean_dlg_labour_BE2017 = mean(dlg_labour) if year==2017 & country=="BE"
    mi passive: gen mean_dlg_labour = max(mean_dlg_labour_BE2014, mean_dlg_labour_BE2017)

    mi passive: egen mean_dlg_labour_DE2014 = mean(dlg_labour) if wave==2014 & country=="DE"
    mi passive: egen mean_dlg_labour_DE2017 = mean(dlg_labour) if wave==2017 & country=="DE"
    mi passive: replace mean_dlg_labour = max(mean_dlg_labour_DE2014, mean_dlg_labour_DE2017) if country=="DE"
    could be generalized to

    Code:
    g mean_dlg_labour=.
    quietly levelsof country, local(countries)
    foreach country of local countries{
        mi passive: egen mean_dlg_labour_`country'2014 = mean(dlg_labour) if year==2014 & country=="`country'"
        mi passive: egen mean_dlg_labour_`country'2017 = mean(dlg_labour) if year==2017 & country=="`country'"
        mi passive: replace mean_dlg_labour = max(mean_dlg_labour_`country'2014, mean_dlg_labour_`country'2017) if country=="`country'"
    }
    Could you also please explain me why D.log is not working?
    You cannot combine time-series operators and the log function in Stata at present. You need to do this in two steps. First generate the logged variables and then apply the difference operator.

    Comment


    • #3
      Thank you very much Andrew!! Appreciate a lot your support
      The code works, only for the first line, I gen a passive variable

      mi passive: gen m_dlg_labour=.

      otherwise stata gives me the error: "existing variable mean_dlg_labour not passive"

      Comment


      • #4
        Try

        Code:
        quietly levelsof country, local(countries)
        foreach country of local countries{
            mi passive: egen mean_dlg_labour_`country'2014 = mean(dlg_labour) if year==2014 & country=="`country'"
            mi passive: egen mean_dlg_labour_`country'2017 = mean(dlg_labour) if year==2017 & country=="`country'"
            mi passive: cap gen mean_dlg_labour = max(mean_dlg_labour_`country'2014, mean_dlg_labour_`country'2017) if country=="`country'"
            mi passive: replace mean_dlg_labour = max(mean_dlg_labour_`country'2014, mean_dlg_labour_`country'2017) if country=="`country'"
        }

        Comment

        Working...
        X