Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regress data with multiple imputation using micombine

    Dear stata-community,

    we are working on a regression model using SCF-data (https://www.federalreserve.gov/econres/scfindex.htm). We struggle conducting the regression due to a multiple imputation technique that is used in the SCF-data (five implicates). That is what SCF recommend in their codebook:
    For Stata users, the easiest way to correct your coefficients and standard errors for various estimation models is to use the Stata micombine ado file. This ado file was created by Patrick Royston and can be downloaded at http://ideas.repec.org/c/boc/bocode/s446602.html. When using micombine with the SCF data, users need to create a variable that denotes the implicate of the data. The implicate variable is used in the impid() option and the Y1 case id variable is used in the obsid() option in micombine. An example is below.

    micombine regress 'insert model here', obsid(y1) impid(imp) detail

    Below you find our do-file. We don't know how to apply the sample code. What would be the implicate variable and Y1 case id variable in our case?

    We appreciate all help! Many thanks in advance,

    Marleen & Joleen


    --------------
    do-file
    --------------

    clear
    clear matrix
    set more off
    cap log close

    cd "/Users/Marleen/Documents/research project/data"

    * 1. Maximierung der Anzahl der Variablen von 5.000 auf 10.000 erhöhen:
    set maxvar 10000

    * 2. Hinzufügen der Variable Jahr in den einzelnen Datensätzen:
    foreach y in 1989 1992 1995 1998 2001 2004 2007 2010 2013 2016 2019 {
    display "`y'"
    use "rscfp`y'.dta", clear
    generate int year = `y'
    save "rscfp`y'.dta", replace
    }

    * 3. Kombinieren der Variable xx1 für das Jahr 1989 und yy1 ab 1992:
    replace yy1 = xx1 if missing(yy1)

    * 4. Zusammenfügen der einzelnen Datensätze zu einem Master-Datensatz:
    use "./rscfp1989.dta"
    append using "rscfp1992.dta" "rscfp1995.dta" "rscfp1998.dta" "rscfp2001.dta"
    "rscfp2004.dta" "rscfp2007.dta" "rscfp2010.dta" "rscfp2013.dta"
    "rscfp2016.dta" "rscfp2019.dta"
    save "combined", replace

    /* 5. Selektieren der benötigten Variablen + Kontrollvariablen
    Jahr, Beobachtung, Geschlecht, verheiratet, Alter, Business Assets,Trust Assets
    Kontrollvariablen: Bildungslevel, Ethnische Gruppe, Beschäftigungsstatuts
    */

    keep year yy1 hhsex married age bus trusts asset edcl racecl4 occat1

    * 6. Ändern der Reihenfolge der Variablen zwecks Übersichtlichkeit
    order year yy1
    order edcl occat1 racecl4, last

    * 7. Installieren vom ICE-Paket:
    ssc install ice

    * 8. Regresssion durchführen
    * 8.1 Regression für Business Assets:

    micombine regress bus year, obsid(y1) impid(imp) detail

  • #2
    impid(varname) specifies that varname is the variable identifying the imputations. The number of imputations is determined as the number of unique values of varname. All observations for which varname takes the value zero are ignored in the analysis. Default varname: _mj.

    obsid(varname) is provided to allow micombine to analyse datasets created by programs other than ice. varname specifies the name of a variable holding the "observation ID", i.e. the sequence number of each observation in a given imputation. The number of observations should be identical between imputations, as should the order of the observations. varname should run 1,...,N for imputation 1, 1,...,N for imputation 2, and so on. ice automatically stores the information with the data, so this option is not required. Default varname: _mi.

    Can anyone please help us clarifying what the implicate variable and Y1 case id variable is?

    Comment


    • #3
      micombine is very old now (the last update, prior to release of its replacement -mim-, was in vol 7 of the SJ; so, you should not be using it and SCF (whatever that is) should update its instructions; however, here is my guess: impid is just the variable that counts the imputed data sets - the default, IIRC, is _mj; obsid is basically a counter within imputed data sets; the default name in -ice- is _mi; I don't know whether SCF uses these defaults but, if not, they should at least tell you what variable names replace them

      Comment

      Working...
      X