Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • correct sequence of commands to 1) transform a dataset into long form data set and 2) multiply impute item-level with logistic reg?

    I have a data set where each row is an observation of a participant that I would like to both 1) expand into a long form panel data set where each participant has 48 participant-month rows and 2) multiply impute item-level missing values and run logistic regression on the imputed data.
    I know both the code for expanding the data set and multiply imputing the data but I cannot get both to run together and I am wondering what the correct sequence is. I have thus far tried the following:

    when I try to first create a panel data set and then run multiple imputation mvn on it I get the following error:





    *restructuring data to long format:
    *duplicate n times according to value of variable months_to_emigrating
    expand months_to_emigrating

    *generate a variable indicating time-varying months
    sort participantid
    qui by participantid: g month_=_n

    ***generate a time varying variable indicating event occurence
    drop emigrated2
    g emigrated2=0
    replace emigrated2=1 if months_to_emigrating==month_ & emigrated==1


    mi estimate: logistic emigrated2 politics, vce(robust)

    (system variable _mi_id updated because of changed number of obs)

    an error occurred when mi estimate executed logistic on m=1





    Alternatively, I try to run multiple imputation mvn on it and after create a panel data set I get the following error:

    mi impute mvn geotravex_nonasia age numsiblings female married numchild numfamilyhk numfamilyoutside otherpaytrip educyea

    > r selectiveuni rentpublichousing rentprivatehousing ownpublichousing ownprivatehousing numparentshkproperty numparentfore

    > ignkproperty numhkproperty ocpr publicwork lsal satsal paroldestage pargeotravexoutsideasia friendsupport familysupport j

    > obyears socialscience profession stem business healthrating extralanguage taxesmore spendvalues bno foreignpassport years

    > _in_hk generation_hk hk_emigrants_known hk_nationality, add(40)

    (system variable _mi_id updated because of changed number of obs)

    note: age omitted because of collinearity.

    note: numsiblings omitted because of collinearity.

    AND so on for other variables


    I asked Chat GPT3 and it recommended I instead run the mice command in the following way:

    use "panel_data.dta", clear

    * Set up multiple imputation mi set mlong mi register imputed y x1 x2 x3 mi impute chained (10) y x1 x2 x3 || person_id, add(10) seed(123)

    * Save imputed data mi xeq 1: save "imputed_panel_data.dta", replace





    I got my hopes up and tried this on the panel data set but Stata says:

    mi impute chained (10) geotravex_nonasia age numsiblings female married numchild numfamilyhk numfamilyoutside otherpaytrip educyear selectiveuni rentpublichousing rentprivatehousing ownpublichousing ownprivatehousing numparentshkproperty numparentforeignkproperty numhkproperty ocpr publicwork lsal satsal paroldestage pargeotravexoutsideasia friendsupport familysupport jobyears socialscience profession stem business healthrating extralanguage taxesmore spendvalues bno foreignpassport years_in_hk generation_hk hk_emigrants_known hk_nationality || participantid, add(10)



    | invalid name

    -- above applies to specification (10 ) geotravex_nonasia age numsiblings female married numchild numfamilyhk

    numfamilyoutside otherpaytrip educyear selectiveuni rentpublichousing rentprivatehousing ownpublichousing

    ownprivatehousing numparentshkproperty numparentforeignkproperty numhkproperty ocpr publicwork lsal satsal

    paroldestage pargeotravexoutsideasia friendsupport familysupport jobyears socialscience profession stem business

    healthrating extralanguage taxesmore spendvalues bno foreignpassport years_in_hk generation_hk hk_emigrants_known

    hk_nationality || participantid



    mi impute chained: invalid specification;

    see above error messages


    Any suggestions on what the correct sequence of commands is to proceed with these two operations?
    If you do not see any problem with any of the code a, let me know which is the correct sequence of operations and I will see if I will keep trying to run it on simpler data set in case the issue is with the data



Working...
X