I have a data set where each row is an observation of a participant that I would like to both 1) expand into a long form panel data set where each participant has 48 participant-month rows and 2) multiply impute item-level missing values and run logistic regression on the imputed data.
I know both the code for expanding the data set and multiply imputing the data but I cannot get both to run together and I am wondering what the correct sequence is. I have thus far tried the following:
when I try to first create a panel data set and then run multiple imputation mvn on it I get the following error:
*restructuring data to long format:
*duplicate n times according to value of variable months_to_emigrating
expand months_to_emigrating
*generate a variable indicating time-varying months
sort participantid
qui by participantid: g month_=_n
***generate a time varying variable indicating event occurence
drop emigrated2
g emigrated2=0
replace emigrated2=1 if months_to_emigrating==month_ & emigrated==1
mi estimate: logistic emigrated2 politics, vce(robust)
(system variable _mi_id updated because of changed number of obs)
an error occurred when mi estimate executed logistic on m=1
Alternatively, I try to run multiple imputation mvn on it and after create a panel data set I get the following error:
mi impute mvn geotravex_nonasia age numsiblings female married numchild numfamilyhk numfamilyoutside otherpaytrip educyea
> r selectiveuni rentpublichousing rentprivatehousing ownpublichousing ownprivatehousing numparentshkproperty numparentfore
> ignkproperty numhkproperty ocpr publicwork lsal satsal paroldestage pargeotravexoutsideasia friendsupport familysupport j
> obyears socialscience profession stem business healthrating extralanguage taxesmore spendvalues bno foreignpassport years
> _in_hk generation_hk hk_emigrants_known hk_nationality, add(40)
(system variable _mi_id updated because of changed number of obs)
note: age omitted because of collinearity.
note: numsiblings omitted because of collinearity.
AND so on for other variables
I asked Chat GPT3 and it recommended I instead run the mice command in the following way:
use "panel_data.dta", clear
* Set up multiple imputation mi set mlong mi register imputed y x1 x2 x3 mi impute chained (10) y x1 x2 x3 || person_id, add(10) seed(123)
* Save imputed data mi xeq 1: save "imputed_panel_data.dta", replace
I got my hopes up and tried this on the panel data set but Stata says:
mi impute chained (10) geotravex_nonasia age numsiblings female married numchild numfamilyhk numfamilyoutside otherpaytrip educyear selectiveuni rentpublichousing rentprivatehousing ownpublichousing ownprivatehousing numparentshkproperty numparentforeignkproperty numhkproperty ocpr publicwork lsal satsal paroldestage pargeotravexoutsideasia friendsupport familysupport jobyears socialscience profession stem business healthrating extralanguage taxesmore spendvalues bno foreignpassport years_in_hk generation_hk hk_emigrants_known hk_nationality || participantid, add(10)
| invalid name
-- above applies to specification (10 ) geotravex_nonasia age numsiblings female married numchild numfamilyhk
numfamilyoutside otherpaytrip educyear selectiveuni rentpublichousing rentprivatehousing ownpublichousing
ownprivatehousing numparentshkproperty numparentforeignkproperty numhkproperty ocpr publicwork lsal satsal
paroldestage pargeotravexoutsideasia friendsupport familysupport jobyears socialscience profession stem business
healthrating extralanguage taxesmore spendvalues bno foreignpassport years_in_hk generation_hk hk_emigrants_known
hk_nationality || participantid
mi impute chained: invalid specification;
see above error messages
Any suggestions on what the correct sequence of commands is to proceed with these two operations?
If you do not see any problem with any of the code a, let me know which is the correct sequence of operations and I will see if I will keep trying to run it on simpler data set in case the issue is with the data
I know both the code for expanding the data set and multiply imputing the data but I cannot get both to run together and I am wondering what the correct sequence is. I have thus far tried the following:
when I try to first create a panel data set and then run multiple imputation mvn on it I get the following error:
*restructuring data to long format:
*duplicate n times according to value of variable months_to_emigrating
expand months_to_emigrating
*generate a variable indicating time-varying months
sort participantid
qui by participantid: g month_=_n
***generate a time varying variable indicating event occurence
drop emigrated2
g emigrated2=0
replace emigrated2=1 if months_to_emigrating==month_ & emigrated==1
mi estimate: logistic emigrated2 politics, vce(robust)
(system variable _mi_id updated because of changed number of obs)
an error occurred when mi estimate executed logistic on m=1
Alternatively, I try to run multiple imputation mvn on it and after create a panel data set I get the following error:
mi impute mvn geotravex_nonasia age numsiblings female married numchild numfamilyhk numfamilyoutside otherpaytrip educyea
> r selectiveuni rentpublichousing rentprivatehousing ownpublichousing ownprivatehousing numparentshkproperty numparentfore
> ignkproperty numhkproperty ocpr publicwork lsal satsal paroldestage pargeotravexoutsideasia friendsupport familysupport j
> obyears socialscience profession stem business healthrating extralanguage taxesmore spendvalues bno foreignpassport years
> _in_hk generation_hk hk_emigrants_known hk_nationality, add(40)
(system variable _mi_id updated because of changed number of obs)
note: age omitted because of collinearity.
note: numsiblings omitted because of collinearity.
AND so on for other variables
I asked Chat GPT3 and it recommended I instead run the mice command in the following way:
use "panel_data.dta", clear
* Set up multiple imputation mi set mlong mi register imputed y x1 x2 x3 mi impute chained (10) y x1 x2 x3 || person_id, add(10) seed(123)
* Save imputed data mi xeq 1: save "imputed_panel_data.dta", replace
I got my hopes up and tried this on the panel data set but Stata says:
mi impute chained (10) geotravex_nonasia age numsiblings female married numchild numfamilyhk numfamilyoutside otherpaytrip educyear selectiveuni rentpublichousing rentprivatehousing ownpublichousing ownprivatehousing numparentshkproperty numparentforeignkproperty numhkproperty ocpr publicwork lsal satsal paroldestage pargeotravexoutsideasia friendsupport familysupport jobyears socialscience profession stem business healthrating extralanguage taxesmore spendvalues bno foreignpassport years_in_hk generation_hk hk_emigrants_known hk_nationality || participantid, add(10)
| invalid name
-- above applies to specification (10 ) geotravex_nonasia age numsiblings female married numchild numfamilyhk
numfamilyoutside otherpaytrip educyear selectiveuni rentpublichousing rentprivatehousing ownpublichousing
ownprivatehousing numparentshkproperty numparentforeignkproperty numhkproperty ocpr publicwork lsal satsal
paroldestage pargeotravexoutsideasia friendsupport familysupport jobyears socialscience profession stem business
healthrating extralanguage taxesmore spendvalues bno foreignpassport years_in_hk generation_hk hk_emigrants_known
hk_nationality || participantid
mi impute chained: invalid specification;
see above error messages
Any suggestions on what the correct sequence of commands is to proceed with these two operations?
If you do not see any problem with any of the code a, let me know which is the correct sequence of operations and I will see if I will keep trying to run it on simpler data set in case the issue is with the data