Dear Statalist community,
I have an unbalanced panel dataset at the month-day-hour level and spans around 10 years. Among my 20 independent variables, some have missing values for the entire year (e.g., in 2011, an independent variable has missing value for all the month-day-hours) and some have missing values for couple of month-day-hour within the year (e.g., in 2011, an independent variable only has missing value on Jan 1st, 10am).
I want to run separate regressions for each year, using independent variables that do not have missing values for the entire year. That is, I will not include the independent variable if it has missing values for the entire year. However, I will replace the missing value with 0 and include the independent variable if it only have missing values for couple of month-day-hour within the year.
My questions are:
1. how can I find variables that do not have missing values for the entire year?
2. how can I save these variables to a list so that the regression only use these variables?
What I have in mind:
Thank you very much for your help!
I have an unbalanced panel dataset at the month-day-hour level and spans around 10 years. Among my 20 independent variables, some have missing values for the entire year (e.g., in 2011, an independent variable has missing value for all the month-day-hours) and some have missing values for couple of month-day-hour within the year (e.g., in 2011, an independent variable only has missing value on Jan 1st, 10am).
I want to run separate regressions for each year, using independent variables that do not have missing values for the entire year. That is, I will not include the independent variable if it has missing values for the entire year. However, I will replace the missing value with 0 and include the independent variable if it only have missing values for couple of month-day-hour within the year.
My questions are:
1. how can I find variables that do not have missing values for the entire year?
2. how can I save these variables to a list so that the regression only use these variables?
What I have in mind:
Code:
levelsof year, local(myyear) * loop through each time period and run one regression foreach i in `myyear' { * select variables that do not have missing values for the entire year i DO NOT KNOW WHAT TO DO HERE * for the variables that have only some missing value, replace missing with 0 foreach var of `varlist' { replace `var'=0 if `var'==.&year==i } * run regression reg dep independents if year==i }
Comment