I'm having an issue with a foreach loop, which I suspect is syntactical. I'm new to Stata, but I can't find the answer to this in the existing documentation.
My dataset has 10 continuous variables by catvars state and year. (E.g. AK 2017 v1 v2 v3 v4...; AK 2018 v1 v2 v3 v4...). I would like to find the coefficient of variation (cv) by state, across years, for several continuous variables. (E.g., find the cv of v1 across years for AK.) The goal is to do a data quality check for each variable, as we suspect data quality issues for some years in some states. I attempted to run a foreach loop and I keep getting the error "varlist required," although I define the varlist.
do file:
The result:
The data is already sorted by state_cd, and the series of commands worked fine when I was going variable by variable without a loop. Am I not supposed to use the "by" command within a foreach loop? If not, how do I get the cv of values by state across years, rather than the cv of all observations of v1 in the dataset?
(Ideally, I'll then look at the cv to see which states have the highest cv for each variable and track down state-years with implausible data.)
Thanks in advance for the help!
My dataset has 10 continuous variables by catvars state and year. (E.g. AK 2017 v1 v2 v3 v4...; AK 2018 v1 v2 v3 v4...). I would like to find the coefficient of variation (cv) by state, across years, for several continuous variables. (E.g., find the cv of v1 across years for AK.) The goal is to do a data quality check for each variable, as we suspect data quality issues for some years in some states. I attempted to run a foreach loop and I keep getting the error "varlist required," although I define the varlist.
do file:
Code:
set trace on foreach var of varlist indx_per1k v2 v3 v4 v5 { by:state_cd: egen m_`var' = mean(`var') by:state_cd egen s_`var' = sd(`var') gen cv_`var' = s_`var' / m_`var' } set trace off
The result:
Code:
. do "C:\ xxxx.tmp"
. set trace on
. foreach var of varlist indx_per1k v2 v3 v4 {
2. by:state_cd: egen m_`var' = mean(`var')
3. by:state_cd egen s_`var' = sd(`var')
4. gen cv_`var' = s_`var' / m_`var'
5. }
- foreach var of varlist indx_per1k v2 v3 v4 {
- by:state_cd: egen m_`var' = mean(`var')
= by:state_cd: egen m_indx_per1k = mean(indx_per1k)
varlist required
by:state_cd egen s_`var' = sd(`var')
gen cv_`var' = s_`var' / m_`var'
}
r(100);
end of do-file
The data is already sorted by state_cd, and the series of commands worked fine when I was going variable by variable without a loop. Am I not supposed to use the "by" command within a foreach loop? If not, how do I get the cv of values by state across years, rather than the cv of all observations of v1 in the dataset?
(Ideally, I'll then look at the cv to see which states have the highest cv for each variable and track down state-years with implausible data.)
Thanks in advance for the help!
Comment