Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "varlist required" error for foreach loop (even though varlist is defined)

    I'm having an issue with a foreach loop, which I suspect is syntactical. I'm new to Stata, but I can't find the answer to this in the existing documentation.

    My dataset has 10 continuous variables by catvars state and year. (E.g. AK 2017 v1 v2 v3 v4...; AK 2018 v1 v2 v3 v4...). I would like to find the coefficient of variation (cv) by state, across years, for several continuous variables. (E.g., find the cv of v1 across years for AK.) The goal is to do a data quality check for each variable, as we suspect data quality issues for some years in some states. I attempted to run a foreach loop and I keep getting the error "varlist required," although I define the varlist.


    do file:
    Code:
    set trace on
    foreach var of varlist indx_per1k v2 v3 v4 v5 {
        by:state_cd: egen m_`var' = mean(`var')
        by:state_cd egen s_`var' = sd(`var')
        gen cv_`var' = s_`var' / m_`var'
    }
    set trace off

    The result:
    Code:
    . do "C:\ xxxx.tmp"
    . set trace on
    
    . foreach var of varlist indx_per1k v2 v3 v4 {
      2.         by:state_cd: egen m_`var' = mean(`var')
      3.         by:state_cd egen s_`var' = sd(`var')
      4.         gen cv_`var' = s_`var' / m_`var'
      5. }
    - foreach var of varlist indx_per1k v2 v3 v4 {
    - by:state_cd: egen m_`var' = mean(`var')
    = by:state_cd: egen m_indx_per1k = mean(indx_per1k)
    varlist required
      by:state_cd egen s_`var' = sd(`var')
      gen cv_`var' = s_`var' / m_`var'
      }
    r(100);
    
    end of do-file

    The data is already sorted by state_cd, and the series of commands worked fine when I was going variable by variable without a loop. Am I not supposed to use the "by" command within a foreach loop? If not, how do I get the cv of values by state across years, rather than the cv of all observations of v1 in the dataset?

    (Ideally, I'll then look at the cv to see which states have the highest cv for each variable and track down state-years with implausible data.)

    Thanks in advance for the help!
    Last edited by Ky Neal; 29 Aug 2024, 11:11.

  • #2
    Remove the colon after your by keyword. The loop itself is fine.

    Carefully compare how the syntax of by statements should be typed: help by

    Comment


    • #3
      As Leonardo Guizzetti signals, the colons are misplaced (or absent).

      Code:
      by state_cd: egen m_`var'  = mean(`var')
      by state_cd: egen s_`var' = sd(`var')

      Comment


      • #4
        Ah, totally missed that. Thank you both!

        Comment

        Working...
        X