Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping across sets of variables to generate new variable

    Hi all,

    This feels like a relatively simple question, so apologies if I have missed a similar answer elsewhere.

    I have a time series dataset consisting of weekly counts of cases and population for different units of analysis (I'm using towns in example below). For each unit (town), I want to create a rate variable based on cases/population. Here's an example of the dataset, cut down to 4 towns and 4 time points for ease.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int date float weekyear byte(aville_cases aville_popn btown_case) int btown_popn byte cplace_cases int cplace_popn byte dcity_case int dcity_popn
    21550 3068 5 100 2 1000 10 2000 50 10000
    21557 3069 4 100 2 1000 12 2000 55 10000
    21564 3070 6 100 3 1000 10 2000 60 10000
    21571 3071 9 100 2 1000 12 2000 70 10000
    end
    format %tdnn/dd/CCYY date
    So what I'd like to do is gen townname_rate = townname_cases/townname_population; I have 15 towns so this seems like an ideal use for a loop.

    I wrote the following but get the error code r(198) "invalid syntax".

    Code:
    local townlist aville btown cplace dcity
    foreach town of "`townlist'"{
    gen `town'_rate=`town'_cases/`town'_popn
    }
    (Note that I'd also ideally be able to extract the town name using loop of substr() but struggled to do this given that variable name length is different so fixed n1/n2 don't apply - so I just manually wrote out all the town names and added them to a local. Ideas on this also welcome!)

    Would welcome opinions on where I am going wrong!

    Many thanks,

    Emily

  • #2
    I'm going to fix your example so that cases is consistently a suffix (not capriciously case or cases; that can be fixed if it is true of your real data). There are various ways to do what you want. Here is one.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int date float weekyear byte(aville_cases aville_popn btown_cases) int btown_popn byte cplace_cases int cplace_popn byte dcity_cases int dcity_popn
    21550 3068 5 100 2 1000 10 2000 50 10000
    21557 3069 4 100 2 1000 12 2000 55 10000
    21564 3070 6 100 3 1000 10 2000 60 10000
    21571 3071 9 100 2 1000 12 2000 70 10000
    end
    format %tdnn/dd/CCYY date
    
    foreach t in aville btown cplace dcity {
        gen `t'_rate = `t'_cases/`t'_popn
    }

    And here is another

    Code:
    local townlist aville btown cplace dcity
    
    foreach town of local townlist {
    The syntax for foreach is strict. The syntax using of can take any one of various prescribed forms but no other.


    However, for most Stata purposes you would be much better off with a long data layout (structure, or format, some say).

    Code:
    rename (*_cases) (cases_*)
    rename (*_popn) (popn_*)
    
    reshape long cases_ popn_, i(date) j(place) string
    
    gen rate = cases/popn
    Last edited by Nick Cox; 03 Jul 2020, 05:47.

    Comment


    • #3
      Thanks Nick, that's worked a treat. Should have known it was something simple in the foreach syntax! Sorry also for the error in variable names - that was my mistake on creating the dummy dataset (as the real thing is restricted access/sharing).

      Would also welcome ideas as to how to extract the town name from the case or population variables to avoid having to manually write out the varlist - I have been looking at using loop of substr() but struggled to do this given that town name length - and therefore variable name length - is different so fixed n1/n2 don't apply?

      Thanks again

      Emily


      Comment


      • #4
        Code:
        unab stubs : *cases 
        local stubs : subinstr local stubs "_cases" "", all

        Comment


        • #5
          Thanks very much - off to read up about unab!

          Comment

          Working...
          X