Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapsing multiple variables (more than one statistic) [Stata 18.0]

    Hey everyone,

    I need mean & standard error for 16 variables and would like to use the collapse command.
    I understand I can create a variable list and use the collapse command on the variable list, but only if just one statistic is required, however I do need two.
    Is there a way to avoid typing out all variables?

    Here is what I used when it was only one variable:

    Code:
    collapse (mean) y = agecat (semean) se_y = agecat, by(treatment region)
    Thanks!

  • #2
    Anna:
    you may want to consider the following work-around and tweak it to your needs:
    Code:
     use "C:\Program Files\Stata18\ado\base\a\auto.dta"
    (1978 automobile data)
    
    . collapse (mean) price (sd) price
    error:
           price = (mean) price
           price = (sd) price
    name conflict
    r(198);
    
    . gen price_2=price
    
    . collapse (mean) price (sd) price_2
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      You can do with 16 variables the same thing you did with 2. Now, writing all that out is probably tedious and error-prone, so it makes sense to write the arguments for the -collapse- command using a loop. Here's an example of how it's done using the auto.dta.

      Code:
      sysuse auto, clear
      
      unab for_collapse: price mpg headroom-turn
      
      local means
      local sds
      foreach v of varlist `for_collapse' {
          local means `means' mean_`v' = `v'
          local sds `sds' sd_`v' = `v'
      }
      
      collapse `means' `sds', by(foreign rep78)

      Comment


      • #4
        Hi Carlo,

        thank you so much for your reply! This workaround will definitely help streamlining the code!
        Maybe I didn't get this right, but this doesn't help dealing with a variable list, is that correct?


        My idea was creating a variable list for the 16 variables (the example is for two variables only):
        Code:
        vl create var_1 = (agecat gender)
        and then applying collapse to the variable list:
        Code:
        collapse var_1, by(treatment region)
        However, I need the collapse command both for mean and standard error.

        Maybe I didn't understand your suggestion properly. Thank you!

        Comment


        • #5
          A small modification to the last command in #3. You'll want to do this instead:

          Code:
          collapse (mean) `means' (semean) `sds', by(foreign rep78)
          otherwise you'll only get a set of variables containing means, despite their nomenclature.
          Last edited by Hemanshu Kumar; 05 Aug 2024, 10:10.

          Comment


          • #6
            Anna:
            you're right.
            My previous reply does not help with a variable list.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hemanshu Kumar has correctly spotted an error in my code in #3. And his change will fix it. However, I actually intended something slightly different:

              Code:
              sysuse auto, clear
              
              unab for_collapse: price mpg headroom-turn
              
              local means (mean)
              local sds (sd)
              foreach v of varlist `for_collapse' {
                  local means `means' mean_`v' = `v'
                  local sds `sds' sd_`v' = `v'
              }
              
              collapse `means' `sds', by(foreign rep78)
              Either the fix in #5 or this will work--in fact, in the end, the both produce the same -collapse- command.

              Comment


              • #8
                Hey Clyde and Hemanshu,

                thank you so much! That's an amazing workaround.

                Just have one more question. I'm a beginner in Stata, and sometimes need help even for basics, so sorry!

                The code
                Code:
                unab for_collapse: price mpg headroom-turn
                seems to expand an already existing variable list, is that correct?

                How can I create a variable list for only the 16 variables needed for this collapse command? (My entire number of variables is 766)
                Tried
                Code:
                vl create
                before but that did not work.


                Thanks!

                Comment


                • #9
                  with collapse, you can also do this:

                  collapse (mean) price (sd) price_sd = price, by(whatever)

                  Comment


                  • #10
                    Originally posted by Anna Binger View Post
                    The code
                    Code:
                    unab for_collapse: price mpg headroom-turn
                    seems to expand an already existing variable list, is that correct?

                    How can I create a variable list for only the 16 variables needed for this collapse command? (My entire number of variables is 766)
                    Yes, unab is used to expand a variable list constructed with wildcards etc into an explicit list of variables. In Clyde's code in #3, this is somewhat superfluous, since foreach ... of varlist ... will already do that job for you. Which is to say, we could have simply done

                    Code:
                    foreach v of varlist price mpg headroom-turn {
                    If you want to save typing out all your 16 variables in that varlist, you will need to exploit regularities in your variables' names so you can use wildcards like * and ?, or the fact that some of the variables appear consecutively in the dataset, so that you can use the hyphen (as above), and so on.
                    Last edited by Hemanshu Kumar; 05 Aug 2024, 11:50.

                    Comment


                    • #11
                      Thank you guys!
                      For me, defining a variable list and using it like suggested in #7 didn't work. I will have to work on my non-existent knowledge of variable lists in stata in the future!
                      But I decided to go with Hemanshu's code in #10 and it worked.
                      You are all amazing, thank you so much for your help!

                      Comment


                      • #12
                        Unfortunately, I have a follow-up question.

                        What I did is:

                        Code:
                        preserve
                        local means (mean)
                        local se (semean)
                        foreach v of varlist var1 var2 var3 etc. {
                            local means `means' mean_`v' = `v'
                            local se `se' se_`v' = `v'
                        }
                        
                        collapse `means' `se', by(treatment region)
                        save mean_and_se, replace
                        restore
                        Now I have my original dataset in memory again and the values from the collapse command stored in mean_and_se.dta.
                        Since I used by(treatment region), for each variable in mean_and_se.dta I have 4 observations (and each variable takes on 4 different values).
                        observation mean_var1
                        1 (treatment=0, region=0) 3.51
                        2 (treatment=0, region=1) 0.89
                        3 (treatment=1, region=0) 3.04
                        4 (treatment=1, region=1) 0.93

                        In the original dataset I have 140 observations (each participant is one observation).
                        Is there a way to assign the 4 different values of e.g. mean_var1 to all the observations in the original dataset, based on whether a participant belongs to treatment group 0 or 1 AND region group 0 or1?
                        So a participant with treatment=0 and region=0 would get assigned the mean_var1 value 3.51, which was calculated for the group treatment=0 & region=0.

                        I used
                        Code:
                        append using "~/mean_and_se.dta"
                        but of course that just adds 4 observations to the existing140 observations. I'm at a loss here.

                        Thank you,

                        Anna

                        Comment


                        • #13
                          You can do

                          Code:
                          merge m:1 treatment region using "~/mean_and_se.dta", assert(3) nogen
                          But if you need the original dataset, there is also no real need to collapse and then merge back. You can generate the requisite variables using the egen command. Something like this:

                          Code:
                          sysuse auto, clear
                          
                          foreach v of varlist price mpg headroom-turn {
                              egen m_`v' = mean(`v'), by(foreign trunk)
                              egen se_`v' = semean(`v'), by(foreign trunk)
                          }
                          For the second egen command above, the one that uses semean, you may first need to install the egenmore package if you haven't already:

                          Code:
                          net install egenmore.pkg

                          Comment


                          • #14
                            Thank you so much, this solved my problem! Very grateful for the support.

                            Comment

                            Working...
                            X