Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any collapse tricks for multiple stats from multiple vars?

    Picking up on the question that was not asked here: http://www.statalist.org/forums/foru...=1421354574272
    If I have a great long varlist and I want to collapse to (say) means and counts and SDs, what would you recommend, in order to avoid the tedious typing (and attendant errors)? In the dim distance, I anticipate some kind of messing around parsing the varlist and making up a long macro to feed into the command, but maybe there's something smarter.

  • #2
    Seriously do read -help collapse- . It can do many stats for many variables at a time, accepts varlists (var* or var23-var54) and such. What limited information you provide, seems no reason to look farther than -collapse-.

    Comment


    • #3
      Well, I always found collapse the one command that couldn't get succinct. I suppose I want to write something like:
      collapse (mean) myvar* (sd) myvar* (count) myvar*
      and have it name the resulting variables according to some automatic scheme. Or even:
      collapse (mean) mean_myvar*=myvar*..... though that is not very Stataish.
      What I really don't like doing is:
      collapse (mean) mean_myvar1=myvar1 mean_myvar2=myvar2 mean_myvar3=myvar3 mean_myvar4=myvar4..... and so on for many lines, then the sds, then finally you get to leave one of them with the same name as a reward for making it that far.

      Comment


      • #4
        Interesting. I hadn't realized the prepwork involved in -collapse-. See if this gets you close to what you want:

        Code:
        sysuse nlsw88.dta, clear
        
        foreach var of varlist idcode-tenure { 
            gen sd`var'=`var'
            }
            
        foreach var of varlist idcode-tenure { 
            gen c`var'=`var'
            }
        
        foreach var of varlist idcode-tenure { 
            gen sem`var'=`var'
            }
        
        sum
        collapse (mean) (idcode-tenure) (sd) (sdidcode-sdtenure) (count) (cidcode-ctenure) (sem) (semidcode-semtenure)  
        sum

        Comment


        • #5
          Thanks Ben, hadn't even thought of duplicating the variables!

          Comment


          • #6
            and actually, if you use unique stubs to start each series of variable, you should be able to do them in a single loop, and reference them as sem_* sd_*, etc. In my example, "c" is potentially not unique, but:
            Code:
            sysuse nlsw88.dta, clear
            
            foreach var of varlist idcode-tenure { 
                gen sd_`var'=`var'
                gen count_`var'=`var'
                gen sem_`var'=`var'
            }
            
            sum
            collapse (mean) (idcode-tenure) (sd) (sd_*) (count) (count_*) (sem) (sem_*)  
            sum
            is more efficient.

            Comment


            • #7
              Thanks ben earnhart. Your code in #6 was the perfect solution to my issue (similar to that in #1).
              Last edited by Chris Boulis; 19 Apr 2021, 02:44.

              Comment


              • #8
                ben earnhart's helpful code can be simplified as one loop suffices.

                Code:
                 
                 foreach var of varlist idcode-tenure {       gen sd`var'=`var'       gen c`var'=`var'        gen sem`var'=`var' }

                Here is another way to do it: write code that writes code.


                Code:
                sysuse nlsw88.dta, clear
                
                local call 
                local wild 
                
                foreach v of varlist idcode-tenure { 
                    local call `call' (sd) sd`v'=`v' (count) c`v'=`v' (sem) sem`v'=`v'
                    local wild `wild' *`v'
                }
                    
                
                collapse (mean) (idcode-tenure) `call'  
                
                order `wild'
                
                su

                Comment


                • #9
                  Sorry; the first block was mangled.


                  Code:
                   
                   foreach var of varlist idcode-tenure {     
                      gen sd`var'=`var' 
                        
                      gen c`var'=`var'  
                        
                      gen sem`var'=`var' 
                  }

                  Comment


                  • #10
                    Hi Nick Cox. Thanks for the alternative approach. Two questions. (1) Can we place all options, including the mean in the local `call'? (2) (if not) can I add a prefix for mean - say 'x_' (to be consistent with sd - 'sd_' and count 'n_' - I'm not using (sem)) after -collapse-?. I attempted the following (separately):

                    Code:
                    collapse (mean) (x_`v') `call', by()
                    collapse (mean) x_`v' `call', by()
                    but received the following response
                    Code:
                    variable x_ not found
                    r(111);
                    Can the following be amended to address either (1) or (2) above?
                    Code:
                    local varlist totasset totfin totbank totequity totsuper totnonfin totprop totbus totveh
                    
                    local call
                    local wild
                    
                    foreach v of local varlist {
                        local call `call' (sd) sd_`v'=`v' (count) n_`v'=`v'
                        local wild `wild' *`v'
                    }
                        
                    collapse (mean) (totasset-totveh) `call', by(intra agegrp wave)  
                    
                    order `wild'
                    
                    su
                    I want to call the variables from the local varlist for (mean) in collapse, but couldn't so used totasset-totveh (which I want to avoid as this includes additional variables.

                    Stata v.15.1. I'm using panel data.
                    Last edited by Chris Boulis; 20 Apr 2021, 00:30.

                    Comment


                    • #11
                      I don't quite understand what (1) means, but this may answer (2)

                      Code:
                      local varlist totasset totfin totbank totequity totsuper totnonfin totprop totbus totveh
                      
                      local call
                      local wild
                      
                      foreach v of local varlist {
                          local call `call' (mean) mean_`v'=`v'  (sd) sd_`v'=`v' (count) n_`v'=`v'
                          local wild `wild' *`v'
                      }
                          
                      collapse `call', by(intra agegrp wave)  
                      
                      order `wild'
                      Note: the manipulations with the wildcards don't change the variable order in this case. The code there is more of a reminder to myself that the variables might be wanted in a different order.
                      Last edited by Nick Cox; 20 Apr 2021, 02:34.

                      Comment


                      • #12
                        Hi Nick Cox Yes that worked nicely - Thank you. You answered both of my questions (adding (mean) to the local call line answered q1).

                        Comment


                        • #13
                          I just had to jump in here and say that the development of answers to this problem are light years ahead of anything I'd imagined when I started it. Kudos to you Stata wizards.

                          Comment


                          • #14
                            It is a very good example Robert Grant of the value of this forum and those like it. Someone may have the same problem years after and can find solutions without having to make a new post (provided they search first) and the cool thing is that Nick Cox (in particular) is always happy to jump in and show us a better way of doing something (if it's possible), which is a win-win for those learning.

                            Comment


                            • #15
                              I propose the Double Cox Conjecture: whatever statistical innovation you think you have made, David Cox probably did it in the 70s and Nick Cox probably coded it better.

                              Comment

                              Working...
                              X