Any collapse tricks for multiple stats from multiple vars?

Robert Grant

Join Date: Apr 2014

Posts: 58
#1

Any collapse tricks for multiple stats from multiple vars?

15 Jan 2015, 13:47

Picking up on the question that was not asked here: http://www.statalist.org/forums/foru...=1421354574272
If I have a great long varlist and I want to collapse to (say) means and counts and SDs, what would you recommend, in order to avoid the tedious typing (and attendant errors)? In the dim distance, I anticipate some kind of messing around parsing the varlist and making up a long macro to feed into the command, but maybe there's something smarter.
Tags: None
ben earnhart

Join Date: May 2014

Posts: 1027
#2

15 Jan 2015, 14:02

Seriously do read -help collapse- . It can do many stats for many variables at a time, accepts varlists (var* or var23-var54) and such. What limited information you provide, seems no reason to look farther than -collapse-.
Comment
Robert Grant

Join Date: Apr 2014

Posts: 58
#3

15 Jan 2015, 14:11

Well, I always found collapse the one command that couldn't get succinct. I suppose I want to write something like:
collapse (mean) myvar* (sd) myvar* (count) myvar*
and have it name the resulting variables according to some automatic scheme. Or even:
collapse (mean) mean_myvar*=myvar*..... though that is not very Stataish.
What I really don't like doing is:
collapse (mean) mean_myvar1=myvar1 mean_myvar2=myvar2 mean_myvar3=myvar3 mean_myvar4=myvar4..... and so on for many lines, then the sds, then finally you get to leave one of them with the same name as a reward for making it that far.
Comment

ben earnhart

Join Date: May 2014
Posts: 1027

15 Jan 2015, 14:51

Interesting. I hadn't realized the prepwork involved in -collapse-. See if this gets you close to what you want:

Code:

sysuse nlsw88.dta, clear

foreach var of varlist idcode-tenure { 
    gen sd`var'=`var'
    }
    
foreach var of varlist idcode-tenure { 
    gen c`var'=`var'
    }

foreach var of varlist idcode-tenure { 
    gen sem`var'=`var'
    }

sum
collapse (mean) (idcode-tenure) (sd) (sdidcode-sdtenure) (count) (cidcode-ctenure) (sem) (semidcode-semtenure)  
sum

Comment

Robert Grant

Join Date: Apr 2014

Posts: 58
#5

15 Jan 2015, 14:59

Thanks Ben, hadn't even thought of duplicating the variables!
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#6

15 Jan 2015, 15:05

and actually, if you use unique stubs to start each series of variable, you should be able to do them in a single loop, and reference them as sem_* sd_*, etc. In my example, "c" is potentially not unique, but:

Code:

sysuse nlsw88.dta, clear foreach var of varlist idcode-tenure { gen sd_`var'=`var' gen count_`var'=`var' gen sem_`var'=`var' } sum collapse (mean) (idcode-tenure) (sd) (sd_*) (count) (count_*) (sem) (sem_*) sum

is more efficient.
3 likes
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#7

19 Apr 2021, 02:38

Thanks ben earnhart. Your code in #6 was the perfect solution to my issue (similar to that in #1).

Last edited by Chris Boulis; 19 Apr 2021, 02:44.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

19 Apr 2021, 04:30

ben earnhart's helpful code can be simplified as one loop suffices.

Code:

 
 foreach var of varlist idcode-tenure {       gen sd`var'=`var'       gen c`var'=`var'        gen sem`var'=`var' }

Here is another way to do it: write code that writes code.

Code:

sysuse nlsw88.dta, clear

local call 
local wild 

foreach v of varlist idcode-tenure { 
    local call `call' (sd) sd`v'=`v' (count) c`v'=`v' (sem) sem`v'=`v'
    local wild `wild' *`v'
}
    

collapse (mean) (idcode-tenure) `call'  

order `wild'

su

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

19 Apr 2021, 08:00

Sorry; the first block was mangled.

Code:

 
 foreach var of varlist idcode-tenure {     
    gen sd`var'=`var' 
      
    gen c`var'=`var'  
      
    gen sem`var'=`var' 
}

Comment

Chris Boulis

Join Date: Feb 2019

Posts: 368
#10

20 Apr 2021, 00:09

Hi Nick Cox. Thanks for the alternative approach. Two questions. (1) Can we place all options, including the mean in the local `call'? (2) (if not) can I add a prefix for mean - say 'x_' (to be consistent with sd - 'sd_' and count 'n_' - I'm not using (sem)) after -collapse-?. I attempted the following (separately):

Code:

collapse (mean) (x_`v') `call', by() collapse (mean) x_`v' `call', by()

but received the following response

Code:

variable x_ not found r(111);

Can the following be amended to address either (1) or (2) above?

Code:

local varlist totasset totfin totbank totequity totsuper totnonfin totprop totbus totveh local call local wild foreach v of local varlist { local call `call' (sd) sd_`v'=`v' (count) n_`v'=`v' local wild `wild' *`v' } collapse (mean) (totasset-totveh) `call', by(intra agegrp wave) order `wild' su

I want to call the variables from the local varlist for (mean) in collapse, but couldn't so used totasset-totveh (which I want to avoid as this includes additional variables.

Stata v.15.1. I'm using panel data.

Last edited by Chris Boulis; 20 Apr 2021, 00:30.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#11

20 Apr 2021, 01:58

I don't quite understand what (1) means, but this may answer (2)

Code:

local varlist totasset totfin totbank totequity totsuper totnonfin totprop totbus totveh local call local wild foreach v of local varlist { local call `call' (mean) mean_`v'=`v' (sd) sd_`v'=`v' (count) n_`v'=`v' local wild `wild' *`v' } collapse `call', by(intra agegrp wave) order `wild'

Note: the manipulations with the wildcards don't change the variable order in this case. The code there is more of a reminder to myself that the variables might be wanted in a different order.

Last edited by Nick Cox; 20 Apr 2021, 02:34.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#12

20 Apr 2021, 05:34

Hi Nick Cox Yes that worked nicely - Thank you. You answered both of my questions (adding (mean) to the local call line answered q1).
Comment
Robert Grant

Join Date: Apr 2014

Posts: 58
#13

20 Apr 2021, 07:17

I just had to jump in here and say that the development of answers to this problem are light years ahead of anything I'd imagined when I started it. Kudos to you Stata wizards.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#14

20 Apr 2021, 18:28

It is a very good example Robert Grant of the value of this forum and those like it. Someone may have the same problem years after and can find solutions without having to make a new post (provided they search first) and the cool thing is that Nick Cox (in particular) is always happy to jump in and show us a better way of doing something (if it's possible), which is a win-win for those learning.
Comment
Robert Grant

Join Date: Apr 2014

Posts: 58
#15

15 Jul 2021, 05:27

I propose the Double Cox Conjecture: whatever statistical innovation you think you have made, David Cox probably did it in the 70s and Nick Cox probably coded it better.
1 like
Comment

Announcement

Any collapse tricks for multiple stats from multiple vars?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment