Storing results for tabulate

Elena Quercioli

Join Date: Oct 2014

Posts: 16
#1

Storing results for tabulate

13 Mar 2015, 12:52

Dear Statalist members:

Suppose I have generated a dummy variable and then tabulated it using some sample weights,. I have a question: How do then store the results for the tabulated dummy variable in stata?

So suppose I ran this set of commands:

gen ALFL=0
replace ALFL=1 if origin==01 & destination==12
tab ALFL [iw=CORE09]

and wanted to retrieve the number generated by the tab command as a new variable, in the same file. Is there perhaps a way of accomplishing that, quickly? The reason I am asking is that I am generating a very large number of dummy variables, and I need to be very efficient at retrieving the results of my commands.

Any help would be greatly appreciated. Thank you very much.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35668
#2

14 Mar 2015, 03:21

"Store" for what purpose? One way of storing the results is as a matrix.

Code:

sysuse auto tab foreign [iw=mpg], matcell(foo) mat li foo

Putting the results into a new variable is easy too, and you don't even need the tabulate -- but that's very wasteful.

[CODE]
egen foo = total(weight), by(foreign)
[/CODE}
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35668

14 Mar 2015, 05:02

Note that if you want a combined table for several indicator variables, then that is easily programmable. The code here would be no longer with 100 or 1000 indicator variables. See also Mata.

Code:

* set up sandbox: in a real problem the data would be there already

clear
set obs 1000
set seed 2803
gen y = exp(rnormal())

forval i = 1/10 {
     scalar split = runiform()  
     gen x`i' = runiform() >= split
 }

* initialise results matrix
mat results = J(10, 2, .)

forval i = 1/10 {
     forval j = 0/1 {
          su y if x`i' == `j', meanonly
          mat results[`i', `=`j' + 1'] = r(sum)
     }
 }
 
unab x : x*
mat rownames results = `x'
mat colnames results = 0 1  

. mat li results

results[10,2]
             0          1
 x1  148.14933  1422.3619
 x2  864.30821  706.20301
 x3  772.38069  798.13052
 x4  405.13759  1165.3736
 x5  163.07943  1407.4318
 x6  1279.5998  290.91142
 x7  1159.0054  411.50579
 x8  661.66904  908.84218
 x9  1554.9737  15.537477
x10  523.79422   1046.717

Given your indicator variable names in a local macro varlist and weights in a variable w

Code:

 
local nvars : word count `varlist' 
mat results = J(`nvars', 2, .)
tokenize "`varlist'"

forval i = 1/`nvars' {
     forval j = 0/1 {
          su w if ``i'' == `j', meanonly
          mat results[`i', `=`j' + 1'] = r(sum)
     }
 }
 
mat rownames results = `varlist'
mat colnames results = 0 1

Last edited by Nick Cox; 14 Mar 2015, 05:11.

Comment

Elena Quercioli

Join Date: Oct 2014

Posts: 16
#4

14 Mar 2015, 09:00

Nick: Thank you for your answers. When I say I want to store the results of my tab command, I mean store them for the purpose of creating a new variable. When you suggest this,

sysuse auto tab foreign [iw=mpg], matcell(foo) mat li foo

how would your suggestion change if I wanted to create a new variable, and not necessarily a matrix? The reason I like tabulate is because it allows me to use sample weights to calculate nationally representative numbers for my variables. Thanks for all your help here.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35668
#5

14 Mar 2015, 09:31

My previous email can be adapted for this purpose. But putting two numbers in a variable is hardly efficient for any purpose. I think you need to sketch out exactly what you imagine doing with these weights, as almost certainly there is a better way.
Comment
Elena Quercioli

Join Date: Oct 2014

Posts: 16
#6

18 Mar 2015, 10:33

I am trying to use the PSID, year to year current state of residence variable to extrapolate the number of people that move for work-related reasons, in the USA, every year starting in 1997 --- the year in which the PSID makes individual weights available --- from state to state.

For each pairs of states, I construct a dummy variable that identifies those for whom the residence state in year 09 and in 11 was say AL, and then FL, and then I use the tab command combined with individual weights to compute the actual real number of people moving from AL to FL, in the 09-11 period. I do this for all pairs and for all years.

Once I have the numbers, I use those numbers as the LHS variable in a separate regression that has GDP growth rates, of those very same pairs of states in the same years, and other controls --- as explanatory variables. So, essentially, since I would like to do this rather quickly, I need to find an efficient way of generating my LHS variable. As mentioned, that variable is simply the result of the tab command, as in "tab ALFL [iw=COREWEIGHTS11]", for each of the pairs. I was hoping there is an easy, fast, way of storing the results of the tab command for each state pairs, and then proceeding to retrieve those store values and use them for my migration general regression. (I should perhaps mention I intend to set up the regression as a panel where the state pairs constitute the unit of observation, and of course the time element is represented by the span of years for which PSID information is available, e.g., 1997-2011 at the moment).

Any ideas on how to proceed would be greatly appreciated. I am relatively new to Stata and very new to the PSID world. Thank you so much for all the help you have already provided here.
Comment

Announcement

Storing results for tabulate

Comment

Comment

Comment

Comment

Comment