Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Storing results for tabulate

    Dear Statalist members:

    Suppose I have generated a dummy variable and then tabulated it using some sample weights,. I have a question: How do then store the results for the tabulated dummy variable in stata?

    So suppose I ran this set of commands:

    gen ALFL=0
    replace ALFL=1 if origin==01 & destination==12
    tab ALFL [iw=CORE09]

    and wanted to retrieve the number generated by the tab command as a new variable, in the same file. Is there perhaps a way of accomplishing that, quickly? The reason I am asking is that I am generating a very large number of dummy variables, and I need to be very efficient at retrieving the results of my commands.

    Any help would be greatly appreciated. Thank you very much.

  • #2
    "Store" for what purpose? One way of storing the results is as a matrix.

    Code:
     
    sysuse auto
    tab foreign [iw=mpg], matcell(foo) 
    mat li foo
    Putting the results into a new variable is easy too, and you don't even need the tabulate -- but that's very wasteful.

    [CODE]
    egen foo = total(weight), by(foreign)
    [/CODE}

    Comment


    • #3
      Note that if you want a combined table for several indicator variables, then that is easily programmable. The code here would be no longer with 100 or 1000 indicator variables. See also Mata.

      Code:
      * set up sandbox: in a real problem the data would be there already
      
      clear
      set obs 1000
      set seed 2803
      gen y = exp(rnormal())
      
      forval i = 1/10 {
           scalar split = runiform()  
           gen x`i' = runiform() >= split
       }
      
      * initialise results matrix
      mat results = J(10, 2, .)
      
      forval i = 1/10 {
           forval j = 0/1 {
                su y if x`i' == `j', meanonly
                mat results[`i', `=`j' + 1'] = r(sum)
           }
       }
       
      unab x : x*
      mat rownames results = `x'
      mat colnames results = 0 1  
      
      . mat li results
      
      results[10,2]
                   0          1
       x1  148.14933  1422.3619
       x2  864.30821  706.20301
       x3  772.38069  798.13052
       x4  405.13759  1165.3736
       x5  163.07943  1407.4318
       x6  1279.5998  290.91142
       x7  1159.0054  411.50579
       x8  661.66904  908.84218
       x9  1554.9737  15.537477
      x10  523.79422   1046.717
      Given your indicator variable names in a local macro varlist and weights in a variable w

      Code:
       
      local nvars : word count `varlist' 
      mat results = J(`nvars', 2, .)
      tokenize "`varlist'"
      
      forval i = 1/`nvars' {
           forval j = 0/1 {
                su w if ``i'' == `j', meanonly
                mat results[`i', `=`j' + 1'] = r(sum)
           }
       }
       
      mat rownames results = `varlist'
      mat colnames results = 0 1
      Last edited by Nick Cox; 14 Mar 2015, 05:11.

      Comment


      • #4
        Nick: Thank you for your answers. When I say I want to store the results of my tab command, I mean store them for the purpose of creating a new variable. When you suggest this,
        sysuse auto tab foreign [iw=mpg], matcell(foo) mat li foo
        how would your suggestion change if I wanted to create a new variable, and not necessarily a matrix? The reason I like tabulate is because it allows me to use sample weights to calculate nationally representative numbers for my variables. Thanks for all your help here.

        Comment


        • #5
          My previous email can be adapted for this purpose. But putting two numbers in a variable is hardly efficient for any purpose. I think you need to sketch out exactly what you imagine doing with these weights, as almost certainly there is a better way.

          Comment


          • #6
            I am trying to use the PSID, year to year current state of residence variable to extrapolate the number of people that move for work-related reasons, in the USA, every year starting in 1997 --- the year in which the PSID makes individual weights available --- from state to state.

            For each pairs of states, I construct a dummy variable that identifies those for whom the residence state in year 09 and in 11 was say AL, and then FL, and then I use the tab command combined with individual weights to compute the actual real number of people moving from AL to FL, in the 09-11 period. I do this for all pairs and for all years.

            Once I have the numbers, I use those numbers as the LHS variable in a separate regression that has GDP growth rates, of those very same pairs of states in the same years, and other controls --- as explanatory variables. So, essentially, since I would like to do this rather quickly, I need to find an efficient way of generating my LHS variable. As mentioned, that variable is simply the result of the tab command, as in "tab ALFL [iw=COREWEIGHTS11]", for each of the pairs. I was hoping there is an easy, fast, way of storing the results of the tab command for each state pairs, and then proceeding to retrieve those store values and use them for my migration general regression. (I should perhaps mention I intend to set up the regression as a panel where the state pairs constitute the unit of observation, and of course the time element is represented by the span of years for which PSID information is available, e.g., 1997-2011 at the moment).

            Any ideas on how to proceed would be greatly appreciated. I am relatively new to Stata and very new to the PSID world. Thank you so much for all the help you have already provided here.

            Comment

            Working...
            X