Quietly Tabulate group, generate (dummy)

Lazaros Antonios Chatzilazarou

Join Date: Oct 2018

Posts: 11
#1

Quietly Tabulate group, generate (dummy)

06 Nov 2018, 06:32

Hello friends,
I am trying to quietly tabulate a group of variables and at the same time generate a dummy of my preference.
So let me specify how this goes. I have data on exporters, importers, commodity code and year. In order to create a panel I had to create a group named "i" with the command

Code:

egen i = group(exporter importer commodity_code)

that contained exporter importer and commodity code. I did this to create my cross section and with year as my time I create a panel with i and year. Before I create the panel I want to tabulate i and generate a dummy. My data are over 2 million so I have a major problem in tabulating. Through this forum I found commands such as

Code:

collapse, table, bigtab

.
The command

Code:

Bigtab

works for me just fine, however it does not give me the chance to

Code:

bigtab i

and generate my dummy at the same row.
So to wrap it up I am looking for something like this:

Code:

quietly tabulate i, gen(Country_Pair_i)

but since tabulate cannot take too many values more something like this

Code:

quietly bigtab i, gen(Country_Pair_i)

.
I want to bigtab my group "i" and generate my dummy "Country_Pair_i" at the same time.

Sorry for the (in case) complicated post and thank you in advance!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#2

06 Nov 2018, 07:05

With factor variables, you don't need to manually generate dummies. In addition, each dummy is a variable, and Stata variable limits are likely to bite if you have too many dummies. That said, here is a workaround

Code:

levelsof i, local(n) foreach j in `n'{ gen dummy`j'= `j'==i }
1 like
Comment
Lazaros Antonios Chatzilazarou

Join Date: Oct 2018

Posts: 11
#3

13 Nov 2018, 16:54

Originally posted by Andrew Musau View Post

With factor variables, you don't need to manually generate dummies. In addition, each dummy is a variable, and Stata variable limits are likely to bite if you have too many dummies. That said, here is a workaround

Code:

levelsof i, local(n) foreach j in `n'{ gen dummy`j'= `j'==i }

So thanks for the help, I tried your command and what I get is "
macro substitution results in line that is too long".
Comment
Lazaros Antonios Chatzilazarou

Join Date: Oct 2018

Posts: 11
#4

13 Nov 2018, 17:25

Originally posted by Andrew Musau View Post

With factor variables, you don't need to manually generate dummies. In addition, each dummy is a variable, and Stata variable limits are likely to bite if you have too many dummies. That said, here is a workaround

Code:

levelsof i, local(n) foreach j in `n'{ gen dummy`j'= `j'==i }

Ok I solved it with the command for

Code:

Code:

set matvar 32700 and set matsize 11000

My only question will now be if it is not a bother to you to please explain me what this command actually does, what does it generate! Thank you so much
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10254

14 Nov 2018, 02:49

My only question will now be if it is not a bother to you to please explain me what this command actually does

It does exactly what you asked for, i.e., generate dummies for each level of your variable \(i\).-tab i, gen(dummy)- has a limit of 10,000 levels, so as explained in #2, the code is a workaround this limit. See the following example

Code:

*GENERATE DATA SET
set obs 10
set seed 2018
gen i= runiformint(1,3)
l, clean

So this is our data

Code:

The variable \(i\) can take 3 values, i.e., 1, 2 and 3. Let us generate the dummies for this variable using the two methods

Code:

*METHOD 1
. tab i, gen(dummy)

          i |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          3       30.00       30.00
          2 |          5       50.00       80.00
          3 |          2       20.00      100.00
------------+-----------------------------------
      Total |         10      100.00

. l

     +------------------------------+
     | i   dummy1   dummy2   dummy3 |
     |------------------------------|
  1. | 1        1        0        0 |
  2. | 2        0        1        0 |
  3. | 3        0        0        1 |
  4. | 2        0        1        0 |
  5. | 2        0        1        0 |
     |------------------------------|
  6. | 3        0        0        1 |
  7. | 2        0        1        0 |
  8. | 1        1        0        0 |
  9. | 2        0        1        0 |
 10. | 1        1        0        0 |
     +------------------------------+

Code:

*METHOD 2
*FIRST STORE THE LEVELS OF  VARIABLE "i" IN A LOCAL MACRO NAMED "n"
*3 LEVELS IN THIS CASE (1, 2 & 3)
levelsof i, local(n)
*GENERATE DUMMIES (EQUAL TO 1 IF A PARTICULAR LEVEL (j) IS EQUAL TO A VALUE IN VAR "i"
 foreach j in `n'{
 gen d`j'=i==`j'
 }
list, clean

Result:

Code:

.
. list, clean

       i   dummy1   dummy2   dummy3   d1   d2   d3  
  1.   1        1        0        0    1    0    0  
  2.   2        0        1        0    0    1    0  
  3.   3        0        0        1    0    0    1  
  4.   2        0        1        0    0    1    0  
  5.   2        0        1        0    0    1    0  
  6.   3        0        0        1    0    0    1  
  7.   2        0        1        0    0    1    0  
  8.   1        1        0        0    1    0    0  
  9.   2        0        1        0    0    1    0  
 10.   1        1        0        0    1    0    0

So here you see that tab,gen() generated dummies dummy1-dummy3 whereas the second method generated dummies d1-d3 (the same set of dummies, save the name).

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35754
#6

14 Nov 2018, 03:33

The most important advice here in the helpful posts of Andrew Musau is that you don't need so many dummies (I say indicators whenever possible) because you can use factor variable notation. (Even so, my mind still boggles at using more than about seven predictors in a model.)

But answers are missing a work-around available given the initial use of egen, group() (which always generates levels 1 up).

Code:

egen i = group(exporter importer commodity_code) su i, meanonly forval j = 1/`r(max)' { gen d`j' = i == `j' }
2 likes
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

14 Nov 2018, 04:08

Hi Lazaros, it depend where you are going with this whole thing.

If you want those dummies to estimate a fixed effects/dummy variable regression, you just

Code:

egen i = group(exporter importer commodity_code) areg y x, absorb(i)

areg is the right tool for estimating regression with a huge set of dummies, and for it you do not need to generate the dummies explicitly, and you do not need to use factor expansions.
Comment
Lazaros Antonios Chatzilazarou

Join Date: Oct 2018

Posts: 11
#8

14 Nov 2018, 05:06

Originally posted by Joro Kolev View Post

Hi Lazaros, it depend where you are going with this whole thing.

If you want those dummies to estimate a fixed effects/dummy variable regression, you just

Code:

egen i = group(exporter importer commodity_code) areg y x, absorb(i)

areg is the right tool for estimating regression with a huge set of dummies, and for it you do not need to generate the dummies explicitly, and you do not need to use factor expansions.

Hello Joro! Thanks for replying
So my purpose of generating those dummies are to use them in an xtreg regression. I have several others commands with "egen".
So this is the first step

Code:

egen ye = group(year) egen exp = group(exporter) egen imp = group(importer) egen exp_t = group(exporter year) egen imp_t = group(importer year) egen i = group(exporter importer commodity_code)

Then I create the dummies

Code:

quietly tabulate ye, gen(Year_Fe) quietly tabulate exp, gen(Exporter_Fe) quietly tabulate imp, gen(Importer_Fe)

So I can use them in a regression like this

Code:

xtreg lnxYear_Fe* Exporter_Fe* Importer_Fe* plus my others variables

Last edited by Lazaros Antonios Chatzilazarou; 14 Nov 2018, 05:31.
Comment
Lazaros Antonios Chatzilazarou

Join Date: Oct 2018

Posts: 11
#9

14 Nov 2018, 11:15

So would it be also correct if instead of tabulating and generating the dummies I did it once and for all inside my regression command? Something like this

Code:

xtreg lnx lnigdp lnjgdp lndistcap lnigdpcap lnjgdpcap contig comlang_ethno i.year i.i , re

where

i.year

stands for the dummy for year and

i.i

the dummy for I I initially wanted to create? Does this sound right?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#10

15 Nov 2018, 05:10

The
xtreg y x, re
gives you random effects at some level, and which is this some level depends on how you have -xtset panelvar timevar- your data. The random effects are at the panel variable level.

There is nothing wrong with using factor expansions such as the i.variable name in your code, as long as Stata can process your request in time which works for you. areg is good in that it can process (many) many fixed effects which otherwise would cause problems for Stata.

You have many fixed effects at many different levels. It is an economic question whether including such makes sense or not, and I cannot comment on this.
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#11

07 Aug 2019, 08:35

Originally posted by Nick Cox View Post

The most important advice here in the helpful posts of Andrew Musau is that you don't need so many dummies (I say indicators whenever possible) because you can use factor variable notation. (Even so, my mind still boggles at using more than about seven predictors in a model.)

But answers are missing a work-around available given the initial use of egen, group() (which always generates levels 1 up).

Code:

egen i = group(exporter importer commodity_code) su i, meanonly forval j = 1/`r(max)' { gen d`j' = i == `j' }

Thank you so much, I was looking for this for a while.
Comment

Announcement