Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quietly Tabulate group, generate (dummy)

    Hello friends,
    I am trying to quietly tabulate a group of variables and at the same time generate a dummy of my preference.
    So let me specify how this goes. I have data on exporters, importers, commodity code and year. In order to create a panel I had to create a group named "i" with the command
    Code:
    egen i = group(exporter importer commodity_code)
    that contained exporter importer and commodity code. I did this to create my cross section and with year as my time I create a panel with i and year. Before I create the panel I want to tabulate i and generate a dummy. My data are over 2 million so I have a major problem in tabulating. Through this forum I found commands such as
    Code:
    collapse, table, bigtab
    .
    The command
    Code:
    Bigtab
    works for me just fine, however it does not give me the chance to
    Code:
    bigtab i
    and generate my dummy at the same row.
    So to wrap it up I am looking for something like this:
    Code:
    quietly tabulate i, gen(Country_Pair_i)
    but since tabulate cannot take too many values more something like this
    Code:
    quietly bigtab i, gen(Country_Pair_i)
    .
    I want to bigtab my group "i" and generate my dummy "Country_Pair_i" at the same time.

    Sorry for the (in case) complicated post and thank you in advance!

  • #2
    With factor variables, you don't need to manually generate dummies. In addition, each dummy is a variable, and Stata variable limits are likely to bite if you have too many dummies. That said, here is a workaround

    Code:
    levelsof i, local(n)
    foreach j in `n'{
    gen dummy`j'= `j'==i
    }

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      With factor variables, you don't need to manually generate dummies. In addition, each dummy is a variable, and Stata variable limits are likely to bite if you have too many dummies. That said, here is a workaround

      Code:
      levelsof i, local(n)
      foreach j in `n'{
      gen dummy`j'= `j'==i
      }
      So thanks for the help, I tried your command and what I get is "
      macro substitution results in line that is too long".

      Comment


      • #4
        Originally posted by Andrew Musau View Post
        With factor variables, you don't need to manually generate dummies. In addition, each dummy is a variable, and Stata variable limits are likely to bite if you have too many dummies. That said, here is a workaround

        Code:
        levelsof i, local(n)
        foreach j in `n'{
        gen dummy`j'= `j'==i
        }
        Ok I solved it with the command for
        Code:
        Code:
        set matvar 32700 and set matsize 11000
        My only question will now be if it is not a bother to you to please explain me what this command actually does, what does it generate! Thank you so much

        Comment


        • #5
          My only question will now be if it is not a bother to you to please explain me what this command actually does
          It does exactly what you asked for, i.e., generate dummies for each level of your variable \(i\).-tab i, gen(dummy)- has a limit of 10,000 levels, so as explained in #2, the code is a workaround this limit. See the following example

          Code:
          *GENERATE DATA SET
          set obs 10
          set seed 2018
          gen i= runiformint(1,3)
          l, clean
          So this is our data

          Code:
          . l, clean
          
                 i  
            1.   1  
            2.   2  
            3.   3  
            4.   2  
            5.   2  
            6.   3  
            7.   2  
            8.   1  
            9.   2  
           10.   1
          The variable \(i\) can take 3 values, i.e., 1, 2 and 3. Let us generate the dummies for this variable using the two methods

          Code:
          *METHOD 1
          . tab i, gen(dummy)
          
                    i |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    1 |          3       30.00       30.00
                    2 |          5       50.00       80.00
                    3 |          2       20.00      100.00
          ------------+-----------------------------------
                Total |         10      100.00
          
          . l
          
               +------------------------------+
               | i   dummy1   dummy2   dummy3 |
               |------------------------------|
            1. | 1        1        0        0 |
            2. | 2        0        1        0 |
            3. | 3        0        0        1 |
            4. | 2        0        1        0 |
            5. | 2        0        1        0 |
               |------------------------------|
            6. | 3        0        0        1 |
            7. | 2        0        1        0 |
            8. | 1        1        0        0 |
            9. | 2        0        1        0 |
           10. | 1        1        0        0 |
               +------------------------------+
          Code:
          *METHOD 2
          *FIRST STORE THE LEVELS OF  VARIABLE "i" IN A LOCAL MACRO NAMED "n"
          *3 LEVELS IN THIS CASE (1, 2 & 3)
          levelsof i, local(n)
          *GENERATE DUMMIES (EQUAL TO 1 IF A PARTICULAR LEVEL (j) IS EQUAL TO A VALUE IN VAR "i"
           foreach j in `n'{
           gen d`j'=i==`j'
           }
          list, clean
          Result:

          Code:
          .
          . list, clean
          
                 i   dummy1   dummy2   dummy3   d1   d2   d3  
            1.   1        1        0        0    1    0    0  
            2.   2        0        1        0    0    1    0  
            3.   3        0        0        1    0    0    1  
            4.   2        0        1        0    0    1    0  
            5.   2        0        1        0    0    1    0  
            6.   3        0        0        1    0    0    1  
            7.   2        0        1        0    0    1    0  
            8.   1        1        0        0    1    0    0  
            9.   2        0        1        0    0    1    0  
           10.   1        1        0        0    1    0    0
          So here you see that tab,gen() generated dummies dummy1-dummy3 whereas the second method generated dummies d1-d3 (the same set of dummies, save the name).

          Comment


          • #6
            The most important advice here in the helpful posts of Andrew Musau is that you don't need so many dummies (I say indicators whenever possible) because you can use factor variable notation. (Even so, my mind still boggles at using more than about seven predictors in a model.)

            But answers are missing a work-around available given the initial use of egen, group() (which always generates levels 1 up).


            Code:
              
            egen i = group(exporter importer commodity_code)  
            
            su i, meanonly  
            
            forval j = 1/`r(max)' {      
                gen d`j' = i == `j'  
            }

            Comment


            • #7
              Hi Lazaros, it depend where you are going with this whole thing.

              If you want those dummies to estimate a fixed effects/dummy variable regression, you just

              Code:
               
               egen i = group(exporter importer commodity_code)     areg y x, absorb(i)
              areg is the right tool for estimating regression with a huge set of dummies, and for it you do not need to generate the dummies explicitly, and you do not need to use factor expansions.

              Comment


              • #8
                Originally posted by Joro Kolev View Post
                Hi Lazaros, it depend where you are going with this whole thing.

                If you want those dummies to estimate a fixed effects/dummy variable regression, you just

                Code:
                egen i = group(exporter importer commodity_code) areg y x, absorb(i)
                areg is the right tool for estimating regression with a huge set of dummies, and for it you do not need to generate the dummies explicitly, and you do not need to use factor expansions.
                Hello Joro! Thanks for replying
                So my purpose of generating those dummies are to use them in an xtreg regression. I have several others commands with "egen".
                So this is the first step

                Code:
                egen ye = group(year)
                egen exp = group(exporter)
                egen imp = group(importer)
                egen exp_t = group(exporter year)
                egen imp_t = group(importer year)
                egen i = group(exporter importer commodity_code)
                Then I create the dummies
                Code:
                quietly tabulate ye, gen(Year_Fe)
                quietly tabulate exp, gen(Exporter_Fe)
                quietly tabulate imp, gen(Importer_Fe)
                So I can use them in a regression like this
                Code:
                xtreg lnxYear_Fe* Exporter_Fe* Importer_Fe* plus my others variables
                Last edited by Lazaros Antonios Chatzilazarou; 14 Nov 2018, 05:31.

                Comment


                • #9
                  So would it be also correct if instead of tabulating and generating the dummies I did it once and for all inside my regression command? Something like this
                  Code:
                  xtreg lnx lnigdp lnjgdp lndistcap lnigdpcap lnjgdpcap contig comlang_ethno i.year i.i , re
                  where
                  i.year
                  stands for the dummy for year and
                  i.i
                  the dummy for I I initially wanted to create? Does this sound right?

                  Comment


                  • #10
                    The
                    xtreg y x, re
                    gives you random effects at some level, and which is this some level depends on how you have -xtset panelvar timevar- your data. The random effects are at the panel variable level.

                    There is nothing wrong with using factor expansions such as the i.variable name in your code, as long as Stata can process your request in time which works for you. areg is good in that it can process (many) many fixed effects which otherwise would cause problems for Stata.

                    You have many fixed effects at many different levels. It is an economic question whether including such makes sense or not, and I cannot comment on this.

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      The most important advice here in the helpful posts of Andrew Musau is that you don't need so many dummies (I say indicators whenever possible) because you can use factor variable notation. (Even so, my mind still boggles at using more than about seven predictors in a model.)

                      But answers are missing a work-around available given the initial use of egen, group() (which always generates levels 1 up).


                      Code:
                      egen i = group(exporter importer commodity_code)
                      
                      su i, meanonly
                      
                      forval j = 1/`r(max)' {
                      gen d`j' = i == `j'
                      }

                      Thank you so much, I was looking for this for a while.

                      Comment

                      Working...
                      X