Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating dummy variables whose name indicates the group they belong

    Hello,

    Assume that I have a database that has two columns: id (firm) and group. Based on this, I would like to create dummy variables for each firm that indicates the group that is located. That is, I would like all firms that belong in the same group to have a variable name such as: _Indic_group1_firmID, _Indic_group2_firmID etc. I will provide an example of what I mean by that.

    Code:
    input id group
    1    1
    2    1
    3    1
    4    2
    5    2
    6    2
    7    3
    8    3
    9    3
    10    4
    11    4
    12    4
    end
    What I would like to obtain is something like this:
    Code:
    input id     group    indic1_1    indic1_2    indic1_3    indic2_4    indic2_5    indic2_6    indic3_7    indic3_8    indic3_9    indic4_10    indic4_11    indic4_12
    1    1    1    0    0    0    0    0    0    0    0    0    0    0
    2    1    0    1    0    0    0    0    0    0    0    0    0    0
    3    1    0    0    1    0    0    0    0    0    0    0    0    0
    4    2    0    0    0    1    0    0    0    0    0    0    0    0
    5    2    0    0    0    0    1    0    0    0    0    0    0    0
    6    2    0    0    0    0    0    1    0    0    0    0    0    0
    7    3    0    0    0    0    0    0    1    0    0    0    0    0
    8    3    0    0    0    0    0    0    0    1    0    0    0    0
    9    3    0    0    0    0    0    0    0    0    1    0    0    0
    10    4    0    0    0    0    0    0    0    0    0    1    0    0
    11    4    0    0    0    0    0    0    0    0    0    0    1    0
    12    4    0    0    0    0    0    0    0    0    0    0    0    1
    end
    As it can be seen from the above, firms with id 1, 2, and 3 belong to the same group (group1), and for this reason the dummy variables have names such as: indic1_1, indic1_2, indic1_3. In the same manner, firms with id 10, 11, and 12 belong to the fourth group and have dummy names: indic4_10, indic4_11, indic4_12.

    I wonder if it's possible to achieve this.

    Thanks in advance.

  • #2
    Code:
    levelsof group, local(groups)
    foreach g of local groups {
        levelsof id if group == `g', local(ids)
        foreach i of local ids {
            gen indic`g'_`i' = (group == `g' & id == `i')
        }
    }

    Comment


    • #3
      This may not be the most efficient way to do this, depending on the number of groups and firm IDs you have, but a loop over the levels of each variable is one way to do this.

      Code:
      **create local macros that contain a list of all the values taken by your two variables of interest
      
      levelsof id, local(idlev)
      levelsof group, local(grplev)
      
      **loop over these lists to create the indicators
      foreach i in `idlev' {
          foreach g in `grplev' {
              gen indic`g'_`i'=(id==`i' & group==`g')
              
              **this creates all possible combination of group and id
              **since some combinations have no observations you can get rid of these
              **identify the indicators with no observations by noting mean of 0
              **then drop those indicators
              
              sum indic`g'_`i'
              if r(mean)==0 drop indic`g'_`i'
          }
      }
      I'm honestly not sure why you would want to do this, though. If your real data looks like your example data you're going to end up with 1 indicator per observation. What do you plan to do with all these indicators? If you can clearly explain your end goal it's possible someone here will have a better idea how to achieve it.

      Comment


      • #4
        Thanks Clyde. It works like a charm.

        Originally posted by Clyde Schechter View Post
        Code:
        levelsof group, local(groups)
        foreach g of local groups {
        levelsof id if group == `g', local(ids)
        foreach i of local ids {
        gen indic`g'_`i' = (group == `g' & id == `i')
        }
        }

        Comment


        • #5
          Imagine that the same id appears multiple times in the database. What I gave is the simplest example possible to save space.

          I want to run regressions in different subsamples and eventually compare coefficients. To do this, I am using the suest command and it appears that this command does not allow the use of > i. < in front of a variable when one runs a regression. So, I have to create dummies and enter them "manually" in the model. I am not aware of another way to do this easier.

          Originally posted by Sarah Edgington View Post
          I'm honestly not sure why you would want to do this, though. If your real data looks like your example data you're going to end up with 1 indicator per observation. What do you plan to do with all these indicators? If you can clearly explain your end goal it's possible someone here will have a better idea how to achieve it.

          Comment


          • #6
            As far as I can tell suest can handle models with factor variables. So if you're having trouble with that specifically you might want to start a new thread to get some help there. Depending on the specifics of your real data and what models you're running, being able to use factor variables might make things a lot cleaner and easier.

            Comment


            • #7
              Interestingly, when I run the model with factor variables from another PC it worked. Of course the results are the same, but, then again, why did Stata behave differently? At work I use Stata MP 15.1 (2 cores), while at home I have Stata SE 15.1.

              When I re-run the models, I used ib#.var (where # is the basis number) instead of i.var. This seems to do the trick. I did this because Stata SE provided a relevant error message and I was able to find the solution easily. Just for curiosity, I will try again tomorrow in the MP version and see what I get.

              Originally posted by Sarah Edgington View Post
              As far as I can tell suest can handle models with factor variables. So if you're having trouble with that specifically you might want to start a new thread to get some help there. Depending on the specifics of your real data and what models you're running, being able to use factor variables might make things a lot cleaner and easier.

              Comment

              Working...
              X