Generating dummy variables whose name indicates the group they belong

Pantelis Kazakis

Join Date: Aug 2014
Posts: 123

Generating dummy variables whose name indicates the group they belong

20 Mar 2019, 11:37

Hello,

Assume that I have a database that has two columns: id (firm) and group. Based on this, I would like to create dummy variables for each firm that indicates the group that is located. That is, I would like all firms that belong in the same group to have a variable name such as: _Indic_group1_firmID, _Indic_group2_firmID etc. I will provide an example of what I mean by that.

Code:

input id group
1    1
2    1
3    1
4    2
5    2
6    2
7    3
8    3
9    3
10    4
11    4
12    4
end

What I would like to obtain is something like this:

Code:

input id     group    indic1_1    indic1_2    indic1_3    indic2_4    indic2_5    indic2_6    indic3_7    indic3_8    indic3_9    indic4_10    indic4_11    indic4_12
1    1    1    0    0    0    0    0    0    0    0    0    0    0
2    1    0    1    0    0    0    0    0    0    0    0    0    0
3    1    0    0    1    0    0    0    0    0    0    0    0    0
4    2    0    0    0    1    0    0    0    0    0    0    0    0
5    2    0    0    0    0    1    0    0    0    0    0    0    0
6    2    0    0    0    0    0    1    0    0    0    0    0    0
7    3    0    0    0    0    0    0    1    0    0    0    0    0
8    3    0    0    0    0    0    0    0    1    0    0    0    0
9    3    0    0    0    0    0    0    0    0    1    0    0    0
10    4    0    0    0    0    0    0    0    0    0    1    0    0
11    4    0    0    0    0    0    0    0    0    0    0    1    0
12    4    0    0    0    0    0    0    0    0    0    0    0    1
end

As it can be seen from the above, firms with id 1, 2, and 3 belong to the same group (group1), and for this reason the dummy variables have names such as: indic1_1, indic1_2, indic1_3. In the same manner, firms with id 10, 11, and 12 belong to the fourth group and have dummy names: indic4_10, indic4_11, indic4_12.

I wonder if it's possible to achieve this.

Thanks in advance.

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30179

20 Mar 2019, 12:16

Code:

levelsof group, local(groups)
foreach g of local groups {
    levelsof id if group == `g', local(ids)
    foreach i of local ids {
        gen indic`g'_`i' = (group == `g' & id == `i')
    }
}

Comment

Sarah Edgington

Join Date: Apr 2014
Posts: 284

20 Mar 2019, 12:30

This may not be the most efficient way to do this, depending on the number of groups and firm IDs you have, but a loop over the levels of each variable is one way to do this.

Code:

**create local macros that contain a list of all the values taken by your two variables of interest

levelsof id, local(idlev)
levelsof group, local(grplev)

**loop over these lists to create the indicators
foreach i in `idlev' {
    foreach g in `grplev' {
        gen indic`g'_`i'=(id==`i' & group==`g')
        
        **this creates all possible combination of group and id
        **since some combinations have no observations you can get rid of these
        **identify the indicators with no observations by noting mean of 0
        **then drop those indicators
        
        sum indic`g'_`i'
        if r(mean)==0 drop indic`g'_`i'
    }
}

I'm honestly not sure why you would want to do this, though. If your real data looks like your example data you're going to end up with 1 indicator per observation. What do you plan to do with all these indicators? If you can clearly explain your end goal it's possible someone here will have a better idea how to achieve it.

Comment

Pantelis Kazakis

Join Date: Aug 2014
Posts: 123

20 Mar 2019, 12:40

Thanks Clyde. It works like a charm.

Originally posted by Clyde Schechter View Post

Code:

levelsof group, local(groups)
foreach g of local groups {
levelsof id if group == `g', local(ids)
foreach i of local ids {
gen indic`g'_`i' = (group == `g' & id == `i')
}
}

Comment

Pantelis Kazakis

Join Date: Aug 2014

Posts: 123
#5

20 Mar 2019, 13:41

Imagine that the same id appears multiple times in the database. What I gave is the simplest example possible to save space.

I want to run regressions in different subsamples and eventually compare coefficients. To do this, I am using the suest command and it appears that this command does not allow the use of > i. < in front of a variable when one runs a regression. So, I have to create dummies and enter them "manually" in the model. I am not aware of another way to do this easier.

Originally posted by Sarah Edgington View Post

I'm honestly not sure why you would want to do this, though. If your real data looks like your example data you're going to end up with 1 indicator per observation. What do you plan to do with all these indicators? If you can clearly explain your end goal it's possible someone here will have a better idea how to achieve it.
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#6

20 Mar 2019, 14:25

As far as I can tell suest can handle models with factor variables. So if you're having trouble with that specifically you might want to start a new thread to get some help there. Depending on the specifics of your real data and what models you're running, being able to use factor variables might make things a lot cleaner and easier.
2 likes
Comment
Pantelis Kazakis

Join Date: Aug 2014

Posts: 123
#7

20 Mar 2019, 16:50

Interestingly, when I run the model with factor variables from another PC it worked. Of course the results are the same, but, then again, why did Stata behave differently? At work I use Stata MP 15.1 (2 cores), while at home I have Stata SE 15.1.

When I re-run the models, I used ib#.var (where # is the basis number) instead of i.var. This seems to do the trick. I did this because Stata SE provided a relevant error message and I was able to find the solution easily. Just for curiosity, I will try again tomorrow in the MP version and see what I get.

Originally posted by Sarah Edgington View Post

As far as I can tell suest can handle models with factor variables. So if you're having trouble with that specifically you might want to start a new thread to get some help there. Depending on the specifics of your real data and what models you're running, being able to use factor variables might make things a lot cleaner and easier.
Comment

Announcement

Generating dummy variables whose name indicates the group they belong

Comment

Comment

Comment

Comment

Comment

Comment