Creating group ID for observed and non-observed groups (egen group)

Cortnie Shupe

Join Date: Dec 2015

Posts: 11
#1

Creating group ID for observed and non-observed groups (egen group)

10 Mar 2016, 04:59

I am using Stata 13 and working with two datasets, one of which contains tax information and the other does not. For the dataset without tax information, I will assign tax information to the observations using group averages. To do this, I would like to first create group averages in dataset 1 and in dataset 2. The characteristics I use to create group averages (simplified example) are: marital status (married), gender (sex) and number of children (children). I used the egen:

egen g=group(sex married children)

Say this command generates 22 different groups, numbered 1-22, one for each uniquely observed combination of these three characteristics. Two combinations of these variables are not occupied in dataset 1, as there are 24 possible combinations of sex, married, and number of children. Then I create group averages of tax rates ("taxmed" below) for these 22 observed combinations in dataset 1:

egen taxmed = median(tax) if !missing(tax), by(g)

After creating groups for dataset 2, I merge the datasets by group and assign the groups in dataset 2 the tax rate from dataset 1. Here is the problem: There might also be 22 uniquely observed combinations of the characteristics in dataset 2, but they might not be the SAME 22. For example, group 20 in dataset 1 might be married, male with 3 children and in dataset 2, group 20 is married, male with 1 child and I do not want to assign these groups to the same tax group. The group command simply numbers the unique combinations per dataset and I am looking for a command or other method that will create a group number for every possible combination rather than every observed combination of these variables. Otherwise, the code becomes very long if I do this by hand:

gen g=.
replace g=1 if sex==0 & married==0 & children==0
replace g=2 if sex==0 & married==0 & children==1
replace g=3 if sex==0 & married==0 & children==2
replace g=4 if sex==0 & married==1 & children==0
... etc for every possible combination.

Is there a shorter version that is less prone to mistakes/accidental omission of a possible combination?

Thank you in advance for taking the time to read my question and for any advice you can give!

Best regards,
Cortnie
Tags: None
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#2

10 Mar 2016, 08:39

One way to do this is the create a string variable that is the group variable. You can merge on this group and it retains the unique properties of your groups.

Code:

gen str group=string(sex) + string(married) + string(children) tab group

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#3

10 Mar 2016, 09:17

See also for a general discussion http://www.stata-journal.com/sjpdf.h...iclenum=dm0034

I tend to favour here

Code:

egen group=group(sex married children), label tab group

as giving the best of all worlds, comprehensible labels, efficient storage and an underlying numeric variable, useful for many tasks and essential for some.
Comment
Cortnie Shupe

Join Date: Dec 2015

Posts: 11
#4

11 Mar 2016, 02:54

Dear Carole and Nick,

Thank you for your help! The string version worked! The regular command of "egen group=group(sex married children), label" yields the same problem as the one I mention above, but just adds a label to the combination of characteristics. When I merge the datasets on the group numbers, it still matches the groups according to their position in the list of unique combinations for that dataset, but not according to the actual characteristics of that group: the combination 000 is group 1 in the first dataset, but does not exist in the second and so group 1 in the second dataset is 001 and the command matches these two different groups together. The string option on the other hand, appears to match according to the content of the characteristics in the group, which is exactly what I am looking for. Thank you both for your time and help!

Best,
Cortnie
Comment

Announcement

Creating group ID for observed and non-observed groups (egen group)

Comment

Comment

Comment