Help creating a large number of dummy variables

Greg Saldutte

Join Date: Dec 2017

Posts: 81
#1

Help creating a large number of dummy variables

29 Apr 2018, 22:03

Hello,

I have employment data on individual observations. There is a dummy variable indicating whether or not an observation was employed. There is also a variable indicating the industry in which an observation worked, taking values of 1-198.

I have created a dummy variable for each occupational industry using the following code:

Code:

tabulate occ1950, generate(docc)

This generated dummy variables docc1 through docc198.

I now want to create a dummy variable for whether or not an observation is employed in each of these 198 industries. I could repeat the following 198 times:

Code:

generate employed_occ1 = 1 if employed==1 & docc1==1

Doing this 198 times while increasing in numbers would be rather cumbersome. It also seems as though I can't use an asterisk in lieu of increasing numbesr at the end of the empoyed_occ and docc variables. I would like to know if there is an efficient way to generate these variables. Following this I will collapse to the count employed in each industry.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17703

29 Apr 2018, 23:53

Greg:
are you looking for something along the following lines?:

Code:

. use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta"
(1978 Automobile Data)

. foreach var of varlist price-foreign  {
  2. g flag_`var'=1 if `var'!=.
  3.  }
(5 missing values generated)

. sum flag_*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
  flag_price |         74           1           0          1          1
    flag_mpg |         74           1           0          1          1
  flag_rep78 |         69           1           0          1          1
flag_headr~m |         74           1           0          1          1
  flag_trunk |         74           1           0          1          1
-------------+---------------------------------------------------------
 flag_weight |         74           1           0          1          1
 flag_length |         74           1           0          1          1
   flag_turn |         74           1           0          1          1
flag_displ~t |         74           1           0          1          1
flag_gear_~o |         74           1           0          1          1
-------------+---------------------------------------------------------
flag_foreign |         74           1           0          1          1

Kind regards,
Carlo
(Stata 19.0)

Comment

Greg Saldutte

Join Date: Dec 2017

Posts: 81
#3

30 Apr 2018, 00:01

Hi Carlo,
Yes, I am looking for something along those lines.
Thanks,
Greg
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#4

30 Apr 2018, 02:25

Just to note that indicator (dummy) variables that are 1 or missing are not very useful. Missings drop out of model fits any way and what is left is a constant variable that will tell you nothing useful.

What is the purpose of 198 such variables? Factor variable notation usually makes such variables unnecessary.
Comment
Greg Saldutte

Join Date: Dec 2017

Posts: 81
#5

30 Apr 2018, 06:53

Thank you for the response. I want 198 such variables because I am eventually collapsing down to the year and state level. When I do that, I will want a count of employed people in all 198 industries in a given state and year.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#6

30 Apr 2018, 08:43

Creating 198 such variables is in no sense needed for that -- and not even helpful. Some variant on contract or collapse, possibly with if and/or by(), is a simpler way forward.
Comment
Greg Saldutte

Join Date: Dec 2017

Posts: 81
#7

30 Apr 2018, 09:39

Thank you for the response. I thought that in this case, I could neither use contract nor collapse. There are a lot of variables that I am going to collapse to the year-state level. They include marriage status, the number of children in each household, the average number of children in each household, and whether the house is rented or owned. With the contract command, I was afraid that I would lose this information, and end up only with combinations of the 198 occupation variables and employment status. I also did not want to collapse the data in this stage, because I eventually hope to collapse the aforementioned variables, plus others, to the year state county level.

Last edited by Greg Saldutte; 30 Apr 2018, 09:46.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#8

30 Apr 2018, 09:46

There is no technical problem evident here, just your guesses and even fears about what is possible and not possible, and your intention to collapse or contract later rather than now. On the latter, fair enough: but the point about dummy variables remains. You've not explained why you think they are needed and so on the information given they're just a waste of your time.

I really would give more precise advice if I could but you're not making it possible yet.
Comment
Greg Saldutte

Join Date: Dec 2017

Posts: 81
#9

30 Apr 2018, 10:21

Thank you for your willingness to give precise advice. I think that dummy variables are needed because my intention is to ultimately collapse to such count variables as ct_married, ct_dwelling_own, ct_dwelling_rent, and ct_employed. to expand upon the ct_employed variable, I also wanted to collapse to ct_employed_occ1, through ct_employed_occ198, or the count employed in each of these industries at the year-state level. I thought that to do this, before collapsing, I needed to create a dummy variable indicating if an observation was employed, and a dummy variable indicating the occupational industry of each variable (which is still given for unemployed observations). I thought that I then needed to create a new dummy variable for employed_occ1 through employed_occ198 in order to collapse these variables, and end up with ct_employed_occ1 through ct_employed_198.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#10

30 Apr 2018, 10:32

Again, not so. It may be more convenient to use collapse for some reductions, contract for others and then merge.

Both collapse and contract support if fully.

It seems that you want multiple different reduced datasets, but it may still be better just to create tabulations of subsets.
Comment
Greg Saldutte

Join Date: Dec 2017

Posts: 81
#11

30 Apr 2018, 11:47

Thank you for your help, Nick.
Comment

Announcement

Help creating a large number of dummy variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment