Sorting a variable by two dummy variables

Mary Brown

Join Date: Mar 2019

Posts: 5
#1

Sorting a variable by two dummy variables

04 Mar 2019, 05:38

I am working on a paper about discrimination in the labour market. I have three variables (among many others): one is race (black/white), the other is callback and the third is advertisement. Race and callback are dummies, the ad variable assigns a number to each job advertisement from 1 to 1200

the dataset is like this

race call ad
b 0 1
w 0 1
b 0 1
w 0 1

Which means that for ad 1, nobody got called back

then I have

race call ad
w 0 216
b 1 216
b 1 216
w 0 216

which means that for ad 216, 2 African-Americans got called back and 0 Whites

or

race call ad
b 0 376
w 1 376
b 0 376
w 1 376

which means that for ad 376 2 whites got called back and no African-Americans

I need to to know the percentage of ads that favored african americans (like ad 216) or they treates the candidates equally (like ad 1) or they favored whites (like other ads where 1 white got called back and no african americans) so I need to discern the advertisement variable to know how many ads called 0 whites and 0 blacks or 1 black and 1 white or 2 whites and 0 blacks and so on.

I am relatively new to Stata so I've tried to use the summarize command, unsuccessfully

sum ad if race=="w" & call==0 & race=="b" & call==1 says "no observations"

I tried to use tabulate race call adid, row and it says "too many variables specified" so I switched to bysort ad: summarize race call yet I got no results.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35783
#2

04 Mar 2019, 06:08

Code:

sum ad if race=="w" & call==0 & race=="b" & call==1

asks Stata to work on observations for which it's true that race is white and also black in the same observations, and similarly call is 0 and call is 1 in the same observations. That is like asking whether a car is both foreign and domestic. -- but it can't be both. That explains the message you report.

tabulate allows only one or two variables. See its help.

It seems to me that you don't want the & operator there at all, as it yields no observations.

You can count black and white calls with

Code:

egen n_calls = total(call), by(ad race)

and ensure that you only look at one observation per advertisement and race by

Code:

egen tag = tag(ad race)

after that

Code:

tab ad race [fw=n_calls] if tag

would seem a step in the right direction. Presumably if you have different numbers of black and white people you're going to adjust for that. One way to that would be

Code:

egen mean_call = mean(call), by(ad race)

and indeed

Code:

tabulate ad race, summarize(call)

gets you there directly.
Comment
Mary Brown

Join Date: Mar 2019

Posts: 5
#3

04 Mar 2019, 12:43

Thank you very much. I've used the code you wrote but unfortunately when I reached the last step and used tabulate ad race, summarize(call)

it said

too many values
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35783

05 Mar 2019, 00:13

Evidently you have too many ads to tabulate.

Thinking about this again:

1. How are you going to model this? I am no expert here and others do have more authority and may have better advice, but with thousands of ads, I wonder about xtlogit.

2. For a descriptive analysis I would produce a reduced dataset.

Putting all that together, and leaving interpretation of the model an open question (your data are just a minimal sandbox in any case), that suggests

Code:

clear 
input str1 race call ad
b 0 1
w 0 1
b 0 1
w 0 1
w 0 216
b 1 216
b 1 216
w 0 216
b 0 376
w 1 376
b 0 376
w 1 376
end 

encode race, gen(nrace) 
xtset ad 
xtlogit call i.nrace 

contract race call ad 
reshape wide _freq, i(ad race) j(call) 
reshape wide _freq0 _freq1, i(ad) j(race) string 
mvencode *, mv(0) 


gen pcall_black = _freq1b / (_freq1b + _freq0b) 
gen pcall_white = _freq1w / (_freq1w + _freq0w)  

list 

     +-------------------------------------------------------------------+
     |  ad   _freq0b   _freq1b   _freq0w   _freq1w   pcall_~k   pcall_~e |
     |-------------------------------------------------------------------|
  1. |   1         2         0         2         0          0          0 |
  2. | 216         0         2         2         0          1          0 |
  3. | 376         2         0         0         2          0          1 |
     +-------------------------------------------------------------------+

Comment

Mary Brown

Join Date: Mar 2019

Posts: 5
#5

05 Mar 2019, 13:13

I am actually working on a replication so I'm going to stick to the model they used, a probit regression (which I have used for other data I am replicating)

The table I am working on is not supposed to be a regression, it's merely the percentage of callbacks that favored whites or blacks.

You code works but unfortunately I still don't have the numbers I need and I still need to include all the 1200+ ads in the sample. I doubt the authors of the original paper went through every ad manually.
Comment

Announcement

Sorting a variable by two dummy variables

Comment

Comment

Comment

Comment