Need Help Using Share of Participants Instead of Mean Using cmogram command

Jason Browen

Join Date: Oct 2019

Posts: 82
#1

Need Help Using Share of Participants Instead of Mean Using cmogram command

13 Oct 2020, 11:50

Dear all,

I am using Stata 16, on mac and have provided a sample of my dataset using dataex at the bottom of the page. The problem I am having is that I would like to create a regression discontinuity graph using the cmogram command with the share of black respondents on the y axis and on the age in decimal variable on the x axis which represents a person's age in months/30. But every time I write out the code it puts the mean on the y-axis instead of how many respondents were black out of the total number of participants. Here is the code I used:

generate Black=1 if Race==1
cmogram Black age_decimal, scatter cut(21) lineat(21) title("Share of Black Respondents by Age"). I am not sure how to solve this issue.

Thank you in advance,
Jason Browen

input int(PUBID_1997 year) float(Race age_decimal)
1 2000 4 19.25
1 2001 4 20.25
1 2002 4 21.33
1 2003 4 22.17
4 1999 2 19
4 2000 2 19.83
4 2001 2 20.92
4 2002 2 21.83
4 2003 2 23
5 2001 2 19.5
5 2002 2 20.25
5 2003 2 21.25
5 2004 2 22.17
6 2000 2 19.08
6 2001 2 20.17
6 2002 2 20.92
6 2003 2 22.17
9 2001 4 19.33
9 2002 4 20.33
9 2003 4 21.17
9 2004 4 22.17
10 2003 4 19.75
10 2004 4 20.75
10 2005 4 21.75
10 2006 4 22.67
11 2001 2 19.58
11 2002 2 20.58
11 2003 2 21.42
11 2004 2 22.75
13 2003 2 19
13 2004 2 20
13 2005 2 21.08
13 2006 2 22.08
16 2001 2 20.08
16 2002 2 20.92
16 2003 2 21.75
16 2004 2 22.75
21 2001 2 19.42
21 2002 2 20.5
21 2003 2 21.25
21 2004 2 22.25
23 2002 2 20
23 2003 2 21
23 2004 2 21.92
23 2005 2 22.83
24 2003 2 19.5
24 2004 2 20.75
24 2005 2 21.5
24 2006 2 22.67
25 2002 2 19.67
25 2003 2 21.17
25 2004 2 21.75
25 2005 2 22.67
26 1999 1 19.25
26 2000 1 20.5
26 2001 1 21.17
26 2002 1 22.42
28 2002 1 19.08
28 2003 1 20.08
28 2004 1 21.08
28 2005 1 22.08
31 2001 4 19.5
31 2002 4 20.42
31 2003 4 21.5
31 2004 4 22.33
33 1999 4 19
33 2000 4 20
33 2001 4 21.08
33 2002 4 22.08
33 2003 4 22.92
34 2003 4 19.08
34 2004 4 19.92
34 2005 4 20.92
34 2006 4 21.92
34 2007 4 22.83
35 2002 4 19.92
35 2003 4 20.83
35 2004 4 22
35 2005 4 22.83
36 1999 4 19.5
36 2000 4 20.42
36 2001 4 21.42
36 2002 4 22.42
37 1999 4 19.25
37 2000 4 20.25
37 2001 4 21.08
37 2002 4 22.17
38 2000 4 19.42
38 2001 4 20.25
38 2002 4 21.33
38 2003 4 22.42
39 1998 2 19.08
39 1999 2 19.75
39 2000 2 20.92
39 2001 2 21.92
39 2002 2 22.92
41 2002 4 19.83
41 2003 4 20.92
41 2004 4 22.08
41 2005 4 22.83
end
label values PUBID_1997 vlR0000100
label def vlR0000100 1 "1 TO 999", modify
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35612
#2

13 Oct 2020, 12:20

Thanks for the data example. cmogram is from SSC, as you are asked to explain.

Your generate command generates values that are 1 or missing, so the average can only ever be reported as 1. No part of your code has an effect or side-effect of producing 0.

You presumably want something more like this:

Code:

generate Black= Race==1 if Race < . cmogram Black age_decimal, scatter cut(21) lineat(21) title("Share of Black Respondents by Age")

For more on how to generate indicators see

https://www.stata.com/support/faqs/d...mmy-variables/ [executive summary]

https://www.stata-journal.com/articl...article=dm0099 [discursive tutorial, with pitfalls highlighted]
1 like
Comment
Jason Browen

Join Date: Oct 2019

Posts: 82
#3

13 Oct 2020, 12:44

Thank you Nick! I have another question.

Using the same dataset that I provided I used the command cmogram Bla age_decimal, scatter cut(21) lineat(21) title("Share of Black Respondents by Age") fraction to get the graph below. Bla was generated using the code you provided:
generate Bla= Race==1 if Race < . But I am trying to figure out why there are a bunch of observations at 0 in the graph I included below? Do you know by chance of a way to get rid of these? And unfortunately the y-axis values are off as well. Also in the second command you used:
cmogram Black age_decimal, scatter cut(21) lineat(21) title("Share of Black Respondents by Age") the graph has the exact y-axis values that I need! The only downside is that it shows a slight increase instead of a decrease before and after the cutoff point. Thank you so much ~ Jason Browen
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35612
#4

13 Oct 2020, 13:12

Your thinking is the wrong way round, I have to suggest.

I've not used cmogram hitherto, and not looked inside the code, but it's a fair guess that it's reporting zero as an average if and only if all the values in a bin are zero. How could it be otherwise? These are not necessarily mistakes to be ignored.

Even looking at your data example, which may well be a small fraction of the total dataset, it's manifest that the fractional parts are just a consequence of the raw data being years and months.

The possible months are 0 to 11, giving fractional parts as below:

Code:

. mata : strofreal((0::11) / 12, "%3.2f") 1 +--------+ 1 | 0.00 | 2 | 0.08 | 3 | 0.17 | 4 | 0.25 | 5 | 0.33 | 6 | 0.42 | 7 | 0.50 | 8 | 0.58 | 9 | 0.67 | 10 | 0.75 | 11 | 0.83 | 12 | 0.92 | +--------+

so it's quite possible that a bin width that isn't a multiple of month may cause puzzling artefacts.

Easier to understand: In your sample dataset, there are only 8 black people out of 100, so the average is close to 0 any way, and it's hardly surprising from that some bins may report all zeros. In your fuller dataset the average appears to be closer to 2%. Two effects push against each other. A bigger dataset means more in each bin, but a smaller average makes it more likely that bins are populated only by 0s.

In a nutshell: this display isn't smoothing enough. In fact, it's hard to know why the proportion of black people should depend on age at all within such a narrow window. But there may well be interesting or revealing context here that you are not telling us.

Code:

tab age_decimal, su(black)

is evidence that will be diagnostic.

As for whether the curve is what you want or what you expect: sorry, I have zero insight on that.
Comment

Announcement

Need Help Using Share of Participants Instead of Mean Using cmogram command

Comment

Comment

Comment