Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need Help Using Share of Participants Instead of Mean Using cmogram command

    Dear all,

    I am using Stata 16, on mac and have provided a sample of my dataset using dataex at the bottom of the page. The problem I am having is that I would like to create a regression discontinuity graph using the cmogram command with the share of black respondents on the y axis and on the age in decimal variable on the x axis which represents a person's age in months/30. But every time I write out the code it puts the mean on the y-axis instead of how many respondents were black out of the total number of participants. Here is the code I used:

    generate Black=1 if Race==1
    cmogram Black age_decimal, scatter cut(21) lineat(21) title("Share of Black Respondents by Age"). I am not sure how to solve this issue.

    Thank you in advance,
    Jason Browen

    input int(PUBID_1997 year) float(Race age_decimal)
    1 2000 4 19.25
    1 2001 4 20.25
    1 2002 4 21.33
    1 2003 4 22.17
    4 1999 2 19
    4 2000 2 19.83
    4 2001 2 20.92
    4 2002 2 21.83
    4 2003 2 23
    5 2001 2 19.5
    5 2002 2 20.25
    5 2003 2 21.25
    5 2004 2 22.17
    6 2000 2 19.08
    6 2001 2 20.17
    6 2002 2 20.92
    6 2003 2 22.17
    9 2001 4 19.33
    9 2002 4 20.33
    9 2003 4 21.17
    9 2004 4 22.17
    10 2003 4 19.75
    10 2004 4 20.75
    10 2005 4 21.75
    10 2006 4 22.67
    11 2001 2 19.58
    11 2002 2 20.58
    11 2003 2 21.42
    11 2004 2 22.75
    13 2003 2 19
    13 2004 2 20
    13 2005 2 21.08
    13 2006 2 22.08
    16 2001 2 20.08
    16 2002 2 20.92
    16 2003 2 21.75
    16 2004 2 22.75
    21 2001 2 19.42
    21 2002 2 20.5
    21 2003 2 21.25
    21 2004 2 22.25
    23 2002 2 20
    23 2003 2 21
    23 2004 2 21.92
    23 2005 2 22.83
    24 2003 2 19.5
    24 2004 2 20.75
    24 2005 2 21.5
    24 2006 2 22.67
    25 2002 2 19.67
    25 2003 2 21.17
    25 2004 2 21.75
    25 2005 2 22.67
    26 1999 1 19.25
    26 2000 1 20.5
    26 2001 1 21.17
    26 2002 1 22.42
    28 2002 1 19.08
    28 2003 1 20.08
    28 2004 1 21.08
    28 2005 1 22.08
    31 2001 4 19.5
    31 2002 4 20.42
    31 2003 4 21.5
    31 2004 4 22.33
    33 1999 4 19
    33 2000 4 20
    33 2001 4 21.08
    33 2002 4 22.08
    33 2003 4 22.92
    34 2003 4 19.08
    34 2004 4 19.92
    34 2005 4 20.92
    34 2006 4 21.92
    34 2007 4 22.83
    35 2002 4 19.92
    35 2003 4 20.83
    35 2004 4 22
    35 2005 4 22.83
    36 1999 4 19.5
    36 2000 4 20.42
    36 2001 4 21.42
    36 2002 4 22.42
    37 1999 4 19.25
    37 2000 4 20.25
    37 2001 4 21.08
    37 2002 4 22.17
    38 2000 4 19.42
    38 2001 4 20.25
    38 2002 4 21.33
    38 2003 4 22.42
    39 1998 2 19.08
    39 1999 2 19.75
    39 2000 2 20.92
    39 2001 2 21.92
    39 2002 2 22.92
    41 2002 4 19.83
    41 2003 4 20.92
    41 2004 4 22.08
    41 2005 4 22.83
    end
    label values PUBID_1997 vlR0000100
    label def vlR0000100 1 "1 TO 999", modify




  • #2
    Thanks for the data example. cmogram is from SSC, as you are asked to explain.

    Your generate command generates values that are 1 or missing, so the average can only ever be reported as 1. No part of your code has an effect or side-effect of producing 0.


    You presumably want something more like this:

    Code:
    generate Black= Race==1 if Race < . 
    cmogram Black age_decimal, scatter cut(21) lineat(21) title("Share of Black Respondents by Age")

    For more on how to generate indicators see

    https://www.stata.com/support/faqs/d...mmy-variables/ [executive summary]

    https://www.stata-journal.com/articl...article=dm0099 [discursive tutorial, with pitfalls highlighted]

    Comment


    • #3
      Thank you Nick! I have another question.

      Using the same dataset that I provided I used the command cmogram Bla age_decimal, scatter cut(21) lineat(21) title("Share of Black Respondents by Age") fraction to get the graph below. Bla was generated using the code you provided:
      generate Bla= Race==1 if Race < . But I am trying to figure out why there are a bunch of observations at 0 in the graph I included below? Do you know by chance of a way to get rid of these? And unfortunately the y-axis values are off as well. Also in the second command you used:
      cmogram Black age_decimal, scatter cut(21) lineat(21) title("Share of Black Respondents by Age") the graph has the exact y-axis values that I need! The only downside is that it shows a slight increase instead of a decrease before and after the cutoff point. Thank you so much ~ Jason Browen

      Click image for larger version

Name:	Screen Shot 2020-10-13 at 11.33.28 AM.png
Views:	1
Size:	56.3 KB
ID:	1577014

      Comment


      • #4
        Your thinking is the wrong way round, I have to suggest.

        I've not used cmogram hitherto, and not looked inside the code, but it's a fair guess that it's reporting zero as an average if and only if all the values in a bin are zero. How could it be otherwise? These are not necessarily mistakes to be ignored.

        Even looking at your data example, which may well be a small fraction of the total dataset, it's manifest that the fractional parts are just a consequence of the raw data being years and months.

        The possible months are 0 to 11, giving fractional parts as below:

        Code:
        . mata : strofreal((0::11) / 12, "%3.2f")
                   1
             +--------+
           1 |  0.00  |
           2 |  0.08  |
           3 |  0.17  |
           4 |  0.25  |
           5 |  0.33  |
           6 |  0.42  |
           7 |  0.50  |
           8 |  0.58  |
           9 |  0.67  |
          10 |  0.75  |
          11 |  0.83  |
          12 |  0.92  |
             +--------+
        so it's quite possible that a bin width that isn't a multiple of month may cause puzzling artefacts.

        Easier to understand: In your sample dataset, there are only 8 black people out of 100, so the average is close to 0 any way, and it's hardly surprising from that some bins may report all zeros. In your fuller dataset the average appears to be closer to 2%. Two effects push against each other. A bigger dataset means more in each bin, but a smaller average makes it more likely that bins are populated only by 0s.

        In a nutshell: this display isn't smoothing enough. In fact, it's hard to know why the proportion of black people should depend on age at all within such a narrow window. But there may well be interesting or revealing context here that you are not telling us.

        Code:
        tab age_decimal, su(black)
        is evidence that will be diagnostic.

        As for whether the curve is what you want or what you expect: sorry, I have zero insight on that.

        Comment

        Working...
        X