Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing quintiles and medians by state using "twoway connected"

    Hi Statalist,

    I'm using Stata 12 and trying to make a graph that shows the variation in wage gaps within and across states. I want to show how many sub-state geographic regions are in each quintile (Not quartile) within each state as well as the state's median wage gap. I want each state on the y-axis and the wage gap on the x-axis, and the states to be listed from largest median wage gap to smallest median wage gap. For each state, I want a horizontal line the length of the range of wage gaps within the state, and that line should change colors for each quartile. Note: some states have less than 5 substate geographic regions, and it is okay if those states show less than 5 colors.

    The following is a sample of my data (in long format; strank shows the labels of the ranking. NM=5, CA=6, HI=7):

    strank level femwgcoefst

    NM amin -0.247207195
    NM bp20 -0.1900343
    NM cp40 -0.153927803
    NM dmedian -0.145714894
    NM ep60 -0.145714894
    NM fp80 -0.145714894
    NM gmax -0.122379899
    CA amin -0.264407486
    CA bp20 -0.160001606
    CA cp40 -0.149955705
    CA dmedian -0.146284893
    CA ep60 -0.134702295
    CA fp80 -0.134702295
    CA gmax -0.035036702
    HI amin -0.152224705
    HI bp20 -0.147228003
    HI cp40 -0.147228003
    HI dmedian -0.147228003
    HI ep60 -0.147228003
    HI fp80 -0.147228003
    HI gmax -0.092804603

    ____________________________


    Here is the code I have been using:
    Code:
    sort strank levels
    
    twoway connected strank femwgcoefst if (level=="amin" | level=="bp20"), ///
        msize(0) c(L) lcolor(purple) lwidth(thick) ///
        ylab(1(1)51,valuelabel labsize(tiny) alternate tposition(inside)) ///
        ytitle("State", size(vsmall) margin(right)) ysize(9) ///
        xlab(,labsize(tiny)) xsize(6.5)///
        xtitle("Wage Gap (Negative numbers indicate lower than expected wages)",size(vsmall)) ///
        legend(size(tiny) order(1 "0th to 20th Percentile" ///
        2 "20th to 40th Percentile" 3 "40th to 60th Percentile" ///
        4 "60th to 80th Percentile" 5 "80th to 99th Percentile" 6 "Median" ) cols(3) rows(2)) ///    
        title("Female Wage Gap Variation by State")     ///    
        || connected strank femwgcoefst if (level=="bp20" | level=="cp40"), ///
            msize(0) c(L) lcolor(blue) lwidth(thick)  ///
        || connected strank femwgcoefst if (level=="cp40" | level=="ep60"), ///
            msize(0) c(L) lcolor(green) lwidth(thick)  ///
        || connected strank femwgcoefst if (level=="ep60" | level=="fp80"), ///
            msize(0) c(L) lcolor(orange) lwidth(thick)  ///    
        || connected strank femwgcoefst if (level=="fp80" | level=="gmax"), ///
            msize(0) c(L) lcolor(red) lwidth(thick)  ///
        || scatter strank femwgcoefst if level=="dmedian", msize(small) ///
            mcolor(black)
    _______________________

    This is very close to working, but lines are sometimes drawn diagonally between adjacent states (see attachment). I have found that this only occurs when the next observation to be plotted has a more positive value for femwgcoefst. For example, looking at the first plot [twoway connected strank femwgcoefst if (level=="amin" | level=="bp20")], the line for NM does not connect to the line for CA because the row for CA-amin has a more negative value for femwgcoefst than NM-bp20 has for femwgcoefst (-0.264407486 < -0.1900343). The diagonal line problem does occur between CA and HI: the row for HI-amin has a more positive value for femwgcoefst than CA-bp20 (-0.152224705 > -0.160001606). I can't think of an if statement to eliminate this problem. Does anyone have a solution or alternatives I could use?

    Best,

    Karen M. Brummond
    Doctoral Student and Research Assistant
    University of Massachusetts - Amherst
    [email protected]
    Attached Files

  • #2
    You're already using the main trick, c(L) rather than c(l). You just need to work harder at the sort order to do this way.

    ​There is another way to do it, using graph bar and some invisible bars, and to work over(state). That will stop all connections between states.

    On a different note, recall that many readers have difficulty in distinguishing between red and green.

    Comment


    • #3
      I'll try the graph bar method, and if that doesn't work, I'll get back to you. Thank you!

      Comment


      • #4
        I'm seeing a few issues with the graph bar method.
        1) I don't see an option for overlaying another plot so I can mark the state medians with dots. I tried "||" and addplot() and the error message "option not allowed" came back.
        2) I have to make the same style of graph for some race data, and there the range of the wage gaps crosses the zero line. I can't think of how a bar graph can represent that.
        3) I'm trying to get all of the quintiles on a single line. The only option I see that can make the bars all overlap is stack, but that of course adds up the values, which is not what I want. I might be able to do this with some newly constructed variables and the stack option. I'm yet to try that.

        Any thoughts?

        Comment


        • #5
          1) I don't see an option for overlaying another plot so I can mark the state medians with dots. I tried "||" and addplot() and the error message "option not allowed" came back.

          NJC: Correct. Medians as dots are a casualty of this approach. addplot() is only an option for twoway. Conversely, graph dot could be a serious competitor for showing your quintiles directly.

          2) I have to make the same style of graph for some race data, and there the range of the wage gaps crosses the zero line. I can't think of how a bar graph can represent that.

          NJC: graph bar has no problem with negative values.

          Dopey illustration:

          Code:
          sysuse auto
          gen foo = runiform() - 0.5
          graph bar (asis) foo if foreign, over(make)
          graph hbar (asis) foo if foreign, over(make)
          3) I'm trying to get all of the quintiles on a single line. The only option I see that can make the bars all overlap is stack, but that of course adds up the values, which is not what I want. I might be able to do this with some newly constructed variables and the stack option. I'm yet to try that.

          NJC: Yes, that's right. If you choose bars, what is to be added has to be reverse engineered from what stack will produce. But see my first comment again.

          Comment

          Working...
          X