Hi Statalist,
I'm using Stata 12 and trying to make a graph that shows the variation in wage gaps within and across states. I want to show how many sub-state geographic regions are in each quintile (Not quartile) within each state as well as the state's median wage gap. I want each state on the y-axis and the wage gap on the x-axis, and the states to be listed from largest median wage gap to smallest median wage gap. For each state, I want a horizontal line the length of the range of wage gaps within the state, and that line should change colors for each quartile. Note: some states have less than 5 substate geographic regions, and it is okay if those states show less than 5 colors.
The following is a sample of my data (in long format; strank shows the labels of the ranking. NM=5, CA=6, HI=7):
strank level femwgcoefst
NM amin -0.247207195
NM bp20 -0.1900343
NM cp40 -0.153927803
NM dmedian -0.145714894
NM ep60 -0.145714894
NM fp80 -0.145714894
NM gmax -0.122379899
CA amin -0.264407486
CA bp20 -0.160001606
CA cp40 -0.149955705
CA dmedian -0.146284893
CA ep60 -0.134702295
CA fp80 -0.134702295
CA gmax -0.035036702
HI amin -0.152224705
HI bp20 -0.147228003
HI cp40 -0.147228003
HI dmedian -0.147228003
HI ep60 -0.147228003
HI fp80 -0.147228003
HI gmax -0.092804603
____________________________
Here is the code I have been using:
_______________________
This is very close to working, but lines are sometimes drawn diagonally between adjacent states (see attachment). I have found that this only occurs when the next observation to be plotted has a more positive value for femwgcoefst. For example, looking at the first plot [twoway connected strank femwgcoefst if (level=="amin" | level=="bp20")], the line for NM does not connect to the line for CA because the row for CA-amin has a more negative value for femwgcoefst than NM-bp20 has for femwgcoefst (-0.264407486 < -0.1900343). The diagonal line problem does occur between CA and HI: the row for HI-amin has a more positive value for femwgcoefst than CA-bp20 (-0.152224705 > -0.160001606). I can't think of an if statement to eliminate this problem. Does anyone have a solution or alternatives I could use?
Best,
Karen M. Brummond
Doctoral Student and Research Assistant
University of Massachusetts - Amherst
[email protected]
I'm using Stata 12 and trying to make a graph that shows the variation in wage gaps within and across states. I want to show how many sub-state geographic regions are in each quintile (Not quartile) within each state as well as the state's median wage gap. I want each state on the y-axis and the wage gap on the x-axis, and the states to be listed from largest median wage gap to smallest median wage gap. For each state, I want a horizontal line the length of the range of wage gaps within the state, and that line should change colors for each quartile. Note: some states have less than 5 substate geographic regions, and it is okay if those states show less than 5 colors.
The following is a sample of my data (in long format; strank shows the labels of the ranking. NM=5, CA=6, HI=7):
strank level femwgcoefst
NM amin -0.247207195
NM bp20 -0.1900343
NM cp40 -0.153927803
NM dmedian -0.145714894
NM ep60 -0.145714894
NM fp80 -0.145714894
NM gmax -0.122379899
CA amin -0.264407486
CA bp20 -0.160001606
CA cp40 -0.149955705
CA dmedian -0.146284893
CA ep60 -0.134702295
CA fp80 -0.134702295
CA gmax -0.035036702
HI amin -0.152224705
HI bp20 -0.147228003
HI cp40 -0.147228003
HI dmedian -0.147228003
HI ep60 -0.147228003
HI fp80 -0.147228003
HI gmax -0.092804603
____________________________
Here is the code I have been using:
Code:
sort strank levels twoway connected strank femwgcoefst if (level=="amin" | level=="bp20"), /// msize(0) c(L) lcolor(purple) lwidth(thick) /// ylab(1(1)51,valuelabel labsize(tiny) alternate tposition(inside)) /// ytitle("State", size(vsmall) margin(right)) ysize(9) /// xlab(,labsize(tiny)) xsize(6.5)/// xtitle("Wage Gap (Negative numbers indicate lower than expected wages)",size(vsmall)) /// legend(size(tiny) order(1 "0th to 20th Percentile" /// 2 "20th to 40th Percentile" 3 "40th to 60th Percentile" /// 4 "60th to 80th Percentile" 5 "80th to 99th Percentile" 6 "Median" ) cols(3) rows(2)) /// title("Female Wage Gap Variation by State") /// || connected strank femwgcoefst if (level=="bp20" | level=="cp40"), /// msize(0) c(L) lcolor(blue) lwidth(thick) /// || connected strank femwgcoefst if (level=="cp40" | level=="ep60"), /// msize(0) c(L) lcolor(green) lwidth(thick) /// || connected strank femwgcoefst if (level=="ep60" | level=="fp80"), /// msize(0) c(L) lcolor(orange) lwidth(thick) /// || connected strank femwgcoefst if (level=="fp80" | level=="gmax"), /// msize(0) c(L) lcolor(red) lwidth(thick) /// || scatter strank femwgcoefst if level=="dmedian", msize(small) /// mcolor(black)
This is very close to working, but lines are sometimes drawn diagonally between adjacent states (see attachment). I have found that this only occurs when the next observation to be plotted has a more positive value for femwgcoefst. For example, looking at the first plot [twoway connected strank femwgcoefst if (level=="amin" | level=="bp20")], the line for NM does not connect to the line for CA because the row for CA-amin has a more negative value for femwgcoefst than NM-bp20 has for femwgcoefst (-0.264407486 < -0.1900343). The diagonal line problem does occur between CA and HI: the row for HI-amin has a more positive value for femwgcoefst than CA-bp20 (-0.152224705 > -0.160001606). I can't think of an if statement to eliminate this problem. Does anyone have a solution or alternatives I could use?
Best,
Karen M. Brummond
Doctoral Student and Research Assistant
University of Massachusetts - Amherst
[email protected]
Comment