Hi Statausers,
I'm struggling with computing confidence intervals. I have a dataset where each observation is a talk where it is recorded 1) the topic of the talk (dominant_topic), 2) if the speaker is male or female (gender_main_speaker), 3) and the number of times that speaker was interrupted (inter). I want to produce a chart where in the x-axis i have the topics of the talk and in the y-axis the average difference in interruptions for each topic registered for males and females.
This is the code that I'm running to compute the intervals
And here the code in which i create the graph
I'm unsure about the way in which the confidence intervals were constructed. Note that jel_combined are just the string labels to use instead of the numeric dominant_topic.
Here you have data as example.
If anybody can help me by reassuring me that the way that I followed is correct I would appreciate it a lot. Thanks!
I'm struggling with computing confidence intervals. I have a dataset where each observation is a talk where it is recorded 1) the topic of the talk (dominant_topic), 2) if the speaker is male or female (gender_main_speaker), 3) and the number of times that speaker was interrupted (inter). I want to produce a chart where in the x-axis i have the topics of the talk and in the y-axis the average difference in interruptions for each topic registered for males and females.
This is the code that I'm running to compute the intervals
Code:
levelsof dominant_topic, local(hh7s) gen lowerbound = . gen upperbound = . foreach h of local hh7s { ci means inter if dominant_topic == `h' & gender_main_speaker == 1 replace lowerbound = r(lb) if dominant_topic == `h' & gender_main_speaker == 1 replace upperbound = r(ub) if dominant_topic == `h' & gender_main_speaker == 1 ci means inter if dominant_topic == `h' & gender_main_speaker == 0 replace lowerbound = r(lb) if dominant_topic == `h' & gender_main_speaker == 0 replace upperbound = r(ub) if dominant_topic == `h' & gender_main_speaker == 0 } collapse inter lowerbound upperbound, by(dominant_topic gender_main_speaker jel_combined) bysort dominant_topic(gender_main_speaker): gen diff = inter[_n] - inter[_n-1] bysort dominant_topic(gender_main_speaker): gen lb = lowerbound[_n] - lowerbound[_n-1] bysort dominant_topic(gender_main_speaker): gen ub = upperbound[_n] - upperbound[_n-1]
And here the code in which i create the graph
Code:
gen newvar = _n labmask newvar, values(jel_combined) twoway rcap ub lb newvar, lstyle(ci) || /// scatter diff newvar, scale(0.5) xlabel( ,valuelabel angle(45)) xlabel(#20) yline(0,lstyle(foreground) lcolor(red)) legend(off) xtitle("Identified Seminar Topics", size(medium)) ytitle("Difference in Average Interruptions (Females - Males)", size(medium)) msymbol(S) mcolor(dknavy) graphregion(color(white)) bgcolor(white)
Here you have data as example.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(dominant_topic inter gender_main_speaker) 13 17 1 10 6 0 6 17 1 13 30 1 13 13 1 13 20 0 6 8 1 13 5 0 13 18 0 10 6 1 6 7 0 13 11 0 6 18 0 10 6 0 13 6 0 13 27 0 1 7 0 9 10 1 13 27 1 0 4 1 13 29 1 1 7 1 13 19 0 13 6 0 2 7 1 13 21 1 12 8 0 3 12 0 14 13 0 3 19 0 6 23 1 3 12 1 3 13 1 3 8 1 7 13 0 14 8 0 11 15 0 3 11 0 13 2 1 13 32 0 13 7 0 13 23 1 14 21 1 8 14 1 0 6 1 8 12 0 5 13 0 8 6 1 8 8 0 6 0 0 4 0 0 13 0 0 11 0 0 4 0 0 12 0 1 13 16 1 0 10 0 10 7 1 13 10 0 10 12 0 2 13 1 13 9 0 13 17 0 13 11 0 13 6 0 13 8 0 13 23 1 13 15 0 10 4 0 13 19 1 13 14 1 13 30 0 14 21 0 10 15 0 13 17 0 1 6 0 13 30 0 13 11 0 13 29 1 6 0 1 13 18 0 13 14 0 13 22 1 12 1 1 3 1 0 4 0 1 12 0 0 12 1 0 13 13 0 13 6 0 5 30 0 13 12 0 13 11 1 0 38 0 13 13 1 13 16 0 9 9 1 12 9 1 2 1 0 11 3 0 end
If anybody can help me by reassuring me that the way that I followed is correct I would appreciate it a lot. Thanks!
Comment