Generate Box Plots of weights by Different Llphas

Yawo Kokuvi

Join Date: May 2015
Posts: 137

Generate Box Plots of weights by Different Llphas

09 Jul 2023, 17:30

Hello, I am in the process of calculating level-1 and level-2 weights for a DHS survey data, based on the procedures described in this report (The DHS Program - Multilevel Modeling Using DHS Surveys: A Framework to Approximate Level-Weights (English)).

Here is the formula/syntax for both weights:

Code:

gen wt2 = (A_h/a_c_h)*(f^alpha)
gen wt1 = d_HH/wt2

The weights depend on the level of alpha(f) (0 0.1 .25 .50 .75 0.90 1) set by the researcher. According to the report referenced above, "... high dispersion in weights is undesirable and inefficient because the results depend much more on the units with high weights than the units with low weights. As described earlier, α=0 allocates all the variation to the level-1 weight and α=1 allocates all of the variation to the level-2 weight.... As a result, the value of alpha that comes closest to replicating the true design of the survey will generally be the one that simultaneously minimizes both the dispersion of the level-1 and level-2 weights." (pg. 15).

To help the researcher, the report uses this loop to generate 7 levels-weights each, based on different levels of alphas.

Code:

* Calculating the level-weights based on different values of alpha

local alphas 0 0.1 .25 .50 .75 0.90 1
local i = 1

foreach dom of local alphas {
gen wt2_`i' = (A_h/a_c_h)*(f^`dom')
gen wt1_`i' = d_HH/wt2_`i'
local ++i
}

This loop generated a series of weight variables, which are displayed in the boxplot below (See Figure 2 Attached). The boxplots suggest that the optimal design is best approximated with the middle value, α=0.50 for the dataset used in the report (since it is the most efficient level of alpha for producing level-1 and level-2 weights for DHS data, with the least distortion).

I am trying to generate this type of boxplot to help me select the most efficient alpha for calculating weights for a different dataset/country. A scratch data for 10 cases is produced by dataex below. I also attach the full dataset, in case it is needed.

Q1: How can I generate such a Figure 2 type of boxplot?

Q2: Is there any other way, apart from visual inspection of the boxplot to determine the alpha with the least dispersion?

Thanks in advance for your assistance.
Best, Cy

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str15 caseid int(v001 v002) byte v003 float(a_c_h A_h f d_HH wt2 wt1_1 wt1_2 wt1_3 wt1_4 wt1_5 wt1_6 wt1_7 wt2_1 wt2_2 wt2_3 wt2_4 wt2_5 wt2_6 wt2_7 wt1)
""                1 3604 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 8660 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 8136 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 4138 3 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 1109 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 4138 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
"       11077  2" 1 1077 2 25 46309 1.450656e-08 .00009642838 .22310415 5.205704e-08 3.164631e-07 4.743385e-06  .0004322124 .03938275 .5902981 3.588517 1852.36 304.70657 20.329023 .22310415 .0024484925 .00016335538 .000026871374  .0004322124
"       14775  2" 1 4775 2 25 46309 1.450656e-08 .00009642838 .22310415 5.205704e-08 3.164631e-07 4.743385e-06  .0004322124 .03938275 .5902981 3.588517 1852.36 304.70657 20.329023 .22310415 .0024484925 .00016335538 .000026871374  .0004322124
"       18660  4" 1 8660 4 25 46309 1.450656e-08 .00009642838 .22310415 5.205704e-08 3.164631e-07 4.743385e-06  .0004322124 .03938275 .5902981 3.588517 1852.36 304.70657 20.329023 .22310415 .0024484925 .00016335538 .000026871374  .0004322124
""                1 1985 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
"       16317  2" 1 6317 2 25 46309 1.450656e-08 .00009642838 .22310415 5.205704e-08 3.164631e-07 4.743385e-06  .0004322124 .03938275 .5902981 3.588517 1852.36 304.70657 20.329023 .22310415 .0024484925 .00016335538 .000026871374  .0004322124
end

------------------ copy up to and including the previous line ------------------

Attached Files

Burundu2.dta (2.72 MB, 1 view)

Last edited by Yawo Kokuvi; 09 Jul 2023, 18:17. Reason: Clarify content and change title

Tags: None

Yawo Kokuvi

Join Date: May 2015

Posts: 137
#2

10 Jul 2023, 13:32

Hello: I got this code to produce the boxplots desired.

Now, I am wondering if it is feasible to change the x-axis labels (wt1_1, wt1_2, etc) to the 7 alpha levels: 0 0.1 .25 .50 .75 0.90 1?.

Secondly, how can I tweak the code so that I have both plots side by side or on-top of each other?

Thanks - cY

Code:

graph box wt1_1 wt1_2 wt1_3 wt1_4 wt1_5 wt1_6 wt1_7, showyvars legend(off) /// title("Boxplots for level-weights based on different values of α") /// ytitle("Level-1 Weight") graph box wt2_1 wt2_2 wt2_3 wt2_4 wt2_5 wt2_6 wt2_7, showyvars legend(off) /// title("Boxplots for level-weights based on different values of α") /// ytitle("Level-2 Weight")
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10282

10 Jul 2023, 14:04

You need to reshape your data. Your data example in #1 based on your code does not yield anything useful, but here is how you would do it.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str15 caseid int(v001 v002) byte v003 float(a_c_h A_h f d_HH wt2 wt1_1 wt1_2 wt1_3 wt1_4 wt1_5 wt1_6 wt1_7 wt2_1 wt2_2 wt2_3 wt2_4 wt2_5 wt2_6 wt2_7 wt1)
""                1 3604 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 8660 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 8136 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 4138 3 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 1109 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
""                1 4138 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
"       11077  2" 1 1077 2 25 46309 1.450656e-08 .00009642838 .22310415 5.205704e-08 3.164631e-07 4.743385e-06  .0004322124 .03938275 .5902981 3.588517 1852.36 304.70657 20.329023 .22310415 .0024484925 .00016335538 .000026871374  .0004322124
"       14775  2" 1 4775 2 25 46309 1.450656e-08 .00009642838 .22310415 5.205704e-08 3.164631e-07 4.743385e-06  .0004322124 .03938275 .5902981 3.588517 1852.36 304.70657 20.329023 .22310415 .0024484925 .00016335538 .000026871374  .0004322124
"       18660  4" 1 8660 4 25 46309 1.450656e-08 .00009642838 .22310415 5.205704e-08 3.164631e-07 4.743385e-06  .0004322124 .03938275 .5902981 3.588517 1852.36 304.70657 20.329023 .22310415 .0024484925 .00016335538 .000026871374  .0004322124
""                1 1985 1 25 46309 6.042385e-09 .00004016509 .14398909  2.16832e-08 1.438807e-07 2.459355e-06 .00027894537 .03163859 .5407989 3.588517 1852.36 279.15552 16.331553 .14398909  .001269497 .00007426992 .000011192672 .00027894537
"       16317  2" 1 6317 2 25 46309 1.450656e-08 .00009642838 .22310415 5.205704e-08 3.164631e-07 4.743385e-06  .0004322124 .03938275 .5902981 3.588517 1852.36 304.70657 20.329023 .22310415 .0024484925 .00016335538 .000026871374  .0004322124
end

rename wt*_* wt*[2]_*[1]
gen long id=_n
reshape long wt1_ wt2_ wt3_ wt4_ wt5_ wt6_ wt7_, i(id) j(which)
rename wt*_ wt_*
reshape long wt_, i(id which) j(cat)
set scheme s1mono
graph box wt_, over(cat, relabel(1 "0" 2 ".1" 3 ".25" 4 ".5" 5 ".7" 6 ".9" 7 "1")) ///
by(which, note("") leg(off) title("Boxplots for level-weights based on different values of α")) ///
ytitle(Something informative)

Last edited by Andrew Musau; 10 Jul 2023, 14:10.

Comment

Yawo Kokuvi

Join Date: May 2015

Posts: 137
#4

10 Jul 2023, 14:17

Wao Andrew - this is so helpful. thanks so much... Another question, beyond using "Summarize" command, is there any other way to meaningfully compare the dispersion of the various variables to help arrive at a reasonable alpha (the one with the least dispersion for wt1 and wt2?

cheers, Cy
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10282
#5

10 Jul 2023, 14:28

I do not know. It appears to me from

The weights depend on the level of alpha(f) (0 0.1 .25 .50 .75 0.90 1) set by the researcher. According to the report referenced above, "... high dispersion in weights is undesirable and inefficient because the results depend much more on the units with high weights than the units with low weights. As described earlier, α=0 allocates all the variation to the level-1 weight and α=1 allocates all of the variation to the level-2 weight.... As a result, the value of alpha that comes closest to replicating the true design of the survey will generally be the one that simultaneously minimizes both the dispersion of the level-1 and level-2 weights." (pg. 15).

that you should be asking people who work with survey data. Maybe start a new thread and title the question in a way that will attract such people.
Comment
Yawo Kokuvi

Join Date: May 2015

Posts: 137
#6

10 Jul 2023, 14:30

Tusen tak ... much appreciated :-)
Comment

Announcement

Generate Box Plots of weights by Different Llphas

Comment

Comment

Comment

Comment

Comment