Bootstrapping more descriptive of more than one variable in extreme groups of a sample - how to code/program in a smart way?

Troels Kristensen

Join Date: Oct 2021

Posts: 23
#1

Bootstrapping more descriptive of more than one variable in extreme groups of a sample - how to code/program in a smart way?

28 Oct 2021, 06:44

Dear members
I am trying hard to code a program that than boostrap descriptives (mean and CV) in ekstreme groups of a sample of GPs, se below. I want to be able to include more than one descriptive variable in each of these extreme groups in the program in the same boostrap-procedure. The draft-program below is able to to include to variables (auto dataset), calculate means and CVs in each of the groups for price and weight from the data set - and be included in the boostrap command. However, the program in not very smart and when I change the order of the variables from "myprogram price weight" to "myprogram weight price" there is a problem. I am new to this kind of programming. Hope someone can help?
All the best
Troels

*nyeste version 101021
clear *
cls

set seed 17

sysuse auto
cap progam drop myprog
program define myprog, rclass
version 17
args v y
marksample touse

* tempnames of scalar are defined
tempname p10 p10_mean p10_sd p10_cv ///
p90 p90_mean p90_sd p90_cv ///
yp10_mean yp10_sd yp10_cv ///
yp90_mean yp90_sd yp90_cv ///

qui summ `v' if `touse', detail
scalar `p10' = r(p10)
scalar `p90' = r(p90)
*return scalar N = r(N)
return scalar p10 = `p10'
return scalar p90 = `p90'

qui summ `v' if `touse' & inrange(`v', ., `p10'), detail
scalar `p10_mean' = r(mean)
scalar `p10_sd' = r(sd)
scalar `p10_cv' = `p10_sd' / `p10_mean'
*return scalar p10_N = r(N)
return scalar p10_mean = `p10_mean'
return scalar p10_sd = `p10_sd'
return scalar p10_cv = `p10_cv'

qui summ `v' if `touse' & inrange(`v', `p90', .), detail
scalar `p90_mean' = r(mean)
scalar `p90_sd' = r(sd)
scalar `p90_cv' = `p90_sd' / `p90_mean'
*return scalar p90_N = r(N)
return scalar p90_mean = `p90_mean'
return scalar p90_sd = `p90_sd'
return scalar p90_cv = `p90_cv'

*****
qui summ `y' if `touse' & inrange(`y', ., `p10'), detail
scalar `yp10_mean' = r(mean)
scalar `yp10_sd' = r(sd)
scalar `yp10_cv' = `yp10_sd' / `yp10_mean'
*return scalar yp10_N = r(N)
return scalar yp10_mean = `yp10_mean'
return scalar yp10_sd = `yp10_sd'
return scalar yp10_cv = `yp10_cv'

qui summ `y' if `touse' & inrange(`y', `p90', .), detail
scalar `yp90_mean' = r(mean)
scalar `yp90_sd' = r(sd)
scalar `yp90_cv' = `yp90_sd' / `yp90_mean'
*return scalar yp90_N = r(N)
return scalar yp90_mean = `yp90_mean'
return scalar yp90_sd = `yp90_sd'
return scalar yp90_cv = `yp90_cv'

end

myprog price weight
ret list

bootstrap p10_mean=r(p10_mean) p90_mean=r(p90_mean) p10_cv=r(p10_cv) p90_cv=r(p90_cv) yp10_mean=r(yp10_mean) yp10_cv=r(yp10_cv), reps(1000):myprog price weight
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3152
#2

28 Oct 2021, 11:24

You are using p10 in -inrange- for both v and y, but p10 is based on v. When you switch the order, the are no values in y that comport with p10 based on v.
Comment

George Ford

Join Date: Aug 2014
Posts: 3152

28 Oct 2021, 11:25

Code:

*nyeste version 101021
clear *
cls

set seed 17

sysuse auto
cap progam drop myprog
program define myprog, rclass
version 17
args v y
marksample touse

* tempnames of scalar are defined
tempname p10 p10_mean p10_sd p10_cv ///
p90 p90_mean p90_sd p90_cv ///
z10 z90 ///
yp10_mean yp10_sd yp10_cv ///
yp90_mean yp90_sd yp90_cv 

qui summ `v' if `touse', detail
scalar `p10' = r(p10)
scalar `p90' = r(p90)
*return scalar N = r(N)
return scalar p10 = `p10'
return scalar p90 = `p90'

qui summ `v' if `touse' & inrange(`v', ., `p10'), detail
scalar `p10_mean' = r(mean)
scalar `p10_sd' = r(sd)
scalar `p10_cv' = `p10_sd' / `p10_mean'
*return scalar p10_N = r(N)
return scalar p10_mean = `p10_mean'
return scalar p10_sd = `p10_sd'
return scalar p10_cv = `p10_cv'

qui summ `v' if `touse' & inrange(`v', `p90', .), detail
scalar `p90_mean' = r(mean)
scalar `p90_sd' = r(sd)
scalar `p90_cv' = `p90_sd' / `p90_mean'
*return scalar p90_N = r(N)
return scalar p90_mean = `p90_mean'
return scalar p90_sd = `p90_sd'
return scalar p90_cv = `p90_cv'

*****

qui summ `y' if `touse', detail
scalar `z10' = r(p10)
scalar `z90' = r(p90)
*return scalar N = r(N)
return scalar z10 = `z10'
return scalar z90 = `z90'

qui summ `y' if `touse' & inrange(`y', ., `z10'), detail
scalar `yp10_mean' = r(mean)
scalar `yp10_sd' = r(sd)
scalar `yp10_cv' = `yp10_sd' / `yp10_mean'
*return scalar yp10_N = r(N)
return scalar yp10_mean = `yp10_mean'
return scalar yp10_sd = `yp10_sd'
return scalar yp10_cv = `yp10_cv'

qui summ `y' if `touse' & inrange(`y', `z90', .), detail
scalar `yp90_mean' = r(mean)
scalar `yp90_sd' = r(sd)
scalar `yp90_cv' = `yp90_sd' / `yp90_mean'
*return scalar yp90_N = r(N)
return scalar yp90_mean = `yp90_mean'
return scalar yp90_sd = `yp90_sd'
return scalar yp90_cv = `yp90_cv'

end

myprog price weight
ret list
bootstrap p10_mean=r(p10_mean) p90_mean=r(p90_mean) p10_cv=r(p10_cv) p90_cv=r(p90_cv) yp10_mean=r(yp10_mean) yp10_cv=r(yp10_cv), reps(1000):myprog price weight

myprog weight price
ret list
bootstrap p10_mean=r(p10_mean) p90_mean=r(p90_mean) p10_cv=r(p10_cv) p90_cv=r(p90_cv) yp10_mean=r(yp10_mean) yp10_cv=r(yp10_cv), reps(1000):myprog weight price

Comment

Troels Kristensen

Join Date: Oct 2021

Posts: 23
#4

28 Oct 2021, 11:58

Thanks George. Well identified. This is helpful :-) I think the descriptives for the extreme groups makes sense now for the 2 variables within the same boostrap-procedure - as required. I is a good starting point. I wonder how many variables I can plug into the bootstrap procedure (may be there is a max)? At the moment I'm using 6 but I have far more than two variables that I want to use to describe the extreme groups. In case there is a max: Is it possible to split on two identical bootstrap procedures via similar seed numbers or in other ways?

Do you have any suggestions to make the program smarter via smart code such as loops or different code? So far, It works for two variables - but due to my beginner skills I'm sure it can be improved :-)
Best
Troels
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#5

28 Oct 2021, 12:14

Is there a reason to do them jointly? If not, just do one-by-one. You could set up a loop to go through a list, of course, and spit out the result on each turn, either within the program or enclose the program in a loop.
Comment
Troels Kristensen

Join Date: Oct 2021

Posts: 23
#6

28 Oct 2021, 13:29

From at statistical point of view I think it is far more meaningful to bootstrap the entire sample in one operation and describe the results for the group above p90 and under p10 based on one bootstrap procedure. This makes the results more coherent - the bootstrapped descriptives are all based on the same resampling. If we decide to run a bootstrap procedure per variable we will basis the decription of the extreme groups on many different set of samples rather than one common procedure. This will influence results. What do you thing reviewers would think?

I started out bootstrapping the extreme groups of the sample in the top (n=39) and bottom (n=46) but realised that it is better to use all the data and then split on on top and bottom extreme groups including the descriptives we are dealing with.

I think it is worth discussing the alternative approaches and what they mean for the picture of extreme group within descriptives and related variation/uncertainty. Theoretical point of view or similar are very velcome.
Best
troels
1 like
Comment
Troels Kristensen

Join Date: Oct 2021

Posts: 23
#7

29 Oct 2021, 03:05

Hi George
The program works now, see results below. It is possible to run the bootstrap procedure with more variables and I also incluced more descriptives such as the variation index and difference between group means , which also were bootstrapped. I still think the program can be programmed in a smarter way - hints are well come.
Best Troels

------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
a10_mean | 33.3 .5282989 63.03 0.000 32.26455 34.33545
a90_mean | 46.66667 1.094345 42.64 0.000 44.52179 48.81154
a10_cv | .0318123 .0086289 3.69 0.000 .0149001 .0487245
a90_cv | .0428571 .0117904 3.63 0.000 .0197484 .0659659
aVI | 1.401401 .0384784 36.42 0.000 1.325985 1.476818
aDI | 13.36667 1.190808 11.22 0.000 11.03273 15.70061
b10_mean | 151.875 3.288068 46.19 0.000 145.4305 158.3195
b90_mean | 222.5556 2.822006 78.86 0.000 217.0245 228.0866
b10_cv | .0349189 .0101406 3.44 0.001 .0150436 .0547941
b90_cv | .0237879 .0073562 3.23 0.001 .00937 .0382058
bVI | 1.465386 .0344677 42.51 0.000 1.397831 1.532942
bDI | 70.68056 4.122467 17.15 0.000 62.60067 78.76044
p10_mean | 3665.75 117.4297 31.22 0.000 3435.592 3895.908
p90_mean | 13166.63 945.1757 13.93 0.000 11314.11 15019.14
p10_cv | .0648969 .0161023 4.03 0.000 .033337 .0964569
p90_cv | .1180804 .0442722 2.67 0.008 .0313084 .2048523
pVI | 3.591796 .2723139 13.19 0.000 3.05807 4.125521
pDI | 9500.875 942.2435 10.08 0.000 7654.112 11347.64
yp10_mean | 1888.75 58.76332 32.14 0.000 1773.576 2003.924
yp90_mean | 4313.75 143.292 30.10 0.000 4032.903 4594.597
yp10_cv | .0541741 .0134171 4.04 0.000 .027877 .0804711
yp90_cv | .071181 .017224 4.13 0.000 .0374225 .1049395
ypVI | 2.283918 .0999303 22.86 0.000 2.088058 2.479778
ypDI | 2425 152.6341 15.89 0.000 2125.843 2724.157
zp10_mean | 13.5 .6031448 22.38 0.000 12.31786 14.68214
zp90_mean | 33.125 2.091099 15.84 0.000 29.02652 37.22348
zp10_cv | .0685793 .0239219 2.87 0.004 .0216933 .1154652
zp90_cv | .1201467 .0324602 3.70 0.000 .0565259 .1837675
zpVI | 2.453704 .1769203 13.87 0.000 2.106946 2.800461
zpDI | 19.625 2.153201 9.11 0.000 15.4048 23.8452
------------------------------------------------------------------------------
Comment
Troels Kristensen

Join Date: Oct 2021

Posts: 23
#8

29 Oct 2021, 06:11

I am trying to extend this program to loop through a and next other args b c d e f. However, it seems difficult to find a foreach command or similar which works within args?

*version 291021
clear *
cls

set seed 17

sysuse auto
cap progam drop myprog
program define myprog, rclass
version 17
* b c d e f g
args a
marksample touse

* tempnames of scalar are defined
tempname a10 a10_mean a10_sd a10_cv ///
a90 a90_mean a90_sd a90_cv aVI aDI ///

*foreach x in a {
*foreach x varlist a {
*foreach x in `varlist' {
*foreach x of local varlist {
*qui summ `x' if `touse', detail
*scalar `a50' = r(p50)
*return scalar a50 = `a50'
*}

qui summ `a' if `touse', detail
scalar `a10' = r(p10)
scalar `a90' = r(p90)
*return scalar N = r(N)
return scalar a10 = `a10'
return scalar a90 = `a90'

qui summ `a' if `touse' & inrange(`a', ., `a10'), detail
scalar `a10_mean' = r(mean)
scalar `a10_sd' = r(sd)
scalar `a10_cv' = `a10_sd' / `a10_mean'
*return scalar a10_N = r(N)
return scalar a10_mean = `a10_mean'
return scalar a10_sd = `a10_sd'
return scalar a10_cv = `a10_cv'

qui summ `a' if `touse' & inrange(`a', `a90', .), detail
scalar `a90_mean' = r(mean)
scalar `a90_sd' = r(sd)
scalar `a90_cv' = `a90_sd' / `a90_mean'
*return scalar a90_N = r(N)
return scalar a90_mean = `a90_mean'
return scalar a90_sd = `a90_sd'
return scalar a90_cv = `a90_cv'

* VI DI calculated
scalar `aVI' = `a90_mean' / `a10_mean'
scalar `aDI' = `a90_mean' - `a10_mean'
return scalar aVI = `aVI'
return scalar aDI = `aDI'
end

*foreign gear_ratio displacement turn trunk headroom
myprog price
ret list

bootstrap a10_mean=r(a10_mean) a90_mean=r(a90_mean) a10_cv=r(a10_cv) a90_cv=r(a90_cv) aVI=r(aVI) aDI=r(aDI) , reps(1000): myprog price
stop
Comment

Announcement

Bootstrapping more descriptive of more than one variable in extreme groups of a sample - how to code/program in a smart way?

Comment

Comment

Comment

Comment

Comment

Comment

Comment