Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping more descriptive of more than one variable in extreme groups of a sample - how to code/program in a smart way?

    Dear members
    I am trying hard to code a program that than boostrap descriptives (mean and CV) in ekstreme groups of a sample of GPs, se below. I want to be able to include more than one descriptive variable in each of these extreme groups in the program in the same boostrap-procedure. The draft-program below is able to to include to variables (auto dataset), calculate means and CVs in each of the groups for price and weight from the data set - and be included in the boostrap command. However, the program in not very smart and when I change the order of the variables from "myprogram price weight" to "myprogram weight price" there is a problem. I am new to this kind of programming. Hope someone can help?
    All the best
    Troels


    *nyeste version 101021
    clear *
    cls

    set seed 17

    sysuse auto
    cap progam drop myprog
    program define myprog, rclass
    version 17
    args v y
    marksample touse

    * tempnames of scalar are defined
    tempname p10 p10_mean p10_sd p10_cv ///
    p90 p90_mean p90_sd p90_cv ///
    yp10_mean yp10_sd yp10_cv ///
    yp90_mean yp90_sd yp90_cv ///

    qui summ `v' if `touse', detail
    scalar `p10' = r(p10)
    scalar `p90' = r(p90)
    *return scalar N = r(N)
    return scalar p10 = `p10'
    return scalar p90 = `p90'

    qui summ `v' if `touse' & inrange(`v', ., `p10'), detail
    scalar `p10_mean' = r(mean)
    scalar `p10_sd' = r(sd)
    scalar `p10_cv' = `p10_sd' / `p10_mean'
    *return scalar p10_N = r(N)
    return scalar p10_mean = `p10_mean'
    return scalar p10_sd = `p10_sd'
    return scalar p10_cv = `p10_cv'

    qui summ `v' if `touse' & inrange(`v', `p90', .), detail
    scalar `p90_mean' = r(mean)
    scalar `p90_sd' = r(sd)
    scalar `p90_cv' = `p90_sd' / `p90_mean'
    *return scalar p90_N = r(N)
    return scalar p90_mean = `p90_mean'
    return scalar p90_sd = `p90_sd'
    return scalar p90_cv = `p90_cv'

    *****
    qui summ `y' if `touse' & inrange(`y', ., `p10'), detail
    scalar `yp10_mean' = r(mean)
    scalar `yp10_sd' = r(sd)
    scalar `yp10_cv' = `yp10_sd' / `yp10_mean'
    *return scalar yp10_N = r(N)
    return scalar yp10_mean = `yp10_mean'
    return scalar yp10_sd = `yp10_sd'
    return scalar yp10_cv = `yp10_cv'

    qui summ `y' if `touse' & inrange(`y', `p90', .), detail
    scalar `yp90_mean' = r(mean)
    scalar `yp90_sd' = r(sd)
    scalar `yp90_cv' = `yp90_sd' / `yp90_mean'
    *return scalar yp90_N = r(N)
    return scalar yp90_mean = `yp90_mean'
    return scalar yp90_sd = `yp90_sd'
    return scalar yp90_cv = `yp90_cv'

    end

    myprog price weight
    ret list

    bootstrap p10_mean=r(p10_mean) p90_mean=r(p90_mean) p10_cv=r(p10_cv) p90_cv=r(p90_cv) yp10_mean=r(yp10_mean) yp10_cv=r(yp10_cv), reps(1000):myprog price weight

  • #2
    You are using p10 in -inrange- for both v and y, but p10 is based on v. When you switch the order, the are no values in y that comport with p10 based on v.

    Comment


    • #3
      Code:
      *nyeste version 101021
      clear *
      cls
      
      set seed 17
      
      sysuse auto
      cap progam drop myprog
      program define myprog, rclass
      version 17
      args v y
      marksample touse
      
      * tempnames of scalar are defined
      tempname p10 p10_mean p10_sd p10_cv ///
      p90 p90_mean p90_sd p90_cv ///
      z10 z90 ///
      yp10_mean yp10_sd yp10_cv ///
      yp90_mean yp90_sd yp90_cv 
      
      qui summ `v' if `touse', detail
      scalar `p10' = r(p10)
      scalar `p90' = r(p90)
      *return scalar N = r(N)
      return scalar p10 = `p10'
      return scalar p90 = `p90'
      
      qui summ `v' if `touse' & inrange(`v', ., `p10'), detail
      scalar `p10_mean' = r(mean)
      scalar `p10_sd' = r(sd)
      scalar `p10_cv' = `p10_sd' / `p10_mean'
      *return scalar p10_N = r(N)
      return scalar p10_mean = `p10_mean'
      return scalar p10_sd = `p10_sd'
      return scalar p10_cv = `p10_cv'
      
      qui summ `v' if `touse' & inrange(`v', `p90', .), detail
      scalar `p90_mean' = r(mean)
      scalar `p90_sd' = r(sd)
      scalar `p90_cv' = `p90_sd' / `p90_mean'
      *return scalar p90_N = r(N)
      return scalar p90_mean = `p90_mean'
      return scalar p90_sd = `p90_sd'
      return scalar p90_cv = `p90_cv'
      
      *****
      
      qui summ `y' if `touse', detail
      scalar `z10' = r(p10)
      scalar `z90' = r(p90)
      *return scalar N = r(N)
      return scalar z10 = `z10'
      return scalar z90 = `z90'
      
      qui summ `y' if `touse' & inrange(`y', ., `z10'), detail
      scalar `yp10_mean' = r(mean)
      scalar `yp10_sd' = r(sd)
      scalar `yp10_cv' = `yp10_sd' / `yp10_mean'
      *return scalar yp10_N = r(N)
      return scalar yp10_mean = `yp10_mean'
      return scalar yp10_sd = `yp10_sd'
      return scalar yp10_cv = `yp10_cv'
      
      qui summ `y' if `touse' & inrange(`y', `z90', .), detail
      scalar `yp90_mean' = r(mean)
      scalar `yp90_sd' = r(sd)
      scalar `yp90_cv' = `yp90_sd' / `yp90_mean'
      *return scalar yp90_N = r(N)
      return scalar yp90_mean = `yp90_mean'
      return scalar yp90_sd = `yp90_sd'
      return scalar yp90_cv = `yp90_cv'
      
      end
      
      myprog price weight
      ret list
      bootstrap p10_mean=r(p10_mean) p90_mean=r(p90_mean) p10_cv=r(p10_cv) p90_cv=r(p90_cv) yp10_mean=r(yp10_mean) yp10_cv=r(yp10_cv), reps(1000):myprog price weight
      
      myprog weight price
      ret list
      bootstrap p10_mean=r(p10_mean) p90_mean=r(p90_mean) p10_cv=r(p10_cv) p90_cv=r(p90_cv) yp10_mean=r(yp10_mean) yp10_cv=r(yp10_cv), reps(1000):myprog weight price

      Comment


      • #4
        Thanks George. Well identified. This is helpful :-) I think the descriptives for the extreme groups makes sense now for the 2 variables within the same boostrap-procedure - as required. I is a good starting point. I wonder how many variables I can plug into the bootstrap procedure (may be there is a max)? At the moment I'm using 6 but I have far more than two variables that I want to use to describe the extreme groups. In case there is a max: Is it possible to split on two identical bootstrap procedures via similar seed numbers or in other ways?

        Do you have any suggestions to make the program smarter via smart code such as loops or different code? So far, It works for two variables - but due to my beginner skills I'm sure it can be improved :-)
        Best
        Troels

        Comment


        • #5
          Is there a reason to do them jointly? If not, just do one-by-one. You could set up a loop to go through a list, of course, and spit out the result on each turn, either within the program or enclose the program in a loop.

          Comment


          • #6
            From at statistical point of view I think it is far more meaningful to bootstrap the entire sample in one operation and describe the results for the group above p90 and under p10 based on one bootstrap procedure. This makes the results more coherent - the bootstrapped descriptives are all based on the same resampling. If we decide to run a bootstrap procedure per variable we will basis the decription of the extreme groups on many different set of samples rather than one common procedure. This will influence results. What do you thing reviewers would think?

            I started out bootstrapping the extreme groups of the sample in the top (n=39) and bottom (n=46) but realised that it is better to use all the data and then split on on top and bottom extreme groups including the descriptives we are dealing with.

            I think it is worth discussing the alternative approaches and what they mean for the picture of extreme group within descriptives and related variation/uncertainty. Theoretical point of view or similar are very velcome.
            Best
            troels

            Comment


            • #7
              Hi George
              The program works now, see results below. It is possible to run the bootstrap procedure with more variables and I also incluced more descriptives such as the variation index and difference between group means , which also were bootstrapped. I still think the program can be programmed in a smarter way - hints are well come.
              Best Troels

              ------------------------------------------------------------------------------
              | Observed Bootstrap Normal-based
              | coefficient std. err. z P>|z| [95% conf. interval]
              -------------+----------------------------------------------------------------
              a10_mean | 33.3 .5282989 63.03 0.000 32.26455 34.33545
              a90_mean | 46.66667 1.094345 42.64 0.000 44.52179 48.81154
              a10_cv | .0318123 .0086289 3.69 0.000 .0149001 .0487245
              a90_cv | .0428571 .0117904 3.63 0.000 .0197484 .0659659
              aVI | 1.401401 .0384784 36.42 0.000 1.325985 1.476818
              aDI | 13.36667 1.190808 11.22 0.000 11.03273 15.70061
              b10_mean | 151.875 3.288068 46.19 0.000 145.4305 158.3195
              b90_mean | 222.5556 2.822006 78.86 0.000 217.0245 228.0866
              b10_cv | .0349189 .0101406 3.44 0.001 .0150436 .0547941
              b90_cv | .0237879 .0073562 3.23 0.001 .00937 .0382058
              bVI | 1.465386 .0344677 42.51 0.000 1.397831 1.532942
              bDI | 70.68056 4.122467 17.15 0.000 62.60067 78.76044
              p10_mean | 3665.75 117.4297 31.22 0.000 3435.592 3895.908
              p90_mean | 13166.63 945.1757 13.93 0.000 11314.11 15019.14
              p10_cv | .0648969 .0161023 4.03 0.000 .033337 .0964569
              p90_cv | .1180804 .0442722 2.67 0.008 .0313084 .2048523
              pVI | 3.591796 .2723139 13.19 0.000 3.05807 4.125521
              pDI | 9500.875 942.2435 10.08 0.000 7654.112 11347.64
              yp10_mean | 1888.75 58.76332 32.14 0.000 1773.576 2003.924
              yp90_mean | 4313.75 143.292 30.10 0.000 4032.903 4594.597
              yp10_cv | .0541741 .0134171 4.04 0.000 .027877 .0804711
              yp90_cv | .071181 .017224 4.13 0.000 .0374225 .1049395
              ypVI | 2.283918 .0999303 22.86 0.000 2.088058 2.479778
              ypDI | 2425 152.6341 15.89 0.000 2125.843 2724.157
              zp10_mean | 13.5 .6031448 22.38 0.000 12.31786 14.68214
              zp90_mean | 33.125 2.091099 15.84 0.000 29.02652 37.22348
              zp10_cv | .0685793 .0239219 2.87 0.004 .0216933 .1154652
              zp90_cv | .1201467 .0324602 3.70 0.000 .0565259 .1837675
              zpVI | 2.453704 .1769203 13.87 0.000 2.106946 2.800461
              zpDI | 19.625 2.153201 9.11 0.000 15.4048 23.8452
              ------------------------------------------------------------------------------

              Comment


              • #8
                I am trying to extend this program to loop through a and next other args b c d e f. However, it seems difficult to find a foreach command or similar which works within args?

                *version 291021
                clear *
                cls

                set seed 17

                sysuse auto
                cap progam drop myprog
                program define myprog, rclass
                version 17
                * b c d e f g
                args a
                marksample touse

                * tempnames of scalar are defined
                tempname a10 a10_mean a10_sd a10_cv ///
                a90 a90_mean a90_sd a90_cv aVI aDI ///

                *foreach x in a {
                *foreach x varlist a {
                *foreach x in `varlist' {
                *foreach x of local varlist {
                *qui summ `x' if `touse', detail
                *scalar `a50' = r(p50)
                *return scalar a50 = `a50'
                *}

                qui summ `a' if `touse', detail
                scalar `a10' = r(p10)
                scalar `a90' = r(p90)
                *return scalar N = r(N)
                return scalar a10 = `a10'
                return scalar a90 = `a90'

                qui summ `a' if `touse' & inrange(`a', ., `a10'), detail
                scalar `a10_mean' = r(mean)
                scalar `a10_sd' = r(sd)
                scalar `a10_cv' = `a10_sd' / `a10_mean'
                *return scalar a10_N = r(N)
                return scalar a10_mean = `a10_mean'
                return scalar a10_sd = `a10_sd'
                return scalar a10_cv = `a10_cv'

                qui summ `a' if `touse' & inrange(`a', `a90', .), detail
                scalar `a90_mean' = r(mean)
                scalar `a90_sd' = r(sd)
                scalar `a90_cv' = `a90_sd' / `a90_mean'
                *return scalar a90_N = r(N)
                return scalar a90_mean = `a90_mean'
                return scalar a90_sd = `a90_sd'
                return scalar a90_cv = `a90_cv'

                * VI DI calculated
                scalar `aVI' = `a90_mean' / `a10_mean'
                scalar `aDI' = `a90_mean' - `a10_mean'
                return scalar aVI = `aVI'
                return scalar aDI = `aDI'
                end

                *foreign gear_ratio displacement turn trunk headroom
                myprog price
                ret list

                bootstrap a10_mean=r(a10_mean) a90_mean=r(a90_mean) a10_cv=r(a10_cv) a90_cv=r(a90_cv) aVI=r(aVI) aDI=r(aDI) , reps(1000): myprog price
                stop

                Comment

                Working...
                X