Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help running a bootstrap within a forvalues loop

    Dear all,

    Apologies if this is very basic, however I have spent the better part of a day going through the manuals and online forums with little progress.

    I am trying to run a very simple bootstrap on a dataset. The data set has 3 variables; orgcode - group - perf

    My ideal aim is:
    1. For each value of group - calculate the bootstrapped SD and IQR values
    2. Save the estimated values for that group to a unique file - my ultimate aim being to append all the files
    3. Move onto the next value of group
    I am trying to execute the following code:

    Code:
    use dataset
    drop if total<100
    keep if standard=="Test"
    keep if year=="2018"
    drop year standard proportion total
    rename withinstd value1
    rename without value0
    reshape long value, i(orgcode type) j(perf)
    drop if value==0
    expand value
    drop value
    egen group = group(type)
    drop type
    
    program boot, rclass
            preserve
            collapse perf, by(orgcode group)
            sum perf if group==`x', d
            local sd = r(sd)
            local p25 = r(p25)
            local p75 = r(p75)
            return scalar sd = `sd'
            return scalar iqr = `p75'-`p25'
            restore
    end
    
    forvalues x = 1/14 {
    bootstrap r(sd) r(iqr), reps(10) saving(data_`x', replace): boot
    }
    Running this code yields:
    Code:
    invalid syntax
    an error occurred when bootstrap executed boot
    r(198);

    There are only 14 groups which are defined and dont change, so I could conceivably just manually type the bootstrap sequence out per group and go from there, but for the sake of elegance I wanted to know how to use a loop to accomplish the same thing.

    Many thanks for your time,
    Last edited by Youssof Oskrochi; 07 Aug 2019, 03:54.

  • #2
    x is a local macro within your main session. There is no way that the program boot can tell what it is unless it's passed as an argument. That's what local means! Local means visible only within the same place, which is (in this case) a program.

    Better to abandon that given other problems. Here is a rewriting of your code.

    The first block of code I can't test without example data. I can simplify it a bit. You had

    Code:
    use dataset
    drop if total<100
    keep if standard=="Test"
    keep if year=="2018"
    drop year standard proportion total
    rename withinstd value1
    rename without value0
    reshape long value, i(orgcode type) j(perf)
    drop if value==0
    expand value
    drop value
    egen group = group(type)
    drop type
    I get

    Code:
    use dataset
    keep if total >= 100 & standard=="Test" & year=="2018"
    drop year standard proportion total
    rename (withinstd without) (value1 value0)
    reshape long value, i(orgcode type) j(perf)
    drop if value==0
    expand value
    drop value
    egen group = group(type)
    drop type
    and if I had sight of an example dataset I could probably simplify it further, as the reshape looks unnecessary. You should just drop without as well, perhaps.

    The second block of code seems backwards to me. You have

    Code:
    program boot, rclass
            preserve
            collapse perf, by(orgcode group)
            sum perf if group==`x', d
            local sd = r(sd)
            local p25 = r(p25)
            local p75 = r(p75)
            return scalar sd = `sd'
            return scalar iqr = `p75'-`p25'
            restore
    end
    
    forvalues x = 1/14 {
    bootstrap r(sd) r(iqr), reps(10) saving(data_`x', replace): boot
    }
    That wastes a lot of time and effort doing the same stuff again and again. I suggest (and warn again that nothing is tested)

    Code:
    preserve
    collapse perf, by(orgcode group)
    
    program sd_iqr, rclass
        syntax varname [if] [in]
        marksample touse
        sum `varlist' if `touse', d
        return scalar sd = r(sd)
        return scalar iqr = r(p75)- r(p25)
    end
    
    forvalues x = 1/14 {
       bootstrap r(sd) r(iqr), reps(10) saving(data_`x', replace): sd_iqr perf if group == `x'
    }
    
    restore
    Further comments:

    1. 10 reps strikes me as far too few.

    2 Look again at this

    Code:
    local sd = r(sd)
    local p25 = r(p25)
    local p75 = r(p75)
    return scalar sd = `sd'
    return scalar iqr = `p75'-`p25'
    I don't understand why you are using local macros here. It's not only not needed; it degrades the result slightly, although you'd have to look hard to see that.

    To think yourself out of doing this, ponder this childish tale.

    I have a pen.
    I put it in a box.
    I want my pen.
    I take it out of the box.
    I use my pen.

    With no rationale, the boxing is just extra faffing around, to use a favourite word of a charismatic mathematics teacher of mine in 1965-66.

    The boxing is putting in a local macro only to take it out again.

    We can just cut that down to

    I have a pen.
    I want my pen.
    I use my pen.





    Last edited by Nick Cox; 07 Aug 2019, 05:15.

    Comment


    • #3
      Dear Dr Cox, Thank you so much for the help. Worked like a charm!

      Particularly helpful was your guidance about cleaning up and making the code more efficient. Thank you.

      Comment

      Working...
      X