Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Internal validatiom program

    Hi STATALIST,

    Following program supposed to give me 100 different MSEs, but it produced 100 same MSEs. could anyone please help me to find out the problem?

    Regards,

    Code:
    clear
    clear matrix
    capture log close
    pause on
    
    log using "C:\internal validation CT.log", text replace
    
    capture program drop nfoldmseCT
    program define nfoldmseCT
        local N `2'
        local i=1
        postfile mysim mse using nfoldmseCT-estimates, replace
        set seed 34561
     while `i' <= `1' {
      drop _all
      
      use "C:\internal validation CT.dta", clear
        
      quietly nbreg CTexamsnumber CTunitsdensitypermillionp  un2016poulation65years undp2017humandevelopmentinde if sample==0 , exposure(un2015populationtotal1000) irr vce(robust)
                 
       drop if sample==0
                sample `N', count
                predict predict_rate_CT if sample==1, ir
                gen diff =.
                replace diff = (predict_rate_CT - CTexamsdensityperthousand)^2  if sample==1
              
       quietly  summarize diff, detail
               local mse = (r(mean))
    
        post mysim  (`mse')
          local i=`i'+1
          }
     postclose mysim
     use nfoldmseCT-estimates, clear
    end
    nfoldmseCT 100 77
     /* 100 samples of size 77 */
    
    list
    
    summarize  mse

    Click image for larger version

Name:	MSE.PNG
Views:	1
Size:	2.0 KB
ID:	1531128



  • #2
    This is just a guess, but the code, although it could be tightened up a great deal, looks like it should be looping 100 times. Still, I can see one way it might be going wrong: -sample- does sampling without replacement. So if internal validation CT.dta contains only 77 or fewer observations with sample != 0, then there is only one such sample and you are getting it back every time through the loop..

    If you want sampling with replacement, which avoids this problem, you need -bsample-, not -sample-. Or perhaps your data set should contain more than 77 observations with sample != 0 but it is incorrect and needs fixing.

    Here are some other ways you can improve this code, though they have no bearing on the question you asked.

    1. It is never necessary to do
    Code:
    gen diff = .
    replace diff = whatever if sample == 1
    because it can be done in a single line as
    Code:
    gen diff = whatever if sample == 1
    2. There is no reason to rerun the negative binomial regression each time through the loop: it's the same regression carried out on the same data. So take that out of the program and run it before you call the program. The estimates will remain active because you will not be doing anything in the program to overwrite them, so that -predict- will still work despite the changes to the data. Repeating the regression each time generates a lot of repetitive output and wastes a lot of time.

    Comment

    Working...
    X