Internal validatiom program

Masoumeh Sanagou

Join Date: May 2017
Posts: 107

Internal validatiom program

07 Jan 2020, 14:13

Hi STATALIST,

Following program supposed to give me 100 different MSEs, but it produced 100 same MSEs. could anyone please help me to find out the problem?

Regards,

Code:

clear
clear matrix
capture log close
pause on

log using "C:\internal validation CT.log", text replace

capture program drop nfoldmseCT
program define nfoldmseCT
    local N `2'
    local i=1
    postfile mysim mse using nfoldmseCT-estimates, replace
    set seed 34561
 while `i' <= `1' {
  drop _all
  
  use "C:\internal validation CT.dta", clear
    
  quietly nbreg CTexamsnumber CTunitsdensitypermillionp  un2016poulation65years undp2017humandevelopmentinde if sample==0 , exposure(un2015populationtotal1000) irr vce(robust)
             
   drop if sample==0
            sample `N', count
            predict predict_rate_CT if sample==1, ir
            gen diff =.
            replace diff = (predict_rate_CT - CTexamsdensityperthousand)^2  if sample==1
          
   quietly  summarize diff, detail
           local mse = (r(mean))

    post mysim  (`mse')
      local i=`i'+1
      }
 postclose mysim
 use nfoldmseCT-estimates, clear
end
nfoldmseCT 100 77
 /* 100 samples of size 77 */

list

summarize  mse

Click image for larger version

Name: MSE.PNG
Views: 1
Size: 2.0 KB
ID: 1531128

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#2

07 Jan 2020, 19:29

This is just a guess, but the code, although it could be tightened up a great deal, looks like it should be looping 100 times. Still, I can see one way it might be going wrong: -sample- does sampling without replacement. So if internal validation CT.dta contains only 77 or fewer observations with sample != 0, then there is only one such sample and you are getting it back every time through the loop..

If you want sampling with replacement, which avoids this problem, you need -bsample-, not -sample-. Or perhaps your data set should contain more than 77 observations with sample != 0 but it is incorrect and needs fixing.

Here are some other ways you can improve this code, though they have no bearing on the question you asked.

1. It is never necessary to do

Code:

gen diff = . replace diff = whatever if sample == 1

because it can be done in a single line as

Code:

gen diff = whatever if sample == 1

2. There is no reason to rerun the negative binomial regression each time through the loop: it's the same regression carried out on the same data. So take that out of the program and run it before you call the program. The estimates will remain active because you will not be doing anything in the program to overwrite them, so that -predict- will still work despite the changes to the data. Repeating the regression each time generates a lot of repetitive output and wastes a lot of time.
3 likes
Comment

Announcement

Internal validatiom program

Comment