Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • split sample and run logit models and capture numbers to make a new dataset

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(trans_bef_pregnancy maternalage) byte parity_cat float event
    0 37 2 1
    0 24 2 0
    0 27 2 1
    0 35 1 0
    0 23 1 0
    0 33 1 0
    1 35 2 1
    0 20 1 0
    0 35 3 0
    0 28 2 0
    0 42 3 0
    0 33 1 0
    0 28 2 0
    0 21 1 0
    0 31 1 1
    0 27 2 0
    0 23 2 0
    0 31 1 0
    0 40 2 0
    0 28 1 0
    0 34 2 0
    0 28 1 .
    0 30 3 0
    1 34 1 0
    0 24 2 0
    0 39 4 0
    0 30 4 0
    0 28 1 0
    0 18 1 0
    0 29 2 0
    0 30 1 0
    1 30 1 1
    0 40 2 0
    0 43 2 0
    0 37 2 0
    1 20 1 0
    0 29 1 0
    0 35 4 0
    0 34 2 0
    0 25 1 0
    0 31 3 0
    0 39 2 0
    0 37 2 0
    0 22 1 1
    1 25 1 1
    0 23 1 0
    0 26 1 0
    0 43 1 0
    0 29 2 0
    1 31 2 0
    0 31 2 0
    0 27 1 0
    0 33 1 0
    0 32 1 1
    0 32 2 0
    0 28 2 0
    0 22 1 0
    1 35 1 1
    0 36 3 0
    0 29 2 0
    0 27 1 0
    1 39 3 .
    0 32 1 0
    0 33 1 0
    0 33 2 0
    0 32 1 0
    1 40 2 0
    0 41 4 0
    0 36 3 0
    0 42 2 1
    0 25 1 0
    1 27 1 1
    0 26 1 0
    0 35 2 0
    0 23 1 1
    0 29 3 0
    0 27 1 0
    0 33 2 0
    0 27 1 0
    0 32 2 0
    1 23 1 0
    0 29 1 1
    0 37 2 0
    0 32 4 0
    0 31 4 0
    0 37 3 0
    1 36 2 0
    0 24 2 0
    0 30 2 0
    0 33 1 0
    0 38 4 0
    0 28 1 1
    0 30 3 0
    0 25 1 0
    1 28 3 1
    0 36 2 0
    0 29 2 0
    1 32 3 0
    0 41 1 0
    0 28 3 0
    end
    label values parity_cat parity_cat
    label def parity_cat 1 "1", modify
    label def parity_cat 2 "2", modify
    label def parity_cat 3 "3", modify
    label def parity_cat 4 "4 or above", modify
    
    For the data above: i have the following commands.
    
    set seed 2200
    
    // splitting randomly the data into training 50%, validation 25% and test set 25%
    splitsample, generate(sample) split(0.5 0.25 0.25)
    lab var sample "training, validation or test set"
    lab define sample 1 "training set" 2 "validation set" 3 "test set"
    lab val sample sample
    
    // let's run a logistic model on the training set
    logit event b1.momorigin_cat i.parity_cat trans_bef_pregnancy maternalage b4.abo if n_test>=2 &sample==1 , or
    
    //predicted values for validation and test samples
    
    predict pred_val if sample==2 & (event==0 | event==1)
    sum pred_val, de
    local sum_pred_val=r(sum)
    
    predict pred_test if sample==3 & (event==0 | event==1)
    sum pred_test, de
    local sum_pred_test=r(sum)
    
    
    // to see how the sum of predicted probabilities look like
    tabstat pred_val pred_test, stat(sum mean N)
    tab event sample
    
    
    //Now I want to run simulations with different sumsamples like 50, 25 ,25 above but for diffferent seeds fro example seed : 1 to 1000.
    // How and would like to see how sum of pred_val and pred_test differ from the actual observed events given by tab event sample,
    //What could be the best approach to use
    Last edited by Nishan Lamichhane; 22 Mar 2023, 02:25.

  • #2
    I figured out this the following way:

    Code:
    gen sum_p2=.
    gen sum_p3=.
    qui forv i=1/1000 {
    local k=`i'
    set seed `k'
    
    // splitting randomly the data into training 50%, validation 25% and test set 25%
    splitsample, generate(sample) split(0.5 0.25 0.25)
    
    //doing logit
    logit event  var2 var2......
    pred p2 if sample==2
    sum p2
    replace sum_p2=r(sum) in `i' 
    
    pred p3 if sample==3
    sum p3
    replace sum_p3=r(sum) in `i'
    
    drop sample
    }

    Comment

    Working...
    X