Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crossfold: RMSE and root MSE differ?

    Dear Stata list members,

    As a fairly novice user, I have been working on a k-fold cross-validation using the 'crossfold' command. I am a little confused with the output.

    Below shows details of the last iteration, and then the RMSE's for all models.
    I expected the RMSE and the Root MSE to be identical, but they are not (see red text)

    I feel really stupid and must be missing something trivial, but I cannot find it through other sources either. So I hope someone can explain what's happening to me.



    Source | SS df MS Number of obs = 753
    -------------+---------------------------------- F(6, 746) = 26.90
    Model | 188.969091 6 31.4948485 Prob > F = 0.0000
    Residual | 873.508997 746 1.17092359 R-squared = 0.1779
    -------------+---------------------------------- Adj R-squared = 0.1712
    Total | 1062.47809 752 1.4128698 Root MSE = 1.0821



    | RMSE
    -------------+-----------
    est1 | 1.167672
    est2 | 1.113042
    est3 | 1.218319
    est4 | .960334
    est5 | .9428346
    est6 | 1.053918
    est7 | 1.210286
    est8 | 1.021159
    est9 | 1.155968
    est10 | 1.036067

  • #2
    It would have been helpful to see the Stata code you typed (see Statalist FAQ 3.3) .

    crossfold seems to be working correctly here. The RMSE from the original model is similar to the crossfold values. As random numbers are involved, nothing more can be expected.

    You seem to be confused about what crossfold does. It carries out the same regression on several different randomly selected parts of your data set, and checks how they perform. Naturally, they all give different estimates (in your case ten different estimates). There is no summary value. The last estimate is not special, despite your red ink.

    That said, if crossfold just replicates the original values, plus random noise, what is the point? See http://www.nrcresearchpress.com/doi/...2#.WnGuiqhl-Uk

    Comment


    • #3
      Dear Paul,

      I really appreciate your reply! Thank you.

      The Stata code I typed: crossfold stepwise, pr(0.10): depvar indepvars loud k(10)

      The reason I highlighted the 10th is because this (last) model and the list of RSME's are displayed after to each other, so it was easy to copy and paste here. But the issue of different numbers is the same for all 10 iterations.
      Below I pasted the root MSE from each of the models (displayed through the 'loud') next to the list of RMSE's that it produces at the end. None of them match and even the order of smallest to greatest is different. Therefore I am wondering whether I picked the right value from the model output.

      "crossfold seems to be working correctly here. The RMSE from the original model is similar to the crossfold values." --> Do you mean that the two values that I highlighted in red are the same in your case?
      I'm really probably just missing something trivial, but I'm struggling to see what as I'm still a novice at this topic.

      I am not sure what you mean with your remark about the summary value. I realise the 10th is not a summary value, or is that not what you're implying?


      | RMSE
      -------------+-----------
      est1 | 1.167672 1.0661
      est2 | 1.113042 1.0748
      est3 | 1.218319 1.0585
      est4 | .960334 1.0882
      est5 | .9428346 1.0913
      est6 | 1.053918 1.0784
      est7 | 1.210286 1.0623
      est8 | 1.021159 1.0833
      est9 | 1.155968 1.0684
      est10 | 1.036067 1.0821



      Comment


      • #4
        I think that you have a point... the randomization is what is supposed to pick out the model but the estimates of the RMSE should be the root mean squared errors from these models. Why don't you send an email to the author of the program and post back when you get a reply?

        Comment


        • #5
          Dear Andrew,

          I will do that. I was afraid I was missing something trivial (say, RMSE and root MSE were not the same thing) so I thought I'd ask here first.
          I will certainly post back.

          Comment


          • #6
            I received a very quick reply from Benjamin Daniels:

            "The RMSEs reported by the command are those from the out-of-sample predictions, *not* the regressions themselves."

            So: The results from the training set is what is displayed by the model outputs using 'loud'. The RMSE's that are listed are the RMSE's from the test sets on which this model is validated.

            Perhaps that's what Paul Seed meant as well with "Naturally, they all give different estimates (in your case ten different estimates)."
            Anyway, solved. Thank you guys!

            Comment


            • #7
              Thanks for closing the thread.

              Comment


              • #8
                The following example clarifies what "out-of sample" predictions in #6 means.


                Code:
                . sysuse nlsw88
                (NLSW, 1988 extract)
                
                . set seed 1234
                
                . *TO INSTALL TYPE findit crossfold AND FOLLOW INSTRUCTIONS.
                
                . * I CONSIDER THE SIMPLEST CASE WHERE k=2 fold cross-validation///
                     (default is k=5 yielding 4 out of sample groups)
                
                . crossfold reg wage union, k(2)  loud
                
                      Source |       SS           df       MS      Number of obs   =       937
                -------------+----------------------------------   F(1, 935)       =     25.02
                       Model |  453.778532         1  453.778532   Prob > F        =    0.0000
                    Residual |   16956.488       935  18.1352813   R-squared       =    0.0261
                -------------+----------------------------------   Adj R-squared   =    0.0250
                       Total |  17410.2666       936  18.6007121   Root MSE        =    4.2586
                
                ------------------------------------------------------------------------------
                        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       union |   1.612327   .3223246     5.00   0.000      .979764    2.244891
                       _cons |   7.343919   .1603864    45.79   0.000      7.02916    7.658678
                ------------------------------------------------------------------------------
                
                      Source |       SS           df       MS      Number of obs   =       941
                -------------+----------------------------------   F(1, 939)       =     19.16
                       Model |  302.759027         1  302.759027   Prob > F        =    0.0000
                    Residual |  14841.3628       939  15.8054982   R-squared       =    0.0200
                -------------+----------------------------------   Adj R-squared   =    0.0189
                       Total |  15144.1219       940  16.1107679   Root MSE        =    3.9756
                
                ------------------------------------------------------------------------------
                        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       union |    1.32186   .3020238     4.38   0.000     .7291407     1.91458
                       _cons |   7.066788   .1489924    47.43   0.000     6.774391    7.359184
                ------------------------------------------------------------------------------
                
                             |      RMSE
                -------------+-----------
                        est1 |  3.988536
                        est2 |  4.270144 
                In the data, the variables _est_est* define the estimation samples. So for the last model with RMSE=3.9756, we can compute the out of sample RMSE as follows:

                Code:
                *PREDICT FITTED VALUES USING OUT OF SAMPLE OBSERVATIONS
                . predict yhat if _est_est2!=1, xb
                (1,309 missing values generated)
                
                *GEN SQUARED RESIDUAL
                
                . gen e = (yhat-wage)*(yhat-wage)
                (1,309 missing values generated)
                
                *OBTAIN SUM OF SQUARED RESIDUAL
                
                . egen et = total(e)
                
                . *USE IN-SAMPLE DEGREES OF FREEDOM
                
                . egen n= total(1 ) if _est_est2==1
                (1305 missing values generated)
                
                . replace n= n-2
                (941 real changes made)
                
                *OBTAIN MEAN SUM OF SQUARES RESIDUAL
                
                . gen mss= et/n
                (1,305 missing values generated)
                
                *FINALLY GET ESTIMATED RMSE AND NTE THAT IT MATCHES  -est2- ABOVE
                
                
                . gen rmse= mss^0.5
                (1,305 missing values generated)
                
                . sum rmse
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        rmse |        941    4.265594           0   4.265594   4.265594
                Last edited by Andrew Musau; 31 Jan 2018, 09:51.

                Comment


                • #9
                  Thank you for this interesting post.

                  I have tried to replicate Andrew's codes using k=5 (the default) but results are different. Any idea why?

                  These are the codes only:

                  Code:
                  clear all
                  sysuse nlsw88
                  set seed 1234
                  crossfold reg wage union, k(5)  loud
                  
                  forvalues i=1/5 {
                      preserve
                      qui reg wage union if _est_est`i'==1
                      qui predict yhat`i' if _est_est`i'!=1, xb
                  
                      *GEN SQUARED RESIDUAL
                      qui gen e`i' = (yhat`i'-wage)*(yhat`i'-wage)
                  
                      *OBTAIN SUM OF SQUARED RESIDUAL
                      qui egen et`i' = total(e`i')
                  
                      *USE IN-SAMPLE DEGREES OF FREEDOM
                      qui egen n`i'= total(1) if _est_est`i'==1
                      qui replace n`i'=n`i'-2
                  
                      *OBTAIN MEAN SUM OF SQUARES RESIDUAL
                      qui gen mss`i'= et`i'/n`i'
                  
                      *FINALLY GET ESTIMATED RMSE AND NTE THAT IT MATCHES  -est*- ABOVE
                      qui gen rmse`i'= mss`i'^0.5
                      sum rmse`i'
                      restore
                      }
                  *
                  These are the codes including the output:

                  Code:
                  . clear all
                  
                  . sysuse nlsw88
                  (NLSW, 1988 extract)
                  
                  . set seed 1234
                  
                  . crossfold reg wage union, k(5)
                  
                               |      RMSE
                  -------------+-----------
                          est1 |  4.189929
                          est2 |  3.802928
                          est3 |  4.239404
                          est4 |  4.179155
                          est5 |  4.199521
                  
                  .
                  . forvalues i=1/5 {
                    2.         preserve
                    3.         qui reg wage union if _est_est`i'==1
                    4.         qui predict yhat`i' if _est_est`i'!=1, xb
                    5.
                  .         *GEN SQUARED RESIDUAL
                  .         qui gen e`i' = (yhat`i'-wage)*(yhat`i'-wage)
                    6.
                  .         *OBTAIN SUM OF SQUARED RESIDUAL
                  .         qui egen et`i' = total(e`i')
                    7.
                  .         *USE IN-SAMPLE DEGREES OF FREEDOM
                  .         qui egen n`i'= total(1) if _est_est`i'==1
                    8.         qui replace n`i'=n`i'-2
                    9.
                  .         *OBTAIN MEAN SUM OF SQUARES RESIDUAL
                  .         qui gen mss`i'= et`i'/n`i'
                   10.
                  .         *FINALLY GET ESTIMATED RMSE AND NTE THAT IT MATCHES  -est*- ABOVE
                  .         qui gen rmse`i'= mss`i'^0.5
                   11.         sum rmse`i'
                   12.         restore
                   13.         }
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                         rmse1 |      1,510    2.069807           0   2.069807   2.069807
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                         rmse2 |      1,494    1.929298           0   1.929298   1.929298
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                         rmse3 |      1,500    2.129584           0   2.129584   2.129584
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                         rmse4 |      1,502    2.092361           0   2.092361   2.092361
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                         rmse5 |      1,506    2.088562           0   2.088562   2.088562
                  ------
                  I use Stata 17

                  Comment


                  • #10
                    Hi Lukas, in the simple k=2 fold case, it is not necessary to specify the in-sample regression. This however changes with k>2. The following example makes use of the Grunfeld data set where my target RMSE pertains to the 3rd group.

                    Code:
                    webuse grunfeld
                    set seed 1234
                    crossfold reg invest mvalue kstock, loud
                    
                    *WITH K>2, NECESSARY TO SPECIFY IN SAMPLE REGRESSION
                    qui regress invest mvalue kstock if _est_est3==1
                    predict yhat if _est_est3!=1, xb
                    
                    *GEN SQUARED RESIDUAL
                    gen e = (yhat-invest)*(yhat-invest) if  _est_est3!=1
                    
                    *OBTAIN SUM OF SQUARED RESIDUAL
                    egen et = total(e)
                    
                    *USE OUT OF-SAMPLE DEGREES OF FREEDOM
                    egen n= total(1 ) if _est_est3!=1
                    
                    *OBTAIN MEAN SUM OF SQUARES RESIDUAL
                    gen mss= et/n
                    
                    *FINALLY GET ESTIMATED RMSE AND NTE THAT IT MATCHES  -est3- ABOVE
                    gen rmse= mss^0.5
                    sum rmse
                    Code:
                    . crossfold reg invest mvalue kstock, loud
                    
                          Source |       SS           df       MS      Number of obs   =       160
                    -------------+----------------------------------   F(2, 157)       =    296.68
                           Model |  5791871.18         2  2895935.59   Prob > F        =    0.0000
                        Residual |  1532498.43       157  9761.13652   R-squared       =    0.7908
                    -------------+----------------------------------   Adj R-squared   =    0.7881
                           Total |  7324369.61       159  46065.2177   Root MSE        =    98.798
                    
                    ------------------------------------------------------------------------------
                          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          mvalue |   .1140144   .0070088    16.27   0.000     .1001707     .127858
                          kstock |   .2574191   .0300034     8.58   0.000     .1981567    .3166815
                           _cons |  -46.07651   11.09453    -4.15   0.000    -67.99031   -24.16271
                    ------------------------------------------------------------------------------
                    
                          Source |       SS           df       MS      Number of obs   =       160
                    -------------+----------------------------------   F(2, 157)       =    366.54
                           Model |   6034945.2         2   3017472.6   Prob > F        =    0.0000
                        Residual |  1292466.58       157  8232.27124   R-squared       =    0.8236
                    -------------+----------------------------------   Adj R-squared   =    0.8214
                           Total |  7327411.79       159  46084.3509   Root MSE        =    90.732
                    
                    ------------------------------------------------------------------------------
                          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          mvalue |   .1137032   .0061716    18.42   0.000     .1015131    .1258933
                          kstock |   .2394641   .0275968     8.68   0.000     .1849552    .2939731
                           _cons |  -39.81952   10.23855    -3.89   0.000     -60.0426   -19.59645
                    ------------------------------------------------------------------------------
                    
                          Source |       SS           df       MS      Number of obs   =       160
                    -------------+----------------------------------   F(2, 157)       =    351.00
                           Model |  6801912.55         2  3400956.27   Prob > F        =    0.0000
                        Residual |  1521210.88       157  9689.24129   R-squared       =    0.8172
                    -------------+----------------------------------   Adj R-squared   =    0.8149
                           Total |  8323123.43       159  52346.6882   Root MSE        =    98.434
                    
                    ------------------------------------------------------------------------------
                          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          mvalue |   .1141046   .0063526    17.96   0.000     .1015571    .1266521
                          kstock |   .2326315   .0276624     8.41   0.000     .1779931    .2872699
                           _cons |  -47.92494    11.1398    -4.30   0.000    -69.92815   -25.92174
                    ------------------------------------------------------------------------------
                    
                          Source |       SS           df       MS      Number of obs   =       160
                    -------------+----------------------------------   F(2, 157)       =    325.19
                           Model |  5067174.94         2  2533587.47   Prob > F        =    0.0000
                        Residual |  1223215.38       157  7791.18079   R-squared       =    0.8055
                    -------------+----------------------------------   Adj R-squared   =    0.8031
                           Total |  6290390.32       159  39562.2033   Root MSE        =    88.268
                    
                    ------------------------------------------------------------------------------
                          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          mvalue |    .112054    .005865    19.11   0.000     .1004695    .1236385
                          kstock |   .1877942   .0281228     6.68   0.000     .1322463     .243342
                           _cons |  -32.94767   10.20441    -3.23   0.002    -53.10331   -12.79203
                    ------------------------------------------------------------------------------
                    
                          Source |       SS           df       MS      Number of obs   =       160
                    -------------+----------------------------------   F(2, 157)       =    391.83
                           Model |  6780638.42         2  3390319.21   Prob > F        =    0.0000
                        Residual |  1358450.07       157  8652.54822   R-squared       =    0.8331
                    -------------+----------------------------------   Adj R-squared   =    0.8310
                           Total |  8139088.49       159  51189.2358   Root MSE        =    93.019
                    
                    ------------------------------------------------------------------------------
                          invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          mvalue |   .1284541   .0073341    17.51   0.000     .1139679    .1429404
                          kstock |   .2165697   .0290646     7.45   0.000     .1591617    .2739778
                           _cons |  -44.72366   10.26777    -4.36   0.000    -65.00445   -24.44288
                    ------------------------------------------------------------------------------
                    
                                 |      RMSE
                    -------------+-----------
                            est1 |  76.64691
                            est2 |   108.039
                           est3 |   77.9467 
                            est4 |  121.2959
                            est5 |  106.8959

                    Code:
                     sum rmse
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                            rmse |         40     77.9467           0    77.9467    77.9467
                    
                    .
                    You can generalize the code to the other groups.
                    Last edited by Andrew Musau; 13 Sep 2018, 13:40. Reason: Should be use out of sample degrees of freedom. Error applies to #8, corrected here.

                    Comment


                    • #11
                      Hi Luka,
                      Even though it may be late for this, but i would also direct you towards two other user written commands for Cross validation -loocv- and -cv_regress-.
                      Code:
                       webuse grunfeld
                      
                      . set seed 1234
                      
                      . crossfold reg invest mvalue kstock, loud
                      
                      *** Output Ommited
                      
                      --------------------------------
                            invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                            mvalue |   .1141046   .0063526    17.96   0.000     .1015571    .1266521
                            kstock |   .2326315   .0276624     8.41   0.000     .1779931    .2872699
                             _cons |  -47.92494    11.1398    -4.30   0.000    -69.92815   -25.92174
                      ------------------------------------------------------------------------------
                      
                            Source |       SS           df       MS      Number of obs   =       160
                      -------------+----------------------------------   F(2, 157)       =    325.19
                             Model |  5067174.94         2  2533587.47   Prob > F        =    0.0000
                          Residual |  1223215.38       157  7791.18079   R-squared       =    0.8055
                      -------------+----------------------------------   Adj R-squared   =    0.8031
                             Total |  6290390.32       159  39562.2033   Root MSE        =    88.268
                      
                      ------------------------------------------------------------------------------
                            invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                            mvalue |    .112054    .005865    19.11   0.000     .1004695    .1236385
                            kstock |   .1877942   .0281228     6.68   0.000     .1322463     .243342
                             _cons |  -32.94767   10.20441    -3.23   0.002    -53.10331   -12.79203
                      ------------------------------------------------------------------------------
                      
                            Source |       SS           df       MS      Number of obs   =       160
                      -------------+----------------------------------   F(2, 157)       =    391.83
                             Model |  6780638.42         2  3390319.21   Prob > F        =    0.0000
                          Residual |  1358450.07       157  8652.54822   R-squared       =    0.8331
                      -------------+----------------------------------   Adj R-squared   =    0.8310
                             Total |  8139088.49       159  51189.2358   Root MSE        =    93.019
                      
                      ------------------------------------------------------------------------------
                            invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                            mvalue |   .1284541   .0073341    17.51   0.000     .1139679    .1429404
                            kstock |   .2165697   .0290646     7.45   0.000     .1591617    .2739778
                             _cons |  -44.72366   10.26777    -4.36   0.000    -65.00445   -24.44288
                      ------------------------------------------------------------------------------
                      
                                   |      RMSE 
                      -------------+-----------
                              est1 |  76.64691 
                              est2 |   108.039 
                              est3 |   77.9467 
                              est4 |  121.2959 
                              est5 |  106.8959 
                      
                      . loocv reg invest mvalue kstock
                      
                      
                       Leave-One-Out Cross-Validation Results 
                      -----------------------------------------
                               Method          |    Value
                      -------------------------+---------------
                      Root Mean Squared Errors |   97.782913
                      Mean Absolute Errors     |   61.287695
                      Pseudo-R2                |   .7957062
                      -----------------------------------------
                      
                      . reg invest mvalue kstock
                      
                            Source |       SS           df       MS      Number of obs   =       200
                      -------------+----------------------------------   F(2, 197)       =    426.58
                             Model |  7604093.48         2  3802046.74   Prob > F        =    0.0000
                          Residual |  1755850.43       197  8912.94636   R-squared       =    0.8124
                      -------------+----------------------------------   Adj R-squared   =    0.8105
                             Total |  9359943.92       199  47034.8941   Root MSE        =    94.408
                      
                      ------------------------------------------------------------------------------
                            invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                            mvalue |   .1155622   .0058357    19.80   0.000     .1040537    .1270706
                            kstock |   .2306785   .0254758     9.05   0.000     .1804382    .2809188
                             _cons |  -42.71437   9.511676    -4.49   0.000    -61.47215   -23.95659
                      ------------------------------------------------------------------------------
                      
                      . cv_regress
                      
                      
                      Leave-One-Out Cross-Validation Results 
                      -----------------------------------------
                               Method          |    Value
                      -------------------------+---------------
                      Root Mean Squared Errors |   97.782912
                      Mean Absolute Errors     |   61.287695
                      Pseudo-R2                |      0.79571
                      -----------------------------------------
                      While -loocv- and -cv_regress- provide the same results, you should know that loocv may be very computationally intensive, but in theory can be used for any model. Instead cv_regress is faster, but can only be used after "regress"

                      Fernando

                      Comment


                      • #12
                        FernandoRios and Andrew Musau do both of you have any idea on how to do cross-validation for Hurdle Negative Binomial (1st part is probit or logit and the 2nd part is NB2) and Finite Mixture Models? It seems the crossfold and loocv do not work with the two mentioned models.

                        Comment


                        • #13
                          Hi Dung
                          I do not think LOOCV will be the best for you. And the command I proposed, the cv_regress is only valid for linear regressions.
                          I think your best alternative is do a Crossfold validation.
                          In this sense what you need its a program that does the following.
                          Stp 1. Create a program that randomly assigns your data in 1 of N groups (N can be for example 5).
                          Stp 2. Create a program that obtains predictions of your model. This to compare the Fitness of the model, or the likelihood of the model.
                          In linear models, the predicted value is enough. But in nonlinear models, usually crossvalidation criteria can be computed using the Loglikelihood of the model.
                          Stp 3. Now that you have say N=5 groups, call them s1 s2 s3 s4 s5, use information from s2-s5 to predict the loglikelihood (for each observation) for s1.
                          for s2, repeat stp2 for using samples s1 s3 s4 s5. Same for s3 s4 and s5.
                          stp4 Sum all loglikelihoods that you predicted in Stp3.

                          Hope this is helpful

                          Comment


                          • #14
                            Originally posted by FernandoRios View Post
                            Hi Dung
                            I do not think LOOCV will be the best for you. And the command I proposed, the cv_regress is only valid for linear regressions.
                            I think your best alternative is do a Crossfold validation.
                            In this sense what you need its a program that does the following.
                            Stp 1. Create a program that randomly assigns your data in 1 of N groups (N can be for example 5).
                            Stp 2. Create a program that obtains predictions of your model. This to compare the Fitness of the model, or the likelihood of the model.
                            In linear models, the predicted value is enough. But in nonlinear models, usually crossvalidation criteria can be computed using the Loglikelihood of the model.
                            Stp 3. Now that you have say N=5 groups, call them s1 s2 s3 s4 s5, use information from s2-s5 to predict the loglikelihood (for each observation) for s1.
                            for s2, repeat stp2 for using samples s1 s3 s4 s5. Same for s3 s4 and s5.
                            stp4 Sum all loglikelihoods that you predicted in Stp3.

                            Hope this is helpful
                            Thanks FernandoRios for your suggestions, unfortunately, I am not familiar with creating programs in Stata. Could you please write an example program for me? I am highly appreciated your help.

                            Best regards,

                            DL

                            Comment


                            • #15
                              Hi FernandoRios and Andrew Musau,

                              I created a thread regarding out of sample cross-validation for counts data. In the thread, I provided my coding which gave my unexpected results. I am wondering if you both may have time to look at the thread? Any support is much appreciated.

                              https://www.statalist.org/forums/for...a-am-i-correct

                              Thank you.

                              DL

                              Comment

                              Working...
                              X