Dear all,
I performed a nonparametric kernel regression analysis to develop a diagnostic model for predicting HRQoL, a highly skweed continuous variable. This approach demonstrated superior predictive performance compared to other distributional analysis techniques.
My current challenge is presenting the model equation for external validation. Although I understand the theory, I'm having difficulty programming it in Stata for this purpose. Any assistance would be greatly appreciated.
. npregress kernel qol i.sex age i.work c.comorb i.diabetes c.newimd c.newtrigs c.newurea c.neweGFRMDRD c.newwaist newbmi
Computing mean function
Minimizing cross-validation function:
Iteration 0: Cross-validation criterion = 30.474236
Iteration 1: Cross-validation criterion = 30.351633
Iteration 2: Cross-validation criterion = 30.351633
Iteration 3: Cross-validation criterion = 30.351633
Iteration 4: Cross-validation criterion = 30.351633
warning: 213 observations were not used to compute the mean function because they violated the model
identification assumptions. These observations are marked as 1 in the system variable
_unident_sample. You may use the unidentsample() option to use a different variable name.
Computing optimal derivative bandwidth
Iteration 0: Cross-validation criterion = 1.0019674
Iteration 1: Cross-validation criterion = 1.0019674
Iteration 2: Cross-validation criterion = 1.0019386
Iteration 3: Cross-validation criterion = 1.0018571
Iteration 4: Cross-validation criterion = 1.0018571
Bandwidth
-----------------------------------
| Mean Effect
-------------+---------------------
sex | .5 .5
age | 5.397418 5.919251
work | .5 .5
comorb | .5638 .6183092
diabetes | .5 .5
newimd | 6.627235 7.267969
newtrigs | .4576943 .5019451
newurea | .7612126 .834808
neweGFRMDRD | 6.707786 7.356307
newwaist | 7.322567 8.030526
newbmi | 2.701951 2.96318
-----------------------------------
Local-linear regression Number of obs = 3,940
Continuous kernel : epanechnikov E(Kernel obs) = 3,940
Discrete kernel : liracine R-squared = 0.8250
Bandwidth : cross-validation
-------------------------------------------------------------------------------------------------
qol | Estimate
--------------------------------+----------------------------------------------------------------
Mean |
qol | .8561769
--------------------------------+----------------------------------------------------------------
Effect |
age | -.0003027
comorb | -.0171372
newimd | -.0020548
newtrigs | -.0168828
newurea | -.0030568
neweGFRMDRD | -.0004259
newwaist | -.0021735
newbmi | .0003762
|
sex |
(Female vs Male) | -.0551113
|
work |
(Retired vs Currently working) | -.0555846
(Other vs Currently working) | -.1171366
|
diabetes |
(Yes vs No) | -.0015465
-------------------------------------------------------------------------------------------------
Note: Effect estimates are averages of derivatives for continuous covariates and averages of contrasts for
factor covariates.
Note: You may compute standard errors using vce(bootstrap) or reps().
I performed a nonparametric kernel regression analysis to develop a diagnostic model for predicting HRQoL, a highly skweed continuous variable. This approach demonstrated superior predictive performance compared to other distributional analysis techniques.
My current challenge is presenting the model equation for external validation. Although I understand the theory, I'm having difficulty programming it in Stata for this purpose. Any assistance would be greatly appreciated.
. npregress kernel qol i.sex age i.work c.comorb i.diabetes c.newimd c.newtrigs c.newurea c.neweGFRMDRD c.newwaist newbmi
Computing mean function
Minimizing cross-validation function:
Iteration 0: Cross-validation criterion = 30.474236
Iteration 1: Cross-validation criterion = 30.351633
Iteration 2: Cross-validation criterion = 30.351633
Iteration 3: Cross-validation criterion = 30.351633
Iteration 4: Cross-validation criterion = 30.351633
warning: 213 observations were not used to compute the mean function because they violated the model
identification assumptions. These observations are marked as 1 in the system variable
_unident_sample. You may use the unidentsample() option to use a different variable name.
Computing optimal derivative bandwidth
Iteration 0: Cross-validation criterion = 1.0019674
Iteration 1: Cross-validation criterion = 1.0019674
Iteration 2: Cross-validation criterion = 1.0019386
Iteration 3: Cross-validation criterion = 1.0018571
Iteration 4: Cross-validation criterion = 1.0018571
Bandwidth
-----------------------------------
| Mean Effect
-------------+---------------------
sex | .5 .5
age | 5.397418 5.919251
work | .5 .5
comorb | .5638 .6183092
diabetes | .5 .5
newimd | 6.627235 7.267969
newtrigs | .4576943 .5019451
newurea | .7612126 .834808
neweGFRMDRD | 6.707786 7.356307
newwaist | 7.322567 8.030526
newbmi | 2.701951 2.96318
-----------------------------------
Local-linear regression Number of obs = 3,940
Continuous kernel : epanechnikov E(Kernel obs) = 3,940
Discrete kernel : liracine R-squared = 0.8250
Bandwidth : cross-validation
-------------------------------------------------------------------------------------------------
qol | Estimate
--------------------------------+----------------------------------------------------------------
Mean |
qol | .8561769
--------------------------------+----------------------------------------------------------------
Effect |
age | -.0003027
comorb | -.0171372
newimd | -.0020548
newtrigs | -.0168828
newurea | -.0030568
neweGFRMDRD | -.0004259
newwaist | -.0021735
newbmi | .0003762
|
sex |
(Female vs Male) | -.0551113
|
work |
(Retired vs Currently working) | -.0555846
(Other vs Currently working) | -.1171366
|
diabetes |
(Yes vs No) | -.0015465
-------------------------------------------------------------------------------------------------
Note: Effect estimates are averages of derivatives for continuous covariates and averages of contrasts for
factor covariates.
Note: You may compute standard errors using vce(bootstrap) or reps().
Comment