Negative partial R2: How can I address this?

Huy Nguyen

Join Date: Jun 2021

Posts: 15
#1

Negative partial R2: How can I address this?

14 Aug 2021, 07:41

Dear All,
I'm analyzing to assess contribution of 6 independent variables (IVs), I estimated partial R2 - the difference between the full model R² and model R² leaving each of these IVs out. However, some models had negative partial R2 =>What should I do to address this issue or any other way?
Tags: fixed effects, regression
Huy Nguyen

Join Date: Jun 2021

Posts: 15
#2

14 Aug 2021, 07:46

Dear All,
I'm analyzing to estimate partial R2 - the difference between the full linear regression model R² and model R² leaving each of these independent variables (IVs) out in order to assess the level of contribution of each of those IVs. However, in some models, partial R2 is negative (as the full model R2< the leave-one-out model R2) => can we have negative partial R2? What should I do to address this issue or Is there any other way? Can I use standardized regression models instead of estimating partial R2? Any responses are much appreciated.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#3

14 Aug 2021, 08:42

Welcome to Statalist.

it would help to see your code and output. Then we could see exactly what is going on, and also check if you are making some sort of mistake. Use code tags so your output is legible.

See the Statalist FAQ, especially the section on asking questions effectively.

one thing I would check — are the same cases being analyzed throughout? If missing data causes some models to have more cases than others do, that could distort your comparisons.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
Huy Nguyen

Join Date: Jun 2021

Posts: 15
#4

15 Aug 2021, 00:36

Thanks Williams very much. As per your advice, I would like to paste here the codes and steps for partial R2 = - 1.54:

1. Check missing data: (The commands in red text)
global COVARIATES_ KTL sex1 bmi diabetes1 run_orderk1 id_datek1
mdesc ktl_4dp1 $COVARIATES_KTL
/*
Variable | Missing Total Percent Missing
----------------+-----------------------------------------------
ktl_4dp1 | 11 200 5.50
sex1 | 0 200 0.00
bmi | 0 200 0.00
diabetes1 | 0 200 0.00
run_orderk1 | 0 200 0.00
id_datek1 | 0 200 0.00
----------------+-----------------------------------------------
*/

2. Estimate full model R2:
regress lnac_ratio_mg_g_crea1 ktl_4dp1 $COVARIATES_KTL //R2 = 5.78
/*
Source | SS df MS Number of obs = 80
-------------+---------------------------------- F(6, 73) = 0.75
Model | 3.27439032 6 .54573172 Prob > F = 0.6138
Residual | 53.335627 73 .730625028 R-squared = 0.0578
-------------+---------------------------------- Adj R-squared = -0.0196
Total | 56.6100173 79 .716582498 Root MSE = .85477

------------------------------------------------------------------------------
lnac_ratio~1 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ktl_4dp1 | -.0168167 .1679316 -0.10 0.9205 -.3515038 .3178704
sex1 | -.3039834 .2073607 -1.47 0.1470 -.7172526 .1092858
bmi | -.0248137 .0239596 -1.04 0.3038 -.0725651 .0229376
diabetes1 | -.1333981 .2831267 -0.47 0.6389 -.6976689 .4308727
run_orderk1 | -.1177673 .1936225 -0.61 0.5449 -.5036563 .2681218
id_datek1 | .3019704 .4012062 0.75 0.4541 -.4976324 1.101573
_cons | 2.200982 .9765674 2.25 0.0272 .2546862 4.147278
------------------------------------------------------------------------------
*/

3. Estimate partial R2 of kidney telomere length (ktl_4dp1) from the leave-one-out model:
regress lnac_ratio_mg_g_crea1 $COVARIATES_KTL //leave ktl_4dp1 out => R2 = 7.32

/*
Source | SS df MS Number of obs = 86
-------------+---------------------------------- F(5, 80) = 1.26
Model | 4.43416534 5 .886833068 Prob > F = 0.2876
Residual | 56.123305 80 .701541312 R-squared = 0.0732
-------------+---------------------------------- Adj R-squared = 0.0153
Total | 60.5574703 85 .712440827 Root MSE = .83758

------------------------------------------------------------------------------
lnac_ratio~1 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sex1 | -.3038415 .1949196 -1.56 0.1230 -.6917438 .0840608
bmi | -.0266283 .0210182 -1.27 0.2089 -.0684559 .0151992
diabetes1 | -.1148249 .2683111 -0.43 0.6698 -.6487811 .4191313
run_orderk1 | -.1406207 .1854659 -0.76 0.4506 -.5097096 .2284681
id_datek1 | .4038203 .375833 1.07 0.2858 -.3441111 1.151752
_cons | 1.976752 .7064951 2.80 0.0064 .5707823 3.382722
------------------------------------------------------------------------------
*/ => Partial R2 of ktl= 5.78 - 7.32 = - 1.54

Your continued advice is greatly acknowledged.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

15 Aug 2021, 05:48

Your output confirms that the problem is because your regressions are being run on different samples, as Richard Williams suggestd in post #2.

You first output tells us that you have 200 observations, and that ktl_4dp1 has a missing value in 11 observations.

Your second output tells us that that your regression that includes ktl_4dp1 has 80 observations after missing values in lnac_ratio_mg_g_crea1 ktl_4dp1, ktl_4dp1, and the variables in $COVARIATES_KTL are excluded. (So it appears that lnac_ratio_mg_g_crea1 must have numerous missing values as well.)

Your third output tells us that that your regression that excludes ktl_4dp1 has 86 observations. Thus it was run on 6 additional observations. The R² cannot be compared to the value from the regression with fewer observations.

You need to run both regressions on the same 80 observations. Here is one way to do that, taking advantage of the fact that e(sample) tells us which observations were used in the most recent estimation.

Code:

regress lnac_ratio_mg_g_crea1 ktl_4dp1 $COVARIATES_KTL generate byte to_use = e(sample) regress lnac_ratio_mg_g_crea1 $COVARIATES_KTL if to_use==1

Last edited by William Lisowski; 15 Aug 2021, 06:45.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#6

15 Aug 2021, 06:17

Thanks for providing the output. Again, though, I'll stress that you should use code tags, which William uses and are explained in the Statalist FAQ. As is, your output is very hard to read because any spaces after the first get deleted, so things don't line up correctly. William and others may be willing to plod through the output anyway but I usually am not!

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Huy Nguyen

Join Date: Jun 2021

Posts: 15
#7

15 Aug 2021, 23:59

Thanks William Lisowski and Rechard Williams. I appreciate your valued advice and guide. As I'm new member of Statalist, please forgive me for not well-organized codes. I will Statalist FAQ for how to use code tags. In terms of missing data, I thought 11 missings would affect the result not much; yet, as per your and William's explanation, we also have multiple missings in outcome which affect the estimation as well which I've not checked this before.
Comment

Announcement

Negative partial R2: How can I address this?

Comment

Comment

Comment

Comment

Comment

Comment