Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative partial R2: How can I address this?

    Dear All,
    I'm analyzing to assess contribution of 6 independent variables (IVs), I estimated partial R2 - the difference between the full model R2 and model R2 leaving each of these IVs out. However, some models had negative partial R2 =>What should I do to address this issue or any other way?

  • #2
    Dear All,
    I'm analyzing to estimate partial R2 - the difference between the full linear regression model R2 and model R2 leaving each of these independent variables (IVs) out in order to assess the level of contribution of each of those IVs. However, in some models, partial R2 is negative (as the full model R2< the leave-one-out model R2) => can we have negative partial R2? What should I do to address this issue or Is there any other way? Can I use standardized regression models instead of estimating partial R2? Any responses are much appreciated.

    Comment


    • #3
      Welcome to Statalist.

      it would help to see your code and output. Then we could see exactly what is going on, and also check if you are making some sort of mistake. Use code tags so your output is legible.

      See the Statalist FAQ, especially the section on asking questions effectively.

      one thing I would check — are the same cases being analyzed throughout? If missing data causes some models to have more cases than others do, that could distort your comparisons.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Thanks Williams very much. As per your advice, I would like to paste here the codes and steps for partial R2 = - 1.54:

        1. Check missing data: (The commands in red text)
        global COVARIATES_ KTL sex1 bmi diabetes1 run_orderk1 id_datek1
        mdesc ktl_4dp1 $COVARIATES_KTL

        /*
        Variable | Missing Total Percent Missing
        ----------------+-----------------------------------------------
        ktl_4dp1 | 11 200 5.50
        sex1 | 0 200 0.00
        bmi | 0 200 0.00
        diabetes1 | 0 200 0.00
        run_orderk1 | 0 200 0.00
        id_datek1 | 0 200 0.00
        ----------------+-----------------------------------------------
        */

        2. Estimate full model R2:
        regress lnac_ratio_mg_g_crea1 ktl_4dp1 $COVARIATES_KTL //R2 = 5.78
        /*
        Source | SS df MS Number of obs = 80
        -------------+---------------------------------- F(6, 73) = 0.75
        Model | 3.27439032 6 .54573172 Prob > F = 0.6138
        Residual | 53.335627 73 .730625028 R-squared = 0.0578
        -------------+---------------------------------- Adj R-squared = -0.0196
        Total | 56.6100173 79 .716582498 Root MSE = .85477

        ------------------------------------------------------------------------------
        lnac_ratio~1 | Coefficient Std. err. t P>|t| [95% conf. interval]
        -------------+----------------------------------------------------------------
        ktl_4dp1 | -.0168167 .1679316 -0.10 0.9205 -.3515038 .3178704
        sex1 | -.3039834 .2073607 -1.47 0.1470 -.7172526 .1092858
        bmi | -.0248137 .0239596 -1.04 0.3038 -.0725651 .0229376
        diabetes1 | -.1333981 .2831267 -0.47 0.6389 -.6976689 .4308727
        run_orderk1 | -.1177673 .1936225 -0.61 0.5449 -.5036563 .2681218
        id_datek1 | .3019704 .4012062 0.75 0.4541 -.4976324 1.101573
        _cons | 2.200982 .9765674 2.25 0.0272 .2546862 4.147278
        ------------------------------------------------------------------------------
        */

        3. Estimate partial R2 of kidney telomere length (ktl_4dp1) from the leave-one-out model:
        regress lnac_ratio_mg_g_crea1 $COVARIATES_KTL //leave ktl_4dp1 out => R2 = 7.32

        /*
        Source | SS df MS Number of obs = 86
        -------------+---------------------------------- F(5, 80) = 1.26
        Model | 4.43416534 5 .886833068 Prob > F = 0.2876
        Residual | 56.123305 80 .701541312 R-squared = 0.0732
        -------------+---------------------------------- Adj R-squared = 0.0153
        Total | 60.5574703 85 .712440827 Root MSE = .83758

        ------------------------------------------------------------------------------
        lnac_ratio~1 | Coefficient Std. err. t P>|t| [95% conf. interval]
        -------------+----------------------------------------------------------------
        sex1 | -.3038415 .1949196 -1.56 0.1230 -.6917438 .0840608
        bmi | -.0266283 .0210182 -1.27 0.2089 -.0684559 .0151992
        diabetes1 | -.1148249 .2683111 -0.43 0.6698 -.6487811 .4191313
        run_orderk1 | -.1406207 .1854659 -0.76 0.4506 -.5097096 .2284681
        id_datek1 | .4038203 .375833 1.07 0.2858 -.3441111 1.151752
        _cons | 1.976752 .7064951 2.80 0.0064 .5707823 3.382722
        ------------------------------------------------------------------------------
        */ => Partial R2 of ktl= 5.78 - 7.32 = - 1.54

        Your continued advice is greatly acknowledged.

        Comment


        • #5
          Your output confirms that the problem is because your regressions are being run on different samples, as Richard Williams suggestd in post #2.

          You first output tells us that you have 200 observations, and that ktl_4dp1 has a missing value in 11 observations.

          Your second output tells us that that your regression that includes ktl_4dp1 has 80 observations after missing values in lnac_ratio_mg_g_crea1 ktl_4dp1, ktl_4dp1, and the variables in $COVARIATES_KTL are excluded. (So it appears that lnac_ratio_mg_g_crea1 must have numerous missing values as well.)

          Your third output tells us that that your regression that excludes ktl_4dp1 has 86 observations. Thus it was run on 6 additional observations. The R2 cannot be compared to the value from the regression with fewer observations.

          You need to run both regressions on the same 80 observations. Here is one way to do that, taking advantage of the fact that e(sample) tells us which observations were used in the most recent estimation.
          Code:
          regress lnac_ratio_mg_g_crea1 ktl_4dp1 $COVARIATES_KTL
          generate byte to_use = e(sample)
          regress lnac_ratio_mg_g_crea1          $COVARIATES_KTL if to_use==1
          Last edited by William Lisowski; 15 Aug 2021, 06:45.

          Comment


          • #6
            Thanks for providing the output. Again, though, I'll stress that you should use code tags, which William uses and are explained in the Statalist FAQ. As is, your output is very hard to read because any spaces after the first get deleted, so things don't line up correctly. William and others may be willing to plod through the output anyway but I usually am not!
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Thanks William Lisowski and Rechard Williams. I appreciate your valued advice and guide. As I'm new member of Statalist, please forgive me for not well-organized codes. I will Statalist FAQ for how to use code tags. In terms of missing data, I thought 11 missings would affect the result not much; yet, as per your and William's explanation, we also have multiple missings in outcome which affect the estimation as well which I've not checked this before.

              Comment

              Working...
              X