Should I use the Jackknife option with linear regression?

Andrew Rosser

Join Date: Oct 2015
Posts: 23

Should I use the Jackknife option with linear regression?

25 Feb 2016, 03:52

Dear Stata users,

I am using linear regression in Stata 13.1 to see if two continuous non normally distributed variables significantly correlate or not. When I plot the variables in a two way scatter graph, there are some significant outlying values. I read that Jackknife regression is more robust than linear regression with non-normal data.

My output with this code for linear regression

Code:

 reg rpfdefpop stordur

Code:

 
      Source |       SS       df       MS              Number of obs =      18
-------------+------------------------------           F(  1,    16) =    1.17
       Model |  2315.69913     1  2315.69913           Prob > F      =  0.2948
    Residual |  31578.8436    16  1973.67773           R-squared     =  0.0683
-------------+------------------------------           Adj R-squared =  0.0101
       Total |  33894.5428    17  1993.79663           Root MSE      =  44.426

------------------------------------------------------------------------------
   rpfdefpop |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     stordur |   .6813429   .6290176     1.08   0.295    -.6521148    2.014801
       _cons |   48.69781   13.05285     3.73   0.002       21.027    76.36862
------------------------------------------------------------------------------

but when I use the jackknife regression option

Code:

  reg rpfdefpop stordur, vce(jackknife)

Code:

  
Linear regression                               Number of obs      =        18
                                                Replications       =        18
                                                F(   1,     17)    =      5.78
                                                Prob > F           =    0.0279
                                                R-squared          =    0.0683
                                                Adj R-squared      =    0.0101
                                                Root MSE           =   44.4261

------------------------------------------------------------------------------
             |              Jackknife
   rpfdefpop |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     stordur |   .6813429   .2833695     2.40   0.028     .0834856      1.2792
       _cons |   48.69781   12.82814     3.80   0.001     21.63279    75.76283
------------------------------------------------------------------------------

Using the jackknife option appears to show that the correlation is now statistically significant (p=0.028) whereas it wasn't using the standard linear regression (p=0.295). I can't find an answer on the forums as to which is the correct approach.

Many thanks for your help

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#2

25 Feb 2016, 04:02

Andrew:
the (possibly oversold) issue of normality in OLS refers to residuals only.
You do not report if you have performed OLS postestimation tests, such as -estat hettest-; hence, we cannot say anything about the dispersion of your residuals.
As a closing-out remark, significant or not, I would not trust that much the outcome of an OLS performed on such a limited sample size.

Last edited by Carlo Lazzaro; 25 Feb 2016, 04:20.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#3

25 Feb 2016, 04:10

Why not post the data? Or minimally show a scatter plot?

It may be that the apparent outliers suggest e.g. a suitable transformation which could be much more sensible for your data.

Note that writing of "significant outlying values" may suggest to some that you carried out a significance test, but you do not say what that was.

If you mean something like notable, prominent, striking, etc. when you say "significant", as I guess, then there are lots of good informal words that will serve your purpose better.

Note that jackknifing gives you the same regression, but just different standard errors and P-values. If the original regression was a bad idea, jackknifing won't fix it.

Last edited by Nick Cox; 25 Feb 2016, 04:19.
Comment

Andrew Rosser

Join Date: Oct 2015
Posts: 23

25 Feb 2016, 07:14

Thank you very much for your help. The data is from a pilot study hence the numbers are very small. I wasn't going to attribute too much significance to any result I obtained. Sorry for the imprecise language regarding outliers, I haven't done a significance test. I meant striking / notable instead of significant for values for patients 16 & 18.

Click image for larger version

Name: rpf dep pop.png
Views: 1
Size: 10.4 KB
ID: 1328287

16 & 18.

Patient	Rpfdefpos (%)	Stordur (days)
1	98.9	5
2	0	7
3	68.3	6
4	87.4	4
5	99.4	2
6	33.5	8
7	0	6
8	0	6
9	81.1	1
10	0	5
11	97.4	9
12	99.9	7
13	0	13
14	98.4	7
15	68.4	15
16	98.4	42
17	0	9
18	97.4	71

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35730
#5

25 Feb 2016, 07:56

Thanks for posting the data, but now it does seem to me that your data don't suit linear regression at all, and not just because of possible outliers. Evidently your response is bounded and there also seem to be some groups, e.g. some patients with zero whatever it is and some with almost 100%,

http://www.stata-journal.com/sjpdf.h...iclenum=st0147

gives some hints, but some trials with your data and a logit model for continuous proportions were not, as the scatter plot suggests, not especially promising either.
1 like
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

25 Feb 2016, 12:51

Ignoring from the fundamental problems that Nick points out, here is the answer to your original question : the first regress model that you fit assumes a constant standard deviation (homoskedasticity) and uses the assumption to estimate standard errors. The jackknife does not make this assumption. It estimates non-parametric standard errors. Therefore, if, as here, the SD is not constant, one would expect a difference.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Andrew Rosser

Join Date: Oct 2015

Posts: 23
#7

26 Feb 2016, 06:58

Thank you very much for advice.
Comment

Announcement

Should I use the Jackknife option with linear regression?

Comment

Comment

Comment

Comment

Comment

Comment