Formal tests of normality and are they important?

Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#16

27 Dec 2021, 09:13

Please show the exact command and output. This is good advice in general when seeking help here, but is particularly important in this case because people use the term "robust regression" to mean different things.
1 like
Comment
Cassie Wright

Join Date: Dec 2021

Posts: 44
#17

27 Dec 2021, 09:32

Originally posted by Clyde Schechter View Post

Please show the exact command and output. This is good advice in general when seeking help here, but is particularly important in this case because people use the term "robust regression" to mean different things.

Initially I used :

Code:

reg dependentvar i.catvar, vce(robust)

This gave me statistically significant p values and statistically insignificant CI including 0.

But then I used this code

Code:

reg dependentvar i.catvar if !missing(dependentvar, catvar), vce(robust)

And the p values and CI were statistically significant.

Is there any reason why they give different results?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#18

27 Dec 2021, 09:36

No, they should not give different results. Please show example data that reproduces this problem, and show the output that Stata is giving you. Without the fine-grained information, it is impossible to troubleshoot this.

If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

#19

27 Dec 2021, 10:20

Cassie:
as a robust regression (-rreg-) differ from an OLS with robust standard errors, you should share what you typed and what Stata gave you back (as per FAQ) to increase your chances of getting helpful replies, as Clyde wisely recommended.
In addition, as Clyde warned you about, both your codes should give back the same results:

Code:

use "C:\Program Files\Stata17\ado\base\a\auto.dta"
. regress price mpg, robust

Linear regression                               Number of obs     =         74
                                                F(1, 72)          =      17.28
                                                Prob > F          =     0.0001
                                                R-squared         =     0.2196
                                                Root MSE          =     2623.7

------------------------------------------------------------------------------
             |               Robust
       price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.47701    -4.16   0.000    -353.4727    -124.316
       _cons |   11253.06   1376.393     8.18   0.000     8509.272    13996.85
------------------------------------------------------------------------------

. regress price mpg if !missing(price, mpg), robust

Linear regression                               Number of obs     =         74
                                                F(1, 72)          =      17.28
                                                Prob > F          =     0.0001
                                                R-squared         =     0.2196
                                                Root MSE          =     2623.7

------------------------------------------------------------------------------
             |               Robust
       price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.47701    -4.16   0.000    -353.4727    -124.316
       _cons |   11253.06   1376.393     8.18   0.000     8509.272    13996.85
------------------------------------------------------------------------------

.

Last edited by Carlo Lazzaro; 27 Dec 2021, 10:27.

Kind regards,
Carlo
(Stata 19.0)

Comment

Helen Juliano

Join Date: Aug 2023

Posts: 1
#20

17 Aug 2023, 06:12

Hello!

I have a similar problem as Cassie had.

I am comparing the Lifesatisfaction between people with different working hours. For that I have a dependent variable for Lifesatisfaction and an independent variable for the Working hours per week. The working hours per week are categorized in 3 groups: Full-Time; 4-Day-Week, and Part-Time.

As I cannot do a t-test because there are 3 groups (the maximum for ttest is 2 groups), I decided to do an ANOVA analysis. However, my dependent variable is not normally distributed...

After some research I found some alternative tests: Mann–Whitney U test and the Wilcoxon Signed Rank test. But these tests are only eligible for variables with max. 2 groups, and my independent variable has 3 groups...
Is there any other alternative, that fits the 3 groups? Or is there a way I can transform the dependent variable?

Thank you in advance.

Kind regards
Helen
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35778
#21

17 Aug 2023, 06:59

I don't read #20 as a similar problem because I don't sense that with your data there would be any expectation that anything whatsoever would or even should be normally distributed. The nub of the matter is how is life satisfaction measured, as I am guessing that it is some kind of grade or score that is ordinal or just possibly a rough or crude interval scale.

There would be a better answer if you show us the results of

Code:

tab lifesatisfaction workinghours

where naturally you need to type the variable names you're using.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#22

17 Aug 2023, 09:10

First, to be clear, I agree with what Nick says in #21: nothing you have described suggests any real reason to be concerned about normality.

But I will point out that there is a non-parametric version of ANOVA called Kruskal-Wallis ANOVA, and Stata implements it in the command -kwallis-.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2421
#23

17 Aug 2023, 09:33

My view here is that one can (almost) always use -permute- or -bootstrap- and avoid the traditional concerns about normality and other ideal conditions. Among other things, such randomization tests allow the user to define their own test statistic, using the original metric, rather than the ranks used by Kruskal-Wallis and the like. I'd say that such "non-parametric" procedures, while clever and meaningful intellectual achievements in their own time, have much less (no?) relevance now when even a cheap personal computer can use Stata (etc.) to do some kind of randomization procedure in minutes or even seconds.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment