dataset normality issue (probit analysis)

Rabab Al hasni

Join Date: May 2019

Posts: 70
#1

dataset normality issue (probit analysis)

21 May 2019, 09:33

Hi all,

I intending to employ simple probit, multinomial and Heckman selection model analysis to identify determinants of firms that is applied or not applied for bank credit. I have sample size 300 participants out of population about 18000. I have been asked to check the nature of the data normality (i.e. normal distribution). I tried Skewness and Kurtosis and Shapiro wilk tests. I found the results of Shapiro looks much better than the former test. However, still, I do have many variables that are non-normal distributed (e.g. business sector, age, gender, location, formal education which are <0.05). From your Knowledge and experience, How I could solve the issue of non-normality of these variables. If you think that is normality issue is not a matter for my research analysis, How I could justify this please?

Please, I am seeking for your kind advice.

Many thanks for giving your attention and time in advance

Rabab
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35754
#2

21 May 2019, 09:43

Where is it that you think normal distributions are assumed (better expression: ideal conditions) for what you want? Your response is binary, so you don't expect it to be normal. Non-normality of predictors sometimes goes hand in hand with outliers or nonlinearity, but marginal normality of predictors is -- while nice if you have it -- not essential for what you want to do. (Note that if it were an assumption then using indicator or dummy predictors would have to be banned!) .

I would always take the evidence of normal quantile plots more seriously than Shapiro-Wilk or Jarque-Bera tests (if that is what you are using).
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#3

21 May 2019, 09:45

Rabab:
as an aside to Nick's helpful advice, you may be interested in the following thread: https://www.statalist.org/forums/for...-probit-models

Kind regards,
Carlo
(Stata 19.0)
Comment
Rabab Al hasni

Join Date: May 2019

Posts: 70
#4

22 May 2019, 06:35

Many thanks Nick and Carlo for your prompt reply, surely, your clarifications will help me.

Sorry I am not an econometrician but I am trying my best to learn the terms and methods that are required to my research analysis. Thus, I would like to ask you kindly about the RESET Test. I have tried to search for it in order to perform it as a test for the normality but I got results in google engine with a different term called 'Ramsey RESET test' that is concerned for 'omitted structure by including powers of the predicted values'.

My question: Do you know how to perform the RESET test with squares and cubes in Stata software for the purpose of examining the normality? and is it the same the Ramsey test?

I will consider Nick Cox advice but I would like also to see the RESET test what would reveal for the normality?

Kind regards,
Rabab
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#5

22 May 2019, 06:57

I am not an econometrician either and I have never used the RESET test. My idea is that you choose transformations bearing in mind patterns on quantile and scatter plots and substantive knowledge.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2479
#6

22 May 2019, 07:15

Hi Rabab,
Couple of points, in addition to the comments you already got.
1. Regarding Reset test. This is not a test of normality, but rather a test to see if your model is well specified.

Say your true model is:

Code:

y=a0+a1*x+a2*x^2+e

where E(e)=E(e|x)= 0 and Var(e)=var(e|x)=constant (so its not necessarily normal), is homoskedastic and X is exogenous.
Lets further assume that e~Normal(0,sigma^2)
But you estimate this:

Code:

y=b0+b1*x+u

The consequences are that the model is misspecified, thus b0 and b1 will be inconsistent and biased.
Also your error u is a combination of the true error e, and x^2 (also violating the Linear model assumptions), and may no longer be normal.
Ramsay Reset test aims to see if your model is misspecified or not. using, as you correctly states squares and cubes of the predicted y_hat and looking if this terms are significant in the regression. If they are, its likely the model is misspecified. Ramsay Reset test does not tell you how to correct the misspecification tho.

2. Regarding the normality assumptions of your data, the test you and Nick mentioned: Shapiro-Wilk or Jarque-Bera tests, will tell you if the data comes from a normal distribution or not, but are only appropriate for continuous data. Discrete data is by default not normally distributed.

3. Normality assumptions in regression context usually refers to the normality of the error. For OLS, normality of the errors is not essential for statistical inference or estimation. Im not aware of a formal test of this assumption in the context of selection models. Perhaps you can search for "testing normality assumptions in heckman selection models", to find the right answer for your question.
HTH
Fernando
2 likes
Comment
Rabab Al hasni

Join Date: May 2019

Posts: 70
#7

23 May 2019, 03:09

Originally posted by Nick Cox View Post

I am not an econometrician either and I have never used the RESET test. My idea is that you choose transformations bearing in mind patterns on quantile and scatter plots and substantive knowledge.

Thank you very much Nick
Comment
Rabab Al hasni

Join Date: May 2019

Posts: 70
#8

23 May 2019, 03:36

Originally posted by FernandoRios View Post

Hi Rabab,
Couple of points, in addition to the comments you already got.
1. Regarding Reset test. This is not a test of normality, but rather a test to see if your model is well specified.

Say your true model is:

Code:

y=a0+a1*x+a2*x^2+e

where E(e)=E(e|x)= 0 and Var(e)=var(e|x)=constant (so its not necessarily normal), is homoskedastic and X is exogenous.
Lets further assume that e~Normal(0,sigma^2)
But you estimate this:

Code:

y=b0+b1*x+u

The consequences are that the model is misspecified, thus b0 and b1 will be inconsistent and biased.
Also your error u is a combination of the true error e, and x^2 (also violating the Linear model assumptions), and may no longer be normal.
Ramsay Reset test aims to see if your model is misspecified or not. using, as you correctly states squares and cubes of the predicted y_hat and looking if this terms are significant in the regression. If they are, its likely the model is misspecified. Ramsay Reset test does not tell you how to correct the misspecification tho.

2. Regarding the normality assumptions of your data, the test you and Nick mentioned: Shapiro-Wilk or Jarque-Bera tests, will tell you if the data comes from a normal distribution or not, but are only appropriate for continuous data. Discrete data is by default not normally distributed.

3. Normality assumptions in regression context usually refers to the normality of the error. For OLS, normality of the errors is not essential for statistical inference or estimation. Im not aware of a formal test of this assumption in the context of selection models. Perhaps you can search for "testing normality assumptions in heckman selection models", to find the right answer for your question.
HTH
Fernando

Dear Fernando,

Many thanks for your clarifications.

Regarding RESET Test, I think I understood now that people (who want to run regression) instead of worrying about the normality test they tend to prove the specification goodness of the model structure via using RESET. I got files from google research engine that has explanations on how to apply the RESET on Stata but the examples were used on OLS. Can I apply the RESET after running the probit model, especially that my whole data is nominal/ discrete not continuous even the dependent variables? To my knowledge, the OLS is regression used only with the continuous dependent variable and so I am wondering if I can run it over my dataset to test Ramsey RESET.

Appreciate your great efforts to provide me this explanation

Kind regards,
Rabab
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2479
#9

23 May 2019, 04:08

I think you need to look into -linktest-. Which based on my cursory look at the stata manual, is very similar to ramsey test, but can be used for other single equation models.
look into help linktest, and into the manual entry. Is very informative with good examples.
Best
Comment
Rabab Al hasni

Join Date: May 2019

Posts: 70
#10

23 May 2019, 04:25

Originally posted by FernandoRios View Post

I think you need to look into -linktest-. Which based on my cursory look at the stata manual, is very similar to ramsey test, but can be used for other single equation models.
look into help linktest, and into the manual entry. Is very informative with good examples.
Best

Thank you very much Fernando

Kind regards,
Rabab
Comment

Announcement

dataset normality issue (probit analysis)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment