Hello everyone ,
I think I have difficulty checking for the normality of my variables by plotting histograms. Histograms seem very subjective when it is used as a method to check normality of variables . As slightly either right or left skewed data (not so obvious ones) can be still seen as normally distributed despite the fact that some outliers are seen either at the right or left side of the histogram itself.
Anyway, I was hoping to find the association between my dependent variable (continuous) and rest of variables of interest (continuous) by using linear regression analysis, but just realised that both my DV and, those continuous IVs that had been previously included in my model , all ended up failing these two tests - Skewness/Kurtosis tests for Normality and Breusch-Pagan / Cook-Weisberg test for heteroskedasticity . (Failing to check skewness of data by eyeballing.) But, while I used the command pnorm to check it, the result seems FINE. I would say.
Image of histogram of z_MI (dependent variable)
Image of pnorm of z_MI (dependent variable)
Here are my Stata outputs of Skewness/Kurtosis tests and Breusch-Pagan / Cook-Weisberg tests:
*Results of Breusch-Pagan / Cook-Weisberg test for heteroskedasticity are attached to this post below.
Apparently, z_MI is my outcome but as we decided to covert myo-inositol (MI) into MI in Z-score for the purpose of interoperation as, we have no information about in what unit, MI was previously being measured in the lab. Was trying to apply transformation on z_MI, but neither log z_MI nor log10 z_MI passed the tests. so after spending the whole day on googling yesterday, I came across some potential methods that possibly, still, enable me to examine the association between my DV and continuous IVs but without using linear regression.
As mentioned earlier, neither DV nor all continuous IVs included in my model meet the assumptions (normality and heteroskedasticity) of linear regression. There are some other variables of interest are categorical were included in the model as well.
Here were the commands that I came across yesterday and I simply ran a univariate regression between these two variables
(1) reg z_MI ogtt_2hour , robust (used Stata version 14.1)
(2) rreg z_MI ogtt_2hour (used Stata version 14.1)
May I know what is the difference between ",robust" and "rreg" as to me, they are all about robust regression? Thanks
(3) npregress kernel z_MI ogtt_2hour, vce(bootstraps ,reps(100) seed (123) (Stata version 15)
with regard to (3), I have no idea what is the standard # I should put for both reps ( ) and seed ( ) as I just followed the syntax from the Stata website blindly.
Other than the three syntax I used above, any other better suggestion to analyse data happen to be non-normal ? Personally, don't like to transform data as it makes interpretation harder.
Thank you for putting up with this long winded post and I truly appreciate your time and effort in contributing to this discussion.
Many thanks,
Emerald
I think I have difficulty checking for the normality of my variables by plotting histograms. Histograms seem very subjective when it is used as a method to check normality of variables . As slightly either right or left skewed data (not so obvious ones) can be still seen as normally distributed despite the fact that some outliers are seen either at the right or left side of the histogram itself.
Anyway, I was hoping to find the association between my dependent variable (continuous) and rest of variables of interest (continuous) by using linear regression analysis, but just realised that both my DV and, those continuous IVs that had been previously included in my model , all ended up failing these two tests - Skewness/Kurtosis tests for Normality and Breusch-Pagan / Cook-Weisberg test for heteroskedasticity . (Failing to check skewness of data by eyeballing.) But, while I used the command pnorm to check it, the result seems FINE. I would say.
Image of histogram of z_MI (dependent variable)
Image of pnorm of z_MI (dependent variable)
Here are my Stata outputs of Skewness/Kurtosis tests and Breusch-Pagan / Cook-Weisberg tests:
*Results of Breusch-Pagan / Cook-Weisberg test for heteroskedasticity are attached to this post below.
Apparently, z_MI is my outcome but as we decided to covert myo-inositol (MI) into MI in Z-score for the purpose of interoperation as, we have no information about in what unit, MI was previously being measured in the lab. Was trying to apply transformation on z_MI, but neither log z_MI nor log10 z_MI passed the tests. so after spending the whole day on googling yesterday, I came across some potential methods that possibly, still, enable me to examine the association between my DV and continuous IVs but without using linear regression.
As mentioned earlier, neither DV nor all continuous IVs included in my model meet the assumptions (normality and heteroskedasticity) of linear regression. There are some other variables of interest are categorical were included in the model as well.
Here were the commands that I came across yesterday and I simply ran a univariate regression between these two variables
(1) reg z_MI ogtt_2hour , robust (used Stata version 14.1)
(2) rreg z_MI ogtt_2hour (used Stata version 14.1)
May I know what is the difference between ",robust" and "rreg" as to me, they are all about robust regression? Thanks
(3) npregress kernel z_MI ogtt_2hour, vce(bootstraps ,reps(100) seed (123) (Stata version 15)
with regard to (3), I have no idea what is the standard # I should put for both reps ( ) and seed ( ) as I just followed the syntax from the Stata website blindly.
Other than the three syntax I used above, any other better suggestion to analyse data happen to be non-normal ? Personally, don't like to transform data as it makes interpretation harder.

Thank you for putting up with this long winded post and I truly appreciate your time and effort in contributing to this discussion.
Many thanks,
Emerald

Comment