Regression

Alex Hartley

Join Date: Nov 2022

Posts: 8
#1

Regression

16 Mar 2023, 03:01

Hi guys,

I am not too experienced with statistics, but am conducting some quantitative analysis for my undergrad Psychology dissertation and would like a bit of help please.

I ran a linear regression to see if social support level (a binary variable - either low or high) could predict Total Difficulties Score (a continuous variable). But, when running my assumptions test, the assumption of homoscedasticity was violated. Therefore, I did some research and found that one way of overcoming this problem is by log transforming the dependent variable. So, I log transformed Total difficulties score, creating a new variable called log_totaldifficultiesscore. I then reran the regression and the assumption was no longer violated, so the problem was overcome. BUT, I am know unsure as to HOW to interpret the coefficients, as it is no longer RAW scores being discussed in the regression but LOG TRANSFORMED SCORES. So, what would a coefficient of -.370 actually mean? Or, how can I can I 'un-log transform' the coefficients??

I hope my explanation makes sense and someone can help, I have been struggling with this for a few days now!

Thank you
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#2

16 Mar 2023, 03:37

Alex:
1) a simple OLS is hardly enough to give a fair and true view of the data generating process you're investigating. In all likelihood, your predictor is statistically significant (that simply is a matter of fact) but it also includes the effect of other independent variabes that you, as per your description, omitted to plug in the right-hand side of your regression equation;
2) if you detected heteroskedasticity only (but I guess that main issue with your model rests on its misspecification; see -linktest-), you can simply invoke -robust- standard errors without logging the regressand (unless your implicit goal/angle is a log-linear regression):
3) as per FAQ, please post what you typed nd what Stata gave you back and share an excerpt/example of your dataset via -dataex-. Thanks.

Last edited by Carlo Lazzaro; 16 Mar 2023, 04:13.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Hartley

Join Date: Nov 2022

Posts: 8
#3

16 Mar 2023, 03:58

Hello,
Thank you for your quick reply - that is very helpful!
Unfortunately I am very limited as to what information I can share from my STATA, as all analysis occurs within a secure hub as it is part of a national UK dataset. It is therefore not possible to copy and paste or screenshot from STATA, which makes things challenging!

I have tried using the robust command:
regress sebdtot i.suppsc2 i.ChldSx i.ltillness i.Bullyexp i.tenure, vce(robust)

When I perform this regression, then run estat imtest, heteroskedasticity is p=0.0009.

So, adding the robust command has not overcome the issue.
Have I done this correctly?
Any more guidance would be incredibly helpful!
Thanks so much
Alex
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#4

16 Mar 2023, 05:34

Alex:
1) whenever you deal with a confidential dataset, you could tackle the issue notwithstanding by changing the name of the variables (Spiderman; Dare Devil, Alfa; Beta and so on and so forth) and follows my previous (full of typos9 ) suggestion #3:
2) this is a common cobblestone I stumbled upon many times years ago (and Statalist put me on the right track then): the -robust- option affects the standard errors, not the residual; that's why -estat imtest- (or -estat hettest) will complain about heteroskedasticity ever after. The -robust- command did fix the issue; is the test that should not be repeated;
3) unsolicited advice: check via -linkedin- the correctness of the functional form of the regressand (that, if misspecified, bites way harder than heteroskedasticity).

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Hartley

Join Date: Nov 2022

Posts: 8
#5

16 Mar 2023, 05:49

Again, thank you so much for your help!
Just to clarify and make sure I have got my head around it.
The robust command has resulted in a change to the standard errors and NOT my residuals. Because the residuals have not changed, the data is still heteroskedastic. However, the change in standard errors as a result of the robust command means the linear regression can be appropriately run, despite the heteroscedasticity.
Please let me know if that is correct. I am very much a beginner with stats and also very new to STATA so struggling to get my head around things.
Thanks again
Alex
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#6

16 Mar 2023, 07:25

The vce(robust) option (not a command) does not change the coefficient estimates, but only the reported standard errors. So fitted and residuals are unchanged, as you say. That is the intention.

I would judge heteroscedasticity here graphically, not by a test, say by looking at rvfplot after a regression.

Last edited by Nick Cox; 16 Mar 2023, 07:28.
Comment
Alex Hartley

Join Date: Nov 2022

Posts: 8
#7

16 Mar 2023, 07:29

Hello Nick, thank you for your reply!
That makes a lot of sense.
So, with reference to my initial post, I will not bother creating a log-variable for Total Difficulties Score. Instead, I shall run the regression with Total Difficulties Score and add the vce(robust) option to account for the heteroscedasticity - do you think this sounds like a good idea?

Also, if you don't mind me asking just out of curiosity, what is the difference between an option and a command in STATA?
Thanks so much
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#8

16 Mar 2023, 07:57

Sorry, no; I can't advise you without seeing the data whether transforming or not transforming or indeed using a model with log link is best for your project. Considerations include whether the score is bounded and so predictions out of range are a risk,

The curiosity is expected and welcome, but you can find out yourself by reading e.g.

Code:

help language

and please look at https://www.statalist.org/forums/help#spelling
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment