Multiple linear regression

Jana Schue

Join Date: Oct 2021

Posts: 116
#1

Multiple linear regression

14 Jan 2023, 05:00

Hallo everyone,
currently I am doing an analysis with different IV (in total 4: gender diversity, educational background diversity, educational diversity and tenure diversity). I analyze their influence on my DV totalplatact.

I am doing this with the xtreg command. I am wondering why I get significant results when I am doing the single linear regression for each IV separately. But when I am doing a multiple linear regression my results turning insignificant. Now my question is: Am I doing something wrong or is it necessary to do a multiple linear regression? I understand that in the multiple linear regression, I am adding more data, but why do they influence my the results that strong? Furthermore, do I need to do a multiple linear regression or is it also possible to do 4 single linear regression and how do this change the interpretation of my results?

Here you can find my code for the multiple linear regression:

Code:

xtreg ln_totalplatact educational_diversity gender_diversity educational_background_diversity tenure_diversity firm_age total_countries itraffic i.year, fe

Here are the results:

Fixed-effects (within) regression Number of obs = 2,516
Group variable: group_id Number of groups = 462

R-squared: Obs per group:
Within = 0.6002 min = 1
Between = 0.4460 avg = 5.4
Overall = 0.4402 max = 8

F(13,2041) = 235.74
corr(u_i, Xb) = 0.2609 Prob > F = 0.0000

--------------------------------------------------------------------------------------------------
ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
---------------------------------+----------------------------------------------------------------
educational_diversity | -.0262389 .3550358 -0.07 0.941 -.7225092 .6700315
gender_diversity | -.0323503 .5151676 -0.06 0.950 -1.042659 .9779588
educational_background_diversity | .3229427 .360618 0.90 0.371 -.384275 1.030161
tenure_diversity | .4573999 .2042954 2.24 0.025 .0567507 .8580491
firm_age | .2588898 .0287609 9.00 0.000 .2024861 .3152935
total_countries | .0647614 .0053486 12.11 0.000 .0542721 .0752506
itraffic | .4617485 .0670973 6.88 0.000 .3301621 .5933349
|
year |
2014 | .5222751 .1348361 3.87 0.000 .2578443 .7867059
2015 | .7147278 .128312 5.57 0.000 .4630916 .9663639
2016 | .6753462 .1103312 6.12 0.000 .4589727 .8917197
2017 | 1.162105 .1288624 9.02 0.000 .9093891 1.41482
2018 | 1.474414 .1457064 10.12 0.000 1.188665 1.760163
2019 | 1.145387 .1406569 8.14 0.000 .8695413 1.421233
2020 | 0 (omitted)
|
_cons | 4.089895 .7902792 5.18 0.000 2.540057 5.639732
---------------------------------+----------------------------------------------------------------
sigma_u | 3.1296129
sigma_e | 1.3644004
rho | .84029016 (fraction of variance due to u_i)
--------------------------------------------------------------------------------------------------
F test that all u_i=0: F(461, 2041) = 19.63 Prob > F = 0.0000

Here is the code for one example of the single linear regressions:

Code:

xtreg ln_totalplatact educational_diversity firm_age total_countries itraffic i.year, fe

Here the result for te single linear regression:

Fixed-effects (within) regression Number of obs = 2,516
Group variable: group_id Number of groups = 462

R-squared: Obs per group:
Within = 0.5982 min = 1
Between = 0.4404 avg = 5.4
Overall = 0.4361 max = 8

F(10,2044) = 304.35
corr(u_i, Xb) = 0.2584 Prob > F = 0.0000

---------------------------------------------------------------------------------------
ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
educational_diversity | .6277709 .2789133 2.25 0.025 .080787 1.174755
firm_age | .2566612 .028795 8.91 0.000 .2001906 .3131318
total_countries | .0656033 .0053511 12.26 0.000 .0551092 .0760974
itraffic | .4649335 .0671624 6.92 0.000 .3332196 .5966474
|
year |
2014 | .5127186 .1350401 3.80 0.000 .2478881 .7775492
2015 | .6997248 .1284152 5.45 0.000 .4478865 .9515632
2016 | .6639445 .1104493 6.01 0.000 .4473396 .8805494
2017 | 1.156521 .1290474 8.96 0.000 .9034429 1.409599
2018 | 1.470326 .1459171 10.08 0.000 1.184165 1.756488
2019 | 1.133732 .1408184 8.05 0.000 .8575694 1.409895
2020 | 0 (omitted)
|
_cons | 4.100442 .791515 5.18 0.000 2.548182 5.652702
----------------------+----------------------------------------------------------------
sigma_u | 3.139236
sigma_e | 1.3668274
rho | .84063688 (fraction of variance due to u_i)
---------------------------------------------------------------------------------------
F test that all u_i=0: F(461, 2044) = 19.72 Prob > F = 0.0000

So now I am wondering why for the educational diversity the significance switches from 0.025 (significant) for the single linear regression to 0.941 (extremely insignificant) for the multiple linear regression. Further the Coefficient changes from positive to negative.

Sorry for the bad copy of the data, but I don't get how to copy the regression table into statalist in a nice way. The relevant p values are marked, to make it at least a bit better readable. Let me know if there is a better way and I will copy it again.

Best regards
Jana
Tags: None
Pengzhan Qian

Join Date: Jan 2023

Posts: 5
#2

14 Jan 2023, 05:14

Hi Jana,

When you put more control variables, the coefficient of existing variables is "absorbed" in the coefficients of new variables so there is nothing wrong. It is just the nature of regression.

Besides, what you called "single linear regressions" is already a multiple linear regression. A true single linear regression would look like this:

reg ln_totalplatact educational_diversity
Comment
Jana Schue

Join Date: Oct 2021

Posts: 116
#3

14 Jan 2023, 06:14

Hi Pengzhan,

thanks for your explanation!

But now the question for me is how do I interpret my results? When having only one IV my results are significant and when haveing multiple IV the results are insignificant.

Best regards
Jana
Comment
Pengzhan Qian

Join Date: Jan 2023

Posts: 5
#4

14 Jan 2023, 09:10

Hi Jana,

It means the coefficient of "educational_diversity" is not robust to the model specification (what control variables you put in the model).

In this case, you can not give it a meaningful interpretation because the coefficient is too sensitive to control variables.
Comment
Jana Schue

Join Date: Oct 2021

Posts: 116
#5

14 Jan 2023, 10:05

Hi Pengzhan,

I understand the point. Thanks for explaining.
But in my case the other variables are also IVs.
So the question is what to do know? To put them out of my model would be pretty problematic for my research.

Best regards
Jana
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#6

15 Jan 2023, 04:04

Jana:
you should stick with your initial -xtreg,fe- code:
1) the within R_sq is pretty good (0.6002):
2) there's evidence of a panel-wise effect (F test that all u_i=0: F(461, 2044) = 19.72 Prob > F = 0.0000);
3) given the number of panels in your dataset, you should go -vce(cluster panelid)- standard errors;
4) it does not make sense to perform one regression per predictor as you're not giving a fair and true view of the data generating process this way. In addition, it not surprisning that the single predictor turn out to be statistically significant, but you shoud not rely on that piece of information, because a simple linear regression works well for teaching purposes but is totally uninformative in real-world research;
5) provided that your regression model is correctly specified (and that -hausman- or the community-contributed module -xtoverid-, if, as you ought to, clustered-robust standard errors are invoked, do not point you out to -re- specification), -fe- is the way to go;
6) please note that the most likely reason for the lack of statistical significance of your coefficients is a limited within panel variation of time-varying predictors.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Jana Schue

Join Date: Oct 2021

Posts: 116
#7

16 Jan 2023, 07:51

Hi Carlo,

thanks for your advice and explanation.

Now I use slightly different data and get significant results. Just when I integrate the vce (cluster panelid) my results turn insignificant.
Do I need to include this command or is it also possible to stay just with fe (Hausman test recommended using fe). Or if I do so what are the limitations of my analysis?

Here my results:
Fixed-effects (within) regression Number of obs = 685
Group variable: group_id Number of groups = 171

R-squared: Obs per group:
Within = 0.6102 min = 1
Between = 0.1397 avg = 4.0
Overall = 0.2264 max = 7

F(13,501) = 60.34
corr(u_i, Xb) = 0.0980 Prob > F = 0.0000

---------------------------------------------------------------------------------------
ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
educational_diversity | -.21364 .5482795 -0.39 0.697 -1.29085 .8635704
gender_diversity | 2.103213 .8283263 2.54 0.011 .4757918 3.730634
background_diversity | 1.484516 .6620977 2.24 0.025 .1836863 2.785347
tenure_diversity | .9061782 .3587043 2.53 0.012 .2014281 1.610928
firm_age | .4321347 .0378201 11.43 0.000 .3578292 .5064402
TMT_size | .0946607 .082135 1.15 0.250 -.0667107 .2560322
growth | .1005996 .0415818 2.42 0.016 .0189034 .1822957
itraffic | .2876017 .1296906 2.22 0.027 .0327972 .5424062
|
year |
2015 | .2807606 .1663144 1.69 0.092 -.045999 .6075202
2016 | .4505605 .1515766 2.97 0.003 .1527565 .7483646
2017 | .8372377 .1908038 4.39 0.000 .4623635 1.212112
2018 | 1.172811 .2371612 4.95 0.000 .7068583 1.638764
2019 | 1.25151 .2381836 5.25 0.000 .7835485 1.719472
2020 | 0 (omitted)
|
_cons | 6.880359 1.598531 4.30 0.000 3.739708 10.02101
----------------------+----------------------------------------------------------------
sigma_u | 3.123255
sigma_e | 1.0053125
rho | .90612002 (fraction of variance due to u_i)
---------------------------------------------------------------------------------------
F test that all u_i=0: F(170, 501) = 34.19 Prob > F = 0.0000

Best regards
Jana
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#8

16 Jan 2023, 08:20

Jana:
sorry to disappoint you, but with 171 panels the clustered robust standard errors is called for.
Just to cheer you up (or hopefully so), the default standard errors give you an illusory idea of statistical significance (that, at the risk, of sounding as a bit of a bore, is not the most important issues in inferential statistics).
That said:
1) the most likely reason for the lack of statistical significance of your coefficients is a limited within panel variation of time-varying predictors;
2) you may want to investigate whether (or not) a non-linear relationship betweem -firm_age- and the regressand exists;
3) couble-check if -fe- is actually the way to go by running -xtreg,re- and then the community-contributed module -xtoverid- (its null being that -re- is the way to go).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Jana Schue

Join Date: Oct 2021

Posts: 116
#9

19 Jan 2023, 08:27

Hello Carlo,
thanks for your help and explanations.
I understand now that I have to use the -vce(robust) / vce(cluster panelid)- for my model.

I did the hausman test and random effects seem to be the way to go.
Now I want to try your recommendation with the -xtoverid-, but an error occurs when I try to use it.

Code:

xtreg ln_totalplatact ln_educational_diversity ln_gender_diversity ln_background_diversity ln_tenure_diversity firm_age itraffic growth i.year , re vce(robust) xtoverid

When I use this code I get the error:
xtoverid
2014b: operator invalid

Can you explain to me where the fault lies?

Best regards,
Jana
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#10

19 Jan 2023, 08:49

Jana:
yes, of course..
The community-contributed module -xtoverid-, glorious as it is, is a bit olf-fashioned and does not support -fvvarlist- notation.
The usual fix is to prefix your -xtoverid- code with the -xi:- prefix (pun not intended ).

Kind regards,
Carlo
(Stata 19.0)
Comment
Jana Schue

Join Date: Oct 2021

Posts: 116
#11

19 Jan 2023, 09:22

Hey Carlo,
I am not shure if I get you right.

Now I tried the code:

Code:

xtreg ln_totalplatact_w ln_educational_diversity ln_gender_diversity ln_background_diversity ln_tenure_diversity firm_age itraffic ihs_growth i.year , re vce(robust) xi: xtoverid

But I get the error code:
xi:
not allowed

Could you please clarify once more?

Thanks in advance and best regards,
Jana
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#12

19 Jan 2023, 09:24

Jana:
sorry, my bad: I shoud have written -xtreg- instead of -xtoverid-.
Please go:

Code:

xi: xtreg ln_totalplatact_w ln_educational_diversity ln_gender_diversity ln_background_diversity ln_tenure_diversity firm_age itraffic ihs_growth i.year , re vce(robust) xtoverid

Kind regards,
Carlo
(Stata 19.0)
Comment
Jana Schue

Join Date: Oct 2021

Posts: 116
#13

19 Jan 2023, 09:28

Hey Carlo,

thanks for your fast reply.

Code:

xi: xtreg ln_totalplatact_w ln_educational_diversity ln_gender_diversity ln_background_diversity ln_tenure_diversity firm_age itraffic ihs_growth i.year , re vce(robust) xtoverid

I used your code, but still get an error:
o. operator not allowed

Do you have an idea how to fix this?

Best regards,
Jana
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#14

19 Jan 2023, 09:54

Jana:
yes.
The variable(s) tat was(were) omitted with -xtreg- should be removed manually before running -xtoverid-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jana Schue

Join Date: Oct 2021

Posts: 116
#15

19 Jan 2023, 10:50

Hey Carlo,

thanks for all your help!

Now it worked and the result is:
Sargan-Hansen statistic 32.106 Chi-sq(12) P-value = 0.0013
Therefore, I will stick to the re vce(robust) model.

Best regards
Jana
Comment

Announcement

Multiple linear regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment