Hallo everyone,
currently I am doing an analysis with different IV (in total 4: gender diversity, educational background diversity, educational diversity and tenure diversity). I analyze their influence on my DV totalplatact.
I am doing this with the xtreg command. I am wondering why I get significant results when I am doing the single linear regression for each IV separately. But when I am doing a multiple linear regression my results turning insignificant. Now my question is: Am I doing something wrong or is it necessary to do a multiple linear regression? I understand that in the multiple linear regression, I am adding more data, but why do they influence my the results that strong? Furthermore, do I need to do a multiple linear regression or is it also possible to do 4 single linear regression and how do this change the interpretation of my results?
Here you can find my code for the multiple linear regression:
Here are the results:
Fixed-effects (within) regression Number of obs = 2,516
Group variable: group_id Number of groups = 462
R-squared: Obs per group:
Within = 0.6002 min = 1
Between = 0.4460 avg = 5.4
Overall = 0.4402 max = 8
F(13,2041) = 235.74
corr(u_i, Xb) = 0.2609 Prob > F = 0.0000
--------------------------------------------------------------------------------------------------
ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
---------------------------------+----------------------------------------------------------------
educational_diversity | -.0262389 .3550358 -0.07 0.941 -.7225092 .6700315
gender_diversity | -.0323503 .5151676 -0.06 0.950 -1.042659 .9779588
educational_background_diversity | .3229427 .360618 0.90 0.371 -.384275 1.030161
tenure_diversity | .4573999 .2042954 2.24 0.025 .0567507 .8580491
firm_age | .2588898 .0287609 9.00 0.000 .2024861 .3152935
total_countries | .0647614 .0053486 12.11 0.000 .0542721 .0752506
itraffic | .4617485 .0670973 6.88 0.000 .3301621 .5933349
|
year |
2014 | .5222751 .1348361 3.87 0.000 .2578443 .7867059
2015 | .7147278 .128312 5.57 0.000 .4630916 .9663639
2016 | .6753462 .1103312 6.12 0.000 .4589727 .8917197
2017 | 1.162105 .1288624 9.02 0.000 .9093891 1.41482
2018 | 1.474414 .1457064 10.12 0.000 1.188665 1.760163
2019 | 1.145387 .1406569 8.14 0.000 .8695413 1.421233
2020 | 0 (omitted)
|
_cons | 4.089895 .7902792 5.18 0.000 2.540057 5.639732
---------------------------------+----------------------------------------------------------------
sigma_u | 3.1296129
sigma_e | 1.3644004
rho | .84029016 (fraction of variance due to u_i)
--------------------------------------------------------------------------------------------------
F test that all u_i=0: F(461, 2041) = 19.63 Prob > F = 0.0000
Here is the code for one example of the single linear regressions:
Here the result for te single linear regression:
Fixed-effects (within) regression Number of obs = 2,516
Group variable: group_id Number of groups = 462
R-squared: Obs per group:
Within = 0.5982 min = 1
Between = 0.4404 avg = 5.4
Overall = 0.4361 max = 8
F(10,2044) = 304.35
corr(u_i, Xb) = 0.2584 Prob > F = 0.0000
---------------------------------------------------------------------------------------
ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
educational_diversity | .6277709 .2789133 2.25 0.025 .080787 1.174755
firm_age | .2566612 .028795 8.91 0.000 .2001906 .3131318
total_countries | .0656033 .0053511 12.26 0.000 .0551092 .0760974
itraffic | .4649335 .0671624 6.92 0.000 .3332196 .5966474
|
year |
2014 | .5127186 .1350401 3.80 0.000 .2478881 .7775492
2015 | .6997248 .1284152 5.45 0.000 .4478865 .9515632
2016 | .6639445 .1104493 6.01 0.000 .4473396 .8805494
2017 | 1.156521 .1290474 8.96 0.000 .9034429 1.409599
2018 | 1.470326 .1459171 10.08 0.000 1.184165 1.756488
2019 | 1.133732 .1408184 8.05 0.000 .8575694 1.409895
2020 | 0 (omitted)
|
_cons | 4.100442 .791515 5.18 0.000 2.548182 5.652702
----------------------+----------------------------------------------------------------
sigma_u | 3.139236
sigma_e | 1.3668274
rho | .84063688 (fraction of variance due to u_i)
---------------------------------------------------------------------------------------
F test that all u_i=0: F(461, 2044) = 19.72 Prob > F = 0.0000
So now I am wondering why for the educational diversity the significance switches from 0.025 (significant) for the single linear regression to 0.941 (extremely insignificant) for the multiple linear regression. Further the Coefficient changes from positive to negative.
Sorry for the bad copy of the data, but I don't get how to copy the regression table into statalist in a nice way. The relevant p values are marked, to make it at least a bit better readable. Let me know if there is a better way and I will copy it again.
Best regards
Jana
currently I am doing an analysis with different IV (in total 4: gender diversity, educational background diversity, educational diversity and tenure diversity). I analyze their influence on my DV totalplatact.
I am doing this with the xtreg command. I am wondering why I get significant results when I am doing the single linear regression for each IV separately. But when I am doing a multiple linear regression my results turning insignificant. Now my question is: Am I doing something wrong or is it necessary to do a multiple linear regression? I understand that in the multiple linear regression, I am adding more data, but why do they influence my the results that strong? Furthermore, do I need to do a multiple linear regression or is it also possible to do 4 single linear regression and how do this change the interpretation of my results?
Here you can find my code for the multiple linear regression:
Code:
xtreg ln_totalplatact educational_diversity gender_diversity educational_background_diversity tenure_diversity firm_age total_countries itraffic i.year, fe
Fixed-effects (within) regression Number of obs = 2,516
Group variable: group_id Number of groups = 462
R-squared: Obs per group:
Within = 0.6002 min = 1
Between = 0.4460 avg = 5.4
Overall = 0.4402 max = 8
F(13,2041) = 235.74
corr(u_i, Xb) = 0.2609 Prob > F = 0.0000
--------------------------------------------------------------------------------------------------
ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
---------------------------------+----------------------------------------------------------------
educational_diversity | -.0262389 .3550358 -0.07 0.941 -.7225092 .6700315
gender_diversity | -.0323503 .5151676 -0.06 0.950 -1.042659 .9779588
educational_background_diversity | .3229427 .360618 0.90 0.371 -.384275 1.030161
tenure_diversity | .4573999 .2042954 2.24 0.025 .0567507 .8580491
firm_age | .2588898 .0287609 9.00 0.000 .2024861 .3152935
total_countries | .0647614 .0053486 12.11 0.000 .0542721 .0752506
itraffic | .4617485 .0670973 6.88 0.000 .3301621 .5933349
|
year |
2014 | .5222751 .1348361 3.87 0.000 .2578443 .7867059
2015 | .7147278 .128312 5.57 0.000 .4630916 .9663639
2016 | .6753462 .1103312 6.12 0.000 .4589727 .8917197
2017 | 1.162105 .1288624 9.02 0.000 .9093891 1.41482
2018 | 1.474414 .1457064 10.12 0.000 1.188665 1.760163
2019 | 1.145387 .1406569 8.14 0.000 .8695413 1.421233
2020 | 0 (omitted)
|
_cons | 4.089895 .7902792 5.18 0.000 2.540057 5.639732
---------------------------------+----------------------------------------------------------------
sigma_u | 3.1296129
sigma_e | 1.3644004
rho | .84029016 (fraction of variance due to u_i)
--------------------------------------------------------------------------------------------------
F test that all u_i=0: F(461, 2041) = 19.63 Prob > F = 0.0000
Here is the code for one example of the single linear regressions:
Code:
xtreg ln_totalplatact educational_diversity firm_age total_countries itraffic i.year, fe
Fixed-effects (within) regression Number of obs = 2,516
Group variable: group_id Number of groups = 462
R-squared: Obs per group:
Within = 0.5982 min = 1
Between = 0.4404 avg = 5.4
Overall = 0.4361 max = 8
F(10,2044) = 304.35
corr(u_i, Xb) = 0.2584 Prob > F = 0.0000
---------------------------------------------------------------------------------------
ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------------+----------------------------------------------------------------
educational_diversity | .6277709 .2789133 2.25 0.025 .080787 1.174755
firm_age | .2566612 .028795 8.91 0.000 .2001906 .3131318
total_countries | .0656033 .0053511 12.26 0.000 .0551092 .0760974
itraffic | .4649335 .0671624 6.92 0.000 .3332196 .5966474
|
year |
2014 | .5127186 .1350401 3.80 0.000 .2478881 .7775492
2015 | .6997248 .1284152 5.45 0.000 .4478865 .9515632
2016 | .6639445 .1104493 6.01 0.000 .4473396 .8805494
2017 | 1.156521 .1290474 8.96 0.000 .9034429 1.409599
2018 | 1.470326 .1459171 10.08 0.000 1.184165 1.756488
2019 | 1.133732 .1408184 8.05 0.000 .8575694 1.409895
2020 | 0 (omitted)
|
_cons | 4.100442 .791515 5.18 0.000 2.548182 5.652702
----------------------+----------------------------------------------------------------
sigma_u | 3.139236
sigma_e | 1.3668274
rho | .84063688 (fraction of variance due to u_i)
---------------------------------------------------------------------------------------
F test that all u_i=0: F(461, 2044) = 19.72 Prob > F = 0.0000
So now I am wondering why for the educational diversity the significance switches from 0.025 (significant) for the single linear regression to 0.941 (extremely insignificant) for the multiple linear regression. Further the Coefficient changes from positive to negative.
Sorry for the bad copy of the data, but I don't get how to copy the regression table into statalist in a nice way. The relevant p values are marked, to make it at least a bit better readable. Let me know if there is a better way and I will copy it again.
Best regards
Jana

Comment