Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple linear regression

    Hallo everyone,
    currently I am doing an analysis with different IV (in total 4: gender diversity, educational background diversity, educational diversity and tenure diversity). I analyze their influence on my DV totalplatact.

    I am doing this with the xtreg command. I am wondering why I get significant results when I am doing the single linear regression for each IV separately. But when I am doing a multiple linear regression my results turning insignificant. Now my question is: Am I doing something wrong or is it necessary to do a multiple linear regression? I understand that in the multiple linear regression, I am adding more data, but why do they influence my the results that strong? Furthermore, do I need to do a multiple linear regression or is it also possible to do 4 single linear regression and how do this change the interpretation of my results?

    Here you can find my code for the multiple linear regression:
    Code:
    xtreg ln_totalplatact educational_diversity gender_diversity educational_background_diversity tenure_diversity firm_age total_countries itraffic i.year, fe
    Here are the results:

    Fixed-effects (within) regression Number of obs = 2,516
    Group variable: group_id Number of groups = 462

    R-squared: Obs per group:
    Within = 0.6002 min = 1
    Between = 0.4460 avg = 5.4
    Overall = 0.4402 max = 8

    F(13,2041) = 235.74
    corr(u_i, Xb) = 0.2609 Prob > F = 0.0000

    --------------------------------------------------------------------------------------------------
    ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
    ---------------------------------+----------------------------------------------------------------
    educational_diversity | -.0262389 .3550358 -0.07 0.941 -.7225092 .6700315
    gender_diversity | -.0323503 .5151676 -0.06 0.950 -1.042659 .9779588
    educational_background_diversity | .3229427 .360618 0.90 0.371 -.384275 1.030161
    tenure_diversity | .4573999 .2042954 2.24 0.025 .0567507 .8580491
    firm_age | .2588898 .0287609 9.00 0.000 .2024861 .3152935
    total_countries | .0647614 .0053486 12.11 0.000 .0542721 .0752506
    itraffic | .4617485 .0670973 6.88 0.000 .3301621 .5933349
    |
    year |
    2014 | .5222751 .1348361 3.87 0.000 .2578443 .7867059
    2015 | .7147278 .128312 5.57 0.000 .4630916 .9663639
    2016 | .6753462 .1103312 6.12 0.000 .4589727 .8917197
    2017 | 1.162105 .1288624 9.02 0.000 .9093891 1.41482
    2018 | 1.474414 .1457064 10.12 0.000 1.188665 1.760163
    2019 | 1.145387 .1406569 8.14 0.000 .8695413 1.421233
    2020 | 0 (omitted)
    |
    _cons | 4.089895 .7902792 5.18 0.000 2.540057 5.639732
    ---------------------------------+----------------------------------------------------------------
    sigma_u | 3.1296129
    sigma_e | 1.3644004
    rho | .84029016 (fraction of variance due to u_i)
    --------------------------------------------------------------------------------------------------
    F test that all u_i=0: F(461, 2041) = 19.63 Prob > F = 0.0000




    Here is the code for one example of the single linear regressions:
    Code:
    xtreg ln_totalplatact educational_diversity firm_age total_countries itraffic i.year, fe
    Here the result for te single linear regression:

    Fixed-effects (within) regression Number of obs = 2,516
    Group variable: group_id Number of groups = 462

    R-squared: Obs per group:
    Within = 0.5982 min = 1
    Between = 0.4404 avg = 5.4
    Overall = 0.4361 max = 8

    F(10,2044) = 304.35
    corr(u_i, Xb) = 0.2584 Prob > F = 0.0000

    ---------------------------------------------------------------------------------------
    ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
    ----------------------+----------------------------------------------------------------
    educational_diversity | .6277709 .2789133 2.25 0.025 .080787 1.174755
    firm_age | .2566612 .028795 8.91 0.000 .2001906 .3131318
    total_countries | .0656033 .0053511 12.26 0.000 .0551092 .0760974
    itraffic | .4649335 .0671624 6.92 0.000 .3332196 .5966474
    |
    year |
    2014 | .5127186 .1350401 3.80 0.000 .2478881 .7775492
    2015 | .6997248 .1284152 5.45 0.000 .4478865 .9515632
    2016 | .6639445 .1104493 6.01 0.000 .4473396 .8805494
    2017 | 1.156521 .1290474 8.96 0.000 .9034429 1.409599
    2018 | 1.470326 .1459171 10.08 0.000 1.184165 1.756488
    2019 | 1.133732 .1408184 8.05 0.000 .8575694 1.409895
    2020 | 0 (omitted)
    |
    _cons | 4.100442 .791515 5.18 0.000 2.548182 5.652702
    ----------------------+----------------------------------------------------------------
    sigma_u | 3.139236
    sigma_e | 1.3668274
    rho | .84063688 (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------------
    F test that all u_i=0: F(461, 2044) = 19.72 Prob > F = 0.0000


    So now I am wondering why for the educational diversity the significance switches from 0.025 (significant) for the single linear regression to 0.941 (extremely insignificant) for the multiple linear regression. Further the Coefficient changes from positive to negative.



    Sorry for the bad copy of the data, but I don't get how to copy the regression table into statalist in a nice way. The relevant p values are marked, to make it at least a bit better readable. Let me know if there is a better way and I will copy it again.


    Best regards
    Jana

  • #2
    Hi Jana,

    When you put more control variables, the coefficient of existing variables is "absorbed" in the coefficients of new variables so there is nothing wrong. It is just the nature of regression.

    Besides, what you called "single linear regressions" is already a multiple linear regression. A true single linear regression would look like this:

    reg ln_totalplatact educational_diversity

    Comment


    • #3
      Hi Pengzhan,

      thanks for your explanation!

      But now the question for me is how do I interpret my results? When having only one IV my results are significant and when haveing multiple IV the results are insignificant.

      Best regards
      Jana

      Comment


      • #4
        Hi Jana,

        It means the coefficient of "educational_diversity" is not robust to the model specification (what control variables you put in the model).

        In this case, you can not give it a meaningful interpretation because the coefficient is too sensitive to control variables.

        Comment


        • #5
          Hi Pengzhan,

          I understand the point. Thanks for explaining.
          But in my case the other variables are also IVs.
          So the question is what to do know? To put them out of my model would be pretty problematic for my research.

          Best regards
          Jana

          Comment


          • #6
            Jana:
            you should stick with your initial -xtreg,fe- code:
            1) the within R_sq is pretty good (0.6002):
            2) there's evidence of a panel-wise effect (F test that all u_i=0: F(461, 2044) = 19.72 Prob > F = 0.0000);
            3) given the number of panels in your dataset, you should go -vce(cluster panelid)- standard errors;
            4) it does not make sense to perform one regression per predictor as you're not giving a fair and true view of the data generating process this way. In addition, it not surprisning that the single predictor turn out to be statistically significant, but you shoud not rely on that piece of information, because a simple linear regression works well for teaching purposes but is totally uninformative in real-world research;
            5) provided that your regression model is correctly specified (and that -hausman- or the community-contributed module -xtoverid-, if, as you ought to, clustered-robust standard errors are invoked, do not point you out to -re- specification), -fe- is the way to go;
            6) please note that the most likely reason for the lack of statistical significance of your coefficients is a limited within panel variation of time-varying predictors.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hi Carlo,

              thanks for your advice and explanation.

              Now I use slightly different data and get significant results. Just when I integrate the vce (cluster panelid) my results turn insignificant.
              Do I need to include this command or is it also possible to stay just with fe (Hausman test recommended using fe). Or if I do so what are the limitations of my analysis?

              Here my results:
              Fixed-effects (within) regression Number of obs = 685
              Group variable: group_id Number of groups = 171

              R-squared: Obs per group:
              Within = 0.6102 min = 1
              Between = 0.1397 avg = 4.0
              Overall = 0.2264 max = 7

              F(13,501) = 60.34
              corr(u_i, Xb) = 0.0980 Prob > F = 0.0000

              ---------------------------------------------------------------------------------------
              ln_totalplatact | Coefficient Std. err. t P>|t| [95% conf. interval]
              ----------------------+----------------------------------------------------------------
              educational_diversity | -.21364 .5482795 -0.39 0.697 -1.29085 .8635704
              gender_diversity | 2.103213 .8283263 2.54 0.011 .4757918 3.730634
              background_diversity | 1.484516 .6620977 2.24 0.025 .1836863 2.785347
              tenure_diversity | .9061782 .3587043 2.53 0.012 .2014281 1.610928
              firm_age | .4321347 .0378201 11.43 0.000 .3578292 .5064402
              TMT_size | .0946607 .082135 1.15 0.250 -.0667107 .2560322
              growth | .1005996 .0415818 2.42 0.016 .0189034 .1822957
              itraffic | .2876017 .1296906 2.22 0.027 .0327972 .5424062
              |
              year |
              2015 | .2807606 .1663144 1.69 0.092 -.045999 .6075202
              2016 | .4505605 .1515766 2.97 0.003 .1527565 .7483646
              2017 | .8372377 .1908038 4.39 0.000 .4623635 1.212112
              2018 | 1.172811 .2371612 4.95 0.000 .7068583 1.638764
              2019 | 1.25151 .2381836 5.25 0.000 .7835485 1.719472
              2020 | 0 (omitted)
              |
              _cons | 6.880359 1.598531 4.30 0.000 3.739708 10.02101
              ----------------------+----------------------------------------------------------------
              sigma_u | 3.123255
              sigma_e | 1.0053125
              rho | .90612002 (fraction of variance due to u_i)
              ---------------------------------------------------------------------------------------
              F test that all u_i=0: F(170, 501) = 34.19 Prob > F = 0.0000




              Best regards
              Jana

              Comment


              • #8
                Jana:
                sorry to disappoint you, but with 171 panels the clustered robust standard errors is called for.
                Just to cheer you up (or hopefully so), the default standard errors give you an illusory idea of statistical significance (that, at the risk, of sounding as a bit of a bore, is not the most important issues in inferential statistics).
                That said:
                1) the most likely reason for the lack of statistical significance of your coefficients is a limited within panel variation of time-varying predictors;
                2) you may want to investigate whether (or not) a non-linear relationship betweem -firm_age- and the regressand exists;
                3) couble-check if -fe- is actually the way to go by running -xtreg,re- and then the community-contributed module -xtoverid- (its null being that -re- is the way to go).
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hello Carlo,
                  thanks for your help and explanations.
                  I understand now that I have to use the -vce(robust) / vce(cluster panelid)- for my model.

                  I did the hausman test and random effects seem to be the way to go.
                  Now I want to try your recommendation with the -xtoverid-, but an error occurs when I try to use it.


                  Code:
                   xtreg ln_totalplatact ln_educational_diversity ln_gender_diversity ln_background_diversity ln_tenure_diversity firm_age itraffic growth  i.year , re vce(robust)
                   xtoverid
                  When I use this code I get the error:
                  xtoverid
                  2014b: operator invalid

                  Can you explain to me where the fault lies?

                  Best regards,
                  Jana

                  Comment


                  • #10
                    Jana:
                    yes, of course..
                    The community-contributed module -xtoverid-, glorious as it is, is a bit olf-fashioned and does not support -fvvarlist- notation.
                    The usual fix is to prefix your -xtoverid- code with the -xi:- prefix (pun not intended ).
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hey Carlo,
                      I am not shure if I get you right.

                      Now I tried the code:

                      Code:
                              
                      xtreg ln_totalplatact_w ln_educational_diversity ln_gender_diversity ln_background_diversity ln_tenure_diversity firm_age itraffic ihs_growth  i.year , re vce(robust)
                      xi:  
                      xtoverid
                      But I get the error code:
                      xi:
                      not allowed

                      Could you please clarify once more?

                      Thanks in advance and best regards,
                      Jana

                      Comment


                      • #12
                        Jana:
                        sorry, my bad: I shoud have written -xtreg- instead of -xtoverid-.
                        Please go:
                        Code:
                        xi: xtreg ln_totalplatact_w ln_educational_diversity ln_gender_diversity ln_background_diversity ln_tenure_diversity firm_age itraffic ihs_growth i.year , re vce(robust)
                        xtoverid
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Hey Carlo,

                          thanks for your fast reply.

                          Code:
                           xi: xtreg ln_totalplatact_w ln_educational_diversity ln_gender_diversity ln_background_diversity ln_tenure_diversity firm_age itraffic ihs_growth  i.year , re vce(robust)
                           xtoverid
                          I used your code, but still get an error:
                          o. operator not allowed

                          Do you have an idea how to fix this?

                          Best regards,
                          Jana

                          Comment


                          • #14
                            Jana:
                            yes.
                            The variable(s) tat was(were) omitted with -xtreg- should be removed manually before running -xtoverid-.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Hey Carlo,

                              thanks for all your help!

                              Now it worked and the result is:
                              Sargan-Hansen statistic 32.106 Chi-sq(12) P-value = 0.0013
                              Therefore, I will stick to the re vce(robust) model.


                              Best regards
                              Jana

                              Comment

                              Working...
                              X