Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with interpreting / fitting a Fixed Effects-model

    Dear Statalist users,
    I'm currently working on my master's thesis and did compute a fixed effects-model using Stata. As this is my first time working with longitudinal data, there are a few things that confuse me and I'd appreciate help / a shove towards the right direction. A little bit of background: I'm exploring the effect of structural positions in a company network on the success of said companies. For that, I surveyed 43 companies and how they are interlocked over the time span of 15 years, giving me 180 (monthly) time points. My independent variables are (1) local network measures, (2) global network measures and (3) the general behaviour of the German equity index whereas the dependent variable is the equity price.

    My current output looks like this (the number of 38 IDs instead of 43 is due to missing values):
    Code:
    xtreg price dax indegree outdegree closeness constraint centralization density, fe robust
    
    Fixed-effects (within) regression               Number of obs      =      5537
    Group variable: id                              Number of groups   =        38
    
    R-sq:  within  = 0.0492                         Obs per group: min =         3
           between = 0.0581                                        avg =     145.7
           overall = 0.0431                                        max =       180
    
                                                    F(7,37)            =      4.16
    corr(u_i, Xb)  = 0.0356                         Prob > F           =    0.0018
    
                                          (Std. Err. adjusted for 38 clusters in id)
    --------------------------------------------------------------------------------
                   |               Robust
             price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
               dax |   1.372309   .7611821     1.80   0.080    -.1699926     2.91461
          indegree |   1.091147   2.588484     0.42   0.676     -4.15362    6.335913
         outdegree |   .8330976    .844596     0.99   0.330    -.8782164    2.544412
         closeness |   12689.14   7250.301     1.75   0.088    -2001.366    27379.65
        constraint |   15.38289   12.43083     1.24   0.224    -9.804358    40.57013
    centralization |   147.7793   80.39707     1.84   0.074    -15.12063    310.6792
           density |  -335.4107   445.2656    -0.75   0.456    -1237.604     566.783
             _cons |   18.54576   23.84009     0.78   0.442    -29.75884    66.85036
    ---------------+----------------------------------------------------------------
           sigma_u |  65.845741
           sigma_e |   37.70635
               rho |  .75305497   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------
    I already tested for heteroskedasticity using the xttest3 command (as suggested in this overview: https://www.princeton.edu/~otorres/Panel101.pdf), hence the robust standard errors - this should be correct? I also checked if FE-models are the correct choice using the Hausman-test.

    First off, I am very happy that at least parts of my model are significant, even though they were more so before the robust standard errors - but I guess I'll have to live with that. But I do have a few concerns:
    1. The surveyed companies are, in my case, not independent from each other. They are active in the same "field" and the independent variables have been specifically computed using the way they interfere with each other. If I'm correct, this should be a pretty big violation of my regression's assumptions. I've stumbled across the term "permutation" to deal with this, but so far failed to understand if this is suitable, let alone how to do it in Stata. Is this a good approach and can someone explain this in rather simple terms?
    2. Are there any goodness of fit-measures I should definitely not miss? I am currently only aware of the test for heteroskedasticity and the Hausman-test, following both a (beginners) book about panel regression and the presentation by Princeton (linked above).
    3. I've read that areg and xtreg will give me different R-sq values (https://www.stata.com/support/faqs/s...sus-xtreg-fe/#) and that the one from areg is preferable. Am I correct that it would be best to report both in my thesis?
    4. What worries me the most is that my dependent variable is very skewed. This is both apparent graphically as well as obvious in numeric measures (such as sktest). While this is not a formal assumption of the FE model, I've stumbled upon a lot of posts where people voice concerns about this. By looking at it, it almost resembles a poisson distribution (which I am not familiar with too, unfortunately). Would it make sense to work with something like the xtgee-command instead of xtreg or am I on the wrong track here? Is there any wise way to deal with the skewed distribution?

    Thanks for reading this far and apologies if some of my questions appear to be too simple, it just feels like it is a little over my head right now and since surveying and working with my data has taken up quite some time so far I want to make sure I do this final step correct. I am already super excited that my research hypothesis' seem to hold some ground, judging by the results. :-)

    Best regards

  • #2
    You didn't get a quick answer. You'll generally get more help by following the FAQ on asking questions - providing Stata code in code delimiters, readable Stata output (which you did provide), and sample data.
    Regarding your questions:
    1. If firms are not independent, then you problems, but how serious they are is not readily evaluated. The main assumption is that the x's are independent of the error terms. If this is likely, then you need some sort of instrumental variables estimator like ivregress. Alternatively, it may be that the x's are independent of the errors, but the errors are correlated across observations. Here, there are a wide variety of fixes depending on the form of the correlation structure. With network data, you need to look at the network literature. It has its own set of norms about estimating such models.
    2. Many observers are not terribly concerned about goodness of fit. Your theory is largely about the parameters, not the fit. Hausman is not a fit test - it compares two estimators where one is seen as consistent and the other makes stronger assumptions giving it higher efficiency but potentially making it not consistent. If you really care about fit, you might look at AIC and BIC.
    3. Again, I don't worry much about R-square. If I had panel data, I'd use xtreg and report all three r-squares.
    4. Skew is not necessarily a big deal. You're making assumptions about the distribution of the error term when you do tests, but this is not the same as an assumption about the distribution of y. Some folks would log or take a square root to reduce skewness. One commentator on this listserve has recommended using poisson regression to handle the problem. The issue I'd worry about is whether a few extreme values are strongly influencing your results. The documentation on regress talks about what can be done to assess this.

    Comment


    • #3
      Dear Phil,
      thanks for your thorough reply. I did not expect a quick answer because it's a lot of text and, in parts, not the most specific type of question, but rather broad - I know that might put some people off. I'm all the more thankful for you taking the time to respond to this. Also, I did not provide sample data because I did not perceive my questions to be a syntax problem and I thought the output would be sufficient. If my train of thought is wrong there I'll happily provide sample data.

      As I'm still a bit unsure about (1), I'll address the rest of my questions first.

      2. I'll look into AIC and BIC - my understanding was, that this is more suitable for event data analysis. I will see that I read up more on it and then decide wether to include it into my analysis or not. Thank you for that suggestions!
      3. Thanks, will do.
      4. Also, many thanks. From a theoretical (research question) perspective, a Fixed Effects-model makes the most sense to me, so I think I will leave it at that. I will look into my distributions some more to see if there are extreme values that need to be dealt with. I am quite sure there are.

      Unfortunately, I do not really understand how to proceed with (1). Am I correct that I should check this by using commands like `predict residuals` and then `qnorm`? If I do that, my data does not seem to deviate too much from the expected outcome.
      I will also look out for literature on network data. So far I was only able to find literature on network-internal models such as Siena or ERGM, but I'm sure the solution is right around the corner :-)

      Again, thank you for your time and best regards,
      Steffen

      Comment


      • #4
        Steffen:
        as an aside to Phil's helpful advice, if your companies are in fact nested within industries, you may want to consider a -mixed- approach instead of -xtreg, fe-.
        Eventually, whereas is good that -xtreg, fe- seems justified on a theoretical ground, how could you test -fe- vs -re- specification with -hausman- given that -hausman- allows default standard errors only?
        If you suspect heteroskedasticity and/or autocorrelation in your data and impose robust/clustered standard errors you should consider the user-written command -xtoverid- (type -search xtoverid- from within Stata to install it) to compare -fe- vs -re- specification.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,
          many thanks for your reply!

          The companies I'm looking at don't appear to be nested in their respective industries (neither from a theoretical perspective nor could I find anything remarkable in the data), they rather share a very broad field (they are all large equities). I am more concerned because the values for the independent variables have been computed specifically by looking at the connection between these companies.

          Also, thanks for the hint with the Hausman-test - I ran both -fe- and -re- models without robust standard errors and then ran the Hausman-test. Only afterwards did I test for heteroskedasticity. In hindsight, I understand why this is the wrong approach. -xtoverid- worked like a charm.

          Comment


          • #6
            Steffen:
            happy with reading that -xtoverid- supported your last analysis.
            Unfortunately, I'm familiar with network-internal models: that's why I've no suggestion about that topic.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear all,
              apologies for resurrecting this thread, but I have two follow-up questions and I think they're best read in the context of this. One is about mixed effects-models and one is about winsorizing my DV. I've highlighted the question because it is a big chunk of text.

              Carlo, I think I was a bit quick to dismiss your suggestion of looking into mixed models. I've read up on it and stumbled upon this quote (source: West, Welch, Galecki; Linear Mixed Models):
              outcome variables in which the residuals are normally distributed but may not be independent or have constant variance. Study designs leading to data sets that may be appropriately analyzed using LMMs include (1) studies with clustered data, such as students in classrooms, or experimental designs with random blocks, such as batches of raw material for an industrial process, and (2) longitudinal or repeated-measures studies, in which subjects are measured repeatedly over time or under different conditions.
              Now, it seems that both of these assumptions fit my data exceptionally well. The companies I am looking at are nested within a certain field (the equity index) and also in several industry sectors (which I've covered with a variable). Also, my data is measured monthly and the conditions they are under are ever-changing (the global economic situation). Is my interpretation of this quote according to my situation reasonable? And is it then reasonable to use my industry sector-variable as the identity variable? I ran it through Stata and got very good results, but of course this is all for naught if I misunderstood the model. This is what my model looks like:
              Code:
               xtmixed price dax indegree outdegree closeness constraint centralization density || branche:
              
              Performing EM optimization:
              
              Performing gradient-based optimization:
              
              Iteration 0:   log likelihood = -31073.811  
              Iteration 1:   log likelihood = -31073.811  (backed up)
              
              Computing standard errors:
              
              Mixed-effects ML regression                     Number of obs      =      5537
              Group variable: branche                         Number of groups   =         5
              
                                                              Obs per group: min =       180
                                                                             avg =    1107.4
                                                                             max =      3188
              
              
                                                              Wald chi2(7)       =   1518.96
              Log likelihood = -31073.811                     Prob > chi2        =    0.0000
              
              --------------------------------------------------------------------------------
                       price |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              ---------------+----------------------------------------------------------------
                         dax |   1.309262   1.810222     0.72   0.470    -2.238708    4.857232
                    indegree |   15.05383   .4207109    35.78   0.000     14.22925    15.87841
                   outdegree |   3.802555     .32859    11.57   0.000      3.15853     4.44658
                   closeness |   279.0146   34.11777     8.18   0.000      212.145    345.8842
                  constraint |   99.22682   4.180185    23.74   0.000     91.03381    107.4198
              centralization |   185.1203     38.408     4.82   0.000      109.842    260.3986
                     density |  -1265.175    106.146   -11.92   0.000    -1473.218   -1057.133
                       _cons |  -14.21867   14.63267    -0.97   0.331    -42.89816    14.46083
              --------------------------------------------------------------------------------
              
              ------------------------------------------------------------------------------
                Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
              -----------------------------+------------------------------------------------
              branche: Identity            |
                                 sd(_cons) |   22.20421   7.255493      11.70294    42.12847
              -----------------------------+------------------------------------------------
                              sd(Residual) |   66.09888   .6284093      64.87862    67.34208
              ------------------------------------------------------------------------------
              LR test vs. linear regression: chibar2(01) =   184.64 Prob >= chibar2 = 0.0000
              Now, the results are drastically different from my FE-model, so I want to make sure I'm not making a big mistake here. I also looked around the web and understand that it is difficult to check for heteroskedasticity in this. It has been suggested to visually check the residuals. I did this and it looked alright to my eye, which I take as a good sign.


              Also, I've winsorized my dependent variable, since some values have been really extreme (before that, I used a log-transformation to get a more normal distribution). The empirical data is correct, but I think this points to some unique specifications of the companies that my model will not be able to assess and hence the estimations will be distorted. By winsorizing, the data remains at the upper/lower end of the spectrum, but the estimations might be more sound. I would report both the values for the winsorized as well as the unchanged dependent variable. Does this sound like a reasonable solution?

              Thank you all in advance, really appreciate you helping out a novice.
              Last edited by Steffen Triebel; 31 Aug 2017, 06:54.

              Comment


              • #8
                Steffen:
                -your results with -mixed- are different from an -fe- model, because youe entered the -re- jurisdiction;
                - I'm not a fan of winsorizing: hence, I would consider the "unchanged" data only.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Carlo,
                  thanks again for your quick and, as always, helpful response. I'll discuss how to deal with my outliers with my professor once he returns from his vacation, but will take your view into account and read about it some more.

                  As for the model - I think my wording might have been a bit misleading :-) I am not surprised about the different results, I am rather relieved even - they make a lot of sense regarding my hypothesis'. My concern was more about if my reasoning behind choosing a mixed effect model was sound or if I misunderstood when to use it. Judging from the results it seems that this is just what I was looking for, but fiddling around with models until you find one that fits your theoretical assumptions generally seems like a bad idea. So I want to make sure that my argument for using this type of model is solid.

                  All best,
                  Steffen

                  Comment


                  • #10
                    Steffen:
                    if you're dealing with nested data (I would discuss this topic with your supervisor), -mixed- is worth exploring.
                    Besides, if -mixed- is actually the way to go, you should also investigate whether a random intercept model conveys enough information or it is outperformed by a random coefficient model (with both intercept and slope random).
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment

                    Working...
                    X