Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed & Random Effects with noisy ESG (sustainability) data

    Dear readers & contributers!

    I am completely new here, a master (Msc. Sustainable Finance) student just a few weeks removed from leaving the world of academia behind, gratefully Please excuse me if I do something wrong in this post, I have read the FAQ and will try to apply the CODE tags!

    I am running my data analysis on Stata 14.0 right now, and would appreciate it immensely if you could find the time to advice me on my issue!

    A tiny little background: I am researching whether or not BGD (= board gender diversity, variable name GR) has an impact on the environmental & social (EIVA and SIVA, respectfully) performance of a company. I do this in the context of neo-institutional theory, meaning that I gathered sufficient data to do a pan-continental analysis (comparing 69 countries grouped in 4 classes, depending on how well-institutionalized corporate sustainability is in this class) and see whether the BGD has more/less impact on EIVA & SIVA. I have a panel data set with 18,573 firm-year observations, years range from 2015-2018

    My variables are:
    dependent: EIVA or SIVA
    independent:
    - GR: gender ratio; the higher this number the more male-dominant the board of directors is; the MAIN variable of interest
    - NM: ratio representing the nationality mix
    - market cap, revenue and debt as control variables - variable names MC, Revenu and Debt
    - nordicEU; westernEU; thirdgroup and fourthgroup = the 4 groups of classified countries; THESE ARE ALL DUMMY VARIABLES!
    - sectornum: the sector a company is in, which I
    Code:
    encode
    from string to numeric to be able to create dummy variables with
    Code:
    i.sectornum
    - year, also a dummy variable which I use to control for macro-economic changes in each year
    - 69 country dummy variables which I do not use directly in the regression but used to create the 4 classes

    I am trying to decide between the fixed effects and random effects, whereby:
    a) the Hausman test clearly points to the fixed effects
    b) the random effects model results are EXACTLY what I wanted to show for both the GR variable and the 4 country-classes; it is in line with the literature and my own rationale
    c) one of my Finance professors had a (too short) talk with me recently whereby he criticised the use of FE for ESG/sustainability data, as, according to him, this is very noisy data and a FE estimator would simply take away the little meaningful variation we have in the data and regress predominantly, noise. As I mentioned the Hausman test he brushed it off by saying it is often very biased. He had to rush away and I had no chance to express just how unclear it was what he said, yet I feel like his remark could help me out with writing a convincing methodology in favor of the RE. Does any of you have an idea what he could have meant? He is abroad now and will be so due to family circumstances for a couple of months, and it would be highly inappropriate to bother him.

    now, my regressions are:

    Code:
     xtreg EIVA GR NM MC Revenu Debt nordicEU westernEU thirdgroup fourthgroup i.sectornum i.year, re vce(robust)
    Code:
     xtreg EIVA GR NM MC Revenu Debt  i.year, fe vce(robust)
    when it comes to the FE regression, I had to leave out all the 4 country classifications and the sector dummies since Stata just dropped them due to collinearity. This is a HUGE problem for me since I need the coefficients on the 4 classes as a significant part of what I am trying to contribute with this master thesis!!

    The coefficient on GR under the FE estimator is positive (which would mean that the less women directors there are on a board the better the environmental performance, going against literally ALL literature), whereas the coefficient is negative under the RE estimator, and like I already said, the RE model makes sense overall. Yet the Prob>chi2 of the Hausman is 0.000, begging for an inconsistent random effects estimator. I ran the Hausman before I included robust standard errors.

    So now that you know (in case you had the wonderful patience to actually read all of this) the situation, my questions specifically are:

    1. Does anyone have a clue what the professor could have meant with the noisy data not being appropriate for FE, and vice versa?
    2. Is my coding in Stata even correct?
    3. How can I make the smart choice between FE and RE? Or would you suggest another model that would allow to estimate the 4 classes of countries?
    3. Would you know any academic articles/literature in general that might help me further/back up using the RE?

    THANK YOU so much in advance, I am so happy to have found this Stata-community & forum!

    Kind regards,

    Amira













  • #2
    I am among those who deplore the mindless use of the Hausman test to pick between fe and re. But here is another way to think about it, and it may or may not be what your professor had in mind.

    The first important point is that in panel data, the within-panel effect of a predictor on an outcome can be different from the between-panel effect. They can differ in any way you can imagine, including having opposite signs. For a simple-minded but clear illustration of this point, run this code:
    Code:
    clear
    set obs 5
    gen panel_id = _n
    expand 2
    
    set seed 1234
    by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5)
    by panel_id: gen x = panel_id + _n
    
    xtset panel_id
    
    xtreg y x, fe
    regress y x
    
    //    GRAPH THE DATA TO SHOW WHAT'S HAPPENING
    separate y, by(panel_id)
    
    graph twoway connect y? x || lfit y x
    So it is important for you to know whether you are interested in the within-firm or between-firm effects of GR on EIVA/SIVA. This is a decision you must make before you can write down any regression commands. (In particular, you must not make the decision based on which one gives you the results you would rather see. At best that's bad science; at worst it's fraud. You must make it based on an understanding of what question you are trying to answer with your research. The within- and between- effects are answers to different questions; knowing what your question is is the very first task in any research project.)

    If you are looking for the within-effects, then it is usually best to use the fixed effects estimator. If you feel like using the Hausman test at that point and Hausman says that random effects is OK too, then you can get slightly more efficient estimates by using random effects. But the -fe- and -re estimates will be nearly the same in this circumstance: Hausman would reject random effects if they were not. Frankly, I think the Hausman test is a waste of time in this context because if I really want the within- effects, that is what the -fe- estimator gives me, whereas the -re- estimator approximates it with a weighted average of within- and between- effects (and assumes that the within- and between- effects are actually equal.)

    If you are looking for between-effects, then you cannot use the -fe- estimator because it is a purely within-effects estimator. It doesn't matter what Hausman says about it: fixed-effects estimators only give you within-panel estimates. So if you need between-effects you have some other choices. For linear regressions, there is -xtreg, be-. This is a pure between-effects estimator. And -xtreg, re- may be useful. As already mentioned, the -re- estimator is based on the assumption that within and between effects are equal, and the results it gives are a weighted average of the two. But if in your data the within and between effects are not reasonably close to equal, then the results of -xtreg, re- will be nonsense. The Hausman test is one way of seeing whether the within- and between- effects are equal. If Hausman rejects -re- it is rejecting the hypothesis that within effects = between effects. If you are catering to an audience that expects to see Hausman tests, this would be a situation for using it. Personally, though, it is not my preferred way of doing it. The problem is that being a statistical hypothesis test, it is sensitive to sample size: in a large sample it will usually reject random effects even if the results would be very acceptable, and in a small sample it could very well fail to reject random effects even if the results would be far off the mark. So my preferred approach is to obtain separate estimates of the within- and between- effects using Francisco Perales' -xthybrid- command, available from SSC. You will get both a within and between effect estimate for each variable (or at least for each variable for which they can both be estimated.) If the within and between effects are, for practical purposes, the same for all the variables (or at least for everything but the nuisance variables), then you can feel comfortable using -xtreg, re- for your final results (and those results will be similar to the results shown by -xthybrid- but will feel more familiar to most audiences) If the within- and between- effect estimates from -xthybrid- differ from each other materially, then I would use the between-effect outputs from -xthybrid- as my between-effect estimates.

    Added: Thank you for the use of code delimiters.

    Comment


    • #3
      Jeff Wooldridge in #11 of this link provides an excellent perspective on this debate between the choice of within and between estimators. It largely rejects the notion that because my interest is in between-variation, I should go for the between or random effects estimator. Of course, this does not negate a lot of Clyde's excellent points.

      Comment


      • #4
        Actually, I don't think that what Jeff Wooldridge says there contradicts what I said here. I think it's a matter of emphasis. What I say here is that when you are interested in between effects, the random-effects model may be useful, and I outline that I think that's the case only when the between effects and the fixed effects are nearly enough equal that the difference between them is ignorable for practical purposes. But, in general, I recommended using the between effects estimators from the -xthybrid- command , which are different from the random effects estimates. Wooldridge's comment is silent about that.

        And, although it is not relevant here, I certainly agree with Wooldridge that the only place where the random effects estimator really stands out as preferable is in a randomized study.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Actually, I don't think that what Jeff Wooldridge says there contradicts what I said here. I think it's a matter of emphasis. What I say here is that when you are interested in between effects, the random-effects model may be useful, and I outline that I think that's the case only when the between effects and the fixed effects are nearly enough equal that the difference between them is ignorable for practical purposes. But, in general, I recommended using the between effects estimators from the -xthybrid- command , which are different from the random effects estimates. Wooldridge's comment is silent about that.

          And, although it is not relevant here, I certainly agree with Wooldridge that the only place where the random effects estimator really stands out as preferable is in a randomized study.
          Dear Dr. Schechter! Thank you so much for your quick and very expansive reply! I will be sure to try out the proposed alternative and will most certainly check back in with the results! Thank you again!!!

          Comment


          • #6
            Actually, I don't think that what Jeff Wooldridge says there contradicts what I said here. I think it's a matter of emphasis. What I say here is that when you are interested in between effects, the random-effects model may be useful, and I outline that I think that's the case only when the between effects and the fixed effects are nearly enough equal that the difference between them is ignorable for practical purposes. But, in general, I recommended using the between effects estimators from the -xthybrid- command , which are different from the random effects estimates. Wooldridge's comment is silent about that.

            And, although it is not relevant here, I certainly agree with Wooldridge that the only place where the random effects estimator really stands out as preferable is in a randomized study.
            Agreed Clyde, the point being that in the choice between the standard random effects and fixed effects estimators, the decision can only be made after a comparison of the random effects and fixed effects coefficients and not independent of this.

            Comment

            Working...
            X