Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing results obtained by FE, RE, and OLS regression models when usind different time periods and dependent variables

    Dear Stata Community,

    Given I am fairly new to econometrics and the understandings behind this, I have a few questions regarding the use of the three regression models RE, FE, and OLS and the comparison of these in analyzing panel data, which I hope you can help me answer.

    I am currently regressing data with the use of Stata and performing both the Hausman test and the LM or F-test in determining which model is the appropriate one to use.
    My data set contains all listed companies in the Nordics during the time period 2005-2017.
    The models I am regressing are models where I have investment ratio as the dependent variable and lagged cash flow as the independent variable. My model changes in regards to two things:

    1) time period; i.e. I run the same regression for three different time periods (2005-2007; 2008-2009; 2010-2017)
    2) After I have run the three regressions in bullet 1 and analyzed these, I then run the regressions for the three time periods, but by using two other dependent variables.

    What I have currently done is run the two tests to determine what regression model to use for each of the three time periods.
    For the models with different dependent variables for all the three periods I have done the exact same for each of the periods. These tests show different results in terms of what model is the correct one to use.

    My question is:
    1) Can I compare the results for the three time periods by using different models?
    2) When I later on use the two other dependent variables in the three time periods, can I then also use different models across time periods and dependent variables and compare these?

    As a way of showing you my codes, I have written them down below. The sections marked in red are the important ones (the dependent variable I change, different time periods used, and the tests of which model to use).


    Thank you very much in advance!

    Code:
    * Load data
    clear
    insheet using "C:\Users\dkkrz\Desktop\stata3middel.csv", delimiter(";")
    
    * Encode panel data
    encode selskabsnavn, gen(selskab)
    tsset selskab r
        
    * Create investments divided by lagged total assets variable
    gen totalassetskorrigeret_1 = l.totalassetskorrigeret
    gen capex_ratio = (capitalexpendituresadditiontofix/totalassetskorrigeret_1)
    
    * Ekskludér negative CFLOW
    drop if cflow<0
    
    * Create lagged cflow variable
    gen cflow_1 = l.cflow
    
    ****************************************
    
    * Random effects models all years last year
    xtreg capex_ratio cflow_1, re
    estimates store random_effects
    
    * Fixed effects models all years last year
    xtreg capex_ratio cflow_1, fe
    estimates store fixed_effects
    
    * Test random effects vs. fixed effects
    hausman fixed_effects random_effects
    
    * Regular OLS all years last year
    regress capex_ratio cflow_1
    
    * Test OLS vs. random effects (Breusch-Pagan)
    estat hettest
    
    * Use random effects with categories
    xtreg capex_ratio cflow_1, re
    
    *******************************************
    * Create lagged cflow variable
    gen netsalesorrevenues_1 = l.netsalesorrevenues
    gen sales_growth = (netsalesorrevenues/netsalesorrevenues_1 - 1)*100
    gen sales_growth_1 = l.sales_growth
    gen sales_growth_sq = sales_growth^2
    
    * Encode industry
    encode industry10, gen(industries10)
    
    * regress MTB
    regress mtb sales_growth_1 sales_growth_sq i.industries10
    
    * Get mtb_f and mtb_r
    predict mtb_f
    predict mtb_r, residuals
    
    * Lag mtb_f and mtb_r
    gen mtb_f_1 = l.mtb_f
    gen mtb_r_1 = l.mtb_r
    
    ****************************************
    
    * Subset for correct period
    keep if hllcllhc == "LL-HC" | hllcllhc == "HL-LC"
    
    gen period = ""
    replace period = "pre_crisis" if r < 2008
    replace period = "crisis" if r >= 2008 & r<=2009
    replace period = "post_crisis" if r > 2009
    
    keep if period == "post_crisis"
    
    
    * Variable - model
    encode hllcllhc, gen(hllcllhc_factor)
    encode country, gen(countries)
    encode period, gen(period_factor)
    encode cap, gen(cap_factor)
    
    ****************************************
    
    * Random effects models all years last year
    xtreg capex_ratio cflow_1, re
    estimates store random_effects
    
    * Fixed effects models all years last year
    xtreg capex_ratio cflow_1, fe
    estimates store fixed_effects
    
    * Test random effects vs. fixed effects
    hausman fixed_effects random_effects
    
    * Regular OLS all years last year
    regress capex_ratio cflow_1
    
    * Test OLS vs. random effects (Breusch-Pagan)
    estat hettest
    
    * Use random effects with categories
    xtreg capex_ratio cflow_1, re
    
    ******************** VIGTIGSTE MODEL ********************
    
    * Random effects models with categories
    xtreg capex_ratio i.hllcllhc_factor##c.cflow_1 mtb_f_1 mtb_r_1, re
    estimates store random_effects
    
    * Fixed effects models with categories
    xtreg capex_ratio i.hllcllhc_factor##c.cflow_1 mtb_f_1 mtb_r_1, fe
    estimates store fixed_effects
    
    * Test random effects vs. fixed effects
    hausman fixed_effects random_effects
    
    * Regular OLS with categories
    regress capex_ratio i.hllcllhc_factor##c.cflow_1 mtb_f_1 mtb_r_1
    margins i.hllcllhc_factor, dydx(c.cflow_1)
    marginsplot
    
    * Test OLS vs. random effects (Breusch-Pagan)
    estat hettest
    
    * Use random effects with categories
    xtreg capex_ratio i.hllcllhc_factor##c.cflow_1 mtb_f_1 mtb_r_1, re
    
    *************************************************************************************************************************************


  • #2
    Anyone?

    Comment


    • #3
      Kara:
      welcome to this forum.
      Questions like the one you posted are often left unreplied, simply because nobody knows your data and the related data generating process.
      That said, your strategy should consider:
      1) a specification in the right-hand side of your regression equation that gives a fair and true view of the data generating process. Playing around with different predictors in search of the model with the highest number of statistically significant predictors is not the (methodological) way to go, could easily end up in an endless try and -re-try process, that can be destroyed by any average-skilled reviewer. The post-estimation analysis (that I do not find in your post) is relevant as well (endogeneity, non-linear specification and, last but not least, heteroskedasticity should be investigated and accounted for accordingly). By the way, -hettest- investigates the presence of heteroskedasticity; you should use -xttest0- to investigate random effects.
      2) it's rare (but possible) that pooled OLS outperforms -xtreg- with panel data. When default standard errors are used, if the F-test appearing as a footnote of the -xtreg,fe- outcome table lacks statistical significance, you should switch to pooled OLS;
      3) I fail to get why you select different time periods from your dataset and try different regression models;
      As an aside, please note that posting what you got from Stata can increase your chance of getting helpful replies.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X