Problems with interpretation of my regression

Oliver Brock

Join Date: Feb 2023
Posts: 12

Problems with interpretation of my regression

28 Jun 2023, 15:39

Hello dear STATA experts,

I have a problem with the interpretation of my regression results and would like to ask you for advice.

I have two variables of interest (abs_AEM and abs_REM)
Also, I have a dummy variable that refers to two time periods for covid (before covid = 0; during covid = 1). Now I have examined what influence the dummy time period (and control variables) have on abs_AEM and abs_REM. As I expected, the covid period has a positive influence on abs_AEM and a negative influence on abs_REM . I had assumed that there is a substiution effect between the two variables (when one increases, the other decreases, and vice versa).

My only problem is: the two variables behave differently by time period, as expected, but are still positively correlated with each other?

How can we explain something like this? Could I say for example

"abs_AEM increased overall and abs_REM decreased overall in period = 1, but there is a positive correlation when viewed firm by firm, so that abs_AEM and abs_REM are still positively correlated"?

I tested the relation of abs_AEM and abs_REM with pwcorr and I also included them in the regressions. Here are the results (with some other control variables):

For abs_REM as dependent variable:

Code:

. xtreg abs_REM Covid Size Growth_R Growth_TA MTB Leverage ROA Loss abs_AEM, fe robust

Fixed-effects (within) regression               Number of obs     =      4,764
Group variable: twodigit_sic                    Number of groups  =         45

R-squared:                                      Obs per group:
     Within  = 0.0570                                         min =         20
     Between = 0.2599                                         avg =      105.9
     Overall = 0.0795                                         max =        556

                                                F(9,44)           =      30.76
corr(u_i, Xb) = 0.1507                          Prob > F          =     0.0000

                          (Std. err. adjusted for 45 clusters in twodigit_sic)
------------------------------------------------------------------------------
             |               Robust
     abs_REM | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       Covid |  -.0218859   .0054147    -4.04   0.000    -.0327985   -.0109733
        Size |  -.0037742   .0025534    -1.48   0.147    -.0089203    .0013719
    Growth_R |   .0165218   .0075974     2.17   0.035     .0012103    .0318333
   Growth_TA |   .1504183   .0189844     7.92   0.000     .1121577    .1886789
         MTB |   .0012978   .0005613     2.31   0.026     .0001666    .0024289
    Leverage |  -.0901932   .0356235    -2.53   0.015    -.1619877   -.0183988
         ROA |   .1282802    .036159     3.55   0.001     .0554065    .2011538
        Loss |  -.0085868   .0162591    -0.53   0.600    -.0413549    .0241814
     abs_AEM |   .5137514    .090037     5.71   0.000     .3322939     .695209
       _cons |   .3214487   .0408188     7.88   0.000     .2391837    .4037136
-------------+----------------------------------------------------------------
     sigma_u |  .09803695
     sigma_e |  .24000368
         rho |  .14299677   (fraction of variance due to u_i)
------------------------------------------------------------------------------

For abs_AEM as dependent variable:

Code:

. xtreg abs_AEM Covid Size Growth_R Growth_TA MTB Leverage ROA Loss abs_REM, fe robust

Fixed-effects (within) regression               Number of obs     =      4,764
Group variable: twodigit_sic                    Number of groups  =         45

R-squared:                                      Obs per group:
     Within  = 0.0948                                         min =         20
     Between = 0.4348                                         avg =      105.9
     Overall = 0.1345                                         max =        556

                                                F(9,44)           =      28.38
corr(u_i, Xb) = 0.2312                          Prob > F          =     0.0000

                          (Std. err. adjusted for 45 clusters in twodigit_sic)
------------------------------------------------------------------------------
             |               Robust
     abs_AEM | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       Covid |   .0082866   .0019511     4.25   0.000     .0043543    .0122189
        Size |  -.0020397   .0005306    -3.84   0.000    -.0031092   -.0009703
    Growth_R |  -.0016693   .0018495    -0.90   0.372    -.0053967     .002058
   Growth_TA |   .0079366   .0086527     0.92   0.364    -.0095018    .0253751
         MTB |  -.0000429   .0001124    -0.38   0.704    -.0002694    .0001836
    Leverage |  -.0117508   .0072808    -1.61   0.114    -.0264242    .0029227
         ROA |  -.0719155   .0102187    -7.04   0.000      -.09251    -.051321
        Loss |   .0157668   .0035964     4.38   0.000     .0085186    .0230149
     abs_REM |    .034126   .0075674     4.51   0.000      .018875     .049377
       _cons |   .0756555   .0079218     9.55   0.000     .0596901    .0916209
-------------+----------------------------------------------------------------
     sigma_u |  .01866835
     sigma_e |  .06185628
         rho |  .08348068   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Code:

. pwcorr abs_AEM abs_REM

             |  abs_AEM  abs_REM
-------------+------------------
     abs_AEM |   1.0000
     abs_REM |   0.1945   1.0000

Would it still be appropriate to explain a "subsitutional effect" between AEM and REM? Even if they have a positive interaction? Because I try to find evidence for the fact that company tend to make more use of AEM and less use of REM.

Im excited to hear your opinion about this. Note: The very low R2 is typical for my kind of research.

Thanks in advance

Oliver

Last edited by Oliver Brock; 28 Jun 2023, 15:42.

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3187
#2

28 Jun 2023, 16:07

That's not the way to do it. You've got a system of equations (AEM = f(REM) , REM = f(AEM)).

What type of variables are AEM and REM?

Might look at these, but the models are different than yours.

HTML Code:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3719575 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2201973
1 like
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 822
#3

28 Jun 2023, 16:08

Even if they have a positive interaction?

These two variables have a positive correlation not necessarily a positive interaction, right? Where a correlation measures the linear relationship between AEM and REM, and an interaction estimates the second-order change in the effect of AEM on your outcome given certain levels of REM (and vice-versa)?

I'm not an economist, so I don't know what is or is not theoretically justified here, but I am wondering if a regression model with an interaction term might be appropriate. If it really is the case that you expect AEM and REM are negatively correlated, then it looks like that is contradicted by your data, but if you expect that they have a negative interaction, then it doesn't look like you've tested that. I believe there are some economists who post regularly on the forum, and maybe they'll jump in, but in the mean time I'm hoping you can give a little more clarification.

Edit: crossed with #2
1 like
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 822
#4

28 Jun 2023, 16:20

George Ford So should this be a structural equation model with two correlated outcomes then?
1 like
Comment
George Ford

Join Date: Aug 2014

Posts: 3187
#5

28 Jun 2023, 16:30

Without knowing more about the problem, I don't know what it is. But estimating the two equations separately as you have is not correct. Could be a structural model, but might not be. Is REM a function of AEM? Or are these two different options a firm may choose?

What do the variables look like? Are they dummies (0/1)?
2 likes
Comment
Oliver Brock

Join Date: Feb 2023

Posts: 12
#6

28 Jun 2023, 16:31

Hello George, hello Daniel.

thank you for your quick answers and the attached papers, they have already helped me. Thank you very much!
Comment
Oliver Brock

Join Date: Feb 2023

Posts: 12
#7

28 Jun 2023, 16:36

@George

Estimating both separetly is often used in this field research. AEM stands for "accrual-based earnings management" while REM is "real earnings management". Both are ways to "manipulate" your earnings during your financial year to achieve specific targets (for example: Earnings forecasts from stock market analysts). Companies could use both techniques. The variables are continuous (a higher number indicates more earnings management).

Research has not conclusively determined whether the two processes (AEM and REM) are related, and if so, how. It is often said that companies use one more and the other less at the same time. However, this has not been found to be the case everywhere.

The main point of my analysis is whether firms applied more or less earnings management (AEM/REM) during the Corona crisis. In fact, they apply more AEM and less REM. Nevertheless, it seems that individual companies that apply more AEM also apply more REM

Last edited by Oliver Brock; 28 Jun 2023, 16:39.
Comment
Oliver Brock

Join Date: Feb 2023

Posts: 12
#8

28 Jun 2023, 16:46

To give a non-economic example: In 2019, people travel to work by car 100 times and by bike 50 times on average. In 2020, they drive 120 times by car but 40 times by bike. A dummy variable (2019 = 0; 2020 = 1) indicates that the year has a significant influence on both driving (+) and biking (-). Nevertheless, (according to my analysis) people who bike more also drive significantly more. I think that can make sense. However, I now want to test whether people drive more BECAUSE they bike less.

But how do I do this in STATA?
Comment
George Ford

Join Date: Aug 2014

Posts: 3187
#9

29 Jun 2023, 07:11

Whether it's done or not, estimating those equations seperately will give you biased results.

Why not just take the ratio of the two as the dependent variable? AEM/REM or AEM/(AEM+REM)

Then the covid coefficient tells you how the intensity shifts, and you avoid the mis-specified model.
1 like
Comment

Announcement

Problems with interpretation of my regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment