Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with interpretation of my regression

    Hello dear STATA experts,

    I have a problem with the interpretation of my regression results and would like to ask you for advice.

    I have two variables of interest (abs_AEM and abs_REM)
    Also, I have a dummy variable that refers to two time periods for covid (before covid = 0; during covid = 1). Now I have examined what influence the dummy time period (and control variables) have on abs_AEM and abs_REM. As I expected, the covid period has a positive influence on abs_AEM and a negative influence on abs_REM . I had assumed that there is a substiution effect between the two variables (when one increases, the other decreases, and vice versa).

    My only problem is: the two variables behave differently by time period, as expected, but are still positively correlated with each other?

    How can we explain something like this? Could I say for example

    "abs_AEM increased overall and abs_REM decreased overall in period = 1, but there is a positive correlation when viewed firm by firm, so that abs_AEM and abs_REM are still positively correlated"?

    I tested the relation of abs_AEM and abs_REM with pwcorr and I also included them in the regressions. Here are the results (with some other control variables):

    For abs_REM as dependent variable:

    Code:
    . xtreg abs_REM Covid Size Growth_R Growth_TA MTB Leverage ROA Loss abs_AEM, fe robust
    
    Fixed-effects (within) regression               Number of obs     =      4,764
    Group variable: twodigit_sic                    Number of groups  =         45
    
    R-squared:                                      Obs per group:
         Within  = 0.0570                                         min =         20
         Between = 0.2599                                         avg =      105.9
         Overall = 0.0795                                         max =        556
    
                                                    F(9,44)           =      30.76
    corr(u_i, Xb) = 0.1507                          Prob > F          =     0.0000
    
                              (Std. err. adjusted for 45 clusters in twodigit_sic)
    ------------------------------------------------------------------------------
                 |               Robust
         abs_REM | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           Covid |  -.0218859   .0054147    -4.04   0.000    -.0327985   -.0109733
            Size |  -.0037742   .0025534    -1.48   0.147    -.0089203    .0013719
        Growth_R |   .0165218   .0075974     2.17   0.035     .0012103    .0318333
       Growth_TA |   .1504183   .0189844     7.92   0.000     .1121577    .1886789
             MTB |   .0012978   .0005613     2.31   0.026     .0001666    .0024289
        Leverage |  -.0901932   .0356235    -2.53   0.015    -.1619877   -.0183988
             ROA |   .1282802    .036159     3.55   0.001     .0554065    .2011538
            Loss |  -.0085868   .0162591    -0.53   0.600    -.0413549    .0241814
         abs_AEM |   .5137514    .090037     5.71   0.000     .3322939     .695209
           _cons |   .3214487   .0408188     7.88   0.000     .2391837    .4037136
    -------------+----------------------------------------------------------------
         sigma_u |  .09803695
         sigma_e |  .24000368
             rho |  .14299677   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------

    For abs_AEM as dependent variable:
    Code:
    . xtreg abs_AEM Covid Size Growth_R Growth_TA MTB Leverage ROA Loss abs_REM, fe robust
    
    Fixed-effects (within) regression               Number of obs     =      4,764
    Group variable: twodigit_sic                    Number of groups  =         45
    
    R-squared:                                      Obs per group:
         Within  = 0.0948                                         min =         20
         Between = 0.4348                                         avg =      105.9
         Overall = 0.1345                                         max =        556
    
                                                    F(9,44)           =      28.38
    corr(u_i, Xb) = 0.2312                          Prob > F          =     0.0000
    
                              (Std. err. adjusted for 45 clusters in twodigit_sic)
    ------------------------------------------------------------------------------
                 |               Robust
         abs_AEM | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           Covid |   .0082866   .0019511     4.25   0.000     .0043543    .0122189
            Size |  -.0020397   .0005306    -3.84   0.000    -.0031092   -.0009703
        Growth_R |  -.0016693   .0018495    -0.90   0.372    -.0053967     .002058
       Growth_TA |   .0079366   .0086527     0.92   0.364    -.0095018    .0253751
             MTB |  -.0000429   .0001124    -0.38   0.704    -.0002694    .0001836
        Leverage |  -.0117508   .0072808    -1.61   0.114    -.0264242    .0029227
             ROA |  -.0719155   .0102187    -7.04   0.000      -.09251    -.051321
            Loss |   .0157668   .0035964     4.38   0.000     .0085186    .0230149
         abs_REM |    .034126   .0075674     4.51   0.000      .018875     .049377
           _cons |   .0756555   .0079218     9.55   0.000     .0596901    .0916209
    -------------+----------------------------------------------------------------
         sigma_u |  .01866835
         sigma_e |  .06185628
             rho |  .08348068   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------



    Code:
    . pwcorr abs_AEM abs_REM
    
                 |  abs_AEM  abs_REM
    -------------+------------------
         abs_AEM |   1.0000
         abs_REM |   0.1945   1.0000


    Would it still be appropriate to explain a "subsitutional effect" between AEM and REM? Even if they have a positive interaction? Because I try to find evidence for the fact that company tend to make more use of AEM and less use of REM.

    Im excited to hear your opinion about this. Note: The very low R2 is typical for my kind of research.

    Thanks in advance

    Oliver
    Last edited by Oliver Brock; 28 Jun 2023, 15:42.

  • #2
    That's not the way to do it. You've got a system of equations (AEM = f(REM) , REM = f(AEM)).

    What type of variables are AEM and REM?

    Might look at these, but the models are different than yours.

    HTML Code:
    https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3719575
    https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2201973

    Comment


    • #3
      Even if they have a positive interaction?
      These two variables have a positive correlation not necessarily a positive interaction, right? Where a correlation measures the linear relationship between AEM and REM, and an interaction estimates the second-order change in the effect of AEM on your outcome given certain levels of REM (and vice-versa)?

      I'm not an economist, so I don't know what is or is not theoretically justified here, but I am wondering if a regression model with an interaction term might be appropriate. If it really is the case that you expect AEM and REM are negatively correlated, then it looks like that is contradicted by your data, but if you expect that they have a negative interaction, then it doesn't look like you've tested that. I believe there are some economists who post regularly on the forum, and maybe they'll jump in, but in the mean time I'm hoping you can give a little more clarification.

      Edit: crossed with #2

      Comment


      • #4
        George Ford So should this be a structural equation model with two correlated outcomes then?

        Comment


        • #5
          Without knowing more about the problem, I don't know what it is. But estimating the two equations separately as you have is not correct. Could be a structural model, but might not be. Is REM a function of AEM? Or are these two different options a firm may choose?

          What do the variables look like? Are they dummies (0/1)?

          Comment


          • #6
            Hello George, hello Daniel.

            thank you for your quick answers and the attached papers, they have already helped me. Thank you very much!

            Comment


            • #7
              @George

              Estimating both separetly is often used in this field research. AEM stands for "accrual-based earnings management" while REM is "real earnings management". Both are ways to "manipulate" your earnings during your financial year to achieve specific targets (for example: Earnings forecasts from stock market analysts). Companies could use both techniques. The variables are continuous (a higher number indicates more earnings management).

              Research has not conclusively determined whether the two processes (AEM and REM) are related, and if so, how. It is often said that companies use one more and the other less at the same time. However, this has not been found to be the case everywhere.

              The main point of my analysis is whether firms applied more or less earnings management (AEM/REM) during the Corona crisis. In fact, they apply more AEM and less REM. Nevertheless, it seems that individual companies that apply more AEM also apply more REM
              Last edited by Oliver Brock; 28 Jun 2023, 16:39.

              Comment


              • #8
                To give a non-economic example: In 2019, people travel to work by car 100 times and by bike 50 times on average. In 2020, they drive 120 times by car but 40 times by bike. A dummy variable (2019 = 0; 2020 = 1) indicates that the year has a significant influence on both driving (+) and biking (-). Nevertheless, (according to my analysis) people who bike more also drive significantly more. I think that can make sense. However, I now want to test whether people drive more BECAUSE they bike less.

                But how do I do this in STATA?

                Comment


                • #9
                  Whether it's done or not, estimating those equations seperately will give you biased results.

                  Why not just take the ratio of the two as the dependent variable? AEM/REM or AEM/(AEM+REM)

                  Then the covid coefficient tells you how the intensity shifts, and you avoid the mis-specified model.

                  Comment

                  Working...
                  X