Earnings management using stata

Guest

Earnings management using stata

23 Oct 2019, 13:37

Hello,

I'm working on the impact of some variables on the earnings management. So, in order to calculate earnings management, we use generally in the literature some proxies like discretionary accruals. The most used models to detect the discretionary accruals are the jones model, the modified jones model and last but not least the Kothari model.

The general idea behind these models is to regress a dependent variable (Total Accruals) on some independent variables (according to the model)

For example : the equation of Jones model : TACit/Ai,t-1 = β0 (1/Ai,t-1) + β1 (ΔREVit/Ai,t-1) + β2 (PPEit/Ai,t-1)+ ԑit
TAC: Total accruals; Rev: revenues; PPE: proporty plant and equipment

You just need to predict the residuals to get the discretionary accruals . Using Stata I typed: predict residuals, resid

My data : 38 firms/ 6years (2012-2017) (228 observation)

Giving the results I had, my issues are:

1- I have lower R-sq? How Can I explain that? (even a lot of researchers had lower R-sq too)
2- I didn't keep the constant in my models, do u think that it's the main cause why my R-sq are low?
3- How Can I choose the best model for my main analysis (the impact of firm size, board of directors....on constraining earnings management)?
4- As most of my varaibles in my equations are based on euro values, do you think that I should winsorize them? if so How can I do it?
5- Any advice and suggestions will be greatly appreciated cauz I'm really lost
Thank you so much
Guest.

Jones Model(1991) : TACit/Ai,t-1 = β0 (1/Ai,t-1) + β1 (ΔREVit/Ai,t-1) + β2 (PPEit/Ai,t-1)+ ԑit

Code:

reg TOTAL_ACCR PART1 PART2 PART3, noconstant

Note that : Part1 , part 2 part 3 are the parts of the right side of the equation

The results

Code:

      Source |       SS           df       MS      Number of obs   =       228
 -------------+----------------------------------   F(3, 225)       =     12.83
       Model |  .384540939         3  .128180313   Prob > F        =    0.0000
    Residual |   2.2473704       225  .009988313   R-squared       =    0.1461
-------------+----------------------------------   Adj R-squared   =    0.1347
       Total |  2.63191134       228  .011543471   Root MSE        =    .09994

------------------------------------------------------------------------------
  TOTAL_ACCR |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       PART1 |    2161441    2273761     0.95   0.343     -2319149     6642030
       PART2 |   .0507552   .0416235     1.22   0.224    -.0312665    .1327769
       PART3 |  -.0610047   .0100617    -6.06   0.000    -.0808319   -.0411776

Code:

. gen DISC_JONES=.

Code:

  
   . predict r, resid


. replace DISC_JONES=r

. drop r

Modified Jones Model TACit/Ai,t-1 = β0 (1/Ai,t-1) + β1 ((ΔREVit-ΔRECit)/Ai,t-1) + β2 (PPEit/Ai,t-1)+ ԑit

Code:

reg TOTAL_ACCR PART1 PART4 PART3, noconstant

The results

Code:

    Source |       SS           df       MS      Number of obs   =       228
-------------+----------------------------------   F(3, 225)       =     13.36
       Model |  .397857291         3  .132619097   Prob > F        =    0.0000
    Residual |  2.23405405       225  .009929129   R-squared       =    0.1512
-------------+----------------------------------   Adj R-squared   =    0.1398
       Total |  2.63191134       228  .011543471   Root MSE        =    .09965

------------------------------------------------------------------------------
  TOTAL_ACCR |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       PART1 |    2955591    2223533     1.33   0.185     -1426022     7337204
       PART4 |  -.0889794   .0528283    -1.68   0.094    -.1930809    .0151221
       PART3 |  -.0593695   .0100473    -5.91   0.000    -.0791683   -.0395706

Code:

. gen DISC_MODIJONES=.

. predict r, resid

. replace DISC_MODIJONES=r


. drop r

The KOTHARI et al. (2005) Model TACit/Ai,t-1 = β0 (1/Ai,t-1) + β1 ((ΔREVit-ΔRECit)/Ai,t-1) + β2 (PPEit/Ai,t-1)+ ROAit+ԑit

Code:

reg TOTAL_ACCR PART1 PART4 PART3 Lag_ROA, noconstant

The results

Code:

      Source |       SS           df       MS      Number of obs   =       228
-------------+----------------------------------   F(4, 224)       =     10.57
       Model |  .417983552         4  .104495888   Prob > F        =    0.0000
    Residual |  2.21392779       224  .009883606   R-squared       =    0.1588
-------------+----------------------------------   Adj R-squared   =    0.1438
       Total |  2.63191134       228  .011543471   Root MSE        =    .09942

------------------------------------------------------------------------------
  TOTAL_ACCR |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       PART1 |    1977862    2321827     0.85   0.395     -2597556     6553280
       PART4 |  -.0907816   .0527222    -1.72   0.086    -.1946765    .0131133
       PART3 |  -.0689246   .0120549    -5.72   0.000    -.0926803    -.045169
     Lag_ROA |   .1230123   .0862035     1.43   0.155    -.0468612    .2928858
------------------------------------------------------------------------------

Code:

. gen DISC_KOTHARI=.

. predict r, resid

. replace DISC_KOTHARI=r

. drop r

Descriptive Statistics

Code:

sum DISC_JONES DISC_MODIJONES DISC_KOTHARI

Code:

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
  DISC_JONES |        228   -.0004002    .0994995  -.3131552   .3108965
DISC_MODIJ~S |        228    .0002508    .0992048  -.3286133    .326021
DISC_KOTHARI |        228   -.0010197    .0987519  -.3607893    .294788

Last edited by sladmin; 06 Aug 2020, 05:05. Reason: anonymize original poster

Tags: panel data, regression

Guest
#2

24 Oct 2019, 04:11

Please can someone help me. It's really important for my thesis

Last edited by sladmin; 06 Aug 2020, 05:06. Reason: anonymize original poster
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

24 Oct 2019, 08:15

You didn't get a quick answer. You'll improve your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Your posting is extremely long and complex and hard to understand. You end with 5 questions which is likewise a bit much.

reg y z
predict residual1, residual

Is the right way to do a regression and prediction.

When you say "lower r-square", the only two r-squares are in your regressions and they make sense - adding the lag_ROA increases r-square as it must with addition of variables to regression.
Do you mean low r-square? If so, and if others find the same thing, then that is just how much the model explains. Many models people use often have low r-square (try estimating beta on daily or monthly return data and you'll have very low r-squares in general).

Normally, we include a constant automatically. When you drop the constant, you force predicted y to equal 0 when the rhs variables equal 0 which often makes no sense. R-square is also problematic without the constant.

3- How Can I choose the best model for my main analysis (the impact of firm size, board of directors....on constraining earnings management)?

You need to justify your model based on theory and prior work in the area. Many don't subscribe to the idea of picking the best model as a research strategy. That said, if you really want to choose among models, use r-square or AIC or BIC (see estat ic).

4- As most of my variables in my equations are based on euro values, do you think that I should winsorize them? if so How can I do it?

There is nothing in using euros that has anything to do with outliers. You should convert things into the same currency. There is a user-written winsorize program. enter findit winsorize at the command line, and you'll find it and can install it.
1 like
Comment
Guest
#4

24 Oct 2019, 08:37

Phil Bromiley Thank you so much for your response. I'm sorry for the long post . I was just trying to explain things a little bet about the models and how I did . I'm sorry about that.
so, about the constant you think I should not drop it? How can I justify it ? Thank you so much
Comment
Guest
#5

24 Oct 2019, 08:46

Yeah I actually meant a low R-squared ....So, if I have to choose one model between the 3 according to the r-squared : I will choose the one with the highest r-squared (the kothari model R-squared = 0.1588) right??. Do you think it a good idea to work with 3 models or not? Because according to my descriptive statistics : there is no big difference in the mean ? I'm sorry for my questions i'm really lost...
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17685
#6

24 Oct 2019, 10:49

Guest:
if you go -regress- (which is questionable, given your panel data structure) you should compare the models using the Adj-R-squared.
Besides, descriptive statistics can tell you nothing about inference (as they were created for a different job).
Last but not least, why not discussing with your supervisor whether considering three regression models makes any sense (as you should have already reported in the Methods section of your thesis)?

Last edited by sladmin; 06 Aug 2020, 05:06. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Guest
#7

24 Oct 2019, 11:20

Carlo Lazzaro Thank you so much Carlo for your response. For the --reg-- I did first begin with the fe, re hausman......Panel data econometry. But, according to others researchers working on this field said that in order to estimate the residual- the discretionary accruals-I can use OLS. After that, in my main analysis, I can use: Panel regression. In regards of the choice of models, my supervisor actually advise me to choose just one model (especially when the 3 models are no that different). And also because I have an other dependant variable so I will have a lot of things going on......So, I turn in to you if you can give me an advice about that(statistically speaking) Thank you so much. I really do appreciate your help.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17685
#8

24 Oct 2019, 11:43

Guest:
but you can estimate systematic errors (epsilon) with -xtreg-, too.
Again, if you decide to go OLS, you should pick up the regression model with he highest Adj_R_squared(however, this approach relies on the same regressand and different regressors in the compared models, which I'm not sure is your case).
Besides: have you perfomed nay post estimation test about your regression model (endogeneity; misspecification via omitted variable bias; heteroskedasticity)
I'm also not clear with what you mean by another dependent variable: do you mean different regressands or something like -mvreg-.
Again, I think you should discuss with your supervisor all these issues that, at a very first glance, look relevant for your thesis.

Last edited by sladmin; 06 Aug 2020, 05:06. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Guest
#9

24 Oct 2019, 12:29

Carlo Lazzaro Thank you so much carlo. You can say I'm not lucky enough to have a supervisor that seems into discussion or even care So, everything is up to me (With all my respect to her ). Well, I meant by onother dependent variable, like I have another proxy for earnings quality other than earnings management ( So i'm working with 2 main models in my research). Well, I didn't do any of that even the papers, articles and thesis I have read, they keep the model ( jones or kothari) as it is and they just calculate the errors (=disc accruals) which seems to me questionable even though I'm not good in statis and econometrics (I'm learning). They just point that concerning heteroskedasticity problem, dividing all variables of the models by /Total assests (t-1) seems to eradicate the problem but without any further statistic demonstration.... This is why I'm lost actually. So, I'm trying to do the right things ( statistically speaking) in order to defend my work. According to you, do you think I should do all the tests related to panel data (choose between re fe and the test for hetero and endoge...) in order to choose the right model and have the 'right' errors?? I'M really thankful for any advice

Last edited by sladmin; 06 Aug 2020, 05:06. Reason: anonymize original poster
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17685
#10

24 Oct 2019, 13:38

Guest:
Statistics and econometrics fall in between being creative and following the usual route.
Most, in your case, depends on what's the rule in your research field (ie, OLS Vs panel data).
Anyway, you should test your models for endogeneity, misspecification and heteroskedasticity; otherwise, your results are at risk of being unreliable.

Last edited by sladmin; 06 Aug 2020, 05:06. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#11

24 Oct 2019, 14:03

Thank u very much. Ok I will do those tests and see. Well, What I know is in my main model : the impact of firm charactersictics and audit on constraining discretionary accruals I'm going to use panel data analysis. My problem is with the 3 models estimating those discretionary accruals(my dependent variable): do I have to go OLS or Panel data??? Do you think I should do xtreg ? Thank u for your insights
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17685
#12

25 Oct 2019, 00:10

Guest:
as you have panel data , the first choice should be -xtreg- (given your continuous regressand).
As already replied, different research fields have their own customary rules, though.

Last edited by sladmin; 06 Aug 2020, 05:07. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#13

25 Oct 2019, 06:32

Carlo Lazzaro Thank you so much. I have yet another question about the constants( β0) in my models if you don't mind.
Giving my initial post, the structure of one of my models is :

Code:

TACit/Ai,t-1 = β0 (1/Ai,t-1) + β1 ((ΔREVit-ΔRECit)/Ai,t-1) + β2 (PPEit/Ai,t-1)+ ԑit

By standardizing the regression equation by lag of total assets (Ai,t-1), we eliminate heteroscedasticity problem (according to researchers).
Part1= 1/Ai,t-1
Part2=(ΔREVit-ΔRECit)/Ai,t-1
Part3=PPEit/Ai,t-1

So, when I do :

Code:

reg Part1 Part2 Part3

Without the option "noconstant", I'm just counting 2 constants in my regression, right?? Please correct me if I'm wrong. Thank you again. Your responses have been really helpful.

Last edited by sladmin; 06 Aug 2020, 05:07. Reason: anonymize original poster
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17685
#14

25 Oct 2019, 06:52

Guest:
I'm not sure I got your point right.
Your constant indicates the variation in the regressand when all the other predictors are set to zero.

Last edited by sladmin; 06 Aug 2020, 05:07. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#15

25 Oct 2019, 07:19

Yes. It just in my model I have the constant scaled by lag of Total Assests. So when I do regress Y (Total accruals) on (1/lag Total assests)......and other independant variables, I already have the constant in my regression. So I need to add "noconstant" in my regression because it's already in (1/lag total assets)? I hope I made myself clear!!
Comment

Announcement

Earnings management using stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment