Generating a new variable as a result of comparing the mean of a variable (t-test) in two different times

Seyed Mahmoud Hosseinniakani

Join Date: Apr 2018
Posts: 59

Generating a new variable as a result of comparing the mean of a variable (t-test) in two different times

15 Jul 2019, 05:56

Hi,

I have three variables here in my example, as described below.
1) EXP: is the time dummy variable indicating 1 if the time is from 2013 to 2015 and, Zero otherwise (2016 to 2018).
2) Abs_DACC: is a variable representing companies earnings management.
3) firmid: is the company's identity number

I want to generate a new (dummy) variable indicating 1 if the average value of the variable Abs_DACC is significantly more (could be t-test) in the period of Depost == 1 than the period of Depost == 0, and Zero otherwise.

Below is the example from my sample.

I would be grateful if one could guide me in generating my new dummy variable.

Best regards,
Mahmoud

example

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 firmid float(Abs_DACC EXP)
"SE0000101362"  .0043683364 1
"SE0000101362"    .04054667 1
"SE0000101362"   .016035259 1
"SE0000101362"  .0021590118 0
"SE0000101362"  .0004436661 0
"SE0000101362"   .025840644 0
"SE0000103699"   .007250534 1
"SE0000103699" .00024572795 1
"SE0000103699"   .011364168 1
"SE0000103699"   .015171934 0
"SE0000103699"   .005706637 0
"SE0000103699"   .015331745 0
"SE0000103814"   .007773208 1
"SE0000103814"   .020582644 1
"SE0000103814"  .0010111552 1
"SE0000103814"    .05279258 0
"SE0000103814"   .022170475 0
"SE0000103814"   .006775594 0
"SE0000105199"   .010294355 1
"SE0000105199"   .020721633 0
"SE0000105199"    .03075206 0
"SE0000105199"     .0771719 0
"SE0000105264"    .05795591 1
"SE0000105264"    .19086185 1
"SE0000107724"     .0830186 1
"SE0000107724"    .01778689 0
"SE0000107724"   .003614869 0
"SE0000107724"   .070517756 0
"SE0000108227"    .03772058 1
"SE0000108227"    .02138011 1
"SE0000108227"   .013683626 1
"SE0000108227"   .005922483 0
"SE0000108227"    .05475985 0
"SE0000108227"    .00685365 0
"SE0000108656"   .032462087 1
"SE0000108656"    .03074585 1
end

------------------ copy up to and including the previous line ------------------

Listed 36 out of 781 observations

.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

15 Jul 2019, 12:11

Code:

capture program drop one_firm program define one_firm capture ttest Abs_DACC, by(EXP) if c(rc) == 420 { // ONLY ONE GROUP gen byte sig_diff = . } else if c(rc) == 0 { gen byte sig_diff = (r(p) < 0.05) } exit end runby one_firm, by(firmid)

will do this. It requires the -runby- program, written by Robert Picard and me, available from SSC.

The code recognizes that, at least in your example data, there are some firms that have data on Abs_DACC only for EXP = 0 or only for EXP = 1 and it allows those situations, returning a missing value for the result, without throwing an error.

All of that said, you shouldn't do this at all. The American Statistical Association has recommended discarding the concept of statistical significance. See https://www.tandfonline.com/doi/full...5.2019.1583913. You should instead identify some criterion for the difference in mean Abs_DACC that is more meaningful and useful.
Comment
Seyed Mahmoud Hosseinniakani

Join Date: Apr 2018

Posts: 59
#3

16 Jul 2019, 00:47

Clyde:
Thank you so much for the code and the reference!
In fact, I am not concluding based on t-test. I am just trying to find companies that are suspected and unsuspected to earnings management. I have also alternative measures to control it. Both suspect or unsuspect groups are included in the sample and being tested in a regression model. I understand your point saying that it is not accurate to say that 1 in the new variable -sig_diff- shows significant differences because the level of p-value can be different for each firm. Indeed, you well pointed out that 1 or 0 is not a clear and good representative of p-value here.

Thanks again for the comment.

Kind regards,
Mahmoud
Comment

Announcement

Generating a new variable as a result of comparing the mean of a variable (t-test) in two different times

Comment

Comment