Perform t-test on quantile means

Niels Meijer

Join Date: Aug 2017

Posts: 19
#1

Perform t-test on quantile means

26 Sep 2017, 02:56

Dear statalist users,

I've got a dataset where that I have made quantiles (10) for variable Y.

I know how to summarize my x variables for the top and bottom quantile. I use sum x1 x2 x3 if quantile_y==1 and sum x1 x2 x3 if quantile_y==10.

Now I'd like to perform a test whether the mean x is significantly different in quantile 1 of Y versus quantile 10 of Y.

I cannot figure out how to do this, could someone explain to me which commands to use for this?

Kind regards,

Niels
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4423

26 Sep 2017, 03:06

Code:

ttest x1 if inlist(quantile_y, 1, 10), by(quantile_y)
display in smcl as text "P = " as result %06.4f 1

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35734

26 Sep 2017, 03:09

Although it's (very) common, I give first a protest against misuse of terminology. The quantiles are points defined by whatever fraction of the data for a variable is smaller, and correspondingly whatever complementary fraction is larger. The bins, classes or intervals they define are not themselves quantiles.

So, although we know what you mean by the top and bottom quantile, it's not good terminology.

That said, you seem to want something like this:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. xtile Dmpg=mpg, n(10)

. tab Dmpg

         10 |
  quantiles |
     of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          8       10.81       10.81
          2 |         10       13.51       24.32
          3 |          9       12.16       36.49
          4 |          8       10.81       47.30
          5 |          3        4.05       51.35
          6 |         10       13.51       64.86
          7 |          7        9.46       74.32
          8 |          5        6.76       81.08
          9 |          7        9.46       90.54
         10 |          7        9.46      100.00
------------+-----------------------------------
      Total |         74      100.00

. ttest price if inlist(Dmpg, 1, 10), by(Dmpg)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       1 |       8    11139.25    1216.829    3441.712    8261.907    14016.59
      10 |       7    4276.571    220.8372    584.2804    3736.202    4816.941
---------+--------------------------------------------------------------------
combined |      15    7936.667    1114.391     4316.02    5546.535     10326.8
---------+--------------------------------------------------------------------
    diff |            6862.679     1323.13                4004.231    9721.126
------------------------------------------------------------------------------
    diff = mean(1) - mean(10)                                     t =   5.1867
Ho: diff = 0                                     degrees of freedom =       13

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9999         Pr(|T| > |t|) = 0.0002          Pr(T > t) = 0.0001

The trick is in the if qualifier to restrict the data compared to precisely two groups, after which ttest is happy.

Comment

Niels Meijer

Join Date: Aug 2017

Posts: 19
#4

26 Sep 2017, 03:18

Thank you very much Nick, very useful. I should call them deciles I believe.

Last edited by Niels Meijer; 26 Sep 2017, 03:24.
Comment
Niels Meijer

Join Date: Aug 2017

Posts: 19
#5

26 Sep 2017, 03:25

In the example you gave, you test whether the "price" variable is significantly different in the bottom decile compared with the top decile, correct?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#6

26 Sep 2017, 03:38

Deciles are just kinds of quantiles, so the same point arises. I know that if frequent enough abusage becomes accepted usage, but I will not yield the point without a fight.

#5 is correct and the output I think is not ambiguous.
Comment
Niels Meijer

Join Date: Aug 2017

Posts: 19
#7

26 Sep 2017, 07:06

Any suggestions on how to get a table exported? Y1-Y8 are my explanatory variables. So the mean for each Y for the lowest 10% of X (group 1) versus the mean for each Y for the highest 10% of X (group 2), their difference with statistical significance by t-test. I can't figure it out how to get a table like this, my only solution is making it in excel but that is quite a lot of work since I need multiple of those tables.

I am looking for something like this:
Mean Decile 1 Mean Decile 10 Difference (t-test for significance level)

Y1

Y2

Y3

Y4

Y5

Y6

Y7

Y8

edit: i'd like to have the standard errors of the group means below the mean in [brackets] too, if that is possible.

Last edited by Niels Meijer; 26 Sep 2017, 07:33.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#8

26 Sep 2017, 08:14

Some elementary tricks at http://www.stata-journal.com/sjpdf.h...iclenum=pr0053

A vast amount of effort has gone into writing Stata programs for table export and/or report generation that their authors understand very well. Evidently many other Stata users prefer to spend hours trying to understand these programs when compiling tables directly might take just minutes.

There is much point to that study if you need to produce the same kind of table again and again.

For myself,

1. I do not export results to Excel so I can't advise on how to do that. If I said "Why do you want to do that?" I am likely to be thought facetious. But I note that many users find putexcel a great help.

2. For any table produced just once I often find it easiest just to loop over possibilities and use my favourite text editor to copy and paste together. For anything produced repeatedly I find it easier to write custom code than to master someone else's program. That's not necessarily good advice.
Comment
Niels Meijer

Join Date: Aug 2017

Posts: 19
#9

26 Sep 2017, 08:27

Thank you for your response Nick, looks like I am not the only one having difficulties understanding someone else's programs to create tables. The table will be used for my paper, that's why I need to export it. I just export tables to excel so that I can make them look the way I want to before adding them to my paper in MS word. I'll have a look at -putexcel- then I guess, Unfortunately, I am not expierenced enough to write my own code yet.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#10

27 Sep 2017, 13:12

There are many reasonable routines for making regression tables for journals. However, they do not necessarily handle what you want.

Here's a primitive way to get your tables. Assume you have a variable Netincome

g x1=.
g x2=.
xtile Dni=Netincome, n(10)
tab Dni
ttest Netincome if inlist(Dni, 1, 10), by(Dni)
return list
replace x1=r(mu_1) in 1/1
replace x2=r(mu_2) in 1/1
list x1 x2 in 1/2

*then do the next variable and replace in 2/2 etc.

Once you have Stata variables, you can put them into excel easily.
Comment
Jae Li

Join Date: May 2017

Posts: 184
#11

23 Oct 2017, 07:02

@Joseph Coveney Hi Joseph! Can I ask you a question about the post #2?

What does this line of codes mean? Why do you define P=1? After -ttest-, it just presents

P = 1.0000

in command window. I will be grateful for your help! Many thanks in advance!

Code:

display in smcl as text "P = " as result %06.4f 1
Comment

	Mean Decile 1	Mean Decile 10	Difference (t-test for significance level)
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8

Announcement