Testing the significance of the difference between two variances

Mostafa Harakeh

Join Date: Jul 2014
Posts: 91

Testing the significance of the difference between two variances

23 May 2016, 17:46

Dear Stata users,

I have a dataset with two groups (0,1) and two time periods (0,1). The dummy for the groups is group_dummy and the dummy for the time periods is time_dummy. The variable of interest is X.

I need to test the difference in the variances of X among both groups in both time periods, separately.

I think this can be done using the command:
robvar X if time_dummy==0, by (group_dummy)
robvar X if time_dummy==1, by (group_dummy)

This works perfectly and life is good so far.

Now I construct a new variable Ratio which equals the [Variance(X | group_dummy=1) / Variance(X | group_dummy=0)].

The table below shows the numbers which I got from Stata after executing the commands above and after calculating the Ratio variable.

time_dummy == 0			time_dummy == 1
group_dummy	N	Variance	group_dummy	N	Variance
0	2,658	0.0235	0	3,215	0.0233
1	2,636	0.0316	1	2,100	0.0365
F-stat	df	Ratio = Var(1) / Var(0)	F-stat	df	Ratio = Var(1) / Var(0)
35.20	5292	1.35	77.78	5313	1.56

My question is: how can I test the statistical significance of the difference in the Ratio between time 0 and time 1?

Hint: I think that I can test that (1.56/1.35) is significantly different than 1. But how to do it in the context of variances?

Thank you so much for your valuable suggestions. I truly appreciate your help.

Mostafa

Tags: None

Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#2

24 May 2016, 06:16

Stata has a built in command (sdtest) for equality of variances, -help sdtest- and read the manual for relevant examples.
1 like
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#3

24 May 2016, 07:27

Hi Oded,

Thank you for you suggestion.

The -sdtest- is similar to the -robvar- command. Actually the -robvar- command seems to give more setailed significance tests.

My question is about testing the difference between the ratios above.

Regards

mostafa
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

24 May 2016, 10:21

Well, I don't know of any statistical tests for comparing two variance ratios, even putting aside whether somebody has written a Stata program to calculate them. And the fact that you've gone 24 hours without anybody else coming up with one suggests to me that there is no widely accepted approach to this.

So, if I were in your situation, I would probably try to do this by bootstrapping it.
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#5

24 May 2016, 18:51

Hi Clyde,

Thank you for suggesting a plausible alternative.

May you please elaborate more on that?

All I know about bootstrapping is that it randomly selects a subsample from the original sample for 1000 times, for example. Is this true?

How can I apply this to my case?

Thanks again!

Mostafa
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

24 May 2016, 19:25

So, you have to write a little program that calculates your ratio of ratios of variances, and then run that through the -bootstrap:- prefix. Here's an example of how it's done using made up data:

Code:

clear* // DEFINE PROGRAM TO CALCULATE RATIO OF RATIOS capture program drop ratio_of_ratios_of_variances program define ratio_of_ratios_of_variances, rclass forvalues t = 0/1 { forvalues g = 0/1 { quietly summ x if time == `t' & group == `g' scalar var`t'`g' = r(Var) } } scalar ratio0 = var00/var01 scalar ratio1 = var10/var11 return scalar ratio_of_ratios = ratio1/ratio0 exit end // CREATE TOY DATA SET set obs 100 gen byte group = mod(_n, 2) expand 2 by group, sort: gen time = mod(_n, 2) set seed 1234 gen x = rnormal(0, 0.5 + group) // SHOW THE DATA table group time, c(N x mean x sd x) // CALCULATE RATIO OF RATIOS FOR ORIGINAL SAMPLE ratio_of_ratios_of_variances return list // NOW BOOTSTRAP bootstrap r(ratio_of_ratios), nodrop nodots reps(10000): /// ratio_of_ratios_of_variances // AND GET BOOTSTRAP CONFIDENCE INTERVALS estat bootstrap

Note: When customizing this to your own data, you might want to reduce the -reps()- option in the -bootstrap- command to a small number, say 100, so it will run quickly while you're testing. But you need a large number of reps to adequately sample the tails of this distribution, especially since, as a ratio of ratios that are themselves fairly close to zero, it may have a very heavy right tail. So when you're ready for production runs, put it back up to 10,000 at a minimum. Even 50,000 or 100,000 if you have the patience.

See -help bootstrap- and the corresponding manual section to understand the details of the -bootstrap- command.

Once you have the bias corrected confidence interval, you can base a test of the significance of the ratio of the variance ratios by seeing whether the confidence interval excludes or includes 1.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#7

25 May 2016, 01:28

I like Clyde's bootstrapping approach, because I have a hunch that it would be more robust to nonnormality. But if you're too impatient to wait for tens of thousands of bootstrap samples, then you could try something like the following.

.ÿversionÿ14.1

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿsetÿseedÿ`=date("2016-05-25",ÿ"YMD")'

.ÿ
.ÿprogramÿdefineÿmatchvar,ÿrclass
ÿÿ1.ÿÿÿÿÿversionÿ14.1
ÿÿ2.ÿÿÿÿÿargsÿmÿxÿV
ÿÿ3.ÿ
.ÿÿÿÿÿtempvarÿtmpvar0
ÿÿ4.ÿÿÿÿÿgenerateÿdoubleÿ`tmpvar0'ÿ=ÿ`x'ÿ*ÿsqrt(`m')
ÿÿ5.ÿÿÿÿÿsummarizeÿ`tmpvar0',ÿdetail
ÿÿ6.ÿÿÿÿÿreturnÿscalarÿfxÿ=ÿ(r(Var)ÿ-ÿ`V')^2
ÿÿ7.ÿend

.ÿ
.ÿprogramÿdefineÿgeneratem
ÿÿ1.ÿÿÿÿÿversionÿ14.1
ÿÿ2.ÿÿÿÿÿsyntaxÿ,ÿg(integer)ÿt(integer)ÿn(integer)ÿv(real)
ÿÿ3.ÿ
.ÿÿÿÿÿdropÿ_all
ÿÿ4.ÿÿÿÿÿquietlyÿsetÿobsÿ`n'
ÿÿ5.ÿÿÿÿÿgenerateÿbyteÿgrpÿ=ÿ`g'
ÿÿ6.ÿÿÿÿÿgenerateÿbyteÿtimÿ=ÿ`t'
ÿÿ7.ÿÿÿÿÿgenerateÿdoubleÿxÿ=ÿrnormal()
ÿÿ8.ÿÿÿÿÿminboundÿmatchvar,ÿrange(`=`v'ÿ/ÿ2'ÿ`=`v'ÿ*ÿ2')ÿfrom(`v')ÿarguments(xÿ`v')ÿ//ÿtrace
ÿÿ9.ÿÿÿÿÿquietlyÿreplaceÿxÿ=ÿxÿ*ÿsqrt(r(x))
ÿ10.ÿÿÿÿÿquietlyÿsummarizeÿx,ÿdetail
ÿ11.ÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Finalÿvarianceÿisÿ"ÿasÿresultÿ%06.4fÿr(Var)
ÿ12.ÿend

.ÿ
.ÿ/*ÿ0ÿÿÿÿÿ2,658ÿÿÿÿÿ0.0235ÿÿÿÿÿ0ÿÿÿÿÿ3,215ÿÿÿÿÿ0.0233
>ÿÿÿÿ1ÿÿÿÿÿ2,636ÿÿÿÿÿ0.0316ÿÿÿÿÿ1ÿÿÿÿÿ2,100ÿÿÿÿÿ0.0365ÿ*/
.ÿ
.ÿgeneratemÿ,ÿg(0)ÿt(0)ÿn(2658)ÿv(0.0235)
Finalÿvarianceÿisÿ0.0235

.ÿ
.ÿtempfileÿtmpfil0

.ÿquietlyÿsaveÿ`tmpfil0'

.ÿ
.ÿgeneratemÿ,ÿg(1)ÿt(0)ÿn(2636)ÿv(0.0316)
Finalÿvarianceÿisÿ0.0316

.ÿappendÿusingÿ`tmpfil0'

.ÿquietlyÿsaveÿ`tmpfil0',ÿreplace

.ÿ
.ÿgeneratemÿ,ÿg(0)ÿt(1)ÿn(3215)ÿv(0.0233)
Finalÿvarianceÿisÿ0.0233

.ÿappendÿusingÿ`tmpfil0'

.ÿquietlyÿsaveÿ`tmpfil0',ÿreplace

.ÿ
.ÿgeneratemÿ,ÿg(1)ÿt(1)ÿn(2100)ÿv(0.0365)
Finalÿvarianceÿisÿ0.0365

.ÿappendÿusingÿ`tmpfil0'

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿegenÿbyteÿcondÿ=ÿgroup(grpÿtim)

.ÿ
.ÿmixedÿxÿi.grp##i.tim,ÿresiduals(independent,ÿby(cond))ÿvarianceÿremlÿnolrtestÿnolog

Mixed-effectsÿREMLÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿ10,609
Groupÿvariable:ÿ_allÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿÿÿ1

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿ10,609
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿ10,609.0
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿ10,609

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(3)ÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ4.22
Logÿrestricted-likelihoodÿ=ÿÿ3986.7006ÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.2384

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿ1.grpÿ|ÿÿ-.0057174ÿÿÿ.0045636ÿÿÿÿ-1.25ÿÿÿ0.210ÿÿÿÿÿ-.014662ÿÿÿÿ.0032271
ÿÿÿÿÿÿÿ1.timÿ|ÿÿ-.0053217ÿÿÿ.0040108ÿÿÿÿ-1.33ÿÿÿ0.185ÿÿÿÿ-.0131826ÿÿÿÿ.0025393
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿgrp#timÿ|
ÿÿÿÿÿÿÿÿ1ÿ1ÿÿ|ÿÿÿ.0010779ÿÿÿ.0067421ÿÿÿÿÿ0.16ÿÿÿ0.873ÿÿÿÿ-.0121362ÿÿÿÿ.0142921
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0043756ÿÿÿ.0029732ÿÿÿÿÿ1.47ÿÿÿ0.141ÿÿÿÿ-.0014518ÿÿÿÿ.0102029
------------------------------------------------------------------------------

------------------------------------------------------------------------------
ÿÿRandom-effectsÿParametersÿÿ|ÿÿÿEstimateÿÿÿStd.ÿErr.ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-----------------------------+------------------------------------------------
_all:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(empty)ÿ|
-----------------------------+------------------------------------------------
Residual:ÿIndependent,ÿÿÿÿÿÿÿ|
ÿÿÿÿbyÿcondÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1:ÿvar(e)ÿ|ÿÿÿ.0234967ÿÿÿ.0006447ÿÿÿÿÿÿ.0222666ÿÿÿÿ.0247948
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ2:ÿvar(e)ÿ|ÿÿÿ.0232966ÿÿÿ.0005811ÿÿÿÿÿÿÿ.022185ÿÿÿÿÿ.024464
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ3:ÿvar(e)ÿ|ÿÿÿ.0315966ÿÿÿ.0008705ÿÿÿÿÿÿ.0299357ÿÿÿÿ.0333496
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ4:ÿvar(e)ÿ|ÿÿÿ.0365033ÿÿÿ.0011268ÿÿÿÿÿÿ.0343603ÿÿÿÿ.0387799
------------------------------------------------------------------------------

.ÿ
.ÿnlcomÿ(ratio0:ÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns3ose:_cons])^2ÿ/ÿexp(_b[lnsig_e:_cons])^2)ÿ///
>ÿÿÿÿÿ(ratio1:ÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns4ose:_cons])^2ÿ/ÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns2ose:_cons])^2),ÿpost

ÿÿÿÿÿÿratio0:ÿÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns3ose:_cons])^2ÿ/ÿexp(_b[lnsig_e:_cons])^2
ÿÿÿÿÿÿratio1:ÿÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns4ose:_cons])^2ÿ/ÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns2ose:_cons])^2

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿratio0ÿ|ÿÿÿ1.344724ÿÿÿ.0522844ÿÿÿÿ25.72ÿÿÿ0.000ÿÿÿÿÿ1.242248ÿÿÿÿ1.447199
ÿÿÿÿÿÿratio1ÿ|ÿÿÿ1.566889ÿÿÿ.0621862ÿÿÿÿ25.20ÿÿÿ0.000ÿÿÿÿÿ1.445007ÿÿÿÿ1.688772
------------------------------------------------------------------------------

.ÿtestÿ_b[ratio1]ÿ=ÿ_b[ratio0]

ÿ(ÿ1)ÿÿ-ÿratio0ÿ+ÿratio1ÿ=ÿ0

ÿÿÿÿÿÿÿÿÿÿÿchi2(ÿÿ1)ÿ=ÿÿÿÿ7.48
ÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.0062

.ÿ
.ÿexit

endÿofÿdo-file

.

You could do something analogous with sem, too. And it would be a lot less unseemly than that above, which strikes me as something from one of those SASaholics who have a penchant for applying PROC MIXED to problems that it wasn't intended for.
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#8

25 May 2016, 07:28

Hi Clyde and Joseph,

Thank you both for the detailed and valuable suggestions.

I need some time in order for me to implement these codes to my dataset.

I will update you later on when I try the programs.

My sincere gratitude for your help.

Mostafa
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#9

08 Jun 2016, 08:27

Hello Clyde,

I have been reading about bootstrapping and I have one fundamental question in order to understand how this code works.

My understanding is that the bootstrapping process creates sub-samples from the initial sample and in every time it calculates the ratio_of_ratios_of_variances. At the end, it gives me a p-value which confirms that the ratio_of_ratios_of_variances is statistically significant given the 10000 sub-samples it created.

Is this correct?

One more thing, please: are we testing that the ratio_of_ratios_of_variances is statistically different from zero of from one? I think that I need to test if the ratio_of_ratios_of_variances is statistically different from one so that I can claim that the difference between the two ratios is statistically significant.

Thank you so much for your constant help.

Regards,
Mostafa
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

08 Jun 2016, 09:34

You are correct; the null value for a ratio is 1, not zero. What the bootstrap will give you is a series of estimates of the ratio each derived from a sample drawn at random (with replacement) from your original data. From that series it will then calculate a confidence interval--the output of the -estat bootstrap- command. It does not give you a p-value of any hypothesis test. However, you can convert that confidence interval into a test of the null hypothesis H0: Ratio = 1, by simply looking at whether the confidence interval contains or excludes 1. If it excludes 1, you have rejected that null hypothesis at the .05 significance level.
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#11

08 Jun 2016, 09:40

Clyde,

I customised your code so that it fits my data.

The code works perfectly !

Thank you so much. I'm really grateful

Mostafa
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#12

08 Jul 2016, 12:07

Hello Joseph,

Allow me to thank you again for providing me with the code above.

I haven't had the chance to try it before today.

The code works fine up until to the command - generatem, g(0) t(0) n(2658) v(0.0235) -; Stata returns an error message of "Invalid syntax".

I would really appreciate it if you can tell me what might be going wrong.

Kindly, I have few questions in additions:

1) May you briefly explain the objective for the main parts of the code? Are there some parts which I can exclude when applying the code to my dataset [like -set seeds-, -args-, -g(0)- and -t(0)-]? In other words, should I plug the number of observations along with the variances in the code you provided OR should I apply the code to my dataset and the code will retrieve the number of observations and the variances automatically?

2) May you please explain briefly the statistical concept being implemented? What I got is the "multilevel mixed-effects linear regression" and the "non-linear combinations of estimators"; yet, it was difficult for me to develop an understanding about how these concepts help in testing the difference between the two variance ratios.

Thank you so much for your valuable help.

Best,
Mostafa
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#13

08 Jul 2016, 13:21

P.S. Joseph: I tried to copy and paste the code in to a do file. I attached the do file so that it makes it easier for inspecting the source of the error I'm encountering. Joseph Coveney Stata code for testing variance ratios.do

Last edited by Mostafa Harakeh; 08 Jul 2016, 13:50.
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#14

08 Jul 2016, 17:12

Dear Joseph,

I am sorry for posting multiple times. I just want to tell you that code eventually worked for me.

You might be wondering why would someone face problems when he can simply copy and paste: the thing is that when I copy and paste, too many strange characters are pasted in the do-file. So it took me time to clean it and it took me more time in order to figure out how to work around the error message "unexpected end of file".

Now everything is okay, may you please give me some guidance on the two points I mentioned above? I will paste them below for easiness:

1) May you briefly explain the objective for the main parts of the code? Are there some parts which I can exclude when applying the code to my dataset [like -set seeds-, -args-, -g(0)- and -t(0)-]? In other words, should I plug the number of observations along with the variances in the code you provided OR should I apply the code to my dataset and the code will retrieve the number of observations and the variances automatically?

2) May you please explain briefly the statistical concept being implemented? All I got is the "multilevel mixed-effects linear regression" and the "non-linear combinations of estimators"; yet, it was difficult for me to develop an understanding about how these concepts help in testing the difference between the two variance ratios.

Thank you very much for your generous help.

Mostafa
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

12 Jul 2016, 11:23

So, in the code Joseph posted in #7, you can ignore everything up to where he says "Begin here." Everything before that was just to generate an example data set to demonstrate the code.

The -mixed- command then fits a random effects model with group#time interaction. This is equivalent to just breaking the data into 4 groups based on two levels of time and two levels of group. The convenient thing is that because of the -residuals(independent, by(cond))- option, it also estimates the variance in each group#time combination separately. Those results appear in the table:

Code:

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ _all: (empty) | -----------------------------+------------------------------------------------ Residual: Independent, | by cond | 1: var(e) | .0234967 .0006447 .0222666 .0247948 2: var(e) | .0232966 .0005811 .022185 .024464 3: var(e) | .0315966 .0008705 .0299357 .0333496 4: var(e) | .0365033 .0011268 .0343603 .0387799 ------------------------------------------------------------------------------

Next, he runs the -nlcom- command with the -post- option. This command calculates the two variance ratios, as well as estimating their standard errors. The expressions involving _b[whatever] refer to the coefficients and variance component estimates that the -mixed- command created and left behind in memory in a "virtual" matrix _b. Because of the -post- option, the statistics regarding ratio1 and ratio2 are posted as estimates themselves and are available to the -test- command. He then uses -test- to test whether the ratios are equal.

To really understand why this works, you have to understand how -mixed- saves its estimates of the variance components in _b, which is somewhat complicated. If you really want to know those details, they can be found in the [ME] reference manual in the -mixed- chapter. But there are relatively few circumstances where one needs to get into this level of Stata internals detail. The essence of it is as explained in the preceding paragraph: the variance ratios are estimated by -nlcom- and posted to e(), where they can then be used in the -test- command.
1 like
Comment

Announcement

Testing the significance of the difference between two variances

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment