Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing the significance of the difference between two variances

    Dear Stata users,

    I have a dataset with two groups (0,1) and two time periods (0,1). The dummy for the groups is group_dummy and the dummy for the time periods is time_dummy. The variable of interest is X.

    I need to test the difference in the variances of X among both groups in both time periods, separately.

    I think this can be done using the command:
    robvar X if time_dummy==0, by (group_dummy)
    robvar X if time_dummy==1, by (group_dummy)

    This works perfectly and life is good so far.

    Now I construct a new variable Ratio which equals the [Variance(X | group_dummy=1) / Variance(X | group_dummy=0)].

    The table below shows the numbers which I got from Stata after executing the commands above and after calculating the Ratio variable.

    time_dummy == 0 time_dummy == 1
    group_dummy N Variance group_dummy N Variance
    0 2,658 0.0235 0 3,215 0.0233
    1 2,636 0.0316 1 2,100 0.0365
    F-stat df Ratio = Var(1) / Var(0) F-stat df Ratio = Var(1) / Var(0)
    35.20 5292 1.35 77.78 5313 1.56

    My question is: how can I test the statistical significance of the difference in the Ratio between time 0 and time 1?

    Hint: I think that I can test that (1.56/1.35) is significantly different than 1. But how to do it in the context of variances?

    Thank you so much for your valuable suggestions. I truly appreciate your help.

    Mostafa

  • #2
    Stata has a built in command (sdtest) for equality of variances, -help sdtest- and read the manual for relevant examples.

    Comment


    • #3
      Hi Oded,

      Thank you for you suggestion.

      The -sdtest- is similar to the -robvar- command. Actually the -robvar- command seems to give more setailed significance tests.

      My question is about testing the difference between the ratios above.

      Regards

      mostafa

      Comment


      • #4
        Well, I don't know of any statistical tests for comparing two variance ratios, even putting aside whether somebody has written a Stata program to calculate them. And the fact that you've gone 24 hours without anybody else coming up with one suggests to me that there is no widely accepted approach to this.

        So, if I were in your situation, I would probably try to do this by bootstrapping it.

        Comment


        • #5
          Hi Clyde,

          Thank you for suggesting a plausible alternative.

          May you please elaborate more on that?

          All I know about bootstrapping is that it randomly selects a subsample from the original sample for 1000 times, for example. Is this true?

          How can I apply this to my case?

          Thanks again!

          Mostafa

          Comment


          • #6
            So, you have to write a little program that calculates your ratio of ratios of variances, and then run that through the -bootstrap:- prefix. Here's an example of how it's done using made up data:

            Code:
            clear*
            
            // DEFINE PROGRAM TO CALCULATE RATIO OF RATIOS
            capture program drop ratio_of_ratios_of_variances
            program define ratio_of_ratios_of_variances, rclass
                forvalues t = 0/1 {
                    forvalues g = 0/1 {
                        quietly summ x if time == `t' & group == `g'
                        scalar var`t'`g' = r(Var)
                    }
                }
                scalar ratio0 = var00/var01
                scalar ratio1 = var10/var11
                return scalar ratio_of_ratios = ratio1/ratio0
                exit
            end
                
            
            // CREATE TOY DATA SET
            set obs 100
            gen byte group = mod(_n, 2)
            expand 2
            by group, sort: gen time = mod(_n, 2)
            set seed 1234
            gen x = rnormal(0, 0.5 + group)
            
            // SHOW THE DATA
            table group time, c(N x mean x sd x)
            
            // CALCULATE RATIO OF RATIOS FOR ORIGINAL SAMPLE
            ratio_of_ratios_of_variances
            return list
            
            // NOW BOOTSTRAP
            bootstrap r(ratio_of_ratios), nodrop nodots reps(10000): ///
                ratio_of_ratios_of_variances
            // AND GET BOOTSTRAP CONFIDENCE INTERVALS
            estat bootstrap
            Note: When customizing this to your own data, you might want to reduce the -reps()- option in the -bootstrap- command to a small number, say 100, so it will run quickly while you're testing. But you need a large number of reps to adequately sample the tails of this distribution, especially since, as a ratio of ratios that are themselves fairly close to zero, it may have a very heavy right tail. So when you're ready for production runs, put it back up to 10,000 at a minimum. Even 50,000 or 100,000 if you have the patience.

            See -help bootstrap- and the corresponding manual section to understand the details of the -bootstrap- command.

            Once you have the bias corrected confidence interval, you can base a test of the significance of the ratio of the variance ratios by seeing whether the confidence interval excludes or includes 1.

            Comment


            • #7
              I like Clyde's bootstrapping approach, because I have a hunch that it would be more robust to nonnormality. But if you're too impatient to wait for tens of thousands of bootstrap samples, then you could try something like the following.

              .ÿversionÿ14.1

              .ÿ
              .ÿclearÿ*

              .ÿsetÿmoreÿoff

              .ÿsetÿseedÿ`=date("2016-05-25",ÿ"YMD")'

              .ÿ
              .ÿprogramÿdefineÿmatchvar,ÿrclass
              ÿÿ1.ÿÿÿÿÿversionÿ14.1
              ÿÿ2.ÿÿÿÿÿargsÿmÿxÿV
              ÿÿ3.ÿ
              .ÿÿÿÿÿtempvarÿtmpvar0
              ÿÿ4.ÿÿÿÿÿgenerateÿdoubleÿ`tmpvar0'ÿ=ÿ`x'ÿ*ÿsqrt(`m')
              ÿÿ5.ÿÿÿÿÿsummarizeÿ`tmpvar0',ÿdetail
              ÿÿ6.ÿÿÿÿÿreturnÿscalarÿfxÿ=ÿ(r(Var)ÿ-ÿ`V')^2
              ÿÿ7.ÿend

              .ÿ
              .ÿprogramÿdefineÿgeneratem
              ÿÿ1.ÿÿÿÿÿversionÿ14.1
              ÿÿ2.ÿÿÿÿÿsyntaxÿ,ÿg(integer)ÿt(integer)ÿn(integer)ÿv(real)
              ÿÿ3.ÿ
              .ÿÿÿÿÿdropÿ_all
              ÿÿ4.ÿÿÿÿÿquietlyÿsetÿobsÿ`n'
              ÿÿ5.ÿÿÿÿÿgenerateÿbyteÿgrpÿ=ÿ`g'
              ÿÿ6.ÿÿÿÿÿgenerateÿbyteÿtimÿ=ÿ`t'
              ÿÿ7.ÿÿÿÿÿgenerateÿdoubleÿxÿ=ÿrnormal()
              ÿÿ8.ÿÿÿÿÿminboundÿmatchvar,ÿrange(`=`v'ÿ/ÿ2'ÿ`=`v'ÿ*ÿ2')ÿfrom(`v')ÿarguments(xÿ`v')ÿ//ÿtrace
              ÿÿ9.ÿÿÿÿÿquietlyÿreplaceÿxÿ=ÿxÿ*ÿsqrt(r(x))
              ÿ10.ÿÿÿÿÿquietlyÿsummarizeÿx,ÿdetail
              ÿ11.ÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Finalÿvarianceÿisÿ"ÿasÿresultÿ%06.4fÿr(Var)
              ÿ12.ÿend

              .ÿ
              .ÿ/*ÿ0ÿÿÿÿÿ2,658ÿÿÿÿÿ0.0235ÿÿÿÿÿ0ÿÿÿÿÿ3,215ÿÿÿÿÿ0.0233
              >ÿÿÿÿ1ÿÿÿÿÿ2,636ÿÿÿÿÿ0.0316ÿÿÿÿÿ1ÿÿÿÿÿ2,100ÿÿÿÿÿ0.0365ÿ*/
              .ÿ
              .ÿgeneratemÿ,ÿg(0)ÿt(0)ÿn(2658)ÿv(0.0235)
              Finalÿvarianceÿisÿ0.0235

              .ÿ
              .ÿtempfileÿtmpfil0

              .ÿquietlyÿsaveÿ`tmpfil0'

              .ÿ
              .ÿgeneratemÿ,ÿg(1)ÿt(0)ÿn(2636)ÿv(0.0316)
              Finalÿvarianceÿisÿ0.0316

              .ÿappendÿusingÿ`tmpfil0'

              .ÿquietlyÿsaveÿ`tmpfil0',ÿreplace

              .ÿ
              .ÿgeneratemÿ,ÿg(0)ÿt(1)ÿn(3215)ÿv(0.0233)
              Finalÿvarianceÿisÿ0.0233

              .ÿappendÿusingÿ`tmpfil0'

              .ÿquietlyÿsaveÿ`tmpfil0',ÿreplace

              .ÿ
              .ÿgeneratemÿ,ÿg(1)ÿt(1)ÿn(2100)ÿv(0.0365)
              Finalÿvarianceÿisÿ0.0365

              .ÿappendÿusingÿ`tmpfil0'

              .ÿ
              .ÿ*
              .ÿ*ÿBeginÿhere
              .ÿ*
              .ÿegenÿbyteÿcondÿ=ÿgroup(grpÿtim)

              .ÿ
              .ÿmixedÿxÿi.grp##i.tim,ÿresiduals(independent,ÿby(cond))ÿvarianceÿremlÿnolrtestÿnolog

              Mixed-effectsÿREMLÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿ10,609
              Groupÿvariable:ÿ_allÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿÿÿ1

              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿ10,609
              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿ10,609.0
              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿ10,609

              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(3)ÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ4.22
              Logÿrestricted-likelihoodÿ=ÿÿ3986.7006ÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.2384

              ------------------------------------------------------------------------------
              ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
              -------------+----------------------------------------------------------------
              ÿÿÿÿÿÿÿ1.grpÿ|ÿÿ-.0057174ÿÿÿ.0045636ÿÿÿÿ-1.25ÿÿÿ0.210ÿÿÿÿÿ-.014662ÿÿÿÿ.0032271
              ÿÿÿÿÿÿÿ1.timÿ|ÿÿ-.0053217ÿÿÿ.0040108ÿÿÿÿ-1.33ÿÿÿ0.185ÿÿÿÿ-.0131826ÿÿÿÿ.0025393
              ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
              ÿÿÿÿÿgrp#timÿ|
              ÿÿÿÿÿÿÿÿ1ÿ1ÿÿ|ÿÿÿ.0010779ÿÿÿ.0067421ÿÿÿÿÿ0.16ÿÿÿ0.873ÿÿÿÿ-.0121362ÿÿÿÿ.0142921
              ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
              ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0043756ÿÿÿ.0029732ÿÿÿÿÿ1.47ÿÿÿ0.141ÿÿÿÿ-.0014518ÿÿÿÿ.0102029
              ------------------------------------------------------------------------------

              ------------------------------------------------------------------------------
              ÿÿRandom-effectsÿParametersÿÿ|ÿÿÿEstimateÿÿÿStd.ÿErr.ÿÿÿÿÿ[95%ÿConf.ÿInterval]
              -----------------------------+------------------------------------------------
              _all:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(empty)ÿ|
              -----------------------------+------------------------------------------------
              Residual:ÿIndependent,ÿÿÿÿÿÿÿ|
              ÿÿÿÿbyÿcondÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1:ÿvar(e)ÿ|ÿÿÿ.0234967ÿÿÿ.0006447ÿÿÿÿÿÿ.0222666ÿÿÿÿ.0247948
              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ2:ÿvar(e)ÿ|ÿÿÿ.0232966ÿÿÿ.0005811ÿÿÿÿÿÿÿ.022185ÿÿÿÿÿ.024464
              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ3:ÿvar(e)ÿ|ÿÿÿ.0315966ÿÿÿ.0008705ÿÿÿÿÿÿ.0299357ÿÿÿÿ.0333496
              ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ4:ÿvar(e)ÿ|ÿÿÿ.0365033ÿÿÿ.0011268ÿÿÿÿÿÿ.0343603ÿÿÿÿ.0387799
              ------------------------------------------------------------------------------

              .ÿ
              .ÿnlcomÿ(ratio0:ÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns3ose:_cons])^2ÿ/ÿexp(_b[lnsig_e:_cons])^2)ÿ///
              >ÿÿÿÿÿ(ratio1:ÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns4ose:_cons])^2ÿ/ÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns2ose:_cons])^2),ÿpost

              ÿÿÿÿÿÿratio0:ÿÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns3ose:_cons])^2ÿ/ÿexp(_b[lnsig_e:_cons])^2
              ÿÿÿÿÿÿratio1:ÿÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns4ose:_cons])^2ÿ/ÿexp(_b[lnsig_e:_cons]ÿ+ÿ_b[r_lns2ose:_cons])^2

              ------------------------------------------------------------------------------
              ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
              -------------+----------------------------------------------------------------
              ÿÿÿÿÿÿratio0ÿ|ÿÿÿ1.344724ÿÿÿ.0522844ÿÿÿÿ25.72ÿÿÿ0.000ÿÿÿÿÿ1.242248ÿÿÿÿ1.447199
              ÿÿÿÿÿÿratio1ÿ|ÿÿÿ1.566889ÿÿÿ.0621862ÿÿÿÿ25.20ÿÿÿ0.000ÿÿÿÿÿ1.445007ÿÿÿÿ1.688772
              ------------------------------------------------------------------------------

              .ÿtestÿ_b[ratio1]ÿ=ÿ_b[ratio0]

              ÿ(ÿ1)ÿÿ-ÿratio0ÿ+ÿratio1ÿ=ÿ0

              ÿÿÿÿÿÿÿÿÿÿÿchi2(ÿÿ1)ÿ=ÿÿÿÿ7.48
              ÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.0062

              .ÿ
              .ÿexit

              endÿofÿdo-file


              .


              You could do something analogous with sem, too. And it would be a lot less unseemly than that above, which strikes me as something from one of those SASaholics who have a penchant for applying PROC MIXED to problems that it wasn't intended for.

              Comment


              • #8
                Hi Clyde and Joseph,

                Thank you both for the detailed and valuable suggestions.

                I need some time in order for me to implement these codes to my dataset.

                I will update you later on when I try the programs.

                My sincere gratitude for your help.

                Mostafa

                Comment


                • #9
                  Hello Clyde,

                  I have been reading about bootstrapping and I have one fundamental question in order to understand how this code works.

                  My understanding is that the bootstrapping process creates sub-samples from the initial sample and in every time it calculates the ratio_of_ratios_of_variances. At the end, it gives me a p-value which confirms that the ratio_of_ratios_of_variances is statistically significant given the 10000 sub-samples it created.

                  Is this correct?

                  One more thing, please: are we testing that the ratio_of_ratios_of_variances is statistically different from zero of from one? I think that I need to test if the ratio_of_ratios_of_variances is statistically different from one so that I can claim that the difference between the two ratios is statistically significant.

                  Thank you so much for your constant help.

                  Regards,
                  Mostafa

                  Comment


                  • #10
                    You are correct; the null value for a ratio is 1, not zero. What the bootstrap will give you is a series of estimates of the ratio each derived from a sample drawn at random (with replacement) from your original data. From that series it will then calculate a confidence interval--the output of the -estat bootstrap- command. It does not give you a p-value of any hypothesis test. However, you can convert that confidence interval into a test of the null hypothesis H0: Ratio = 1, by simply looking at whether the confidence interval contains or excludes 1. If it excludes 1, you have rejected that null hypothesis at the .05 significance level.

                    Comment


                    • #11
                      Clyde,

                      I customised your code so that it fits my data.

                      The code works perfectly !

                      Thank you so much. I'm really grateful

                      Mostafa

                      Comment


                      • #12
                        Hello Joseph,

                        Allow me to thank you again for providing me with the code above.

                        I haven't had the chance to try it before today.

                        The code works fine up until to the command - generatem, g(0) t(0) n(2658) v(0.0235) -; Stata returns an error message of "Invalid syntax".

                        I would really appreciate it if you can tell me what might be going wrong.

                        Kindly, I have few questions in additions:

                        1) May you briefly explain the objective for the main parts of the code? Are there some parts which I can exclude when applying the code to my dataset [like -set seeds-, -args-, -g(0)- and -t(0)-]? In other words, should I plug the number of observations along with the variances in the code you provided OR should I apply the code to my dataset and the code will retrieve the number of observations and the variances automatically?

                        2) May you please explain briefly the statistical concept being implemented? What I got is the "multilevel mixed-effects linear regression" and the "non-linear combinations of estimators"; yet, it was difficult for me to develop an understanding about how these concepts help in testing the difference between the two variance ratios.

                        Thank you so much for your valuable help.

                        Best,
                        Mostafa





                        Comment


                        • #13
                          P.S. Joseph: I tried to copy and paste the code in to a do file. I attached the do file so that it makes it easier for inspecting the source of the error I'm encountering. Joseph Coveney Stata code for testing variance ratios.do
                          Last edited by Mostafa Harakeh; 08 Jul 2016, 13:50.

                          Comment


                          • #14
                            Dear Joseph,

                            I am sorry for posting multiple times. I just want to tell you that code eventually worked for me.

                            You might be wondering why would someone face problems when he can simply copy and paste: the thing is that when I copy and paste, too many strange characters are pasted in the do-file. So it took me time to clean it and it took me more time in order to figure out how to work around the error message "unexpected end of file".

                            Now everything is okay, may you please give me some guidance on the two points I mentioned above? I will paste them below for easiness:

                            1) May you briefly explain the objective for the main parts of the code? Are there some parts which I can exclude when applying the code to my dataset [like -set seeds-, -args-, -g(0)- and -t(0)-]? In other words, should I plug the number of observations along with the variances in the code you provided OR should I apply the code to my dataset and the code will retrieve the number of observations and the variances automatically?

                            2) May you please explain briefly the statistical concept being implemented? All I got is the "multilevel mixed-effects linear regression" and the "non-linear combinations of estimators"; yet, it was difficult for me to develop an understanding about how these concepts help in testing the difference between the two variance ratios.

                            Thank you very much for your generous help.

                            Mostafa

                            Comment


                            • #15
                              So, in the code Joseph posted in #7, you can ignore everything up to where he says "Begin here." Everything before that was just to generate an example data set to demonstrate the code.

                              The -mixed- command then fits a random effects model with group#time interaction. This is equivalent to just breaking the data into 4 groups based on two levels of time and two levels of group. The convenient thing is that because of the -residuals(independent, by(cond))- option, it also estimates the variance in each group#time combination separately. Those results appear in the table:
                              Code:
                              ------------------------------------------------------------------------------
                                Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
                              -----------------------------+------------------------------------------------
                              _all:                (empty) |
                              -----------------------------+------------------------------------------------
                              Residual: Independent,       |
                                  by cond                  |
                                                 1: var(e) |   .0234967   .0006447      .0222666    .0247948
                                                 2: var(e) |   .0232966   .0005811       .022185     .024464
                                                 3: var(e) |   .0315966   .0008705      .0299357    .0333496
                                                 4: var(e) |   .0365033   .0011268      .0343603    .0387799
                              ------------------------------------------------------------------------------
                              Next, he runs the -nlcom- command with the -post- option. This command calculates the two variance ratios, as well as estimating their standard errors. The expressions involving _b[whatever] refer to the coefficients and variance component estimates that the -mixed- command created and left behind in memory in a "virtual" matrix _b. Because of the -post- option, the statistics regarding ratio1 and ratio2 are posted as estimates themselves and are available to the -test- command. He then uses -test- to test whether the ratios are equal.

                              To really understand why this works, you have to understand how -mixed- saves its estimates of the variance components in _b, which is somewhat complicated. If you really want to know those details, they can be found in the [ME] reference manual in the -mixed- chapter. But there are relatively few circumstances where one needs to get into this level of Stata internals detail. The essence of it is as explained in the preceding paragraph: the variance ratios are estimated by -nlcom- and posted to e(), where they can then be used in the -test- command.

                              Comment

                              Working...
                              X