Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical variable, prove that one answer is significantly more indicated then others.

    Hello,

    One of the questions of my survey is a categorical variable. There are 4 options to choose (0 1 2 and 3). I already did the function tabulate to see how many times answer 0 1 2 and 3 were indicated. It is very obvious that answer 2 is indicated the most (because it was the default option). But how can i prove that this is significantly different from the other 3 answer possibilities?

    Maybe to make your explanation a litle bit easier: the variabele name of this variable is "risktolerance".

    Thanks a lot!

  • #2
    Nick:
    take a look at -help svy_estimation- and -mlogit-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Nick:
      take a look at -help svy_estimation- and -mlogit-.
      Hi Carlo, thanks for your response. Does it make sense to use a mlogit model with just one variable? In this case: mlogit risktolerance?

      I have 344 observations for risktolerance=0, 645 for risktolerance=1, 3868 for risktolerance=2 and 243 for risktolerance=3. The only thing I want is prove that categorie 2 has significantly more answers than the others (which is very obvious, but i am looking for the statistical test)

      Comment


      • #4
        Yes, you can use an mlogit model with just one variable.

        But please be more careful with your language. A statistical test, by its very nature, cannot "prove" anything. Relatedly, having a p-value < 0.05 (or whatever threshold you choose) does not mean that there is a difference and other wise there isn't. p-values give you the probability of drawing a sample containing results as extreme as those you get if, at the population level, the null hypothesis holds. That's all it does. So you may conclude that very few random samples from a population in which the frequency of response 2 does not differ from that of the other variables. Fine. But that is by no means a proof of anything. It is a probabilistic result--always keep that in mind. The phrase "statistcally significant" is seductive and dangerous. If it were up to me, it would be banished from the English language. But if you must use it, be sure you understand what it really means.

        Comment


        • #5
          Okay, thank you Clyde. And I forgot to say that actually there is a hierarchy in this variable. 0 is the least risktolerance and 3 is the most risktolerance. So I think I will need to use an ordered regression, or am I wrong? Is this an ordered probit regression (oprobit risktolerance)?

          Comment


          • #6
            Actually, because what you are focused on here is the frequency of the response, it is better to treat it as a nominal categorical variable and ignore the ordinal properties. If you were looking at, say, a trend towards increasing response frequency for higher (or for lower) levels of the variable, then an ordinal analysis would be in order. But you are not saying freq 0 < freq 1 < freq 2< freq 3. You are saying freq 2 > (all of freq 0, freq 1, freq 3) which is clearly not an ordinal hypothesis.

            Comment


            • #7
              Ok so now I did the mlogit, but how do I interpret these results? The base outcome is standardly set at number 2 (because of the most observations I suppose), and then categorie 0, 1 and 3 are left. Alle the p-values are 0.000 (so significant). So can I now just say that answer 2 is more indicated then the others, or do I need to run another test for this?

              Comment


              • #8
                Each of those p-values tests the null hypothesis that the corresponding category frequency equals that of category 2. By rejecting all of those, and given the obvious direction in which they are unequal, you have accomplished your goal.

                Comment


                • #9
                  You can also check out the user written command -chitest-. Type

                  Code:
                  findit chitest
                  and install tab_chi.

                  This gives you for a single variable the test that the observed frequencies/counts are equal (by default), or equal to some specified, if you specify them explicitly.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    Each of those p-values tests the null hypothesis that the corresponding category frequency equals that of category 2. By rejecting all of those, and given the obvious direction in which they are unequal, you have accomplished your goal.
                    Thank you so much, now I understand it! I just have another (litle bit similar) question. In my dataset I have another variable that specifies which version the people filled in. Version 0 or Version 1. Now I want to see whether answer 2 of the variabele 'risktolerance' (0, 1, 2, 3 as possible answers) is significantly indicated more in version 0.

                    With the function 'tabulate version risktolerance' I can already see the absolute numbers. But now I just want to compare whether answer 2 is indicated more in version 0 compared to version 1. Any idea how I fix this?

                    Comment


                    • #11
                      Nick:
                      -suest- may be an option:
                      Code:
                      . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
                      
                      . mlogit rep78 if foreign==1 & rep78>=3
                      
                      Iteration 0:   log likelihood = -21.089092  
                      Iteration 1:   log likelihood = -21.089092  (backed up)
                      
                      Multinomial logistic regression                         Number of obs =     21
                                                                              LR chi2(0)    =   0.00
                                                                              Prob > chi2   =      .
                      Log likelihood = -21.089092                             Pseudo R2     = 0.0000
                      
                      ------------------------------------------------------------------------------
                             rep78 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      3            |
                             _cons |  -1.098612   .6666667    -1.65   0.099    -2.405255    .2080304
                      -------------+----------------------------------------------------------------
                      4            |  (base outcome)
                      -------------+----------------------------------------------------------------
                      5            |
                             _cons |          0   .4714045     0.00   1.000    -.9239359    .9239359
                      ------------------------------------------------------------------------------
                      
                      . estimates store visitors
                      
                      . mlogit rep78 if foreign==0 & rep78>=3
                      
                      Iteration 0:   log likelihood = -28.079363  
                      Iteration 1:   log likelihood = -28.079363  
                      
                      Multinomial logistic regression                         Number of obs =     38
                                                                              LR chi2(0)    =   0.00
                                                                              Prob > chi2   =      .
                      Log likelihood = -28.079363                             Pseudo R2     = 0.0000
                      
                      ------------------------------------------------------------------------------
                             rep78 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      3            |  (base outcome)
                      -------------+----------------------------------------------------------------
                      4            |
                             _cons |  -1.098612   .3849002    -2.85   0.004    -1.853003   -.3442218
                      -------------+----------------------------------------------------------------
                      5            |
                             _cons |   -2.60269   .7328281    -3.55   0.000    -4.039006   -1.166373
                      ------------------------------------------------------------------------------
                      
                      . estimates store home
                      
                      . mlogit rep78 if foreign==1 & rep78>=3
                      
                      Iteration 0:   log likelihood = -21.089092  
                      Iteration 1:   log likelihood = -21.089092  (backed up)
                      
                      Multinomial logistic regression                         Number of obs =     21
                                                                              LR chi2(0)    =   0.00
                                                                              Prob > chi2   =      .
                      Log likelihood = -21.089092                             Pseudo R2     = 0.0000
                      
                      ------------------------------------------------------------------------------
                             rep78 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      3            |
                             _cons |  -1.098612   .6666667    -1.65   0.099    -2.405255    .2080304
                      -------------+----------------------------------------------------------------
                      4            |  (base outcome)
                      -------------+----------------------------------------------------------------
                      5            |
                             _cons |          0   .4714045     0.00   1.000    -.9239359    .9239359
                      ------------------------------------------------------------------------------
                      
                      . estimates store visitors
                      
                      . suest home visitors
                      
                      Simultaneous results for home, visitors                     Number of obs = 59
                      
                      ------------------------------------------------------------------------------
                                   |               Robust
                                   | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      home_3       |
                             _cons |          0  (omitted)
                      -------------+----------------------------------------------------------------
                      home_4       |
                             _cons |  -1.098612   .3882041    -2.83   0.005    -1.859478   -.3377462
                      -------------+----------------------------------------------------------------
                      home_5       |
                             _cons |   -2.60269   .7391186    -3.52   0.000    -4.051336   -1.154044
                      -------------+----------------------------------------------------------------
                      visitors_3   |
                             _cons |  -1.098612   .6723892    -1.63   0.102    -2.416471    .2192464
                      -------------+----------------------------------------------------------------
                      visitors_4   |
                             _cons |          0  (omitted)
                      -------------+----------------------------------------------------------------
                      visitors_5   |
                             _cons |          0    .475451     0.00   1.000    -.9318668    .9318668
                      ------------------------------------------------------------------------------
                      
                      . test [home_3]_cons=[visitors_3]_cons
                      
                       ( 1)  [home_3]o._cons - [visitors_3]_cons = 0
                      
                                 chi2(  1) =    2.67
                               Prob > chi2 =    0.1023
                      
                      .
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Originally posted by Carlo Lazzaro View Post
                        Nick:
                        -suest- may be an option:
                        Code:
                        . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
                        
                        . mlogit rep78 if foreign==1 & rep78>=3
                        
                        Iteration 0: log likelihood = -21.089092
                        Iteration 1: log likelihood = -21.089092 (backed up)
                        
                        Multinomial logistic regression Number of obs = 21
                        LR chi2(0) = 0.00
                        Prob > chi2 = .
                        Log likelihood = -21.089092 Pseudo R2 = 0.0000
                        
                        ------------------------------------------------------------------------------
                        rep78 | Coefficient Std. err. z P>|z| [95% conf. interval]
                        -------------+----------------------------------------------------------------
                        3 |
                        _cons | -1.098612 .6666667 -1.65 0.099 -2.405255 .2080304
                        -------------+----------------------------------------------------------------
                        4 | (base outcome)
                        -------------+----------------------------------------------------------------
                        5 |
                        _cons | 0 .4714045 0.00 1.000 -.9239359 .9239359
                        ------------------------------------------------------------------------------
                        
                        . estimates store visitors
                        
                        . mlogit rep78 if foreign==0 & rep78>=3
                        
                        Iteration 0: log likelihood = -28.079363
                        Iteration 1: log likelihood = -28.079363
                        
                        Multinomial logistic regression Number of obs = 38
                        LR chi2(0) = 0.00
                        Prob > chi2 = .
                        Log likelihood = -28.079363 Pseudo R2 = 0.0000
                        
                        ------------------------------------------------------------------------------
                        rep78 | Coefficient Std. err. z P>|z| [95% conf. interval]
                        -------------+----------------------------------------------------------------
                        3 | (base outcome)
                        -------------+----------------------------------------------------------------
                        4 |
                        _cons | -1.098612 .3849002 -2.85 0.004 -1.853003 -.3442218
                        -------------+----------------------------------------------------------------
                        5 |
                        _cons | -2.60269 .7328281 -3.55 0.000 -4.039006 -1.166373
                        ------------------------------------------------------------------------------
                        
                        . estimates store home
                        
                        . mlogit rep78 if foreign==1 & rep78>=3
                        
                        Iteration 0: log likelihood = -21.089092
                        Iteration 1: log likelihood = -21.089092 (backed up)
                        
                        Multinomial logistic regression Number of obs = 21
                        LR chi2(0) = 0.00
                        Prob > chi2 = .
                        Log likelihood = -21.089092 Pseudo R2 = 0.0000
                        
                        ------------------------------------------------------------------------------
                        rep78 | Coefficient Std. err. z P>|z| [95% conf. interval]
                        -------------+----------------------------------------------------------------
                        3 |
                        _cons | -1.098612 .6666667 -1.65 0.099 -2.405255 .2080304
                        -------------+----------------------------------------------------------------
                        4 | (base outcome)
                        -------------+----------------------------------------------------------------
                        5 |
                        _cons | 0 .4714045 0.00 1.000 -.9239359 .9239359
                        ------------------------------------------------------------------------------
                        
                        . estimates store visitors
                        
                        . suest home visitors
                        
                        Simultaneous results for home, visitors Number of obs = 59
                        
                        ------------------------------------------------------------------------------
                        | Robust
                        | Coefficient std. err. z P>|z| [95% conf. interval]
                        -------------+----------------------------------------------------------------
                        home_3 |
                        _cons | 0 (omitted)
                        -------------+----------------------------------------------------------------
                        home_4 |
                        _cons | -1.098612 .3882041 -2.83 0.005 -1.859478 -.3377462
                        -------------+----------------------------------------------------------------
                        home_5 |
                        _cons | -2.60269 .7391186 -3.52 0.000 -4.051336 -1.154044
                        -------------+----------------------------------------------------------------
                        visitors_3 |
                        _cons | -1.098612 .6723892 -1.63 0.102 -2.416471 .2192464
                        -------------+----------------------------------------------------------------
                        visitors_4 |
                        _cons | 0 (omitted)
                        -------------+----------------------------------------------------------------
                        visitors_5 |
                        _cons | 0 .475451 0.00 1.000 -.9318668 .9318668
                        ------------------------------------------------------------------------------
                        
                        . test [home_3]_cons=[visitors_3]_cons
                        
                        ( 1) [home_3]o._cons - [visitors_3]_cons = 0
                        
                        chi2( 1) = 2.67
                        Prob > chi2 = 0.1023
                        
                        .
                        Hi Carlo, you are a genius! I tried it and almost succeed. The problem is that my reference categorie is automatically set as the second (because it has the most observations I suppose). So when I do the "suest" command, this is offcourse the omitted variable. So I cannot do the "test" command to see whether answer 2 of version 1 is different from answer 2 of version 0 ... because answer 2 is the standard. How do I change this? In your example this is respectively categorie 4 and categorie 3 as baseoutcome. And in case I cannot switch this, how can I then interpret the results? So how can is see that their is a real difference in answer 2 between version 0 and 1?

                        Comment


                        • #13
                          Nick:
                          thousand miles far from being a genius, I'm simply interested in this kind of stuff.
                          That said, take a look at -baseoutcome()- option available from -mlogit-.
                          Last edited by Carlo Lazzaro; 30 Dec 2022, 05:27.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment

                          Working...
                          X