Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xthybrid coefficient<-->odds coversion

    Dear members,

    In xthybrid results, is there a way to convert the coefficient (when depvar is binary) to odds ratio.


    Code:
    . xthybrid Positive_disc01    stud_SCSTOBC    Teach_SCSTOBC   Teach_nature_1 Teach_nature_2 Teach_gender_1   c
    > ourse1_com course1_eco course1_eng course1_hin course1_his course1_mat course1_pol   sem_1 sem_2 sem_3 sem_4
    >  sem_5 attendence_percent   , clusterid ( group_teachercasteSCSTOB_paper ) se test  p star
    
    The variable 'Teach_SCSTOBC' does not vary sufficiently within clusters
    and will not be used to create additional regressors.
    [~0% of the total variance in 'Teach_SCSTOBC' is within clusters]
    The variable 'course1_com' does not vary sufficiently within clusters
    and will not be used to create additional regressors.
    [~0% of the total variance in 'course1_com' is within clusters]
    The variable 'sem_1' does not vary sufficiently within clusters
    and will not be used to create additional regressors.
    [~0% of the total variance in 'sem_1' is within clusters]
    The variable 'sem_2' does not vary sufficiently within clusters
    and will not be used to create additional regressors.
    [~0% of the total variance in 'sem_2' is within clusters]
    The variable 'sem_3' does not vary sufficiently within clusters
    and will not be used to create additional regressors.
    [~0% of the total variance in 'sem_3' is within clusters]
    The variable 'sem_4' does not vary sufficiently within clusters
    and will not be used to create additional regressors.
    [~0% of the total variance in 'sem_4' is within clusters]
    The variable 'sem_5' does not vary sufficiently within clusters
    and will not be used to create additional regressors.
    [~0% of the total variance in 'sem_5' is within clusters]
    
    Hybrid model. Family: gaussian. Link: identity.
    
    +--------------------------------------+
    |             Variable |     model     |
    |----------------------+---------------|
    | Positive_disc01      |               |
    |     R__Teach_SCSTOBC |    -0.0047    |
    |       R__course1_com |     0.0093    |
    |             R__sem_1 |    -0.0125    |
    |             R__sem_2 |    -0.0133    |
    |             R__sem_3 |    -0.0122    |
    |             R__sem_4 |    -0.0085    |
    |             R__sem_5 |  (omitted)    |
    |      W__stud_SCSTOBC |    -0.0096**  |
    |    W__Teach_nature_1 |     0.0006    |
    |    W__Teach_nature_2 |  (omitted)    |
    |    W__Teach_gender_1 |     0.0061    |
    |       W__course1_eco |     0.0064    |
    |       W__course1_eng |     0.0048    |
    |       W__course1_hin |    -0.0139    |
    |       W__course1_his |     0.0146    |
    |       W__course1_mat |     0.0155    |
    |       W__course1_pol |  (omitted)    |
    | W__attendence_perc~t |     0.0001    |
    |      B__stud_SCSTOBC |     0.0975*   |
    |    B__Teach_nature_1 |     0.0052    |
    |    B__Teach_nature_2 |  (omitted)    |
    |    B__Teach_gender_1 |     0.0052    |
    |       B__course1_eco |     0.0357*   |
    |       B__course1_eng |     0.0314*   |
    |       B__course1_hin |     0.0170    |
    |       B__course1_his |     0.0454**  |
    |       B__course1_mat |     0.0499*** |
    |       B__course1_pol |  (omitted)    |
    | B__attendence_perc~t |     0.0002    |
    |                _cons |    -0.0477    |
    |----------------------+---------------|
    |   var(_cons[g~SCS~r])|               |
    |                _cons |     0.0004*** |
    |----------------------+---------------|
    | var(e.Positive_di~01)|               |
    |                _cons |     0.0277*** |
    |----------------------+---------------|
    | Statistics           |               |
    |                   ll |  3721.2572    |
    |                 chi2 |    42.3196    |
    |                    p |     0.0119    |
    |                  aic | -7388.5143    |
    |                  bic | -7193.5905    |
    +--------------------------------------+
       Legend: * p<.05; ** p<.01; *** p<.001
    Level 1: 10091 units. Level 2: 150 units.
    
    Tests of the random effects assumption:
      _b[B__stud_SCSTOBC] = _b[W__stud_SCSTOBC]; p-value: 0.0061
      _b[B__Teach_nature_1] = _b[W__Teach_nature_1]; p-value: 0.6677
      _b[B__Teach_nature_2] = _b[W__Teach_nature_2]; p-value:      .
      _b[B__Teach_gender_1] = _b[W__Teach_gender_1]; p-value: 0.9516
      _b[B__course1_eco] = _b[W__course1_eco]; p-value: 0.2339
      _b[B__course1_eng] = _b[W__course1_eng]; p-value: 0.2815
      _b[B__course1_hin] = _b[W__course1_hin]; p-value: 0.1669
      _b[B__course1_his] = _b[W__course1_his]; p-value: 0.2736
      _b[B__course1_mat] = _b[W__course1_mat]; p-value: 0.1529
      _b[B__course1_pol] = _b[W__course1_pol]; p-value:      .
      _b[B__attendence_percent] = _b[W__attendence_percent]; p-value: 0.6808
    regards,
    ajay

  • #2
    Not with the regression you have done. You have used a linear probability model and there is no systematic relationship between regression coefficient and odds ratio in that model.

    If you want odds ratios, then you should be using a logistic regression, whereby exponentiating the coefficient will give you an odds ratio. -xthybrid- can do logistic regressions: you need to spcify the options -link(logit)- and -family(bernoulli)-.

    Comment


    • #3
      Hello Prof Clyde,
      I have done as you suggested. I guess now the xthybrid reports odds ratios, am I right prof. Clyde?
      Code:
      . xthybrid Positive_disc01    stud_SCST stud_OBC Teach_SCST Teach_OBC        Teach_nature_1 Teach_nature_2 Tea
      > ch_gender_1   course1_com course1_eco course1_eng course1_hin course1_his course1_mat course1_pol   sem_1 se
      > m_2 sem_3 sem_4 sem_5 attendence_percent   , clusterid ( group_teacherID_paper ) se test  p star link(logit)
      >  family(bernoulli)
      
      The variable 'course1_com' does not vary sufficiently within clusters
      and will not be used to create additional regressors.
      [~0% of the total variance in 'course1_com' is within clusters]
      The variable 'sem_1' does not vary sufficiently within clusters
      and will not be used to create additional regressors.
      [~0% of the total variance in 'sem_1' is within clusters]
      The variable 'sem_2' does not vary sufficiently within clusters
      and will not be used to create additional regressors.
      [~0% of the total variance in 'sem_2' is within clusters]
      The variable 'sem_3' does not vary sufficiently within clusters
      and will not be used to create additional regressors.
      [~0% of the total variance in 'sem_3' is within clusters]
      The variable 'sem_4' does not vary sufficiently within clusters
      and will not be used to create additional regressors.
      [~0% of the total variance in 'sem_4' is within clusters]
      The variable 'sem_5' does not vary sufficiently within clusters
      and will not be used to create additional regressors.
      [~0% of the total variance in 'sem_5' is within clusters]
      
      Hybrid model. Family: bernoulli. Link: logit.
      
      +--------------------------------------+
      |             Variable |     model     |
      |----------------------+---------------|
      | Positive_disc01      |               |
      |       R__course1_com |    -2.4203*** |
      |             R__sem_1 |    -0.4151    |
      |             R__sem_2 |    -0.1849    |
      |             R__sem_3 |    -0.0392    |
      |             R__sem_4 |     0.7021*   |
      |             R__sem_5 |  (omitted)    |
      |         W__stud_SCST |    -0.2015**  |
      |          W__stud_OBC |    -0.1531*   |
      |        W__Teach_SCST |    -0.0832    |
      |         W__Teach_OBC |     0.0803    |
      |    W__Teach_nature_1 |    -0.0813    |
      |    W__Teach_nature_2 |  (omitted)    |
      |    W__Teach_gender_1 |    -0.3224*   |
      |       W__course1_eco |    -0.1369    |
      |       W__course1_eng |     0.2248    |
      |       W__course1_hin |    -0.3187    |
      |       W__course1_his |    -0.2250    |
      |       W__course1_mat |    -0.3137    |
      |       W__course1_pol |  (omitted)    |
      | W__attendence_perc~t |     0.0116*** |
      |         B__stud_SCST |    -1.1326    |
      |          B__stud_OBC |     0.5806    |
      |        B__Teach_SCST |    -0.2215    |
      |         B__Teach_OBC |     0.0007    |
      |    B__Teach_nature_1 |    -0.2191    |
      |    B__Teach_nature_2 |  (omitted)    |
      |    B__Teach_gender_1 |    -0.4800    |
      |       B__course1_eco |    -1.8609**  |
      |       B__course1_eng |    -2.9818*** |
      |       B__course1_hin |    -0.8474*   |
      |       B__course1_his |    -2.3690*** |
      |       B__course1_mat |    -1.9984**  |
      |       B__course1_pol |  (omitted)    |
      | B__attendence_perc~t |     0.0205    |
      |                _cons |     1.2488    |
      |----------------------+---------------|
      |   var(_cons[g~ID_~r])|               |
      |                _cons |     1.2114*** |
      |----------------------+---------------|
      | Statistics           |               |
      |                   ll | -5298.9768    |
      |                 chi2 |   166.0434    |
      |                    p |     0.0000    |
      |                  aic | 10659.9535    |
      |                  bic | 10882.1763    |
      +--------------------------------------+
         Legend: * p<.05; ** p<.01; *** p<.001
      Level 1: 9590 units. Level 2: 132 units.
      
      Tests of the random effects assumption:
        _b[B__stud_SCST] = _b[W__stud_SCST]; p-value: 0.5615
        _b[B__stud_OBC] = _b[W__stud_OBC]; p-value: 0.7637
        _b[B__Teach_SCST] = _b[W__Teach_SCST]; p-value: 0.7347
        _b[B__Teach_OBC] = _b[W__Teach_OBC]; p-value: 0.8461
        _b[B__Teach_nature_1] = _b[W__Teach_nature_1]; p-value: 0.6621
        _b[B__Teach_nature_2] = _b[W__Teach_nature_2]; p-value:      .
        _b[B__Teach_gender_1] = _b[W__Teach_gender_1]; p-value: 0.6971
        _b[B__course1_eco] = _b[W__course1_eco]; p-value: 0.0135
        _b[B__course1_eng] = _b[W__course1_eng]; p-value: 0.0000
        _b[B__course1_hin] = _b[W__course1_hin]; p-value: 0.4710
        _b[B__course1_his] = _b[W__course1_his]; p-value: 0.0064
        _b[B__course1_mat] = _b[W__course1_mat]; p-value: 0.0503
        _b[B__course1_pol] = _b[W__course1_pol]; p-value:      .
        _b[B__attendence_percent] = _b[W__attendence_percent]; p-value: 0.5210

      Also prof. I have a question about the interpretation. W__stud_SCST | -0.2015** (is significant) whereas B__stud_SCST | -1.1326 (is not significant). How to interpret this duality? (what sense does it make, if it does?)
      regards,
      ajay
      Last edited by ajay pasi; 12 Jan 2023, 11:32.

      Comment


      • #4
        I think, it does not report the odds yet (as minus sign in front of the estimates )!

        Comment


        • #5
          No, these are not odds ratios. They are coefficients. To get odds ratios you have to exponentiate them, i.e., apply the -exp()- function.

          The W___ prefix designates a within-group coefficient, and the B___ prefix designates a between-group coefficient. So the within teacher effect of stud_SCST is -0.2015 and the between teacher effect of stud_SCST is -1.1326. It is not uncommon for within-group and between-group effects to differ. For a clear toy example of what this looks like:
          Code:
          clear
          set obs 5
          gen panel_id = _n
          expand 2
          
          set seed 1234
          by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5)
          by panel_id: gen x = panel_id + _n
          
          xtset panel_id
          
          xtreg y x, fe
          regress y x
          
          //      GRAPH THE DATA TO SHOW WHAT'S HAPPENING
          separate y, by(panel_id)
          
          graph twoway connect y? x || lfit y x

          As to the fact that one is "significant" and the other is "not significant," that means nothing at all. Even for people who believe in the usefulness of the concept of statistical significance (and I am not one of those) the difference between significant and not significant is not, itself, significant.

          One might have a useful discussion about the precision of these effect estimates if the standard errors were included in the output. And you did request them with the -se- option. It seems, however, that -xthybrid- has a bug whereby when you also specify the -star- output, you get only the coefficients and the dreaded significance stars--all the other output is suppressed. Actually, when I use -xthybrid-, I usually use it with the -full- option. That way you get output in the same layout as you normally get from StataCorp's regression commands, which I find more helpful.

          Comment


          • #6
            The regression you asked for--->


            Code:
            . xthybrid Positive_disc01    stud_SCST stud_OBC Teach_SCST Teach_OBC        Teach_nature_1 Teach_nature_2 Tea
            > ch_gender_1   course1_com course1_eco course1_eng course1_hin course1_his course1_mat course1_pol   sem_1 se
            > m_2 sem_3 sem_4 sem_5 attendence_percent   , clusterid ( group_teacherID_paper ) se test    link(logit) fami
            > ly(bernoulli)
            
            The variable 'course1_com' does not vary sufficiently within clusters
            and will not be used to create additional regressors.
            [~0% of the total variance in 'course1_com' is within clusters]
            The variable 'sem_1' does not vary sufficiently within clusters
            and will not be used to create additional regressors.
            [~0% of the total variance in 'sem_1' is within clusters]
            The variable 'sem_2' does not vary sufficiently within clusters
            and will not be used to create additional regressors.
            [~0% of the total variance in 'sem_2' is within clusters]
            The variable 'sem_3' does not vary sufficiently within clusters
            and will not be used to create additional regressors.
            [~0% of the total variance in 'sem_3' is within clusters]
            The variable 'sem_4' does not vary sufficiently within clusters
            and will not be used to create additional regressors.
            [~0% of the total variance in 'sem_4' is within clusters]
            The variable 'sem_5' does not vary sufficiently within clusters
            and will not be used to create additional regressors.
            [~0% of the total variance in 'sem_5' is within clusters]
            
            Hybrid model. Family: bernoulli. Link: logit.
            
            +-----------------------------------+
            |             Variable |   model    |
            |----------------------+------------|
            | Positive_disc01      |            |
            |       R__course1_com |    -2.4203 |
            |                      |     0.6339 |
            |             R__sem_1 |    -0.4151 |
            |                      |     0.4984 |
            |             R__sem_2 |    -0.1849 |
            |                      |     0.4647 |
            |             R__sem_3 |    -0.0392 |
            |                      |     0.3422 |
            |             R__sem_4 |     0.7021 |
            |                      |     0.3005 |
            |             R__sem_5 |  (omitted) |
            |                      |            |
            |         W__stud_SCST |    -0.2015 |
            |                      |     0.0638 |
            |          W__stud_OBC |    -0.1531 |
            |                      |     0.0642 |
            |        W__Teach_SCST |    -0.0832 |
            |                      |     0.1206 |
            |         W__Teach_OBC |     0.0803 |
            |                      |     0.1262 |
            |    W__Teach_nature_1 |    -0.0813 |
            |                      |     0.1063 |
            |    W__Teach_nature_2 |  (omitted) |
            |                      |            |
            |    W__Teach_gender_1 |    -0.3224 |
            |                      |     0.1461 |
            |       W__course1_eco |    -0.1369 |
            |                      |     0.3388 |
            |       W__course1_eng |     0.2248 |
            |                      |     0.3542 |
            |       W__course1_hin |    -0.3187 |
            |                      |     0.5999 |
            |       W__course1_his |    -0.2250 |
            |                      |     0.3871 |
            |       W__course1_mat |    -0.3137 |
            |                      |     0.5212 |
            |       W__course1_pol |  (omitted) |
            |                      |            |
            | W__attendence_perc~t |     0.0116 |
            |                      |     0.0015 |
            |         B__stud_SCST |    -1.1326 |
            |                      |     1.6022 |
            |          B__stud_OBC |     0.5806 |
            |                      |     2.4399 |
            |        B__Teach_SCST |    -0.2215 |
            |                      |     0.3900 |
            |         B__Teach_OBC |     0.0007 |
            |                      |     0.3902 |
            |    B__Teach_nature_1 |    -0.2191 |
            |                      |     0.2969 |
            |    B__Teach_nature_2 |  (omitted) |
            |                      |            |
            |    B__Teach_gender_1 |    -0.4800 |
            |                      |     0.3778 |
            |       B__course1_eco |    -1.8609 |
            |                      |     0.6084 |
            |       B__course1_eng |    -2.9818 |
            |                      |     0.5823 |
            |       B__course1_hin |    -0.8474 |
            |                      |     0.4211 |
            |       B__course1_his |    -2.3690 |
            |                      |     0.6841 |
            |       B__course1_mat |    -1.9984 |
            |                      |     0.6834 |
            |       B__course1_pol |  (omitted) |
            |                      |            |
            | B__attendence_perc~t |     0.0205 |
            |                      |     0.0138 |
            |                _cons |     1.2488 |
            |                      |     1.1457 |
            |----------------------+------------|
            |   var(_cons[g~ID_~r])|            |
            |                _cons |     1.2114 |
            |                      |     0.1854 |
            |----------------------+------------|
            | Statistics           |            |
            |                   ll | -5298.9768 |
            |                 chi2 |   166.0434 |
            |                    p |     0.0000 |
            |                  aic | 10659.9535 |
            |                  bic | 10882.1763 |
            +-----------------------------------+
                                     Legend: b/se
            Level 1: 9590 units. Level 2: 132 units.
            
            Tests of the random effects assumption:
              _b[B__stud_SCST] = _b[W__stud_SCST]; p-value: 0.5615
              _b[B__stud_OBC] = _b[W__stud_OBC]; p-value: 0.7637
              _b[B__Teach_SCST] = _b[W__Teach_SCST]; p-value: 0.7347
              _b[B__Teach_OBC] = _b[W__Teach_OBC]; p-value: 0.8461
              _b[B__Teach_nature_1] = _b[W__Teach_nature_1]; p-value: 0.6621
              _b[B__Teach_nature_2] = _b[W__Teach_nature_2]; p-value:      .
              _b[B__Teach_gender_1] = _b[W__Teach_gender_1]; p-value: 0.6971
              _b[B__course1_eco] = _b[W__course1_eco]; p-value: 0.0135
              _b[B__course1_eng] = _b[W__course1_eng]; p-value: 0.0000
              _b[B__course1_hin] = _b[W__course1_hin]; p-value: 0.4710
              _b[B__course1_his] = _b[W__course1_his]; p-value: 0.0064
              _b[B__course1_mat] = _b[W__course1_mat]; p-value: 0.0503
              _b[B__course1_pol] = _b[W__course1_pol]; p-value:      .
              _b[B__attendence_percent] = _b[W__attendence_percent]; p-value: 0.5210


            Last edited by ajay pasi; 12 Jan 2023, 12:45.

            Comment


            • #7
              And the graph that you asked for, is--->
              clear
              set obs 5
              gen collegerollno = _n
              expand 2
              set seed 1234
              by collegerollno , sort: gen y = 4* collegerollno - _n + 3 + rnormal(0, 0.5)
              by collegerollno : gen x = collegerollno + _n
              xtset collegerollno
              xtreg y x, fe
              regress y x
              separate y, by( collegerollno )
              graph twoway connect y? x || lfit y x
              Click image for larger version

Name:	prof clyde.png
Views:	1
Size:	68.4 KB
ID:	1697016




              regards,
              ajay

              Comment


              • #8
                So:
                Code:
                . display exp(-.2015), exp(-2.015 - 1.96*.0638), exp(-.2015 + 1.96*0.0638)
                .81750358 .1176492 .92639738
                
                . display exp(-1.1326), exp(-1.1326 - 1.96*1.6022), exp(-1.1326 + 1.96*1.6022)
                .32219446 .01394113 7.4462608
                gives us odds ratios and confidence intervals for the within-group_Teacher_ID_paper (top) and between (bottom) effects of stud_STSC on odds of positive_DISC_01.
                Note: When you have the -xthybrid- results in memory, you can more easily arrive at the above results by using the -lincom- command with the -or- option.

                So, for the within effect we have an odds ratio (to two decimal places) of 0.82, 95% CI 0.12, 0.93. For the between effect we have 0.32, 95% CI 0.01, 7.45. The between effect means that, on average, given two randomly selected observations from different group_teacher_ID_papers whose average values of stud_STSC differ by 1, the odds of positive DISC_01 are 0.82 times as high (18% lower) for the one with the higher value of stud_STSC as for the one with the lower value. Notice that the confidence interval here is pretty wide: the data are consistent with an astronomically strong effect (0.12) or with a rather modest one (0.93). So this effect size is measured with poor precision.

                The situation for the between effect is even worse. While, on average, given two randomly selected observations from the same group_teacher_ID_papers whose average values of stud_STSC differ by 1, the odds of positive DISC_01 are 0.32 times as high (68% lower) for the one with the higher value of stud_STSD as for the one with the lower value, this confidence interval is enormous. The data are consistent with an OR anywhere between 0.01 and 7.45--that is, with an astronomically large (and, in almost any real world situation, totally implausible) effect of 99% lower odds , or with an astronomically large (and, again, in almost any real world situation, totally implausible) effect of 645% higher odds. This one is so imprecise that we cannot even assert with confidence in which direction the effect might be. And since in the real world odds ratios of 0.01 or 7.45 almost never happen, the results are completely uninformative for this between effect.

                The code I showed that generated the graph was intended to show you how a within and between effect can be radically different. In this case you can see that within collegerollno, the effect of increasing x is a decrease in y, whereas between collegenrollno's, y increases with increasing x. In your data the difference between the estimated effects is not as radical as this, but I just wanted to make as clear as possible that there is no necessary connection between the value of a within-effect and a between-effect, so you should never be surprised to see such differences when you find them. The existence of such a difference, itself, requires no explanation, though in particular instances it might be interesting, or even important, to understand what drives the difference.

                Comment


                • #9
                  I am still reading and understanding your post...

                  found a slight error (in bold)!
                  display exp(-.2015), exp(-.2015 - 1.96*.0638), exp(-.2015 + 1.96*0.0638)
                  .81750358 .72140975 .92639738

                  Comment


                  • #10
                    Clyde sir, I read your analytical and substantiated view of my results. I should say, it is quite informative and rich. I am keeping all your suggestions stored, to proceed with further analysis. Thanks again.

                    regards,
                    ajay

                    Comment


                    • #11
                      Thanks for picking up that error and posting a correction. Evidently, it changes the interpretation, as the confidence interval is now 0.72 to 0.92, which is comfortably narrow and says that the estimated OR of 0.81 is not just a wild guess but that the data really do narrow it down to something pretty close to that.

                      Comment


                      • #12
                        yes sir.

                        Comment

                        Working...
                        X