Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to test for Poisson distribution and run regression for count data?

    Hi, I have an outcome variable 'count_selectoptionsh' which is the count of high value options chosen by participants in a study with three treatments, giving independent variable 'treatment' that can take values 1(control), 2 and 3. I have also pooled treatments 2 and 3, and generated variable 'pooled' that equals 1 if treatment > 1, but 0 otherwise (if it helps with distribution tests). I basically want to gauge the effect of treatment (2 and 3) on the outcome variable 'count_selectoptionsh'.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float count_selectoptionsh byte treatment float pooled
    1 2 1
    0 1 0
    5 2 1
    0 1 0
    3 1 0
    4 3 1
    2 3 1
    5 2 1
    1 3 1
    1 2 1
    2 1 0
    0 1 0
    1 2 1
    4 3 1
    0 1 0
    4 2 1
    1 1 0
    0 1 0
    4 2 1
    3 1 0
    2 1 0
    0 1 0
    1 2 1
    2 3 1
    0 1 0
    1 2 1
    3 1 0
    2 1 0
    1 2 1
    0 1 0
    1 1 0
    2 1 0
    3 1 0
    1 2 1
    2 3 1
    2 1 0
    2 3 1
    1 1 0
    5 1 0
    3 3 1
    3 2 1
    4 3 1
    7 1 0
    2 1 0
    0 1 0
    2 1 0
    4 2 1
    3 1 0
    4 2 1
    0 3 1
    2 2 1
    1 1 0
    1 3 1
    0 1 0
    1 3 1
    0 2 1
    4 2 1
    3 2 1
    3 3 1
    2 2 1
    2 3 1
    1 3 1
    1 1 0
    5 3 1
    7 2 1
    1 2 1
    2 2 1
    1 3 1
    2 2 1
    2 1 0
    4 2 1
    2 2 1
    2 1 0
    2 2 1
    3 1 0
    0 2 1
    1 3 1
    1 1 0
    1 3 1
    4 2 1
    2 3 1
    1 1 0
    1 1 0
    3 3 1
    1 3 1
    4 2 1
    2 2 1
    3 3 1
    0 3 1
    0 1 0
    1 1 0
    1 3 1
    1 3 1
    2 2 1
    2 3 1
    4 1 0
    7 2 1
    5 3 1
    1 3 1
    2 3 1
    end
    For this kind of count survey data - is testing for poisson then using command:
    Code:
    poisson count_selectoptionsh i.treatment, vce(robust)
    Is this a good idea? If yes, how do I test for Poisson distribution?
    If no, are there better ways to achieve this?

    Thank you so much for your help!
    Last edited by anisha arya; 10 Aug 2024, 14:25.

  • #2
    It is a count. No need to test it. Use poisson with robust errors and you don't need to fuss about overdispersion.

    Comment


    • #3
      Hi,

      Thank you for your reply #2!

      I ran a poisson regression:
      Code:
      poisson count_selectoptionsh pooled, exposure(n)
      where n is the total number of observations for pooled(=0 or 1) divided by total number of observations
      with the following result:

      Code:
      --------------------------------------------------------------------------------------
      count_selectoptionsh | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      ---------------------+----------------------------------------------------------------
                    pooled |  -.4264501   .0721805    -5.91   0.000    -.5679212   -.2849789
                     _cons |   1.492565    .059976    24.89   0.000     1.375014    1.610116
                     ln(n) |          1  (exposure)
      --------------------------------------------------------------------------------------
      Then I ran an OLS such as:
      Code:
      regress count_selectoptionsh pooled
      The result:
      Code:
            Source |       SS           df       MS      Number of obs   =       507
      -------------+----------------------------------   F(1, 505)       =      6.51
             Model |  12.0283991         1  12.0283991   Prob > F        =    0.0110
          Residual |  933.431167       505  1.84837855   R-squared       =    0.0127
      -------------+----------------------------------   Adj R-squared   =    0.0108
             Total |  945.459566       506  1.86849717   Root MSE        =    1.3596
      
      ------------------------------------------------------------------------------
      count_sele~h | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            pooled |   .3227007   .1265003     2.55   0.011     .0741691    .5712324
             _cons |   1.561798   .1019026    15.33   0.000     1.361592    1.762003
      ------------------------------------------------------------------------------


      I am confused by the results since a poisson regression gives a negative significant coefficient and OLS gives a positive significant coefficient. Can you please clarify? Thank you so much for your help, I am a novice at this!
      Last edited by anisha arya; 10 Aug 2024, 15:34.

      Comment


      • #4
        Need robust errors.

        The two models are very different.

        With exposure, you are essentially subtracting ln(pyears) from the dependent variable.

        Also, poisson is akin to ln(Y) as the DV (but poisson can handle the 0s).

        Your dataex does not include the variables necessary to reproduce your estimates. Always include all we'd need to reproduce.

        Code:
        use https://grodri.github.io/datasets/ceb, clear //https://grodri.github.io/glms/stata/c4s1
        gen y = round( mean * n, 1)
        
        summ y
        g ly = ln(y)
        g ln = ln(n)
        
        eststo e0: poisson y i.dur i.res , robust
        eststo e1: poisson y i.dur i.res , exposure(n) robust
        eststo e2: reg y i.dur i.res , robust
        eststo e3: reg ly i.dur i.res, robust
        
        constraint 1 ln=1
        eststo e4: cnsreg ly i.dur i.res ln , c(1) robust
        
        esttab e0 e1 e2 e3 e4



        Last edited by George Ford; 10 Aug 2024, 16:17.

        Comment


        • #5
          Originally posted by George Ford View Post
          Need robust errors.

          The two models are very different.

          With exposure, you are essentially subtracting ln(pyears) from the dependent variable.

          Also, poisson is akin to ln(Y) as the DV (but poisson can handle the 0s).

          Your dataex does not include the variables necessary to reproduce your estimates. Always include all we'd need to reproduce.
          Hi,

          Thank you so much for replying. I am including data with all the variables used in either regs:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float count_selectoptionsh byte treatment float(pooled n)
          1 1 0 .3510848
          2 1 0 .3510848
          0 1 0 .3510848
          4 1 0 .3510848
          5 1 0 .3510848
          2 1 0 .3510848
          7 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          2 1 0 .3510848
          5 1 0 .3510848
          4 1 0 .3510848
          1 1 0 .3510848
          3 1 0 .3510848
          2 1 0 .3510848
          2 1 0 .3510848
          3 1 0 .3510848
          0 1 0 .3510848
          4 1 0 .3510848
          4 1 0 .3510848
          1 1 0 .3510848
          0 1 0 .3510848
          2 1 0 .3510848
          7 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          0 1 0 .3510848
          0 1 0 .3510848
          1 1 0 .3510848
          3 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          2 1 0 .3510848
          2 1 0 .3510848
          3 1 0 .3510848
          1 1 0 .3510848
          2 1 0 .3510848
          3 1 0 .3510848
          2 1 0 .3510848
          2 1 0 .3510848
          0 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          0 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          3 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          0 1 0 .3510848
          0 1 0 .3510848
          3 1 0 .3510848
          0 1 0 .3510848
          2 1 0 .3510848
          1 1 0 .3510848
          0 1 0 .3510848
          0 1 0 .3510848
          0 1 0 .3510848
          2 1 0 .3510848
          0 1 0 .3510848
          0 1 0 .3510848
          1 1 0 .3510848
          0 1 0 .3510848
          0 1 0 .3510848
          0 1 0 .3510848
          2 1 0 .3510848
          2 1 0 .3510848
          4 1 0 .3510848
          0 1 0 .3510848
          0 1 0 .3510848
          2 1 0 .3510848
          1 1 0 .3510848
          4 1 0 .3510848
          3 1 0 .3510848
          1 1 0 .3510848
          0 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          0 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          1 1 0 .3510848
          4 1 0 .3510848
          1 1 0 .3510848
          2 1 0 .3510848
          1 1 0 .3510848
          4 1 0 .3510848
          2 1 0 .3510848
          1 1 0 .3510848
          end
          [/CODE]

          With this, can you please reproduce the results, explaining the right way to estimate if the effect of treatments 2, and 3 (separately as well as pooled) on 'count_selectoptionsh'. Thank you so much!

          Comment


          • #6
            In the dataex, all the treated are 1, all the pooled are 0, and n is a constant.

            You need to slow down and pay attention to what you are doing.

            Comment


            • #7
              Originally posted by George Ford View Post
              In the dataex, all the treated are 1, all the pooled are 0, and n is a constant.

              You need to slow down and pay attention to what you are doing.
              Hi,

              Thanks for point that out the data was sorted by treatment so the dataex example only took one chunk of it. Here is a better representation of my data:
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float count_selectoptionsh byte treatment float(pooled n)
              1 2 1 .6489152
              0 1 0 .3510848
              5 2 1 .6489152
              0 1 0 .3510848
              3 1 0 .3510848
              4 3 1 .6489152
              2 3 1 .6489152
              5 2 1 .6489152
              1 3 1 .6489152
              1 2 1 .6489152
              2 1 0 .3510848
              0 1 0 .3510848
              1 2 1 .6489152
              4 3 1 .6489152
              0 1 0 .3510848
              4 2 1 .6489152
              1 1 0 .3510848
              0 1 0 .3510848
              4 2 1 .6489152
              3 1 0 .3510848
              2 1 0 .3510848
              0 1 0 .3510848
              1 2 1 .6489152
              2 3 1 .6489152
              0 1 0 .3510848
              1 2 1 .6489152
              3 1 0 .3510848
              2 1 0 .3510848
              1 2 1 .6489152
              0 1 0 .3510848
              1 1 0 .3510848
              2 1 0 .3510848
              3 1 0 .3510848
              1 2 1 .6489152
              2 3 1 .6489152
              2 1 0 .3510848
              2 3 1 .6489152
              1 1 0 .3510848
              5 1 0 .3510848
              3 3 1 .6489152
              3 2 1 .6489152
              4 3 1 .6489152
              7 1 0 .3510848
              2 1 0 .3510848
              0 1 0 .3510848
              2 1 0 .3510848
              4 2 1 .6489152
              3 1 0 .3510848
              4 2 1 .6489152
              0 3 1 .6489152
              2 2 1 .6489152
              1 1 0 .3510848
              1 3 1 .6489152
              0 1 0 .3510848
              1 3 1 .6489152
              0 2 1 .6489152
              4 2 1 .6489152
              3 2 1 .6489152
              3 3 1 .6489152
              2 2 1 .6489152
              2 3 1 .6489152
              1 3 1 .6489152
              1 1 0 .3510848
              5 3 1 .6489152
              7 2 1 .6489152
              1 2 1 .6489152
              2 2 1 .6489152
              1 3 1 .6489152
              2 2 1 .6489152
              2 1 0 .3510848
              4 2 1 .6489152
              2 2 1 .6489152
              2 1 0 .3510848
              2 2 1 .6489152
              3 1 0 .3510848
              0 2 1 .6489152
              1 3 1 .6489152
              1 1 0 .3510848
              1 3 1 .6489152
              4 2 1 .6489152
              2 3 1 .6489152
              1 1 0 .3510848
              1 1 0 .3510848
              3 3 1 .6489152
              1 3 1 .6489152
              4 2 1 .6489152
              2 2 1 .6489152
              3 3 1 .6489152
              0 3 1 .6489152
              0 1 0 .3510848
              1 1 0 .3510848
              1 3 1 .6489152
              1 3 1 .6489152
              2 2 1 .6489152
              2 3 1 .6489152
              4 1 0 .3510848
              7 2 1 .6489152
              5 3 1 .6489152
              1 3 1 .6489152
              2 3 1 .6489152
              end
              With this, could you please answer my questions in #5. Thank you for your time!

              Comment


              • #8
                pooled is a dummy for treatment > 1. not needed.

                n, at least in this dataex, is collinear with treatment, so is not a sensible exposure variable. what is n? why are you using it as an exposure? have you studied what exposure option does?

                Code:
                poisson count_selectoptionsh b1.treatment, robust
                
                poisson count_selectoptionsh b1.treatment ln, robust  // note ln is omitted due to collinearity

                Comment

                Working...
                X