Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GLM with proportion DV and categorical IV. Which STATA formula should I use?

    I ran an experiment to see if a particular gamification design (2 treatment groups) would perform better in terms of quality than the control. The design was run on a reporting application where citizens can report issues in their environment to the government, and the government can act upon it.

    Quality in my data is measured by the number of successful reports of an individual/total reports made of the same individual. (Successful reports are those acted on by the government).

    The data would look something like this, where each row represents 1 individual:



    The Competition and Inter-Team represent the IV which is categorical, the baseline is no design. My question is which of the following formula/commands in STATA should I use?:

    Approach 1

    glm Succesful_Report Competition Inter-Team, family(binomial Total_Report) link(logit) vce(robust) nolog

    Here the family (binomial Total_Report) takes into account the fact that each individual has a different denominator (total number of reports). The fact that the individuals in the control report less would then be accounted for. Since I run the formula with varying denominators for the DV, Stata does not allow me to run a mfx command.

    Approach 2

    glm Proportion Competition Inter-Team, family(binomial) link(logit) vce(robust) nolog

    In this case the proportion is directly taken as the DV. However, the varying number of reports made by an individual are not accounted for.

    I am unsure which approach to take to test whether a the treatment group (Competition and Inter-Team) outperform the control in regard to the quality. What would be advised?

    Kind regards, Michiel

  • #2
    In my opinion Approach 1 is vastly preferable. Approach 2 ignores the different number of calls made by individuals, and is likely to lead to incorrect results. As for
    Since I run the formula with varying denominators for the DV, Stata does not allow me to run a mfx command.
    -mfx- is by now pretty obsolete. If you are using current or relatively recent Stata, you should be using the official Stata command -margins- instead. It's a fairly complicated command, but it will, I believe, do everything -mfx- did and much more. For an excellent introduction to -margins- see the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. From there, learn more reading the PDF documentation that comes installed with your Stata. (You can start directly with the PDF documentation, but I think Richard Williams' document is clearer as an introduction and provides more typical worked examples.)

    Comment


    • #3
      Dear Clyde,

      Thank you very much for your explanation and for the reference, I highly appreciate it.

      Comment


      • #4
        Cross-posted at https://stats.stackexchange.com/ques...a-should-i-use

        Please note our policy on cross-posting, which is that you should tell us about it.

        Comment


        • #5
          Dear Nick,

          My apologies. Yes, I posted it on Cross Validated as well.

          Comment

          Working...
          X