Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multilevel beta regression

    Hi,

    I am working on a difference-in-differences analysis where the outcome is a proportion (the proportion of incidents of crime that were reported to police). I have been reading about what is the best distribution for an outcome that is a proportion and saw a lot of discussion of beta regression. However, because the incidents are nested in people, I need to use a multilevel analysis. I am not able to find if betareg is available for multilevel analysis and figured I would check in here.

    Thanks in advance!

  • #2
    You can take a look at Papke and Wooldridge (2008). The method can be implemented using xtgee with a -probit- link function and -unstructured- within-group correlation structure.


    Reference:

    Papke, L.E. and J.M. Wooldridge (2008). Panel Data Methods for Fractional Response Variables with an Application to Test Pass Rates," Journal of Econometrics 145, 121-133. https://www.sciencedirect.com/scienc...0440760800050X

    Comment


    • #3
      Thank you so much, Andrew!

      Comment


      • #4
        Hi again,

        From my poking around it seems that you are the expert in these analyses, Andrew and so I am back with another question!

        It seems that because the corr is unstructured, a time variable has to be used when xtset-ing the data. However, because women (level 1) can report multiple incidents (level 2) per year, I receive an error saying "repeated time values within panel". Do you have any suggestionson how to address this issue?

        Thank you!
        Kristin
        Last edited by Kristin Bevilacqua; 27 Oct 2023, 10:53.

        Comment


        • #5
          Use an independent correlation structure then as it appears that you do not have panel data.

          Code:
          corr(independent)

          Comment


          • #6
            Kristin: I think I can make suggestions if I'm sure about the structure of your data. It sounds like you do have panel data and also repeated outcomes for each woman in each time period. Is that correct?

            With any data structure you can always use a pooled method and cluster the standard errors. If id is the women's identifier, just use

            Code:
            glm y x1 x2 ... xk i.year, fam(bin) link(probit) vce(cluster id)
            Pooled estimation is likely to be inefficient compared with a GEE approach, but it's consistent and provides valid inference. If you think the precision of the estimates is good enough, you can stop here.

            I am unsure about one aspect. You said your outcome is a proportion but it seems like each reported incident is binary (was it reported to the police or not?) Your setting would be exactly the same in Papke and Wooldridge (2008) if you are constructing a single proportion for each year and each woman. It seems that's what you'd want to do.

            JW

            Comment


            • #7
              Hi Jeff, I am so sorry I missed your reply. This is extremely helpful, thank you so much!

              Comment


              • #8
                Hi Jeff,

                Thank you again for your help with this question. I wanted to check back in as I have read through the Papke and Wooldrige article. As you mentioned, I am constructing a single proportion per year but not per woman but by group (Latina versus non-Latins white. So of the total number of intimate partner violence incidents, what proportion per year is reported to police for Latina women and for non-Latina white women.

                The math in the article is a bit beyond my training but it seems they use GEE, rather than glm. Given the similarity between my analysis and that of the exogenous explanatory variable in Papke and Woodridge, do you still believe the glm code you shared to be the most appropriate?

                Thank you again!
                Kristin

                Comment

                Working...
                X