Multilevel beta regression

Kristin Bevilacqua

Join Date: Sep 2023

Posts: 14
#1

Multilevel beta regression

24 Oct 2023, 15:22

Hi,

I am working on a difference-in-differences analysis where the outcome is a proportion (the proportion of incidents of crime that were reported to police). I have been reading about what is the best distribution for an outcome that is a proportion and saw a lot of discussion of beta regression. However, because the incidents are nested in people, I need to use a multilevel analysis. I am not able to find if betareg is available for multilevel analysis and figured I would check in here.

Thanks in advance!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#2

25 Oct 2023, 06:40

You can take a look at Papke and Wooldridge (2008). The method can be implemented using xtgee with a -probit- link function and -unstructured- within-group correlation structure.

Reference:

Papke, L.E. and J.M. Wooldridge (2008). Panel Data Methods for Fractional Response Variables with an Application to Test Pass Rates," Journal of Econometrics 145, 121-133. https://www.sciencedirect.com/scienc...0440760800050X
1 like
Comment
Kristin Bevilacqua

Join Date: Sep 2023

Posts: 14
#3

25 Oct 2023, 07:11

Thank you so much, Andrew!
Comment
Kristin Bevilacqua

Join Date: Sep 2023

Posts: 14
#4

27 Oct 2023, 10:49

Hi again,

From my poking around it seems that you are the expert in these analyses, Andrew and so I am back with another question!

It seems that because the corr is unstructured, a time variable has to be used when xtset-ing the data. However, because women (level 1) can report multiple incidents (level 2) per year, I receive an error saying "repeated time values within panel". Do you have any suggestionson how to address this issue?

Thank you!
Kristin

Last edited by Kristin Bevilacqua; 27 Oct 2023, 10:53.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#5

28 Oct 2023, 03:04

Use an independent correlation structure then as it appears that you do not have panel data.

Code:

corr(independent)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#6

28 Oct 2023, 09:32

Kristin: I think I can make suggestions if I'm sure about the structure of your data. It sounds like you do have panel data and also repeated outcomes for each woman in each time period. Is that correct?

With any data structure you can always use a pooled method and cluster the standard errors. If id is the women's identifier, just use

Code:

glm y x1 x2 ... xk i.year, fam(bin) link(probit) vce(cluster id)

Pooled estimation is likely to be inefficient compared with a GEE approach, but it's consistent and provides valid inference. If you think the precision of the estimates is good enough, you can stop here.

I am unsure about one aspect. You said your outcome is a proportion but it seems like each reported incident is binary (was it reported to the police or not?) Your setting would be exactly the same in Papke and Wooldridge (2008) if you are constructing a single proportion for each year and each woman. It seems that's what you'd want to do.

JW
Comment
Kristin Bevilacqua

Join Date: Sep 2023

Posts: 14
#7

04 Dec 2023, 08:51

Hi Jeff, I am so sorry I missed your reply. This is extremely helpful, thank you so much!
Comment
Kristin Bevilacqua

Join Date: Sep 2023

Posts: 14
#8

15 Dec 2023, 12:07

Hi Jeff,

Thank you again for your help with this question. I wanted to check back in as I have read through the Papke and Wooldrige article. As you mentioned, I am constructing a single proportion per year but not per woman but by group (Latina versus non-Latins white. So of the total number of intimate partner violence incidents, what proportion per year is reported to police for Latina women and for non-Latina white women.

The math in the article is a bit beyond my training but it seems they use GEE, rather than glm. Given the similarity between my analysis and that of the exogenous explanatory variable in Papke and Woodridge, do you still believe the glm code you shared to be the most appropriate?

Thank you again!
Kristin
Comment

Announcement

Multilevel beta regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment