Hi all,
I am using the sdid package developed by Daniel PV Damian Clarke to run a synthetic difference-in-differences analysis (as in Arkhangelsky et al., 2021). The tool is fantastic and I would be very grateful for the community's guidance on methods question I had.
For context, my analysis is a balanced panel of subnational regions over roughly two decades. The outcome variable is the annual unemployment rate in each subnational region. A quarter of these subnational regions receive a treatment policy in a staggered fashion.
However, I am concerned that unemployment rate might be an unsuitable outcome variable to use for sdid analysis because it is bounded between 0% and 100%. I believe it may follow a binomial distribution but I am unsure whether this means it should not be used with sdid.
Please may I ask for the community's guidance, if you have a few moments to spare, on the following methods questions I am grappling with:
1. Would unemployment rate (or log unemployment rate) be a statistically defensible outcome variable to use in sdid?
2. My instinct is that it may it would be better to instead conduct a binomial / logistic regression, but how can I do this in the context of a synthetic difference-in-differences analysis?
3. Alternatively, would it be better to run sdid with the log unemployment population count instead, using the log total (economically active) population count as a covariate?
Thank you for your help and apologies if my questions are non-sensical, I am still learning my way around 2WFE/DiD/SC models and their assumptions.
Jason
I am using the sdid package developed by Daniel PV Damian Clarke to run a synthetic difference-in-differences analysis (as in Arkhangelsky et al., 2021). The tool is fantastic and I would be very grateful for the community's guidance on methods question I had.
For context, my analysis is a balanced panel of subnational regions over roughly two decades. The outcome variable is the annual unemployment rate in each subnational region. A quarter of these subnational regions receive a treatment policy in a staggered fashion.
However, I am concerned that unemployment rate might be an unsuitable outcome variable to use for sdid analysis because it is bounded between 0% and 100%. I believe it may follow a binomial distribution but I am unsure whether this means it should not be used with sdid.
Please may I ask for the community's guidance, if you have a few moments to spare, on the following methods questions I am grappling with:
1. Would unemployment rate (or log unemployment rate) be a statistically defensible outcome variable to use in sdid?
2. My instinct is that it may it would be better to instead conduct a binomial / logistic regression, but how can I do this in the context of a synthetic difference-in-differences analysis?
3. Alternatively, would it be better to run sdid with the log unemployment population count instead, using the log total (economically active) population count as a covariate?
Thank you for your help and apologies if my questions are non-sensical, I am still learning my way around 2WFE/DiD/SC models and their assumptions.
Jason
Comment