Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using commuting-zone fixed effects after concatenating PUMAs with commuting-zones with probabilistic weights

    Hi all,

    I have an econometric question.

    Some of you already know the American Community Survey data. But for those of you who don't know, it's a household survey that doesn't identify a household's county due to privacy issues. The smallest geographical unit it identifies is the Public Use Microdata Area (PUMA), which is either a county or a group of counties based on the population.

    For my research, I am trying to run regression in which my outcome variable is household level (PUMA identified), and my treatment variable is county level.

    Since it's cumbersome and noisy to aggregate my treatment variable to the PUMA level, following Autor and Dorn (American Economic Review, 2013), I have decided to link the outcome variable and treatment variable using the commuting zone (CZ) geographic level. CZs are usually larger than PUMAs. Some CZs consist of a single county, whereas most consist of multiple counties. I can easily aggregate the outcome variable from county to CZ. I can also identify the CZ of the household in the ACS data.

    The issue is that some PUMAs get split into multiple CZs. For such cases, Autor and Dorn (2013) suggests using the probability that a household (PUMA identified) lies in a particular CZ using the probabilistic weights based on the population share of those PUMAs in a given CZ. David Dorn has the PUMA-county crosswalk files here (section E): https://www.ddorn.net/data.htm.

    Particularly, he suggests using the "joinby" command on Stata on the ACS dataset using PUMA to match the households with multiple CZs, each CZ with a particular weight. So, a singular household-year combination will get multiple observations, each with a unique CZ. As I stated, each row has a CZ weight based on the population distribution.

    It's easy to weigh my treatment variable based on these weights. But the issue I am having is using CZ fixed effects. Since a singular household can lie in any one of the multiple CZs, how do I deal with this situation?

    I thought of doing one of the three:

    1. Creating a singular row for a household-year combination by collapsing the treatment variable using the given CZ weights. Then, use the CZ fixed effects for that observation using the CZ with the largest weight.

    2. Creating a singular row for a household-year combination by collapsing the treatment variable using the given CZ weights. Then, use the multiple CZ fixed effects for that observation. For instance, if a household's PUMA is in two CZs, then the CZ fixed effects will turn the dummy variables for both of those CZs to 1.

    3. Let the household-year combination have multiple rows of observations and use different CZs for fixed effects per observation. But this will create a lot of noise.

    I know each has its limitations, and I'm not sure if there's a better way to deal with this.

    I know it's long, and I may not have explained it well. But I'd appreciate your input on this, and I'm willing to clarify more if needed. Thank you!
    Last edited by Sam Bennett; 06 Feb 2025, 22:34.

  • #2
    In AD 2013, the data is collapsed to CZ, is it not? That's straightforward. The data is aggregated up so there's no duplication for a lower level of aggregation.

    Comment


    • #3
      Originally posted by George Ford View Post
      In AD 2013, the data is collapsed to CZ, is it not? That's straightforward. The data is aggregated up so there's no duplication for a lower level of aggregation.
      I didn't get you. ACS doesn't have CZ variable.

      Comment


      • #4
        the cross walk gets you that

        Comment


        • #5
          Originally posted by George Ford View Post
          the cross walk gets you that
          My question is not that. Yes, I got the CZs for ACS households. But one PUMA can be in multiple CZs. If I understand correctly, Autor and Dorn (American Economic Review, 2013) deals it with by including both CZs for, as two separate observations, and weighing them by population share of the PUMA in each CZ, which artificially increases my sample size by a lot. So, I'm just asking if that's the right method.

          Comment


          • #6
            collapse (mean) income hhsize numkids [aw=afact], by(CZ)

            There is no duplicate CZ.

            Comment


            • #7
              Your treatment, if county level, will be continuous.

              Comment


              • #8
                Originally posted by George Ford View Post
                collapse (mean) income hhsize numkids [aw=afact], by(CZ)

                There is no duplicate CZ.
                I would still want a household level analysis though, by not collapsing it at the cz level.

                Comment


                • #9
                  Good luck. That's not what AD does, so I wouldn't rely on that approach.

                  Comment


                  • #10
                    Can't use 5-year data?

                    Comment


                    • #11
                      I am trying to see heterogeneous effects by various sociodemographic characteristics as well. So, maybe I can just collapse everything by CZ-year-demographics by using ACS 5 year data. Thank you !

                      Comment


                      • #12
                        With 5 year data, you can use county. Treatment timing will matter when going from 1-to-5 year data.

                        An interesting property of 5 year data is that if you subtract two adjacent samples, you get the change between the last year of the later sample and the first year of the earlier sample (divided by 5) [ Y2016 - Y2015 = (y2016 - y2011)/5]. You could square up the years using the treatment years (change five years before, change five years after).

                        Comment

                        Working...
                        X