Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata dropping two dummy variables to avoid collinearity?

    Hi there,

    I'm trying to run a pooled cross-sectional regression model by combining yearly datasets from the UK LFS. I have eight datasets, one for each year between 2015 and 2022 inclusive. I'm having some trouble however with getting dummy variables for each year, so that I can control for differences associated with each time period.

    As I'm appending each of the datasets, I thought I could use the 'generate' option on the append command to create a variable, 'year', to distinguish between each year. I would then run the regression with i.year as a dependent variable, and introduce my other variables along with it.

    However, when I try this, Stata drops two categories - 2022 and 2021. My understanding until now was that to avoid the dummy variable trap, one had to include n-1 dummy variables, where n is the number of categories. However, in this case Stata seems to be dropping two categories - this is a problem because the coefficients on the year dummies are important to my analysis. I've included my code below:

    Code:
    use 2015.dta
    append using 2016 2017 2018 2019 2020 2021 2022, generate (year)
    probit job_found i.year if INCAC051==5

  • #2
    What effect does the if qualifier have? Does it, say, cause all the 2021 cases to drop out? Try running it without the if qualifier.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Apologies I should have explained this better in the original post: the ultimate goal of the analysis is to see if the effect on job finding probability of focusing on informal methods of job searching changes depending on the year - specifically I want to test whether informal job searching methods were less effective during covid-19 lockdowns. The qualifier is ensuring that all observations were unemployed in the first period, and doesn't change the problem of it dropping two categories.

      However, if I add the noconstant option to the regression and specify year as ibn.year, it only drops one category, but then I've no constant. My main result will be from a regression which interacts an 'informal search' dummy with a 'lockdown' dummy (the 2020 and 2021 periods), so this isn't of major importance - but I would love to understand what's going on here!

      Code:
      probit job_found ibn.year, noconstant
      Thanks

      Comment

      Working...
      X