Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • NYLS 1979 Panel Data set - nonlinear regression

    Hello All,

    I am using the NYLS 1979 data set and I have 12,686 individuals over 26 years. I am looking at the impact of working alternate and night shifts on self-reported health outcomes over time and the survey is administered once per year.

    I am working under the assumption that the incremental effect of working an additional year in alternate or night shift hours, should increase exponentially over time. For example, the effect of the 1st year to the 2nd year should not be the same as the 8th year to the 9th year. I have included my outcome variable squared and my outcome variable cubed in my model to see how these transformations fit my data. I am unsure if this is the best method to address the issues of the non-linear relationship between health outcomes and alternate shifts over time and I am unsure how to interpret my results. My equation is listed below:

    Yit= β0 HealthOutcome+ β1HealthOutcome^2 + β2HealthOutcome^3 + Xit + θt
    Yit – health outcomes for individual I in year t
    Xit - Vector of demographic controls for individual I in year t
    θt – Year fixed effects

    stata code is listed below:
    xtlogit Health_Limitation_ CumulativeNightShiftYears CumulativeNightShiftYears2 CumulativeNightShiftYears3 i.year Health_Insurance_ Age_at_Interview_ Hours_per_Wk_1 Low_Wage_Worker Some_college Nonwhite Male Ever_Smoke_1998 Aerobic_2002 Married_ weight, vce(cluster id) or

    I first ran the code but the results are linear and each additional year of night shift work has the same effect on worker outcomes (contrary to my hypothesis):
    xtlogit Health_Limitation_ CumulativeNightShiftYears i.year Health_Insurance_ Age_at_Interview_ Hours_per_Wk_1 Low_Wage_Worker Some_college Nonwhite Male Ever_Smoke_1998 Aerobic_2002 Married_ weight, vce(cluster id) or

    Any advice or suggested readings would be greatly appreciated.

  • #2
    Well, you don't ask any specific questions. So I'll just give you some reactions I have to your approach.

    1. As you don't show any of the output from the regressions, it is hard for me to know if you have interpreted them correctly. You say that you found no quadratic or cubic effects--but that is very implausible as noise alone usually results on coefficients that are non-zero. Small, perhaps, but zero coefficients are really rare in real life. So I'm guessing what you really meant is that the quadratic and cubic coefficients were not statistically significant. But did you test them jointly? Testing them separately would not be adequate to support your conclusion.

    2. If your underlying belief is that the effect is actually exponential, why are you using a cubic polynomial? Why not calculate exp(CumulativeNightShiftYears) as a new variable and use that instead of CumulativeNightShiftYears and its powers?

    3. Detecting supra-linear effects in logistic regressions can be extremely difficult. Let's consider three possible data generating processes based on a logistic model. In the first, the model is log odds outcome = x (linear), in the second it is log odds outcome = x^3 (cubic), and in the third it is log odds outcome = exp(x) (exponential). Consider what happens for values of x ranging between 1 and 10. Although exp(10) = 22000+ is much bigger than 10^3 = 1000, which, in turn is much bigger than 10, and correspondingly the predicted log odds of outcomes, when we convert this to probability of outcome with the inverse logit transformation, the probabilities are 0.9999546, 1, and 1 (to machine precision). In other words, the predicted probabilities of these three data generating models are nearly identical and it would take a gargantuan data set to enable you to distinguish them. Even at more modest values of x, such as 3, the probabilities are approximately 0.95, 1, and 1, respectively, so finding these supra-linear effects is very difficult. Quadratics might be manageable, but beyond that it's not. Where you do have the ability to distinguish linear from higher order models is when the x-variable's range stays in a small neighborhood of zero. There the inverse-logit transformation actually preserves the non-linearity pretty well. But your variable, cumulative night shift years, is not of that nature.

    4. This is not just a problem specific to the logistic model. You would encounter similar difficulties with probit. What you are up against is that any such model necessarily transforms the entire real line (which is the domain of Xb) into the confines of the [0, 1] interval, so that it necessarily has to smash down hard on Xb as it goes towards infinity. This means that large values of Xb, even when they differ by orders of magnitude, must be mapped into probabilities that are close to each other.

    5. If you had an outcome variable that is continuous and ranged over a very large range of values, you would have a better shot at modeling it this way.

    Hope this helps you think about it.

    Comment


    • #3
      Thank you Clyde for the response! I attached my output. I did not test the quadratic and cubic coefficients jointly and will do some research on the best way to do that. Any suggestions would be greatly appreciated.

      If I use the exp of CumulativeNightShiftYears as I recall I should also calculate the ln of my outcome variable to best interpret my results correct? I attached the stata output using the exp of cumulativenightshiftyears and any further suggestions or suggested readings would also be very helpful.

      Thanks again!

      Matthew
      Attached Files

      Comment


      • #4
        I did not test the quadratic and cubic coefficients jointly and will do some research on the best way to do that. Any suggestions would be greatly appreciated.
        Code:
        test CumulativeNightShiftYears2 CumulativeNightShiftYears3
        If I use the exp of CumulativeNightShiftYears as I recall I should also calculate the ln of my outcome variable to best interpret my results correct?
        No, your outcome variable is a 0/1 dichotomy, and you cannot take the log of 0.

        I attached the stata output using the exp of cumulativenightshiftyears and any further suggestions or suggested readings would also be very helpful.
        Something is wrong in the first model. The output shows that you have only one observation per id--that shouldn't be happening with panel data. You'll need to review the code leading up to that to see where the rest of the data disappeared.

        Comment


        • #5
          Clyde, thank you so much for the code and for explaining how to interpret my results. I will look into the first output to see what I did wrong with my code. I greatly appreciate your insight.

          Matt

          Comment

          Working...
          X