Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting the Sample Based on OLS Coefficients

    Hi,

    I am looking for a way to divide my sample into two groups: one group in which the OLS coefficient is positive and in the other is negative. What is the easiest way to do this in STATA?


    Thanks,
    Hossein

  • #2
    It's not entirely clear what you mean. For a given sample, an OLS coefficient on a regressor has only one sign (and only one value), interaction terms notwithstanding. It might be impossible to find some subset of your sample that would lead to a coefficient having different signs for each of the two subsets. Alternatively, there might be several different subsets that would lead to different coefficient signs. If you mean the predicted value, that is doable, insofar as you can run an OLS model, obtain fitted values, and then observe the signs of the fitted values.

    Comment


    • #3
      Thanks Jimmy! Let me elaborate the problem further:

      I am running OLS regression of household's pollution on their income level and other household's characteristics (with repeated cross-section data). The overall coefficient of income is positive, but when I run it in the two largest income brackets, I get different results. For the first income bracket (low-income people which are around 11% of my data) the coefficient is negative and significant and for the highest income bracket (which is again around 10% of the sample) it is positive and significant.

      I want to claim that the unobserved heterogeneity can not be explained just by income level i.e. the income elasticity of pollution might be negative for some people in the high-income brackets and might be positive for some people in the low-income brackets too. So I am wondering if there is a way to get a sub-sample for which the coefficient is negative. Then if these households are everywhere along the income distribution, it shows that the source of unobserved heterogeneity is beyond just income levels.

      Hossein
      Last edited by Hossein Hosseini; 31 Mar 2017, 12:40.

      Comment


      • #4
        Are the repeated cross sections of the same households over time, or is this just cross-sectional data for several different years? And do you have individual household income levels, as well as assignments to certain income quantiles? Different methods would be available to you depending on these details.

        In any case, at least for exploratory purposes, a somewhat simple approach might be to include dummy variables for assignment to each income group; then, you could interact the observed income with those dummy variables, which would give you different (numerically, not necessarily statistically) coefficients on income at various income quantile levels. You might then want to test whether pooled OLS or a random effects model would be more appropriate in this case.

        I should add that this gets a little tricky if households jump income groups in your sample.
        Last edited by Jimmy Squibb; 31 Mar 2017, 12:52.

        Comment


        • #5
          The data is U.S. CEX (Consumer Expenditure Survey). It is just cross-sectional and there is no repeated household and I just have income levels not quantiles.

          I did the interaction model too. If I just include the income level and the interaction of the income with income bracket dummies, for lower-income groups the interactions terms are negative and for higher income groups it becomes positive and all significant:
          Click image for larger version

Name:	Elasticity.png
Views:	1
Size:	9.8 KB
ID:	1381395






          The same if I just include the income level and the dummies for income brackets. However, if I include income levels, income bracket dummies and their interactions, just the coefficient for the income level is positive and significant and all rest are insignificant. So there are either slope differences or intercept differences but not both of these. In all the above models I am including households other covariates too: education, age, race, marital status, year and state dummies. etc.

          But I am not sure how these results can help me to verify the claim that just income can't explain the unobserved heterogeneity.
          Last edited by Hossein Hosseini; 31 Mar 2017, 14:38.

          Comment


          • #6
            When you say that you want to claim that income can't explain the unobserved heterogeneity, it sounds like you're hoping that your model using income is a bad fit, or (if you were in a clean panel setting, perhaps, you would want to find that unit-specific intercepts are jointly significant, i.e., that they explain something not explained by income). Is that what you're trying to show?

            Comment

            Working...
            X