Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • LPM measurement goodness of fit

    Hello,

    I have a question about my LPM model.
    I have performed a cumulative representation of my regression using nestreg. Now I would like to show how the quality of the model changes after each new regressor.

    local y "ib(freq).education ib(freq).region ib(freq).civsta ib(freq).health male female Chronic_illness No_Chronic_illness Smoke not_Smoke age working_hoursPw household_size activePw food "

    regress Overweight `y' ib(freq).incomeq, cluster(idpers)

    Since the R2 is unsuitable, I wonder what possibilities there are to show the change in quality. I can't find a suitable scale for measuring accuracy, for an LPM. are Adjusted R-squared or Pseudo-R suitable? if so, how could I compare them?
    Can you help me?


  • #2
    I really don't follow that R-square is clearly unsuitable but adjusted or some pseudo R-square may be.

    I am not especially fond of the linear probability model but those that live by the sword die by the sword. That is, if you regard plain regress as appropriate then whatever measures of goodness or badness of fit or other figures of merit that come with or after regress seem to be candidates here.

    As you've revealed elsewhere that you're working on a first degree Dissertation, I will say what I would say to students of mine at that level. Don't go overboard for any single measure of model merit. R-square, RMSE, any IC you want to use all have some advantages and some disadvantages.

    As someone who has examined at first degree, Master's and doctorate levels, I'd say that at all those levels I would want to see an array of measures so that I can compare different models for myself as well as expecting some judgments from any candidate on which models are better than others and why.

    I'd expect subject-matter considerations to be just as important as numerical performance in any sense. Which regressions make most sense in the light of the data context and empirical and theoretical literature?

    Comment


    • #3
      Well, I would say that an LPM model isn't suitable (or is at least sub-optimal) to begin with so you are out of luck when it comes to finding suitable GOF measures.

      But, if you disagree about the LPM (and many economists do) then I like Nick's argument "those that live by the sword die by the sword."

      But, why not just do it the right way and estimate logit models? Using nestreg, you'll see how much model fit improves with each variable or set of variables.

      Also, I think logit + margins gives you the benefits of whatever the lpm has to offer.

      Having said that, I will caution that you have to be careful when interpreting nested logit models. Naive comparisons of coefficients across models can be deceiving. See

      https://www.sciencedirect.com/scienc...132?via%3Dihub
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 17.0 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        I think everyone agrees that the LPM is at best an approximation to the truth (unless X consists of exhaustive and mutually exclusive dummy variables). For obtaining average partial effects, it can be very good. The quality of the APEs is not directly tied to goodness-of-fit.

        The R-squared always can be interpreted as measuring the proportion of the variance of Y explained by the best linear approximation to E(Y|X). That doesn't change just because Y is binary. Of course, the usual R-squared never goes down when a new variable is added. Adjusted R-squared (crudely) penalizes adding more regressors -- better than nothing.

        When economists use the LPM they tend not to worry about GOF because the focus is on partial effect. As Richard suggests, you might get a better approximation by using, say, logit.

        Comment


        • #5
          okey thanks for your input. unfortunately i don't have too much time for my analysis. if i would make a logit model and compare it with my results on my LPM, would that work at all with the discretization of income?

          Comment


          • #6
            You can always compare coefficients in a linear model to the average partial effects in a logit.

            Comment


            • #7
              okey perfect thanks for your help. I think conceptually an LPM is appropriate, since the research topic is basically endogenous, more advanced approaches such as a 2SLS IV approach or possibly a lagged dependent model are more suitable to approximate the effect well. Since such models are not required, I relied on a simple LPM and due to my cluster and the use of multiple survey waves this can give a good approximation and additionally a comparison with a logit would also support this.

              Comment

              Working...
              X