Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Issue" with large standard error of intercept

    Hi everyone,

    When we run a regression (e.g. ordinary least squares, or probit) with an intercept, if our estimate of the intercept has a very large standard error, does it say anything bad about the model? Does it have something to do with the skewness of the data? This is a comment from an otherwise quite helpful reviewer, but I have never paid much attention to the intercept; in particular, I've never heard there's any relationship between the intercept's estimate and the skewness of data. Perhaps - just a wild guess - that it's not significant so the intercept should not be included in the model in the first place?

    I'd appreciate any comments or pointers to the literature. Thanks!

  • #2
    Why not look at the data? If you have just one predictor, all the information you need goes on one scatter plot. Even if you don't, the usual diagnostic plots should still help any way.

    Comment


    • #3
      It helps to think about what the intercept means in both linear or nonlinear models. In a linear model, it is the predicted for y -- in the population, the expected value -- when all covariates are set to zero. In many applications (perhaps even the vast majority), zero is not a possible value for a covariate. Even if it is a possible value, it might be very rare. So, the intercept is E(y|x1 = 0, x2 = 0, ..., xk = 0) and this often is an impossible parameter to estimate well (even with a parametric model). Moreover, it is often not all al interesting, as x1 = 0, ..., xk = 0 is not an interesting population. For example, if I model college grade point average as a function of SAT score, high school GPA, family income, parents' education, and so on, I have no interest in predicting college GPA when all of the RHS variables are zero. Moreover, I will have a very hard time doing it -- there's no data there, or even close! It's a kind of multicollinearity for estimating the slopes, but it's easily explainable.

      The same is true for logit, probit, and other nonlinear models.

      By the way, this problem is very similar to having standard errors of a level term blow up when interactions between two variables, one of which cannot take on zero (or rarely does). Technically, the problem is one of multicollinearity. But, again, the problem has been manufactured by a poor parameterization: one cannot (and does not want to) estimate a partial effect at x = 0. Centering the variables solves the problem. Likewise, if you centered all variables about their means or medians before running the regression, the intercept would be meaningful and estimated very precisely. The slope coefficients wouldn't change.

      I talk about the interaction problem in Chapter 6 of my book "Introductory Econometrics: A Modern Approach," 5e.

      JW

      Comment


      • #4

        Thank you both very much for your replies! Centering the variables makes a lot of sense if one is interested in the value of the intercept, although it really isn't all that interesting. I will be sure to check out the "Introductory Econometrics" textbook. Thanks again!

        Comment


        • #5
          I tend to center my variables and look at the constant very often. To me it is the very first indicator after estimating the model of whether there is something wrong with my model, as you typically have a very good idea what that parameter should aproximately be when you center your independent variables. As such it gives me an extra check of whether the dependent variable has the scale I want or a first indication that are there outliers in my data.

          I also find it a convenient "writing trick" to remind the readers what the unit of the dependent variable is within the results section without sounding too "teachy". I find this especially useful when discussing output from logistic regression: starting with one or two sentences discussing the baseline odds is a very natural way of reminding the readers about the difference between probabilities and odds.
          Last edited by Maarten Buis; 20 Aug 2014, 02:21.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thank you! The "writing trick" is a great idea.

            Comment


            • #7
              My position is a little eccentric here. Despite the advantages of regressing e.g. (response - its mean) on (predictors - their means) that doesn't always produce equations that are easy to compare between different studies, as observed means are bound to differ. But there is much to be said for shifting the origin to something more convenient so long as it is fairly central within the observed range. For example, in studies of recent changes in hurricane or other major storm frequency using reported year implies a time origin which is way outside the range of the data, which typically goes back a century or so at most (setting aside the detail that there was no year 0 between BCE and AD). Using (year - 2000) or (year - 1960) as predictor or whatever gives the intercept more interest as the rate in 2000, 1960 or whatever, and standard errors can also be taken more seriously.

              This seems to be something that every experienced data analyst recognises as a standard trick but which is rarely written up in texts. Counterexamples by way of specific references are warmly welcomed.

              Comment


              • #8
                I don't think Nick is eccentric at all on this point. The key thing is to have zero be a meaningful value. Centering around the mean is one way of doing that but, as Nick notes, the mean will usually differ across studies. So, choosing some universally useful zero point is often good. For example, if GPA (Grade Point Average) ranges from 0 to 4, then subtracting 2 from each value makes zero correspond to a C GPA. Or, at least in the US, subtracting 12 from years of education makes 0 correspond to high school graduate.

                Also, I generally wouldn't center the dependent variable. I am not sure what that gains you and it may just make it harder to figure out what your results mean.

                And finally, I wouldn't center things like gender. Instead, keep in mind that a person with a score of 0 on everything might be a female black Catholic with 12 years of education. Or whatever it is you have set your 0 values to equal.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                Stata Version: 17.0 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Thanks a lot everyone - I really learned a lot here.

                  Comment

                  Working...
                  X