Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Including squared variables in logistic regression change direction of relationship. Why?

    Hello everyone,

    I have a logistic regression with an extensive set of control variables. My dependent variable is electoral turnout in an election and one of my indepedent variables is a continious variable for age. In the country of interest, age shows a curvliniear relationship to electoral turnout and therefore, I also want to include age squared. My issue is that when I have the logistic regression without the age squared, the relationship between electoral turnout and age shows a positive and not statistically significant relationship. However, when I include age squared as well, the variable age (not squared) is negative and statistically significant, while age squared shows a positive and statistically significant relationship. Normally, the age variable "should be" positive. Is it "normal" that the original variable shows a negative relationship, while the squared variable shows a positive relationship, or vice versa? Or is this because I've done something wrong? I can't see any problems with the age and aqe squared variable.

    The first picture shows with the squared variable excluded and the second picture with the age squared variable included. "ålderkont" is the not squared variable, while "ålderkont2" is the squared variable.
    Click image for larger version

Name:	stataage.png
Views:	1
Size:	48.0 KB
ID:	1565093

    Click image for larger version

Name:	stataagesquared.png
Views:	1
Size:	53.3 KB
ID:	1565094


    Thank you very much in advance.

    Best regards,
    Kajsa

  • #2
    How about you demean the age variable by its mean, and try again? Then the sign of the age will be the sign of the effect evaluated at the mean of age.

    Comment


    • #3
      Also in modern Stata you do not need to form squares beforehand (most of the time, if the command accepts factor variables), you can do it on the fly. So do something like this

      Code:
      summ age
      gen agedemeaned = age - r(mean)
      
      logit y agedemeaned c.agedemeaned#c.agedemeaned

      Comment


      • #4
        If you add a square term than the main effect of age is the effect of age for a new born baby. Obviously, in most countries these are not eligible to vote, but the model does not care about that; it will happily extrapolate. So the -.433 is not that meaningful. However, we can see that age will continue to have a negative effect until someone becomes a bit less than 50. Remember when in school you were given a parabola and had to determine the value of x where the maximum or minimum happened? The general form for the parabola was \(y= a x^2 + b x + c\), and the location of the extremum was \(\frac{b}{-2a}\). So the extremum in your case is \(\frac{-.4333599}{-2*0.0044896} \approx 48.3\) .

        I can think of two explanations.

        First, in some countries the first election you could participate in could be something special, leading to a spike at the age when that cohort could first participate in the elections (that is not necessarily the minimum age, as you need to have passed the minimum age and there has to be an election in order to participate in one), If that is the case, then a parabola is probably not a good functional form. That would be a good candidate the techniques discussed in M.L. Buis (2020) "Stata tip 135: Leaps and bounds", The Stata Journal, 20(1), pp. 244-249. (http://www.maartenbuis.nl/publications/leaps.html).

        Second, this could fit a typical lifecourse: Early in life they still have a lot of time, then they get children and a career (the rush hour of life) and they have less time, and as the kids become more independent and the career has become more established the persons have more time again to participate in elections. This might work for a parabola. However, we might be missing phases this way, e.g. what happens after retirement. By definition, a parabola can only have one extremum. So it still may not be the best parameterization.
        Last edited by Maarten Buis; 24 Jul 2020, 04:58.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Hi again,

          Thank you very much for your wise comments.

          When I demeaned the variable, the direction of the relationship changed and agedemeaned got a positive relationship, and the squared agedemeaned got a negative relationship. How do I best interpret this result? And what are the implications for demeaned variables (if there are any)? I'm sorry for not understanding this completely.

          I took a look at the paper you recommended and that was very helpful. When I used the command "margins, at (ålderkont=(20 30 40 50 60 70 80))" and then "marginsplot, noci plotopts(msymbol(i))", I got the result below. It seems to be the case as you say! That the lowest "drop" is at around 50 years old.
          Click image for larger version

Name:	demeaned age.png
Views:	1
Size:	61.1 KB
ID:	1565869

          Best regards,
          Kajsa

          Comment


          • #6
            You can see the discussion here regarding how you interpret demeaned (also called centered) variables:
            https://www.statalist.org/forums/for...-in-panel-data

            In short, the coefficient on the main effect (and disregarding the coefficient on the squared effect) gives you the effect of the variable evaluated at its mean.

            Note that you do not need to do this if you go for -margins-. If you go for -margins-, margins will tell you all you need to know about the effect.

            Comment


            • #7
              This handout discusses how to interpret main effects once interaction or squared terms are added to the model.

              https://www3.nd.edu/~rwilliam/stats2/l53.pdf

              As Maarten notes, once higher level terms are added the main effect may not be too meaningful. As Joro says, centering can be an aid to interpretation, but isn't necessary if you use margins instead.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Thank you very much for your quick responses!!

                Best regards,
                Kajsa

                Comment


                • #9
                  The probability of voting seems to me extremely high, the lowest chance just under 97%. That suggest to me that you have a serious problem in your data.

                  Can you report the results from tab v7007 ?
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment


                  • #10
                    Good catch by Maarten. Also, in the original post, alderkont2 was computed by the user, not by factor variable notable. If factor variable notation is still not being used, then margins will get totally screwed up because it will not know that changes in alderkont must also produce changes in alderkont2.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment

                    Working...
                    X