Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Including a dummy for a continuou variable and the same continuous variable in a regression

    Hi all,

    I am running a linear regression (with school fixed effects) of pupil test scores on a continuous variable which represents the % of staff the pupil is taught by that have a certain type of qualification (and various controls).

    60% of the pupils are not taught by any staff with this qualification and so the variable is 0 for them. For the other 40% of pupils, the variable ranges from 0 to 1.

    I am interested in knowing both:

    a) Does being taught by just one teacher with this qualification (versus zero teachers) have an impact on test scores

    b) Given that you are being taught by at least one teacher with this qualification, does increasing the % of teachers you are taught by who have this qualification impact your test score

    Based on this I was planning to create a dummy variable using my continuous variable, which is equal to 0 if the % of staff the pupil is taught by is 0, and 1 if the % is greater than 0.

    I was going then to run my regression, including BOTH this dummy variable and the continuous variable. Would the coefficients on the dummy and the continuous question help me to answer a) and b) respectively?

    Many thanks,
    Martin

  • #2
    Not exactly. The coefficient of your indicator ("dummy") variable would represent the effect of being taught by one or more teachers with the qualification, not being taught by just one such teacher. The coefficient of the continuous variable would answer question b.

    Comment


    • #3
      Martin:
      if pupils are nested within classes, you may want to test whether a -mixed- model fits your data better than an OLS.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thanks for the replies.

        This is actually what I meant to say Clyde, but I phrased it incorrectly. Just to check, does including both the dummy and continuous variable have any effect on the validity of the estimates?

        Comment


        • #5
          Martin:
          a multicollinearity issue may come alive and Stata will omit one out of the two variables.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            What Carlo says in #5 is possible, but unlikely unless the non-zero part of the continuous variable's distribution is highly spiked. The validity of the estimates is not impaired by including both variables. The constant term in the regression will be the expected test score when the student has not been taught by any teacher with the special qualification. When the students has been taught by one or more such teachers, the expected test score will equal the constant term + the coefficient of the indicator variable + the number of such teachers * the coefficient of the continuous variable. The marginal effect of an additional teacher (except when going from 0 to 1 such teacher) will be the coefficient of the continuous variable. And the marginal effect of going from zero teachers to one will be the coefficient of the indicator + the coefficient of the continuous variable.

            If you are looking for a single averaged estimate of the effect of any teachers vs none, then you should run the regression with only the indicator variable and use its coefficient as the estimate.

            Comment

            Working...
            X