Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does Instrumental variable address selection bias?

    Hello Everyone,

    I am using two main explanatory variables in my study, one is continues variable, and one is dummy variable. I take an simple example here:

    If I want to address the relationship between democracy degree and salary in different country. I have :

    Democracy degree= alpha + beta* salary + u

    This is basic OLS model.

    Then I set salary scales from 1 to 10, where 1 is the lowest and 10 is the highest, I will use this as my continues variable; then I set a dummy variable equals to 1 if the salary levels ranging from 5 to 10, otherwise is 0.

    Now I would argue that a country's education level can affect the people's salary level. So I will use a variable called " Education level" to address salary level ( Taking an example here, please forget about the potential relationship between Democracy and Education level).

    Well, I firstly run IV model in stata with salary level( continues variable) with education level, and I find the variable is endogenous; then I use dummy variable instead of continues variable to run IV model and use the same IV ( Education level). But this time I did not find endogenous. Can I conclude that the way of using dummy variable to measure salary level does not related to endogeneity and I will use OLS with the dummy variable measurement?

    Actually my main question is , both my two measures are related to salary level, and one is continues variable and one is dummy variable. Can I use the same IV for both measures?

    Thank you very much for all your help!

    Chen




    Last edited by Chen Huang; 06 Mar 2016, 09:15.

  • #2
    Any one can help?

    Comment


    • #3
      This isn't very clear, but I would definitely recommend against categorizing salary since it creates an ordinal measure (not a continuous measure).

      Comment


      • #4
        Hi Chen,

        In order to know or find evidence that there is no self-selection or selection bias you need to run another regression and obtain non-significant coefficients.

        For example, if I want to study mayors who run for reelection and the reasons why they get reelected.

        In this case, the selection bias is the decision to run for reelection.

        Therefore, I must estimate the variable "running for reelection" (1 if they run and 0 otherwise). The x variables will be the same ones I will use in my model to predict the reelection. I should get coefficients and they must be not be significant.

        In this manner, I find evidence that there is no selection bias or self-selection.

        I hope that helps.

        Maybe, you could tell us more about your hypothesis and your data so that we can provide a better answer.

        Best,

        Jorge

        Comment


        • #5
          Jorge L. Guzman the model wouldn't be identified in this example. There is still a need for an instrument (e.g., adjusts model for deciding to run for reelection but is not included in the equation predicting whether or not they are reelected). However, that wouldn't necessarily be sufficient to make the determination that there is no selection bias. For example, if your instrument was pressure/encouragement from spouse/significant other, we could reasonably argue that their vote would not directly effect the outcome (unless their vote was the sole winning ballot). So it sounds like a decent instrument to adjust for selection, but doesn't adjust for other factors (e.g., competing job offers, etc...) which is also correlated with the errors in both models. Regardless of the p-value on the parameter estimates, I would say a better approach would be to consider whether or not that variable is part of the production function for your selection model. In the example, if pressure from significant others/spouses does not pass the p-value threshold it is still serving to refine the estimates on the other model parameters.


          Originally posted by Chen Huang View Post
          Hello Everyone,

          I am using two main explanatory variables in my study, one is continues variable, and one is dummy variable. I take an simple example here:

          If I want to address the relationship between democracy degree and salary in different country. I have :

          Democracy degree= alpha + beta* salary + u

          This is basic OLS model.
          The bigger issue that I see is that the model does not seem to be specified correctly. Unless I'm missing something the degree of democracy would not be an individual trait but the description of the covariates sounds like they are individual measures. If democracy degree is measured at the same level as salary and educational level, maybe it would help to have the additional context. Beyond that, categorizing the variable is going to place unreasonable assumptions on the data (e.g., would the distance between the first and second levels of salary really be the same as the distance between the ninth and tenth?). It also doesn't seem too likely that educational level would not directly affect the degree of democracy in a country.

          Comment


          • #6
            Thank you, wbuchanan .

            I did not really get Chen. I think he needs to tell us more about this regression, thesis, and information in order to help him.

            Are you sure about the selection bias? I am aware there are many ways of looking at it. In the example I provided there were two steps.

            The main question was....why do mayors get reelected?

            The question of selection bias here is ....why do they decide to run for reelection?

            The explanatory variables of why do they get reelected have to be uncorrelated to the decision of running for reelection. In this way, there is no self-selection. Mayors who did a great job might want to run for reelection and that same great job allows them to get reelected. But what if something else is going on? What if they feel old? What if they feel that there are other factors that prevent them from running for reelection? In that manner, the same explanatory variables used for the main equation would not be significant. That is the intuition.

            I have seen other differences and differences experiments where people test whether the same reasons that determine the success of a social program are the reasons why people join. They do this in order to test for randomness and selection-bias. If the decision to get involved in a social program results in coefficients being not significant, then, there is no self-selection bias.

            That is the intuition.

            How would you correct for self-selection?

            Best,

            Jorge

            Comment


            • #7
              Jorge L. Guzman I agree with you. The point where I think we start to depart from our shared opinion is that there needs to be an "instrument" (e.g., a variable that would be used to model the selection process but would not be included in the modeling of the outcome) included in the model. The example as you first explained it only mentioned using the same variables that are covarying with the endogenous regressor in the outcome equation. So if reelection is dependent on job approval, political capital, campaign finances, and school success (e.g., the local schools) using only those variables to predict the probability that the candidate will seek reelection does not deal with the selection issue. If, however, the selection equation includes another variable which measures how old the candidate feels, the age they feel will not directly affect whether or not they get reelected by any means other than how it affects whether or not they'll seek reelection. So adding the additional variable to the selection equation (e.g., the instrument) provides the requisite identifying information.

              Comment

              Working...
              X