Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2sls Regression with Categorical Endogenous Variable with Interaction Terms

    Hello everybody,

    I am trying to run an instrumental variables regression where my endogenous variable is a categorical variable (which I created two dummy variables to account for) and with a need for interactions terms.

    The set up is as follows:
    - Primary dependent variable: Y (continuous)
    - Exogenous independent variables: X
    - Endogenous Variable: D (categorical with 3 possible values, but created 2 dummy variables D1 and D2)
    - Instrument: Z
    - Exogenous control variable: C
    - Interaction terms: D*X

    I have tried running it with the following code:

    ivreg2 Y C X (D1 D2 D1#C.X D2#C.X = Z Z#C.X), robust

    However, I am just not sure whether this is the right way to go given that I have two binary endogenous variables with interaction.


    I've also tried running it the following way, but it keeps giving me the error message: "D1_hat: factor variables may not contain noninteger values"

    probit D1 X C Z, vce(robust)
    predict D1_hat

    probit D2 X C Z, vce(robust)
    predict D2_hat

    ivreg2 Y C X (D1 D2 D1#C.X D2#C.X = D1_hat D2_hat D1_hat#C.X D2_hat#C.X)


    I have read other similar postings such as https://www.statalist.org/forums/for...enous-variable but wasn't able to figure it out.

    How should I approach this question using stata?

    Thank you so much in advance for any advice!


  • #2
    Welcome to Stata list. You will increase your chances of useful answer by following the FAQ on asking questions-provide Stata code in code delimiters readable Stata output, and sample data using dataex.

    If just had one dummy variable for the endogenous variable, then it would be relatively straightforward. Two-stage least squares would be consistent and you can also do it in GSEM.

    Are these categories ordered or not? User written cmp might handle this problem. Otherwise you may be forced to GSEM.

    Comment


    • #3
      Hi Phil,

      Thank you for your advice!

      The categories represent types of occupation, and are not ordered. Would it make it simpler?

      Thank you again!

      Comment


      • #4
        Danny Chung: Your general approach is fine (if D1 and D2 are endogenous, and you interact them with exogenous regressors, the resulting interaction terms are endogenous regressors) but your syntax examples do not make sense. Something must have been lost in translation when you tried to shorten your list of variables using the X C Z notation. Perhaps you could share with us the full command lines that you have executed?

        Based on information that you have provided (plus working assumptions that I've made to fill in missing information), there are three comments I would like to share.

        (1) Your first approach seems OK. If Z, C, and X are exogenous, you can use their interaction terms as extra instruments (keeping in mind the usual caveat about weak instruments).

        (2) Mechanically, your second approach resulted in an error message because D1_hat and D2_hat are not categorical variables anymore: they're continuous in (0,1) and should be treated as such. You can prefix D1_hat and D2_hat as c.D1_hat and c.D2_hat, and my guess is that this will solve the mechanical problem. Having said that, don't go for this second approach, unless you have more than 2 variables in Z. Alternatively, let the RHS of your probit models include what you have called Z C X, as well as what you have called Z#C.X.

        (3) If you have a copy of Wooldridge's cross sectional and panel data textbook (2nd ed), you may want to study the control function approach and consider it as an alternative to 2SLS. I'm afraid that I cannot provide the exact page reference, as I do not have a copy of the book with me at the moment.
        Last edited by Hong Il Yoo; 24 Apr 2020, 05:25.

        Comment


        • #5
          Hong Il Yoo

          Hi Hong Il,

          Thank you so much for your advice! I will follow your advice with the approach to 2SLS and also look up the control function approach.

          Thank you again!

          Comment


          • #6
            Hong Il Yoo
            I am working on a variable (Y) which follows compound Poisson distribution. I have an endogenous variable in the model, which is categorical, X. I have an instrumental variable Z.
            The control variables are P, Q and R.
            What would be the correct approach to account for instrumentality, given Y is compound Poisson?
            ​​​​​​​I would be extremely grateful if you kindly offer some suggestions Sir.

            Comment


            • #7
              Deboshmita Brahma I'm afraid that I don't know enough about count data models in Stata to answer this question.

              Comment

              Working...
              X