Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about 2sls regression with skewed cross-sectional data

    Dear all,

    I have cross-sectional data (independent variables in period t, dependent variables in period t+1).They are highly skewed.And I want to do 2SLS regression. Thus far I've read many materials about 2SLS,but most of them focus on panel data and ignore the dicussion about skewness issue. So, I want to know how to do 2sls regression with highly skewed cross-sectional data in Stata. Especially, what is the exact command for the test of instrumental variables in the first-stage, and the regression command for the 2sls regression ?

    Thanks in advance,
    David

    crosspost at:
    http://stats.stackexchange.com/quest...sectional-data
    Last edited by David Lu; 06 Jul 2016, 08:30.

  • #2
    Dear David,

    The fact that the data is skewed does not invalidate the usual 2SLS. Having said that, depending on what exactly you are doing, the skewness of the data may make the standard linear specification inappropriate. In that case, you may want to use -ivpoisson- which essentially estimates an exponential model with endogeneity.

    All the best,

    Joao
    Last edited by Joao Santos Silva; 06 Jul 2016, 12:53. Reason: Correcting typo.

    Comment


    • #3
      Originally posted by Joao Santos Silva View Post
      Dear David,

      The fact that the data is skewed does not invalidate the usual 2SLS. Having said that, depending on what exactly you are doing, the skewness of the data may make the standard linear specification inappropriate. In that case, you may want to use -ivpoisson- which essentially estimates an exponential model with endogeneity.

      All the best,

      Joao
      Dear Joao,

      Thank you for your reply. Yes, I agree with you. And currently, I use -ivregress gmm- to estimates the model with endogeneity. The commands I used is following:

      Code:
      ivregress2 gmm dv1 c.iv1##c.iv2##c.iv3  (cv1 cv2 cv3 = ins1 ins2 ins3 ins4),first
      est restore first
      outreg2 using myfile, cttop(first) replace
      est restore second
      estat firststage
      local fstat `r(mineig)'
      estat endogenous
      local p_durbin `r(p_durbin)'
      estat overid
      outreg2 using myfile, cttop(second) word addtex(IV F-stat, `fstat', Durbin pval, `p_durbin')
      I use -pwcorr- to detect potential instrument variables (ins1 ins2 ins3...). If they are insignificantly correlated with dependent variables (dv1), and significantly correlated with potential endogenous variables (iv1 iv2), I would consider them as appropriate instrument variables. And then go for the further regressions and post-estimation. Since there are no relevant examples or cases to fit this case in stata, I don't know if it is ok to do so. Does these proceedure make sense to you?

      By the way, the -ivpoisson gmm- does not work normally, it reports "initial weight matrix not positive definite". So, in that case can I use -ivregress gmm- instead?

      Thanks again,
      David
      Last edited by David Lu; 07 Jul 2016, 07:12.

      Comment


      • #4
        Dear David,

        The procedure you are using to choose instruments is not appropriate because instruments may well be correlated with the dependent variable. You should using reasoning and economic theory to choose your instruments.

        About the error with -ivpoisson- I would try to estimate simply by Poisson regression and use those estimates as starting values for -ivpoisson-

        All the best,

        Joao

        Comment


        • #5
          Originally posted by Joao Santos Silva View Post
          Dear David,

          The procedure you are using to choose instruments is not appropriate because instruments may well be correlated with the dependent variable. You should using reasoning and economic theory to choose your instruments.

          About the error with -ivpoisson- I would try to estimate simply by Poisson regression and use those estimates as starting values for -ivpoisson-

          All the best,

          Joao
          Dear Joao,

          Thank you for your reply and suggestions. As you suggested that it would be better to estimate simply by Poisson regression and use those estimates as starting values for -ivpoisson-, could you explain a bit how it work in stata ?

          Thanks again,
          David

          Comment


          • #6
            Here is an example of how to do it:

            Code:
            clear all
            sysuse auto
            poisson price mpg
            matrix b=e(b)
            ivpoisson gmm price (mpg=weight), from(b)

            Comment


            • #7
              Originally posted by Joao Santos Silva View Post
              Here is an example of how to do it:

              Code:
              clear all
              sysuse auto
              poisson price mpg
              matrix b=e(b)
              ivpoisson gmm price (mpg=weight), from(b)
              Dear Joao,

              Thank you for your explanation. I tried it and it works in the example. However, when I used it in my case, I got the error

              Code:
              initial matrix must have as many columns as parameters in model
              My commands are following:
              Code:
              poisson dv1 iv1t iv2t iv3
              ivpoisson gmm dv1 (iv1t iv2t iv3=cv3 cv4 cv5 cv61 cv71 cv8), from(b)
              I think it happens becasue my instrument variables are more than the RHS variable, but how can I use more than one instrument variable to "replace" the endogenous variable in this case?

              Thank you,
              David
              Last edited by David Lu; 08 Jul 2016, 13:47.

              Comment


              • #8
                I do not think your explanation is correct; did you create the matrix after poisson?
                Last edited by Joao Santos Silva; 08 Jul 2016, 13:53.

                Comment


                • #9
                  Originally posted by Joao Santos Silva View Post
                  Did you create the matrix after poisson?
                  Yes, here are all the commands i used:

                  Code:
                  poisson dv1 iv1t iv2t iv3
                  matrix b=e(b)
                  ivpoisson gmm dv1 (iv1t iv2t iv3=cv3 cv4 cv5), from(b)
                  It works in this case, but it fails when the number of instrument variables are larger than the number of variable on the left hand side. For example,

                  Code:
                  poisson dv1 iv1t iv2t iv3
                  matrix b=e(b)
                  ivpoisson gmm dv1 (iv1t iv2t iv3=cv3 cv4 cv5 cv61 cv71 cv8), from(b)
                  In this case, it will report error code 480. So, how can I estimate the model with -ivpoisson- if I want to use more than one instrument variables?

                  Thanks,
                  David

                  Comment


                  • #10
                    The following code works, so the problem must be elsewhere.

                    Code:
                    clear all
                    sysuse auto
                    poisson price mpg
                    matrix b=e(b)
                    ivpoisson gmm price (mpg=weight turn rep78), from(b)

                    Comment


                    • #11
                      Originally posted by Joao Santos Silva View Post
                      The following code works, so the problem must be elsewhere.

                      Code:
                      clear all
                      sysuse auto
                      poisson price mpg
                      matrix b=e(b)
                      ivpoisson gmm price (mpg=weight turn rep78), from(b)
                      Dear Joao,

                      Yes, you're right, now it works, very strange. By the way ,I read the manual of -ivpoisson-, and it says "varlist1 and varlist_iv may contain factor variables; see fvvarlist.", but when I use the command contains three-way interactions, I got an error code 498 , "iv1t included in both exogenous and endogenous variable lists r(498);"


                      Code:
                      poisson dv1 iv1t iv2t iv3
                      matrix b=e(b)
                      ivpoisson gmm dv1 c.iv1t##c.iv2t##c.iv3 (iv1t iv2t iv3=cv3 cv4 cv5), from(b)
                      So, could you also give me an example on how to do the regression that contains interaction terms of two endogenous variables?

                      Thanks,
                      David

                      Comment


                      • #12
                        As the error message tells you, you are including the same variables in both the exogenous and in the endogenous variable lists. You need to think better about what you want to estimate; maybe it is better if you do not use factor notation.

                        Comment


                        • #13
                          Originally posted by Joao Santos Silva View Post
                          As the error message tells you, you are including the same variables in both the exogenous and in the endogenous variable lists. You need to think better about what you want to estimate; maybe it is better if you do not use factor notation.
                          Dear Joao,

                          I want to estimate the joint effect of iv1 and iv2, and the moderate effect of iv3 on the relationship between iv1 & iv2's joint effect and the dependent variable. All of them are continuous variables. Since iv1 and iv2 may be mutally determined, so there might be an endogenous problem. Previous relevant literature suggest us to use instrument variables. And that's why I want to -ivpoisson-.
                          In that case, how can I do the regression with these interaction terms in -ivpoisson-?

                          Thanks,
                          David

                          Comment


                          • #14
                            David,

                            I cannot help you because I am not familiar with your problem. I suggest you discuss it with a colleague or a supervisor.

                            All the best,

                            Joao

                            Comment


                            • #15
                              Originally posted by Joao Santos Silva View Post
                              David,

                              I cannot help you because I am not familiar with your problem. I suggest you discuss it with a colleague or a supervisor.

                              All the best,

                              Joao
                              Dear Joao,

                              Thank you for your reply. I attemp to reach out to my colleagues but most of them are focus on qualitative research so not the expert in this field and cannot help so much in this statistical issues. In order to clarify the problem more clearly, I find a similar example from previous literature. And its economic specification and analysis is following:

                              "The first-stage regressions allow for the determination of residuals for CG (i.e., CG_r) and RG (i.e., RG_r) that are absent the influence of the other governance mechanism and other characteristics of the exchange. These residuals are used in the second stage. The interaction terms in the second-stage regressions are also calculated using CG_r and RG_r rather than the original governance variables. In addition to the independent variables used in the first-stage models (absent the instrumental variables), the second stage also includes dichotomous control variables indicating the functional type of outsourcing. Separate stage-two regressions are performed for each cultural dimension. The independent variables in the stage-two regression models are sequentially entered in seven blocks to clearly demonstrate how the groups of variables contribute to the explanation of the variance in opportunism. The first block contains the control variables (OD, IT, LOG, FA, FS, LR, CE, CR, and BS) and exchange hazards (TU, SD, and PS). Subsequently, the variables representing the main effect for contractual governance (CG_r), the main effect for relational governance (RG_r), and the interaction of these governance mechanisms (CG_r × RG_r) are incorporated in the second and third blocks, respectively. The fourth block introduces the individualism construct (along with the associated two-way interactions with the governance mechanisms) to the model containing CG and RG and the fifth block includes the three-way interaction with CG and RG. Similarly, the sixth and seventh blocks include uncertainty avoidance and the three-way interaction. Our data has multiple observations from some customer firms. Using Stata 13.0 we specified the regression models to utilize robust standard errors to account explicitly for the intragroup correlation among the multiple observations from the same customer. Stata's robust clustering procedure adjusts the standard errors using the Huber-White method known as the “sandwich estimator” of variance. Finally, the largest variance inflation factor (VIF) across all models was 8.1, below the suggested threshold of 10 (Cohen et al., 2003). Thus, excessive multicollinearity is not a concern."
                              Click image for larger version

Name:	16年07月09日1000_1.png
Views:	1
Size:	35.5 KB
ID:	1348683

                              Click image for larger version

Name:	16年07月09日1002_1.png
Views:	1
Size:	23.8 KB
ID:	1348684

                              Click image for larger version

Name:	16年07月09日1002_2.png
Views:	1
Size:	27.1 KB
ID:	1348685

                              Click image for larger version

Name:	16年07月09日1003_2.png
Views:	1
Size:	21.1 KB
ID:	1348686




                              Ref:
                              Handley, S. M., & Angst, C. M. (2015). The impact of culture on the relationship between governance and opportunism in outsourcing relationships. Strategic Management Journal, 36(9), 1412-1434.
                              http://onlinelibrary.wiley.com/doi/1...omisedMessage=

                              Since it also used stata 13.0 to deal with the statistical part, so I want to know how this exactly can be done by stata in terms of command. Hope this example help to clarify my problem a little bit.

                              Thank you for your patience and attention to this matter.

                              Best,
                              David
                              Last edited by David Lu; 09 Jul 2016, 02:21.

                              Comment

                              Working...
                              X