No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • KINKYREG: new Stata command for instrument-free inference in linear regression models with endogenous regressors

    I just released a brand-new Stata package called kinkyreg, which I developed jointly with Jan Kiviet and which I will present tomorrow at the virtual UK Stata Conference. This command implements the instrument-free inference approach named "kinky least squares" (KLS) estimation which was proposed by Jan Kiviet in a sequence of recent papers. The KLS estimator analytically corrects the bias of the inconsistent OLS estimator when one or more of the regressors are endogenous. It does so by making an assumption about the admissible degree of endogeneity. By constraining the endogeneity correlations within reasonably narrow bounds, set identification of the coefficients is achieved. By considering the union of confidence intervals over a grid of endogeneity correlations, asymptotically conservative inference can be performed.

    The main output of the kinkyreg package is graphical. The main graphs display the coefficient estimates over a grid of postulated endogeneity correlations including their confidence bands, possibly in comparison to the conventional 2SLS estimates if the user specifies instrumental variables. Often, and especially in the presence of weak (or invalid) instruments, KLS confidence intervals can be more informative than 2SLS intervals.

    A main benefit of the KLS approach is that it enables testing of the exclusion restrictions for excluded (instrumental) variables, which is not possible with the 2SLS approach. (Note that 2SLS-based overidentifying restrictions tests still maintain the untested assumption that a subset of instruments is validly excluded.) This is implemented as a kinkyreg postestimation feature (estat exclusion).

    Further postestimation features include tests for linear hypotheses (estat test), Ramsey's RESET test (estat reset), heteroskedasticity tests (estat hettest) and Durbin's alternative test for serial correlation (estat durbinalt).

    The package is available for installation from my personal website. Type the following in Stata's command window:
    net install kinkyreg, from(
    The basic command syntax is similar to ivregress, although the specification of instrumental variables is optional. (The latter are only used to compare the KLS to the 2SLS estimates.) For example,
    . use
    . kinkyreg lw s expr tenure rns smsa _I* (iq = age mrt), range(-0.7 0.7) inference(iq s)
    The range() option specifies the admissible range of correlations of the endogenous regressor iq with the error term. The inference() option produces graphical output with KLS estimates over the specified range and corresponding confidence intervals for the specified regressors iq and s as a function of the endogeneity correlation. The results are compared to the 2SLS estimates with instruments age and mrt. (To see the graphs, just run the above example in Stata.) Exclusion restriction tests for the two instruments can then simply be obtained in a postestimation step:
    . estat exclusion
    Again, the output will be primarily graphical.

    The help files explain the details of the command syntax and the available options:
    help kinkyreg
    help kinkyreg postestimation
    Examples are included in the help file. Detailed background information and a more extensive example can also be found in an accompanying manuscript (see references below). Comments are welcome.


  • #2
    With thanks to Kit Baum, the kinkyreg package is now also available on SSC:
    ssc install kinkyreg
    My presentation slides from this week's virtual UK Stata Conference are accessible as well:


    • #3
      An update for kinkyreg is now available on my personal website:
      net install kinkyreg, from( replace
      In the new version 1.0.2, a few bugs related to factor variables were fixed.


      • #4
        I want to thank ericmelse for rigorously testing the kinkyreg command and providing helpful suggestions. Not entirely surprising, he found another bug in the code. When using the replay syntax to display KLS results for a specific correlation value or when using the estat test postestimation command, the small-sample variance adjustment was ignored. This has now been fixed in version 1.0.3 which is available from my website.
        ado update kinkyreg, update


        • #5
          Hello´╝îin your command kinkyreg the `tw_combine_options┬┤ is invalid. My version of stata is 15.1. What's wrong with this?

          kinkyreg lw s expr tenure rns smsa _I* age mrt c.tenure#c.age (kww), range(-0.75 0.75) small lincom(1: tenure+c.tenure#c.age*18) lincom(2: t
          > enure+c.tenure#c.age*24) lincom(3: tenure+c.tenure#c.age*30) twoway(, ylabel(-0.15(0.05)0.2) 'tw_combine_options') twoway(1, title("age = 18"
          > )) twoway(2, title("age = 24")) twoway(3, title("age = 30"))
          option ' not allowed


          • #6
            In the example in our manuscript, we have defined a local macro variable tw_combine_options that contains a couple of additional graph options:
            local tw_combine_options "xtitle(, size(vlarge)) ytitle(, size(vlarge)) xlabel(, labsize(vlarge)) ylabel(, labsize(vlarge)) legend(off size(vlarge)) nodraw"
            To substitute the content of the local macro variable, you need to enclose it with a left-single quote ` and a right-single quote ':
            It seems that you have used a right-single quote ' on both sides which causes the error. If you do not plan on using the additional graph options we specified in the local macro variable, you can safely remove this local macro from the command line.


            • #7
              Thank you. It's okay.


              • #8
                Dear Sebastian, Many thanks for sharing with us this helpful command. However, I have two questions:
                1. It seems to me that the performance of the estimator hinges on the specification of range().
                  The range() option specifies the admissible range of correlations of the endogenous regressor iq with the error term
                  However, from an empirical point of view, how can I pursuade the readers/referees (a) the range is suitable, and (b) which certain value of correlation is appropriate?
                2. Is the approach applicable to panel data model?
                Ho-Chuan (River) Huang
                Stata 17.0, MP(4)


                • #9
                  1. Specifying the appropriate range is a very application-specific task. This depends on the suspected source of the endogeneity. For example, if you suspect measurement error, this would imply a negative correlation with the error term. You could then choose an upper bound of zero for the correlation range. Choosing the other bound is often not so easy. It requires a judgement whether you can rule out that the endogeneity correlation is large, and then further requires to quantify what "large" means. But even if you choose a relatively wide range in the absence of informative prior information, the estimates can be helpful to obtain an interval of plausible values. This might be used as a plausibility check for other estimation procedures, e.g. instrumental variables estimators. If the estimates from such an alternative estimator fall outside the KLS interval, you can dismiss the alternative estimator as implausible (e.g. due to invalid or weak instruments).
                  2. You can apply kinkyreg to panel data under a pooling assumption (i.e. no unobserved group-specific effects, no autocorrelated errors, etc.).


                  • #10
                    Apparently, my previous post on the February update of kinkyreg somehow disappeared. I guess it fell victim to the recent Texan winter storm. Since the latest version 1.1.1 is available as of today from both my website and SSC (with the usual thanks to Kit Baum), let me mention again the main new features:

                    The kinkyreg package now comes with an additional command, kinkyreg2dta, that allows you to construct a new data set with the coefficient estimates, standard errors, confidence intervals, and desired postestimation results for the whole grid of endogeneity correlations. This allows to vary the endogeneity correlations of multiple endogenous regressors jointly and provides flexibility to create your own graphical or tabular output after the estimation. From a technical point of view, kinkyreg2dta is simply a wrapper command for kinkyreg that loops over all grid points of endogeneity correlations. This is not the most efficient way of implementing it and there are limitations if the grid size is quite large, but since most of the time researchers are just considering a single or maybe two endogenous regressor(s), this wrapper command should do the job in most cases. An updated example can be found at the end of our article that was just accepted for publication in the Stata Journal:
                    In addition, the February update contains the new postestimation command estat rcr. As it turns out, our KLS procedure can be replicated with the "relative correlation restriction" approaches of Krauth (2016) and Oster (2019). The estat rcr command provides the relevant mapping of the endogeneity correlation into the respective sensitivity parameters of these two alternative approaches. A more detailed discussion about the merits of KLS over these two alternative approaches is included in our Stata Journal article.
                    • Krauth, B. (2016). Bounding a linear causal effect using relative correlation restrictions. Journal of Econometric Methods 5(1): 117-141.
                    • Oster, E. (2019). Unobservable selection and coefficient stability: Theory and evidence. Journal of Business & Economic Statistics 37(2): 187-204.

                    Moreover, some flexibility was added to the postestimation graph options and a few minor bugs were fixed.


                    • #11
                      For example, if you suspect measurement error, this would imply a negative correlation with the error term
                      Sebastian Kripfganz could you explain this further? Does measurement error always imply a negative correlation with the error term?


                      • #12
                        I should have been more precise when I made this statement. Implicitly, I assumed that the true effect of the endogenous variable on the dependent variable is positive. When this effect is negative, then the correlation with the error term will be positive.

                        Classical measurement error - the observed variable equals the true but unobserved variable plus random noise - always implies that the correlation of the observed variable with the error term has the opposite sign than the effect of the true variable on the outcome variable, assuming that there are no other sources of endogeneity for this or another variable in the model.

                        Suppose, y = xb + u but x is measured with error by an observed variable x* = x + e, then (by substitution) you would estimate the regression model y = x*b + (u - eb). Given that Cov(x*, e) = Var(e) > 0 by construction and Cov(x*, u) = 0 by assumption, Cov(x*, u - eb) = Cov(x*, e) (-b) with opposite sign to the sign of b.


                        • #13
                          This explanation was extremely helpful. Thank you, Sebastian!