No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • KINKYREG: new Stata command for instrument-free inference in linear regression models with endogenous regressors

    I just released a brand-new Stata package called kinkyreg, which I developed jointly with Jan Kiviet and which I will present tomorrow at the virtual UK Stata Conference. This command implements the instrument-free inference approach named "kinky least squares" (KLS) estimation which was proposed by Jan Kiviet in a sequence of recent papers. The KLS estimator analytically corrects the bias of the inconsistent OLS estimator when one or more of the regressors are endogenous. It does so by making an assumption about the admissible degree of endogeneity. By constraining the endogeneity correlations within reasonably narrow bounds, set identification of the coefficients is achieved. By considering the union of confidence intervals over a grid of endogeneity correlations, asymptotically conservative inference can be performed.

    The main output of the kinkyreg package is graphical. The main graphs display the coefficient estimates over a grid of postulated endogeneity correlations including their confidence bands, possibly in comparison to the conventional 2SLS estimates if the user specifies instrumental variables. Often, and especially in the presence of weak (or invalid) instruments, KLS confidence intervals can be more informative than 2SLS intervals.

    A main benefit of the KLS approach is that it enables testing of the exclusion restrictions for excluded (instrumental) variables, which is not possible with the 2SLS approach. (Note that 2SLS-based overidentifying restrictions tests still maintain the untested assumption that a subset of instruments is validly excluded.) This is implemented as a kinkyreg postestimation feature (estat exclusion).

    Further postestimation features include tests for linear hypotheses (estat test), Ramsey's RESET test (estat reset), heteroskedasticity tests (estat hettest) and Durbin's alternative test for serial correlation (estat durbinalt).

    The package is available for installation from my personal website. Type the following in Stata's command window:
    net install kinkyreg, from(
    The basic command syntax is similar to ivregress, although the specification of instrumental variables is optional. (The latter are only used to compare the KLS to the 2SLS estimates.) For example,
    . use
    . kinkyreg lw s expr tenure rns smsa _I* (iq = age mrt), range(-0.7 0.7) inference(iq s)
    The range() option specifies the admissible range of correlations of the endogenous regressor iq with the error term. The inference() option produces graphical output with KLS estimates over the specified range and corresponding confidence intervals for the specified regressors iq and s as a function of the endogeneity correlation. The results are compared to the 2SLS estimates with instruments age and mrt. (To see the graphs, just run the above example in Stata.) Exclusion restriction tests for the two instruments can then simply be obtained in a postestimation step:
    . estat exclusion
    Again, the output will be primarily graphical.

    The help files explain the details of the command syntax and the available options:
    help kinkyreg
    help kinkyreg postestimation
    Examples are included in the help file. Detailed background information and a more extensive example can also be found in an accompanying manuscript (see references below). Comments are welcome.


  • #2
    With thanks to Kit Baum, the kinkyreg package is now also available on SSC:
    ssc install kinkyreg
    My presentation slides from this week's virtual UK Stata Conference are accessible as well:


    • #3
      An update for kinkyreg is now available on my personal website:
      net install kinkyreg, from( replace
      In the new version 1.0.2, a few bugs related to factor variables were fixed.


      • #4
        I want to thank ericmelse for rigorously testing the kinkyreg command and providing helpful suggestions. Not entirely surprising, he found another bug in the code. When using the replay syntax to display KLS results for a specific correlation value or when using the estat test postestimation command, the small-sample variance adjustment was ignored. This has now been fixed in version 1.0.3 which is available from my website.
        ado update kinkyreg, update


        • #5
          Hello´╝îin your command kinkyreg the `tw_combine_options┬┤ is invalid. My version of stata is 15.1. What's wrong with this?

          kinkyreg lw s expr tenure rns smsa _I* age mrt c.tenure#c.age (kww), range(-0.75 0.75) small lincom(1: tenure+c.tenure#c.age*18) lincom(2: t
          > enure+c.tenure#c.age*24) lincom(3: tenure+c.tenure#c.age*30) twoway(, ylabel(-0.15(0.05)0.2) 'tw_combine_options') twoway(1, title("age = 18"
          > )) twoway(2, title("age = 24")) twoway(3, title("age = 30"))
          option ' not allowed


          • #6
            In the example in our manuscript, we have defined a local macro variable tw_combine_options that contains a couple of additional graph options:
            local tw_combine_options "xtitle(, size(vlarge)) ytitle(, size(vlarge)) xlabel(, labsize(vlarge)) ylabel(, labsize(vlarge)) legend(off size(vlarge)) nodraw"
            To substitute the content of the local macro variable, you need to enclose it with a left-single quote ` and a right-single quote ':
            It seems that you have used a right-single quote ' on both sides which causes the error. If you do not plan on using the additional graph options we specified in the local macro variable, you can safely remove this local macro from the command line.


            • #7
              Thank you. It's okay.


              • #8
                Dear Sebastian, Many thanks for sharing with us this helpful command. However, I have two questions:
                1. It seems to me that the performance of the estimator hinges on the specification of range().
                  The range() option specifies the admissible range of correlations of the endogenous regressor iq with the error term
                  However, from an empirical point of view, how can I pursuade the readers/referees (a) the range is suitable, and (b) which certain value of correlation is appropriate?
                2. Is the approach applicable to panel data model?
                Ho-Chuan (River) Huang
                Stata 16.0, MP(4)


                • #9
                  1. Specifying the appropriate range is a very application-specific task. This depends on the suspected source of the endogeneity. For example, if you suspect measurement error, this would imply a negative correlation with the error term. You could then choose an upper bound of zero for the correlation range. Choosing the other bound is often not so easy. It requires a judgement whether you can rule out that the endogeneity correlation is large, and then further requires to quantify what "large" means. But even if you choose a relatively wide range in the absence of informative prior information, the estimates can be helpful to obtain an interval of plausible values. This might be used as a plausibility check for other estimation procedures, e.g. instrumental variables estimators. If the estimates from such an alternative estimator fall outside the KLS interval, you can dismiss the alternative estimator as implausible (e.g. due to invalid or weak instruments).
                  2. You can apply kinkyreg to panel data under a pooling assumption (i.e. no unobserved group-specific effects, no autocorrelated errors, etc.).