No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • npregress with bootstrap standard errors omitted dummy variables

    Dear all,

    I am using the new non-parametric regression command with bootstrap standard errors on a large sample size of cross-sectional data. The dependent variables is a binary variable and independent variables include a few continuous variables and dummy variables.

    npregress kernel infant_death gwec impwater impsan_shared unimpsanratio gwd $control, vce(bootstrap, reps(5) seed(2018) nodrop)

    I used "nodrop" option here with bootstrap to fix the error of "insufficient observations to compute bootstrap standard errors".

    Here is the output. All the dummy variables are omitted. Does anybody have any idea what goes wrong here?
    Click image for larger version

Name:	npr.png
Views:	1
Size:	54.6 KB
ID:	1449014

    Any help is greatly appreciated.

    Warm regards,

  • #2
    Hello Doris,

    Although you a large number of observations, the E(kernel obs) statistic is telling you, on average, how many observations you have for each regression ( heuristically, npregress is doing one regression for each observation in your sample). The number for E(kernel obs) suggests that to obtain the mean function you are computing 20,503 regressions using, on average, 82 observations. The second consideration is that by using the nodrop option of the bootstrap you are not solving a problem but ignoring a potential issue.

    It seems to me that to get a more reliable estimate you may want to reduce the number of regressors. From your results, I see a couple of alternatives. For example you do not need to create a variable for age squared (I assume this is what age2 is). If indeed the square of age, or any other arbitrary function of age, is in the model the nonparametric estimate will incorporate it. Also, you may want to introduce the dummy variables as categorical variables using factor variable notation, i.e, i.married_union, the command treats categorical variables differently than continuous variables. This is certainly part of the problem.

    Feel free to send me a copy of the data, if possible, to and I will take a closer look.


    • #3
      Thanks so much for the suggestions. They are very helpful. I have done two estimations with "npregress": one as above, and the other with year and district dummies right afterwards. And then I tried to use "margins" and "marginsplot" to graph the predicted value against the key dependent variable "GWEC". But I got an error message:

      margins, at(gwec=(0(2)38)) reps(5) seed(2018)
      (running margins on estimation sample)

      data have changed since estimation
      margins after npregress will not work if your covariates or your dependent variable have changed since estimation.
      an error occurred when bootstrap executed margins4npregress

      Thanks so much again.
      Warm regards,


      • #4
        Hi Doris,

        The error message indicates that one or more of your predictor or outcome variables have changed between your last call to npregress and your call to margins. This is not allowed with npregress. If you think something else might be going on, please feel free to send us your data set and do file to



        • #5
          I had the same problem. The problem can be easily solved by declaring the dummy var to be a factor var that is if sex is a dummy use instead.