Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Semiparametric coefficients

    Hi all, I am running npregress kernel on my dataset with two variables and obtain an effect estimate for my explanatory variable which is an average of derivatives (alongside with standard errors and p-values) and derivatives of mean function for each observation. Is there a way to obtain the above for the variable that enters nonparametrically in a semiparametric model with more control variables? When i use semipar the output only prints average effect estimates (alongside with standard errors and p-values) for the explanatory variables that enter the model linearly and a scatter plot for the variable that enters nonlinearly. I want the average effect estimate, standard error, p-value and the derivatives for each observation for the variable that enters nonparametrically in the model. I can obtain the information that i am asking when i use the npregress series followed by margins but there stata uses a basis function and not kernels. Thank u so much for all the information from this group all these years.

  • #2
    Hi Fotis,
    Can you provide more information on the model you are trying to estimate?
    seems that you have a particular kind of semiparametric model, but I couldn't understand based on the description you provide.
    F

    Comment


    • #3
      Thank u very much for your answer. So i estimated npregress kernel y x, vce(bootstrap). This gave me an effect estimate for x (alongside with standard errors and p-values due to bootstrap) and it printed in the data the mean function and the derivative of mean function for each observation of x. When i ran semipar y z1 z2 z3, nonpar(x) the output printed effect estimates for z1,z2,z3(with standard errors and p-values) but not for x(the variable that enters the model nonparametrically). Is there a way to print the effect of x (with standard errors and p-values) and the derivatives of mean function for each observation of x in this model? I can obtain the information that i am asking when i use the npregress series y x,asis(z1 z2 z3) followed by margins, dydx(x) generate (a) but there stata uses a basis function and not kernels.
      Fotis

      Comment


      • #4
        I see
        ok so there is no ready made solution for what you want
        you will have to write your own version of it
        It isn’t difficult but will require some additional work.

        Comment


        • #5
          Here is an example
          Code:
          clear
          bcuse hprice3
          semipar lprice ldist larea lland rooms bath age,  nonpar(inst) xtitle(inst) ci kernel(gaussian)
          ** Step 1. Residual
          foreach i in lprice ldist larea lland rooms bath age {
              clonevar org_`i'=`i'
              qui:npregress kernel `i' inst, noderiv kernel(gaussian)
              replace `i'=`i'-_Mean_`i'
          }
          
          reg lprice ldist larea lland rooms bath age, nocons
          
          foreach i in lprice ldist larea lland rooms bath age {
              qui:replace `i'=org_`i'
          }
          predict rlprice_h
          gen lprice_res=lprice-rlprice_h
          npregress kernel lprice_res inst, noderiv kernel(gaussian)
          
          * or
          lpoly lprice_res inst, degree(1) kernel(gaussian) bw(2847.882) ci

          Comment


          • #6
            Many many thanks for this answer Fernando, I was thinking about this topic after a while and I noticed that I never answered back properly. I don't understand why in this step

            reg lprice ldist larea lland rooms bath age, nocons

            you use the residuals of the non parametric estimation of each variable (ldist larea lland rooms bath age) against variable "inst" and not just regress the residual of lprice that comes out of the kernel against the initial variables (ldist larea lland rooms bath age) that will enter the semi-parametric regression linearly and then predict as you do, before going to the last kernel.

            Also in this step

            npregress kernel lprice_res inst, noderiv kernel(gaussian)

            if I don't specify the nonderiv option and print the derivatives, do you think that I could solve this nonparametric with a restriction on the derivatives that I will take for each observation, so that these derivatives would solve a type? To be more specific lets say that i print 4 derivatives a1,a2,a3 and a4. But a1 and a2 belong to a specific group that I can specify based on another column. So i would like a1 and a2 when produced by the nonparametric to be able to solve this type: (-a1x1y1)/(1-a1y1) + (-a2x2y2)/(1-a2y2)=0 where x1, y1 and x2, y2 are values of variables x, y that are in the same row (same observation) with a1 and a2, respectively.
            Last edited by Fotis Delis; 17 May 2023, 09:18.

            Comment


            • #7
              Hi Fotis

              So The first step is a nonparametric version of FWL theorem, or the partialling out approach to multiple fixed effects (or other controls).

              This means you need to obtain the residuals of a model between all dep and indep variables against the "running" variable. (inst), and then estimate the model among those residuals. This is perhaps easier to see if you look at the partialling out explanation of multivariate regression models.

              Not sure I understand your second point.

              Fernando


              Comment


              • #8
                Hi Fernando,

                Clear with the FWL, many thanks. Regarding the second point let me write it better. Once we reach this step

                npregress kernel lprice_res inst, noderiv kernel(gaussian)

                and we choose not to include the noderiv option we print the column of derivatives that I asked for in the first place. So we get a derivative/coefficient for each observation. I noticed that in Stata there is the package cnsreg — Constrained linear regression which allows constraints for the coefficients in OLS. I was wondering whether something similar can be done in the nonparametric environment, so constrain each derivative that you get for each specific observation in your sample to solve an equation. In the example dataset that you used we have 2 years 1978 and 1981. For the sake of simplicity lets say that 1978 has only two observations(so it's a group with 2 observations) and for these two we get a1 and a2 derivatives and we observe two more variables for these observations, for example age and dist. So I was wondering if a1 and a2 when produced by the nonparametric could be restricted to solve this type: (-a1*age1*dist1)/(1-a1*dist1) + (-a2*age2*dist2)/(1-a2*dist2)=0.

                Fotis
                Last edited by Fotis Delis; 18 May 2023, 04:52.

                Comment


                • #9
                  Ok I think i understand your point better
                  THe short answer is no.
                  The whole idea of nonparametric regression is let functional form to be unrestricted. Although i have seen versions where you impose some minor restrictions like nonnegativity.
                  Now, -npregress- does not produce a single coefficient.
                  It reports the average mean and/or average slope between dep and independent variable. In the background, it estimated those numbers for all possible values of the dep variable.

                  In m opinion, you cant combined constrained regression with kernel regression, but may be possible with spline regression, depending on what the model exactly is
                  F

                  Comment


                  • #10
                    Many thanks Fernando,

                    I was thinking something like this: CONSTRAINED NONPARAMETRIC KERNEL REGRESSION: ESTIMATION AND INFERENCE (JEFFREY S. RACINE, CHRISTOPHER F. PARMETER, AND PANG DU),page 42-43 and I am trying to implement it in R.

                    Also a last question, most of the times the average slope/effect between dep and independent variable in semi-parametric is very close to the OLS coefficient and has the same sign. However when you check the different derivatives for all possible values of the dep variable that semiparametric prints in the background, some of them could have a different sign. In the equations that i am working on, OLS coefficient and average effect from all the semiparametric derivatives have negative sign and they are very close, but there are some derivatives (only 5% of all) that give a positive sign which can not be interpreted based on what the OLS regression actually stands for (the theory behind this regression). I get that locally you could have different responses because you do not specify the functional form but is that the reason or am I missing something?

                    Comment


                    • #11
                      That’s pretty much it
                      they way I think about it is flexibility.
                      with standard linear regression you pretty much look at average effects disregard of small groups divergence.
                      with more flexibility you can move from average effects of the whole sample into average effects of sub population.

                      Because of that I never thought reporting average effects in a nonparametric model made much sense.

                      regarding constrained np regression. You are right , I read some of the methods you mentioned, but have not implemented them. Stata’s npregress does not implement it either
                      best wishes

                      Comment


                      • #12
                        Hi Fernando,

                        Do you know if in the code above, especially in the last step

                        npregress kernel lprice_res inst, noderiv kernel(gaussian)

                        I can apply an adaptive nearest-neighbor or a generalized nearest-neighbor bandwidth? I know that the npregress package does not support them but I was wondering whether there is another package.

                        Best regards.
                        Last edited by Fotis Delis; 06 Jun 2023, 02:02.

                        Comment


                        • #13
                          Non that im aware off.
                          But because is the last step, you could probably try to implement it by hand. (I havent read about neither approach so not sure how easy or hard is to do it)
                          F

                          Comment


                          • #14
                            Hi Fernando,

                            In the example code suppose that the following is the main OLS regression


                            reg lprice ldist larea lland rooms bath age inst


                            and for this as we discussed we suspect that inst enters non-parametrically, therefore we use the code that you provided. Assume that b is the coefficient of inst in OLS and when we run the proposed code for the semiparametric modification we get an Effect estimate (average of derivatives) that is close to the OLS coefficient b but provides a derivative for each observation in our sample. Now lets assume that there is a dummy variable which we could interact with inst in OLS that provides very useful information and essentially gives a coefficient b1 for inst for those that have a value of zero in the dummy and (b2=b1+coefficient of the interaction term) for those that have a value of 1 in the dummy. Is there a way to incorporate this info inside the semiparametric steps that we are doing in the example code that we have above? Essentially getting two packages of derivatives, one package of derivatives that will be close to b1 and the other close to b2. Does it make sense in a non parametric model?
                            Last edited by Fotis Delis; 11 Jul 2023, 09:56.

                            Comment


                            • #15
                              Yea it’s possible it not with commands available in Stata

                              Uou could use Robinson estimator but in this case demeaning with respect to the continuous and discrete variables
                              and then analyze the nonparametric component, either using split samples (dummy=0 or 1)

                              or use splines with interactions. This would be easier to implement
                              man’s the derivative can be obtained manually

                              Comment

                              Working...
                              X