Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there a STATA version for R command effect_plot?

    I am looking for a STATA version of the following R command:

    https://www.rdocumentation.org/packa...cs/effect_plot


    Basically the command is plotting the relationship estimated by the regression equation, where that the x-axis is any independent variable, but that the estimated line is from the regression. I want to run this plot after the reghdfe command in STATA. I have attached an example, this appears so simple in R. I tried to import my data into R and tried to run a within estimator via a linear panel model (plm) but I am rather new to R and that is given me nightmares.















    Attached Files

  • #2
    I think Nick Cox's regplot about does the job.

    Comment


    • #3
      The slide shows just some variation on

      Code:
      sysuse auto, clear 
      scatter mpg weight || qfit mpg weight
      but the documentation describes code that sounds more like marginsplot.

      Comment


      • #4
        So here's a simple HDFE model estimated by -reghdfe-:

        Code:
        sysuse auto, clear
        reghdfe price weight length, ab(foreign rep78)
        Results:

        Code:
        (MWFE estimator converged in 3 iterations)
        
        HDFE Linear regression                            Number of obs   =         69
        Absorbing 2 HDFE groups                           F(   2,     61) =      37.99
                                                          Prob > F        =     0.0000
                                                          R-squared       =     0.5611
                                                          Adj R-squared   =     0.5108
                                                          Within R-sq.    =     0.5547
                                                          Root MSE        =  2037.1129
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
              weight |    6.15521   1.041149     5.91   0.000     4.073303    8.237116
              length |  -100.9268   34.91649    -2.89   0.005    -170.7466   -31.10692
               _cons |   6486.754   3890.635     1.67   0.101    -1293.052    14266.56
        ------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
             foreign |         2           0           2     |
               rep78 |         5           1           4     |
        -----------------------------------------------------+
        Note that the coefficient giving the estimated impact of weight on price is 6.

        As Jared Greathouse suggests, you might indeed use -regplot- as a post-estimation command. It produces the following graph.
        Code:
        regplot weight , recast(lfit) name(regplot, replace)
        Click image for larger version

Name:	regplot.png
Views:	1
Size:	257.7 KB
ID:	1661750
        Note that the fitted values (of the dependent variable, -price-) range from about $2,500 (at weight = 2000) to $12,000 (at weight= 5000). From the -regplot- graph, the implied impact of a 1,000 pound increase of weight on price is thus $9,500/3000 or about 3.2. This is much less than the estimated coefficient in the multiple regression of 6. This discrepancy shows that -regplot- 's displayed line is not graphing the estimated partial effect of weight on price, as I understand you want, but instead fitting a line to the bivariate relationship between the fitted value from the regression and weight. The same graph produced by -regplot- could be generated by the code:

        Code:
        predict p_hat
        tw (scatter price weight) (lfit p_hat weight) , name(bivarplot, replace)
        Click image for larger version

Name:	bivarplot.png
Views:	1
Size:	295.0 KB
ID:	1661752

        If instead you want your fitted line to represent the partial regression line showing the estimated impact of weight on price after controlling for the other variable(s) and the fixed effects, Stata offers other tools. One approach is to use the -margins- and the -marginsplot- commands to generate the fitted line estimated at the mean of the other variable(s) (as suggested by Nick Cox ) and then superimpose the scatter of the observed values. Here's the result of that approach:
        Code:
        sysuse auto, clear
        reghdfe price weight length, ab(foreign rep78)
        margins , at(weight=(2000(1000)5000))
        marginsplot, xdim(weight) addplot(scatter price weight, legend(order(2 "Fitted" 3 "Observed")) ) name(mrgsplot, replace)
        Click image for larger version

Name:	mrgsplot.png
Views:	1
Size:	269.9 KB
ID:	1661754

        Note that, by default, -marginsplot- also shows confidence intervals at the selected points.

        As an alternative to overlaying a plot of the observed bivariate relationship between price and weight as I have done in the above graphs, one might want to exploit the results of the Frisch–Waugh–Lovell theorem and graph the "corrected" price variable against the "corrected" weight variable, where "corrected" means the values of these variables from which all the other variables in the regression have been partialed out. Stata's command for producing such a plot is -avplot weight-, but unfortunately Stata's -avplot- does not recognize Sergio Correia 's -reghdfe-. The community contributed commands -reganat- (by @Valerio Filoso) and -reganat2- (by me) do not work after -reghdfe- either.

        But the same result produced by -avplot- can be produced directly like this:

        Code:
        *    "Correct" or "clean" the variable -price- by partialing out the effects of length and the fixed effects.
        reghdfe price length, ab(foreign rep78) resid
            predict p_resid, residual
            //  Then add back the mean of -price- so that it is centered on its original mean.
            sum price
            replace p_resid = p_resid + r(mean)
        
        *    "Correct" or "clean" the variable -weight- by partialing out the effects of length and the fixed effects.
        reghdfe weight length, ab(foreign rep78) resid
            predict w_resid, residual
            //  Then add back the mean of -weight- so that it is centered on its original mean.
            sum weight
            replace w_resid = w_resid + r(mean)
        
        *    Re-estimate the basic model, but this time overlay the scatter plot of the corrected variables.
        reghdfe price weight length, ab(foreign rep78)
        margins , at(weight=(2000(1000)5000))
        marginsplot, xdim(weight) addplot(scatter p_resid w_resid, legend(order(2 "Fitted" 3 "Corrected")) ) name(mrgsplot2, replace)
        Click image for larger version

Name:	mrgsplot2.png
Views:	1
Size:	258.4 KB
ID:	1661756
        Note that, unlike the scatter plot of observed -price- on observed -weight-, the scatter of the corrected value of -price- on the corrected value of -weight- has the same slope as the fitted regression line generated by the -margins- and -marginsplot- commands. The magic of the FWL theorem is that a regression of the corrected price on the corrected weight has exactly the same coefficient as was estimated by -reghdfe- on the full model, but also the same standard error and t-statistic. So the scatter plot of the corrected price on the corrected weight conveys the "true" partial impact of weight on price that is captured by the multiple regression. For this example, here is the regression of the correct -price- on the corrected -weight-::

        Code:
        . reghdfe p_resid w_resid , ab(foreign rep78)
        (MWFE estimator converged in 2 iterations)
        
        HDFE Linear regression                            Number of obs   =         69
        Absorbing 2 HDFE groups                           F(   1,     62) =      35.52
                                                          Prob > F        =     0.0000
                                                          R-squared       =     0.3643
                                                          Adj R-squared   =     0.3027
                                                          Within R-sq.    =     0.3643
                                                          Root MSE        =  2020.6178
        
        ------------------------------------------------------------------------------
             p_resid | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
             w_resid |    6.15521   1.032719     5.96   0.000     4.090834    8.219585
               _cons |  -12420.15   3127.727    -3.97   0.000    -18672.39   -6167.913
        ------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
             foreign |         2           0           2     |
               rep78 |         5           1           4     |
        -----------------------------------------------------+
        Compare the regression coefficient and standard error of -w_resid- in this simple regression to these statistics for -weight- in the original model.
        Last edited by Mead Over; 26 Apr 2022, 20:40.

        Comment


        • #5
          Thanks Jared Greathouse Nick Cox and Mead Over much helpful, I have been trying to do this but my graph is so ugly, perhaps because my data is kind of multimodal

          Here is a little variable description and my code:
          lnpostings: log of job postings
          gsany: any regulations in place (binary) (x1)
          gs_avestd_0to10_2: original independent variable (x2) that captures the exact value of regulation.
          gs2avestd: standardized version of x2, the variable that captures the exact value of regulation.



          Code:
          reghdfe lnpostings gsany gs2avestd ra2avestd $covidregs $state  if (group==1 | group==2 | group==3), a(group state_fips posted_stata) vce(cl stXday)
          regplot gs_avestd_0to10_2,recast(fpfit) name(regplot, replace)

          I am trying to plot the regression line against the original and untransformed distribution of my independent variable. And then perhaps do a version on the transformed independent variable too. Just to see if the shape function of the regression changes if I use transformed variable instead of the original variable.
          Last edited by Umair Ali; 18 May 2022, 17:08.

          Comment


          • #6
            You've got several predictors. regplot can't reduce the fitted hyperplane to a summary in the two-dimensional space that it uses. As explained at greater length in the help it's for regress y x and some other cases in which a two-dimension will work. marginsplot is a better bet for you, I guess.

            On a different note: it seems odd to me to have two versions of a predictor both in the model: aren't they collinear?

            Comment


            • #7
              Originally posted by Nick Cox View Post
              You've got several predictors. regplot can't reduce the fitted hyperplane to a summary in the two-dimensional space that it uses. As explained at greater length in the help it's for regress y x and some other cases in which a two-dimension will work. marginsplot is a better bet for you, I guess.

              On a different note: it seems odd to me to have two versions of a predictor both in the model: aren't they collinear?
              Thanks Nick - No the original variable is not in the regression, only the transformed version is - the three independent variables in the equation are all different, but one of those is just not in its original form.

              Comment


              • #8
                #6 should perhaps be restated. regplot can't obviously produce a useful summary in situations like yours. It does nothing to set variables not shown to constant values.

                Comment

                Working...
                X