Is there a STATA version for R command effect_plot?

Umair Ali

Join Date: Mar 2018

Posts: 17
#1

Is there a STATA version for R command effect_plot?

26 Apr 2022, 15:31

I am looking for a STATA version of the following R command:

https://www.rdocumentation.org/packa...cs/effect_plot

Basically the command is plotting the relationship estimated by the regression equation, where that the x-axis is any independent variable, but that the estimated line is from the regression. I want to run this plot after the reghdfe command in STATA. I have attached an example, this appears so simple in R. I tried to import my data into R and tried to run a within estimator via a linear panel model (plm) but I am rather new to R and that is given me nightmares.

Attached Files

PAF573-slides-all-Spring-2022.pdf (101.3 KB, 1 view)
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

26 Apr 2022, 16:14

I think Nick Cox's regplot about does the job.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#3

26 Apr 2022, 18:26

The slide shows just some variation on

Code:

sysuse auto, clear scatter mpg weight || qfit mpg weight

but the documentation describes code that sounds more like marginsplot.
Comment

Mead Over

Join Date: Sep 2014
Posts: 112

26 Apr 2022, 20:34

So here's a simple HDFE model estimated by -reghdfe-:

Code:

sysuse auto, clear
reghdfe price weight length, ab(foreign rep78)

Results:

Code:

(MWFE estimator converged in 3 iterations)

HDFE Linear regression                            Number of obs   =         69
Absorbing 2 HDFE groups                           F(   2,     61) =      37.99
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.5611
                                                  Adj R-squared   =     0.5108
                                                  Within R-sq.    =     0.5547
                                                  Root MSE        =  2037.1129

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      weight |    6.15521   1.041149     5.91   0.000     4.073303    8.237116
      length |  -100.9268   34.91649    -2.89   0.005    -170.7466   -31.10692
       _cons |   6486.754   3890.635     1.67   0.101    -1293.052    14266.56
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
     foreign |         2           0           2     |
       rep78 |         5           1           4     |
-----------------------------------------------------+

Note that the coefficient giving the estimated impact of weight on price is 6.

As Jared Greathouse suggests, you might indeed use -regplot- as a post-estimation command. It produces the following graph.

Code:

regplot weight , recast(lfit) name(regplot, replace)

Click image for larger version

Name: regplot.png
Views: 1
Size: 257.7 KB
ID: 1661750

Note that the fitted values (of the dependent variable, -price-) range from about $2,500 (at weight = 2000) to $12,000 (at weight= 5000). From the -regplot- graph, the implied impact of a 1,000 pound increase of weight on price is thus $9,500/3000 or about 3.2. This is much less than the estimated coefficient in the multiple regression of 6. This discrepancy shows that -regplot- 's displayed line is not graphing the estimated partial effect of weight on price, as I understand you want, but instead fitting a line to the bivariate relationship between the fitted value from the regression and weight. The same graph produced by -regplot- could be generated by the code:

Code:

predict p_hat
tw (scatter price weight) (lfit p_hat weight) , name(bivarplot, replace)

Click image for larger version

Name: bivarplot.png
Views: 1
Size: 295.0 KB
ID: 1661752

If instead you want your fitted line to represent the partial regression line showing the estimated impact of weight on price after controlling for the other variable(s) and the fixed effects, Stata offers other tools. One approach is to use the -margins- and the -marginsplot- commands to generate the fitted line estimated at the mean of the other variable(s) (as suggested by Nick Cox ) and then superimpose the scatter of the observed values. Here's the result of that approach:

Code:

sysuse auto, clear
reghdfe price weight length, ab(foreign rep78)
margins , at(weight=(2000(1000)5000))
marginsplot, xdim(weight) addplot(scatter price weight, legend(order(2 "Fitted" 3 "Observed")) ) name(mrgsplot, replace)

Click image for larger version

Name: mrgsplot.png
Views: 1
Size: 269.9 KB
ID: 1661754

Note that, by default, -marginsplot- also shows confidence intervals at the selected points.

As an alternative to overlaying a plot of the observed bivariate relationship between price and weight as I have done in the above graphs, one might want to exploit the results of the Frisch–Waugh–Lovell theorem and graph the "corrected" price variable against the "corrected" weight variable, where "corrected" means the values of these variables from which all the other variables in the regression have been partialed out. Stata's command for producing such a plot is -avplot weight-, but unfortunately Stata's -avplot- does not recognize Sergio Correia 's -reghdfe-. The community contributed commands -reganat- (by @Valerio Filoso) and -reganat2- (by me) do not work after -reghdfe- either.

But the same result produced by -avplot- can be produced directly like this:

Code:

*    "Correct" or "clean" the variable -price- by partialing out the effects of length and the fixed effects.
reghdfe price length, ab(foreign rep78) resid
    predict p_resid, residual
    //  Then add back the mean of -price- so that it is centered on its original mean.
    sum price
    replace p_resid = p_resid + r(mean)

*    "Correct" or "clean" the variable -weight- by partialing out the effects of length and the fixed effects.
reghdfe weight length, ab(foreign rep78) resid
    predict w_resid, residual
    //  Then add back the mean of -weight- so that it is centered on its original mean.
    sum weight
    replace w_resid = w_resid + r(mean)

*    Re-estimate the basic model, but this time overlay the scatter plot of the corrected variables.
reghdfe price weight length, ab(foreign rep78)
margins , at(weight=(2000(1000)5000))
marginsplot, xdim(weight) addplot(scatter p_resid w_resid, legend(order(2 "Fitted" 3 "Corrected")) ) name(mrgsplot2, replace)

Click image for larger version

Name: mrgsplot2.png
Views: 1
Size: 258.4 KB
ID: 1661756

Note that, unlike the scatter plot of observed -price- on observed -weight-, the scatter of the corrected value of -price- on the corrected value of -weight- has the same slope as the fitted regression line generated by the -margins- and -marginsplot- commands. The magic of the FWL theorem is that a regression of the corrected price on the corrected weight has exactly the same coefficient as was estimated by -reghdfe- on the full model, but also the same standard error and t-statistic. So the scatter plot of the corrected price on the corrected weight conveys the "true" partial impact of weight on price that is captured by the multiple regression. For this example, here is the regression of the correct -price- on the corrected -weight-::

Code:

. reghdfe p_resid w_resid , ab(foreign rep78)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =         69
Absorbing 2 HDFE groups                           F(   1,     62) =      35.52
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.3643
                                                  Adj R-squared   =     0.3027
                                                  Within R-sq.    =     0.3643
                                                  Root MSE        =  2020.6178

------------------------------------------------------------------------------
     p_resid | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     w_resid |    6.15521   1.032719     5.96   0.000     4.090834    8.219585
       _cons |  -12420.15   3127.727    -3.97   0.000    -18672.39   -6167.913
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
     foreign |         2           0           2     |
       rep78 |         5           1           4     |
-----------------------------------------------------+

Compare the regression coefficient and standard error of -w_resid- in this simple regression to these statistics for -weight- in the original model.

Last edited by Mead Over; 26 Apr 2022, 20:40.

Comment

Umair Ali

Join Date: Mar 2018

Posts: 17
#5

18 May 2022, 16:49

Thanks Jared Greathouse Nick Cox and Mead Over much helpful, I have been trying to do this but my graph is so ugly, perhaps because my data is kind of multimodal

Here is a little variable description and my code:
lnpostings: log of job postings
gsany: any regulations in place (binary) (x1)
gs_avestd_0to10_2: original independent variable (x2) that captures the exact value of regulation.
gs2avestd: standardized version of x2, the variable that captures the exact value of regulation.

Code:

reghdfe lnpostings gsany gs2avestd ra2avestd $covidregs $state if (group==1 | group==2 | group==3), a(group state_fips posted_stata) vce(cl stXday) regplot gs_avestd_0to10_2,recast(fpfit) name(regplot, replace)

I am trying to plot the regression line against the original and untransformed distribution of my independent variable. And then perhaps do a version on the transformed independent variable too. Just to see if the shape function of the regression changes if I use transformed variable instead of the original variable.

Last edited by Umair Ali; 18 May 2022, 17:08.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#6

18 May 2022, 17:45

You've got several predictors. regplot can't reduce the fitted hyperplane to a summary in the two-dimensional space that it uses. As explained at greater length in the help it's for regress y x and some other cases in which a two-dimension will work. marginsplot is a better bet for you, I guess.

On a different note: it seems odd to me to have two versions of a predictor both in the model: aren't they collinear?
Comment
Umair Ali

Join Date: Mar 2018

Posts: 17
#7

19 May 2022, 01:31

Originally posted by Nick Cox View Post

You've got several predictors. regplot can't reduce the fitted hyperplane to a summary in the two-dimensional space that it uses. As explained at greater length in the help it's for regress y x and some other cases in which a two-dimension will work. marginsplot is a better bet for you, I guess.

On a different note: it seems odd to me to have two versions of a predictor both in the model: aren't they collinear?

Thanks Nick - No the original variable is not in the regression, only the transformed version is - the three independent variables in the equation are all different, but one of those is just not in its original form.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#8

19 May 2022, 01:41

#6 should perhaps be restated. regplot can't obviously produce a useful summary in situations like yours. It does nothing to set variables not shown to constant values.
1 like
Comment

Announcement