Plotting results from regression using clustering by ID

Maria Giulia Trupia

Join Date: Aug 2023

Posts: 7
#1

Plotting results from regression using clustering by ID

31 Aug 2023, 13:43

Hi everyone,

I'm having difficulties in plotting the correlation between my IV and DV in a regression with clustering by ID. The command I run is the following:

regress pressure n_category, cluster( mainid)

pressure is DV and n_category is the IV.

How is it possible to plot these results?

Thank you so much,
Maria
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

01 Sep 2023, 02:42

Maria Giulia:
1) usually, after -regress- we're intrested in plotting residual vs fitted, as in the following toy-example:

Code:

. use "C:\Program Files\Stata17\ado\base\a\auto.dta"
(1978 automobile data)

. reg price i.rep78, vce(cluster foreign)

Linear regression                               Number of obs     =         69
                                                F(1, 1)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0145
                                                Root MSE          =     2980.2

                                (Std. err. adjusted for 2 clusters in foreign)
------------------------------------------------------------------------------
             |               Robust
       price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |   1403.125   .0000589  2.4e+07   0.000     1403.124    1403.126
          3  |   1864.733   329.9653     5.65   0.111    -2327.873     6057.34
          4  |       1507   195.7903     7.70   0.082     -980.751    3994.751
          5  |     1348.5   640.3933     2.11   0.282    -6788.468    9485.468
             |
       _cons |     4564.5   .0000428  1.1e+08   0.000     4564.499    4564.501
------------------------------------------------------------------------------

. predict fitted, xb

. predict epsilon, res

. twoway (scatter epsilon fitted ) (lfit epsilon fitted )

or you can use the Stata built-in graph command:

Code:

rvfplot

;

2) the Rsq (if is it the reason of your concern) is not affected by default/non-default standard errors:

Code:

. reg price i.rep78, vce(cluster foreign)

Linear regression                               Number of obs     =         69
                                                F(1, 1)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0145
                                                Root MSE          =     2980.2

                                (Std. err. adjusted for 2 clusters in foreign)
------------------------------------------------------------------------------
             |               Robust
       price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |   1403.125   .0000589  2.4e+07   0.000     1403.124    1403.126
          3  |   1864.733   329.9653     5.65   0.111    -2327.873     6057.34
          4  |       1507   195.7903     7.70   0.082     -980.751    3994.751
          5  |     1348.5   640.3933     2.11   0.282    -6788.468    9485.468
             |
       _cons |     4564.5   .0000428  1.1e+08   0.000     4564.499    4564.501
------------------------------------------------------------------------------

. reg price i.rep78

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      0.24
       Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
    Residual |   568436416        64     8881819   R-squared       =    0.0145
-------------+----------------------------------   Adj R-squared   =   -0.0471
       Total |   576796959        68  8482308.22   Root MSE        =    2980.2

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
          3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
          4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
          5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
             |
       _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
------------------------------------------------------------------------------

.

3) last but not least, to apply cluster standard errors safely, the number of clusters should be at least 30 (in fact, in the previous toy-example, the cluster standard errors are totally misleading).

Kind regards,
Carlo
(Stata 19.0)

Comment

Maria Giulia Trupia

Join Date: Aug 2023

Posts: 7
#3

21 Sep 2023, 18:54

Thank you so much for your detailed answer! Incredibly helpful.
Comment

Announcement