Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plotting results from regression using clustering by ID

    Hi everyone,

    I'm having difficulties in plotting the correlation between my IV and DV in a regression with clustering by ID. The command I run is the following:

    regress pressure n_category, cluster( mainid)

    pressure is DV and n_category is the IV.

    How is it possible to plot these results?

    Thank you so much,
    Maria

  • #2
    Maria Giulia:
    1) usually, after -regress- we're intrested in plotting residual vs fitted, as in the following toy-example:
    Code:
    . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
    (1978 automobile data)
    
    . reg price i.rep78, vce(cluster foreign)
    
    Linear regression                               Number of obs     =         69
                                                    F(1, 1)           =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.0145
                                                    Root MSE          =     2980.2
    
                                    (Std. err. adjusted for 2 clusters in foreign)
    ------------------------------------------------------------------------------
                 |               Robust
           price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           rep78 |
              2  |   1403.125   .0000589  2.4e+07   0.000     1403.124    1403.126
              3  |   1864.733   329.9653     5.65   0.111    -2327.873     6057.34
              4  |       1507   195.7903     7.70   0.082     -980.751    3994.751
              5  |     1348.5   640.3933     2.11   0.282    -6788.468    9485.468
                 |
           _cons |     4564.5   .0000428  1.1e+08   0.000     4564.499    4564.501
    ------------------------------------------------------------------------------
    
    . predict fitted, xb
    
    . predict epsilon, res
    
    . twoway (scatter epsilon fitted ) (lfit epsilon fitted )
    or you can use the Stata built-in graph command:
    Code:
    rvfplot
    ;

    2) the Rsq (if is it the reason of your concern) is not affected by default/non-default standard errors:
    Code:
    . reg price i.rep78, vce(cluster foreign)
    
    Linear regression                               Number of obs     =         69
                                                    F(1, 1)           =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.0145
                                                    Root MSE          =     2980.2
    
                                    (Std. err. adjusted for 2 clusters in foreign)
    ------------------------------------------------------------------------------
                 |               Robust
           price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           rep78 |
              2  |   1403.125   .0000589  2.4e+07   0.000     1403.124    1403.126
              3  |   1864.733   329.9653     5.65   0.111    -2327.873     6057.34
              4  |       1507   195.7903     7.70   0.082     -980.751    3994.751
              5  |     1348.5   640.3933     2.11   0.282    -6788.468    9485.468
                 |
           _cons |     4564.5   .0000428  1.1e+08   0.000     4564.499    4564.501
    ------------------------------------------------------------------------------
    
    . reg price i.rep78
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(4, 64)        =      0.24
           Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
        Residual |   568436416        64     8881819   R-squared       =    0.0145
    -------------+----------------------------------   Adj R-squared   =   -0.0471
           Total |   576796959        68  8482308.22   Root MSE        =    2980.2
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           rep78 |
              2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
              3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
              4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
              5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
                 |
           _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
    ------------------------------------------------------------------------------
    
    .
    3) last but not least, to apply cluster standard errors safely, the number of clusters should be at least 30 (in fact, in the previous toy-example, the cluster standard errors are totally misleading).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you so much for your detailed answer! Incredibly helpful.

      Comment

      Working...
      X