fixed effect regression model

Amber Pong

Join Date: Oct 2022

Posts: 21
#1

fixed effect regression model

18 Jul 2023, 23:39

Hello experts,

I have a dataset with the following variable: id, treatment, clinic, smoker(0/1), age(1=21-25, 2=25-35, 3=36-50, 4=51+). Patients were randomly assigned to treatment/placebo in certain clinics and given an assessment (outcome-continuous). Is there a way to run a regression model with clinic fixed-effect? Moved from r to stata so still trying to figure stata out. Some sample code below.

Code:

input int(id treatment clinic age assessment) 1 1 1 3 3 2 0 1 2 2.7 3 1 2 1 6 4 0 2 1 4 5 1 2 3 4.5 6 0 2 2 6 7 1 3 2 2.1 8 0 3 2 3 9 1 4 1 5 10 0 4 2 4.3 11 1 4 1 4

Last edited by Amber Pong; 19 Jul 2023, 00:12.
Tags: fixed effects, regression

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

19 Jul 2023, 00:43

Amber:
you seem to have a panel dataset that was collapsed on -assessment-.
Assuming that this is your final dataset, you may want to take a look at the following toy-example (I've created a fake -depvar-):

Code:

. g depvar=runiform()*100

. reg depvar i.treatment i.clinic c.age##c.age assessment, vce(cluster id)

Linear regression                               Number of obs     =         11
                                                F(7, 10)          =       4.70
                                                Prob > F          =     0.0141
                                                R-squared         =     0.8316
                                                Root MSE          =      21.72

                                    (Std. err. adjusted for 11 clusters in id)
------------------------------------------------------------------------------
             |               Robust
      depvar | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
 1.treatment |  -9.788388   14.46818    -0.68   0.514     -42.0255    22.44873
             |
      clinic |
          2  |   26.60828   29.57069     0.90   0.389    -39.27932    92.49588
          3  |   8.570557   22.99764     0.37   0.717    -42.67138     59.8125
          4  |   65.36355   25.93744     2.52   0.030     7.571331    123.1558
             |
         age |    31.5863   82.16532     0.38   0.709    -151.4894     214.662
             |
 c.age#c.age |   1.536649   21.50884     0.07   0.944    -46.38803    49.46133
             |
  assessment |   2.057658   9.427851     0.22   0.832     -18.9489    23.06422
       _cons |  -58.41606   77.59065    -0.75   0.469    -231.2988    114.4667
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Amber Pong

Join Date: Oct 2022

Posts: 21
#3

19 Jul 2023, 07:50

Carlo Lazzaro what does vce(cluster id) do in the example you provided? Also, I noticed errors in the code I provided so here is what the sample code should look like.

Code:

input int(id treatment age clinic smoker assessment) 1 1 1 3 1 3 2 0 1 2 1 2.7 3 1 2 1 0 6 4 0 2 1 1 4 5 1 2 3 0 4.5 6 0 2 2 0 6 7 1 3 2 0 2.1 8 0 3 2 0 13 9 1 4 1 1 5 10 0 4 2 0 4.3 11 1 4 1 0 4

The dependent variable is the score on the assessment and age is a categorical variable so would the code be something along this line?

Code:

reg assessment treatment i.clinic i.age smoker.

Should I rather be using the xtreg command?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

19 Jul 2023, 08:08

Amber:
see:

Code:

. reg assessment i.treatment i.clinic i.age i.smoker, vce(cluster clinic)

Linear regression                               Number of obs     =         11
                                                F(1, 2)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.5647
                                                Root MSE          =     3.6465

                                 (Std. err. adjusted for 3 clusters in clinic)
------------------------------------------------------------------------------
             |               Robust
  assessment | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
 1.treatment |  -5.486486   8.774002    -0.63   0.596    -43.23797      32.265
             |
      clinic |
          2  |  -5.702703   8.581503    -0.66   0.575    -42.62593    31.22052
          3  |  -.0810811   .5327053    -0.15   0.893    -2.373127    2.210965
             |
         age |
          2  |  -1.378378   4.001637    -0.34   0.763    -18.59603    15.83928
          3  |   4.567568   .7583289     6.02   0.026     1.304742    7.830393
          4  |  -.4054054   3.827673    -0.11   0.925    -16.87455    16.06374
             |
    1.smoker |  -3.243243   4.387001    -0.74   0.537    -22.11899     15.6325
       _cons |   11.37838   12.60257     0.90   0.462    -42.84611    65.60286
------------------------------------------------------------------------------

.

.

If your clinics are at least 30, vce(cluster clinic) makes sense. This option considers errors within the same clinic to be correlated (see -vce(cluster clusterid)- option in -regreess-).
-xtreg- would be the way to go provided that you have a -timevar- variable, that includes the (theoretically equally spaced) instances in which the same sample was measured on the same variables.

Last edited by Carlo Lazzaro; 19 Jul 2023, 08:45. Reason: Reading Jeff's reply, I realized that I missed -i.smoker- in the right-hand side of the regression equation. Thanks Jeff! :)

Kind regards,
Carlo
(Stata 19.0)

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#5

19 Jul 2023, 08:39

It sounds like some clinics are controls and in other clinics they randomized control and treatment. Because the assignment was at the clinic level, you need some adjustment to the standard errors. Clustering at the clinic level is probably somewhat conservative but it's the easiest thing to do.

One thing to note. If you use

Code:

reg assessment i.treatment i.age i.smoker i.clinic, vce(cluster clinic)

you should ignore the standard errors on the clinic dummy variables as they are meaningless. The other standard errors are fine (if, as mentioned by Carlo, you have a sufficient number of clinics.)

With enough data, you might want to interact i.treatment with i.age and i.smoker to determine if there is heterogeneity in the treatment effects.
1 like
Comment
Amber Pong

Join Date: Oct 2022

Posts: 21
#6

19 Jul 2023, 08:59

The clinics are less than 30 (there's 24), would vce(cluster clinic) still make sense?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#7

19 Jul 2023, 09:02

I think 24 clinics is okay for clustering. A bit small, but some simulations show it can work. How many patients per clinic? How many clinics have some control and treated units?
Comment
Amber Pong

Join Date: Oct 2022

Posts: 21
#8

19 Jul 2023, 09:10

The number of patients per clinic varies but all clinics have at least one control and treated
Comment

Announcement