Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fixed effect regression model

    Hello experts,

    I have a dataset with the following variable: id, treatment, clinic, smoker(0/1), age(1=21-25, 2=25-35, 3=36-50, 4=51+). Patients were randomly assigned to treatment/placebo in certain clinics and given an assessment (outcome-continuous). Is there a way to run a regression model with clinic fixed-effect? Moved from r to stata so still trying to figure stata out. Some sample code below.

    Code:
    input int(id treatment clinic age assessment)
    1  1 1 3 3
    2  0 1 2 2.7
    3  1 2 1 6
    4  0 2 1 4
    5  1 2 3 4.5
    6  0 2 2 6
    7  1 3 2 2.1
    8  0 3 2 3
    9  1 4 1 5
    10 0 4 2 4.3
    11 1 4 1 4
    Last edited by Amber Pong; 19 Jul 2023, 00:12.

  • #2
    Amber:
    you seem to have a panel dataset that was collapsed on -assessment-.
    Assuming that this is your final dataset, you may want to take a look at the following toy-example (I've created a fake -depvar-):
    Code:
    . g depvar=runiform()*100
    
    . reg depvar i.treatment i.clinic c.age##c.age assessment, vce(cluster id)
    
    Linear regression                               Number of obs     =         11
                                                    F(7, 10)          =       4.70
                                                    Prob > F          =     0.0141
                                                    R-squared         =     0.8316
                                                    Root MSE          =      21.72
    
                                        (Std. err. adjusted for 11 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
          depvar | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
     1.treatment |  -9.788388   14.46818    -0.68   0.514     -42.0255    22.44873
                 |
          clinic |
              2  |   26.60828   29.57069     0.90   0.389    -39.27932    92.49588
              3  |   8.570557   22.99764     0.37   0.717    -42.67138     59.8125
              4  |   65.36355   25.93744     2.52   0.030     7.571331    123.1558
                 |
             age |    31.5863   82.16532     0.38   0.709    -151.4894     214.662
                 |
     c.age#c.age |   1.536649   21.50884     0.07   0.944    -46.38803    49.46133
                 |
      assessment |   2.057658   9.427851     0.22   0.832     -18.9489    23.06422
           _cons |  -58.41606   77.59065    -0.75   0.469    -231.2988    114.4667
    ------------------------------------------------------------------------------
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo Lazzaro what does vce(cluster id) do in the example you provided? Also, I noticed errors in the code I provided so here is what the sample code should look like.

      Code:
      input int(id treatment age clinic smoker assessment)
      1 1 1 3 1 3
      2 0 1 2 1 2.7
      3 1 2 1 0 6
      4 0 2 1 1 4
      5 1 2 3 0 4.5
      6 0 2 2 0 6
      7 1 3 2 0 2.1
      8 0 3 2 0 13
      9 1 4 1 1 5
      10 0 4 2 0 4.3
      11 1 4 1 0 4
      The dependent variable is the score on the assessment and age is a categorical variable so would the code be something along this line?

      Code:
      reg assessment treatment i.clinic i.age smoker.
      Should I rather be using the xtreg command?

      Comment


      • #4
        Amber:
        see:
        Code:
        . reg assessment i.treatment i.clinic i.age i.smoker, vce(cluster clinic)
        
        Linear regression                               Number of obs     =         11
                                                        F(1, 2)           =          .
                                                        Prob > F          =          .
                                                        R-squared         =     0.5647
                                                        Root MSE          =     3.6465
        
                                         (Std. err. adjusted for 3 clusters in clinic)
        ------------------------------------------------------------------------------
                     |               Robust
          assessment | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
         1.treatment |  -5.486486   8.774002    -0.63   0.596    -43.23797      32.265
                     |
              clinic |
                  2  |  -5.702703   8.581503    -0.66   0.575    -42.62593    31.22052
                  3  |  -.0810811   .5327053    -0.15   0.893    -2.373127    2.210965
                     |
                 age |
                  2  |  -1.378378   4.001637    -0.34   0.763    -18.59603    15.83928
                  3  |   4.567568   .7583289     6.02   0.026     1.304742    7.830393
                  4  |  -.4054054   3.827673    -0.11   0.925    -16.87455    16.06374
                     |
            1.smoker |  -3.243243   4.387001    -0.74   0.537    -22.11899     15.6325
               _cons |   11.37838   12.60257     0.90   0.462    -42.84611    65.60286
        ------------------------------------------------------------------------------
        
        .
        
        .
        If your clinics are at least 30, vce(cluster clinic) makes sense. This option considers errors within the same clinic to be correlated (see -vce(cluster clusterid)- option in -regreess-).
        -xtreg- would be the way to go provided that you have a -timevar- variable, that includes the (theoretically equally spaced) instances in which the same sample was measured on the same variables.
        Last edited by Carlo Lazzaro; 19 Jul 2023, 08:45. Reason: Reading Jeff's reply, I realized that I missed -i.smoker- in the right-hand side of the regression equation. Thanks Jeff! :)
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          It sounds like some clinics are controls and in other clinics they randomized control and treatment. Because the assignment was at the clinic level, you need some adjustment to the standard errors. Clustering at the clinic level is probably somewhat conservative but it's the easiest thing to do.

          One thing to note. If you use

          Code:
          reg assessment i.treatment i.age i.smoker i.clinic, vce(cluster clinic)
          you should ignore the standard errors on the clinic dummy variables as they are meaningless. The other standard errors are fine (if, as mentioned by Carlo, you have a sufficient number of clinics.)

          With enough data, you might want to interact i.treatment with i.age and i.smoker to determine if there is heterogeneity in the treatment effects.

          Comment


          • #6
            The clinics are less than 30 (there's 24), would vce(cluster clinic) still make sense?

            Comment


            • #7
              I think 24 clinics is okay for clustering. A bit small, but some simulations show it can work. How many patients per clinic? How many clinics have some control and treated units?

              Comment


              • #8
                The number of patients per clinic varies but all clinics have at least one control and treated

                Comment

                Working...
                X