Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effect- three identifier variables and result interpretation

    I have panel dataset where each observation is identified by three variables ( server#, Motherboard_tempreature, cpu_utlization) I built the fixed effect regression as follows:
    Code:
    egen id= group( server_num, CPUAAverageCoreTemperature)
    xtset id CPUAUtilization
    
    xtreg IPMIPower i.CPUAUtilization i.CPUAAverageCoreTemperature Fan1RPM , fe vce(robust)
    
    
    Fixed-effects (within) regression               Number of obs     =      1,296
    Group variable: id                              Number of groups  =        163
    
    R-squared:                                      Obs per group:
         Within  = 0.9573                                         min =          1
         Between = 0.3868                                         avg =        8.0
         Overall = 0.8253                                         max =          9
    
                                                    F(9,162)          =     882.55
    corr(u_i, Xb) = 0.0615                          Prob > F          =     0.0000
    
                                                     (Std. err. adjusted for 163 clusters in id)
    --------------------------------------------------------------------------------------------
                               |               Robust
                     IPMIPower | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ---------------------------+----------------------------------------------------------------
               CPUAUtilization |
                            1  |    45.0428   1.088814    41.37   0.000      42.8927     47.1929
                            2  |    65.8275   1.355795    48.55   0.000     63.15019    68.50481
                            3  |   81.43575   2.302044    35.38   0.000     76.88986    85.98163
                            4  |   99.49748   3.169925    31.39   0.000     93.23778    105.7572
                            5  |   101.0006   3.407896    29.64   0.000     94.27099    107.7302
                            6  |   104.4599   3.555344    29.38   0.000     97.43913    111.4807
                            7  |   106.5967   3.766806    28.30   0.000     99.15836    114.0351
                            8  |   107.0671   3.877293    27.61   0.000     99.41051    114.7236
                               |
    CPUAAverageCoreTemperature |
                           72  |          0  (omitted)
                           74  |          0  (omitted)
                               |
                       Fan1RPM |   .0151772   .0019755     7.68   0.000     .0112762    .0190783
                         _cons |   28.71307   4.337745     6.62   0.000     20.14726    37.27889
    ---------------------------+----------------------------------------------------------------
                       sigma_u |  18.396744
                       sigma_e |  9.5483104
                           rho |  .78778394   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------------------
    My questions:
    1- can I use the model in this way to get the coefficient of CPUAUtilization- this variable is used in xtset- ? can I say that when the cpu utilization=2 then the IPMIPower increased by 65?
    2- I want to see the coefficient of CPUAAverageCoreTemperature as well- used in xtset - but it is omitted? how to check the coefficent for this time invariant variable? should I switch to random effect?

    To conclude, can I include the variables used in xtset as independent variable in the model and interrupt them as we interrupt other independent variable?
    Last edited by amera amery; 17 Jun 2022, 14:50.

  • #2
    1. CPUAUtilization was "used in -xtset-" as the time variable. As such, it has no special status except in analyses that use lags or leads or other time-series operators, or in models with autoregressive structure. As none of that apparatus is used in the analylsis you show, you can think of CPUAUtilization as just another variable. If you left it out of the -xtset- command, nothing would change.

    2. Because CPUAAverageCoreTemperature is constant within each id, its effects are not estimable in fixed effects models. If you need to estimate those effects for your research goals, then, yes you will have to switch to some other model. Random effects is one approach. You might also consider a correlated random effects model, implemented in -xthybrid-, available from SSC.

    Comment


    • #3
      Thank you very much for your reply. it is really helpful. I have another question, when I build the fixed effect regression I got reasonable results
      Code:
      Fixed-effects (within) regression               Number of obs     =        437
      Group variable: id                              Number of groups  =         54
      
      R-squared:                                      Obs per group:
           Within  = 0.9914                                         min =          6
           Between = 0.9589                                         avg =        8.1
           Overall = 0.9838                                         max =          9
      
                                                      F(17,366)         =    2471.05
      corr(u_i, Xb) = -0.0074                         Prob > F          =     0.0000
      
      -----------------------------------------------------------------------------------
              IPMIPower | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      ------------------+----------------------------------------------------------------
        CPUAUtilization |
                     1  |   9.276792   2.030078     4.57   0.000      5.28471    13.26887
                     2  |   5.380208   2.677865     2.01   0.045     .1142759    10.64614
                     3  |   11.73227   2.999456     3.91   0.000     5.833936     17.6306
                     4  |   7.757695   3.634875     2.13   0.033     .6098347    14.90556
                     5  |   9.677154   3.660209     2.64   0.009     2.479476    16.87483
                     6  |   9.213083    3.73256     2.47   0.014     1.873128    16.55304
                     7  |   10.30386   3.768056     2.73   0.007     2.894102    17.71361
                     8  |   9.252536   3.814135     2.43   0.016     1.752167    16.75291
                        |
      InletTemperaturec |   1.504897   1.252602     1.20   0.230     -.958302    3.968097
             CPUAPowerW |    2.40329   .0801771    29.97   0.000     2.245625    2.560956
             DRAMPowerW |  -.6603544   .3213018    -2.06   0.041    -1.292184   -.0285252
            Fan1RPM_std |   .5102339    3.18751     0.16   0.873    -5.757899    6.778366
            Fan2RPM_std |  -9.180696   6.687151    -1.37   0.171    -22.33075    3.969364
            Fan3RPM_std |   6.075889   6.596024     0.92   0.358    -6.894973    19.04675
            Fan4RPM_std |   3.043358   2.931474     1.04   0.300    -2.721288    8.808004
            Fan5RPM_std |  -1.230411   2.944436    -0.42   0.676    -7.020546    4.559724
            Fan6RPM_std |  -.9179874   2.717511    -0.34   0.736    -6.261883    4.425908
                  _cons |   13.75671   28.10006     0.49   0.625    -41.50113    69.01455
      ------------------+----------------------------------------------------------------
                sigma_u |  4.8548805
                sigma_e |   4.320039
                    rho |  .55809644   (fraction of variance due to u_i)
      -----------------------------------------------------------------------------------
      F test that all u_i=0: F(53, 366) = 3.97                     Prob > F = 0.0000
      (option xb assumed; fitted values)
      (437 missing values generated)
      (93 real changes made, 44 to missing)
      (486 real changes made)
      I have been asked to add one more variable (CPUBPowerW) represent the power for the second cpu in the server. In the dataset both variables CPUAPowerW CPUBPowerW values are very close to each other. but the result now are completely opposite

      Code:
         xtreg IPMIPower i.CPUAUtilization  InletTemperaturec   CPUAPowerW CPUBPowerW DRAMPowerW Fan1RPM_std Fan2RPM_std Fan3RPM_std Fan4RPM_std Fan5RPM_std Fan6RPM_std , fe
      Results:
      Code:
      R-squared:    Obs per group:
      Within  = 0.9919    min =          5
      Between = 0.9725    avg =        8.1
      Overall = 0.9842    max =          9
      
          F(18,365)         =    2496.90
      corr(u_i, Xb) = -0.3278    Prob > F          =     0.0000
      
          
      IPMIPower  Coefficient  Std. err.    t    P>t     [95% conf. interval]
          
      CPUAUtilization
      1    -2.474683   2.124949    -1.16   0.245    -6.653363    1.703997
      2    -2.713025   2.322502    -1.17   0.244     -7.28019    1.854141
      3    -1.591106   2.783247    -0.57   0.568    -7.064318    3.882107
      4    -2.824604   3.060585    -0.92   0.357    -8.843197    3.193989
      5    -2.182388   3.135796    -0.70   0.487    -8.348882    3.984106
      6    -2.441316   3.160523    -0.77   0.440    -8.656435    3.773803
      7    -2.007848   3.226662    -0.62   0.534    -8.353029    4.337333
      8    -3.241825   3.296159    -0.98   0.326    -9.723671    3.240021
      
      InletTemperaturec    1.591279   1.173472    1.36   0.176    -.7163365    3.898894
      CPUAPowerW    1.533728    .129726    11.82   0.000     1.278624    1.788833
      CPUBPowerW    1.047168   .1272852    8.23   0.000     .7968633    1.297472
      DRAMPowerW   -.3908985   .2980769    -1.31   0.191    -.9770621    .1952651
      Fan1RPM_std    1.957205   3.070232    0.64   0.524    -4.080359    7.994769
      Fan2RPM_std   -6.958174    6.34777    -1.10   0.274    -19.44097    5.524619
      Fan3RPM_std    6.809148   6.276625    1.08   0.279    -5.533739    19.15204
      Fan4RPM_std    4.176921   2.828196    1.48   0.141    -1.384683    9.738524
      Fan5RPM_std   -2.829339   2.823995    -1.00   0.317    -8.382682    2.724003
      Fan6RPM_std   -1.484024    2.59487    -0.57   0.568    -6.586796    3.618747
      _cons     11.4536   26.00579    0.44   0.660    -39.68638    62.59358
          
      sigma_u   5.0316019
      sigma_e   4.1178042
      rho   .59888867   (fraction    of variance due to u_i)
          
      F test that all u_i=0: F(53, 365) = 2.92    Prob > F = 0.0000
      (option xb assumed; fitted values)
      (437 missing values generated)
      (93 real changes made, 44 to missing)
      (486 real changes made)
      Can we say that the new variable do nothing to our model ? and should be removed? or should we keep it and interpret the results accordingly?
      Last edited by amera amery; 20 Jun 2022, 20:48.

      Comment


      • #4
        Amera:
        what hits my eyes is the sky-rocketing value of your within R-sq statistic (and the other R-sqs share the very same destiny):
        Code:
        Within  = 0.9919    min =          5
        Between = 0.9725    avg =        8.1
        Overall = 0.9842    max =          9
        They are dramatically (too) high: I would rule out possible overfitting here.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          @carlo, Yes. I noticed that and I tried 10 k cross-validation and got the same results, I think removing some variables makes more sense to me and may solved the overfitting issue ( do you agree with this? ). The goal of this project is not a prediction, the goal is checking which variables significantly affect the dependent variable.

          Comment


          • #6
            Amera:
            I'd sponsor your approach.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              When I checked the correlation between dependent variable and CPUBPowerW I see it is .99 which is very high, After removing CPUBPowerW from the regression, the results is more reasonable as follows
              Code:
              .   xtreg IPMIPower i.CPUAUtilization   InletTemperaturec_std    DRAMPowerW Fan1
              > RPM_std Fan2RPM_std Fan3RPM_std Fan4RPM_std Fan5RPM_std Fan6RPM_std ,vce(robus
              > t) re
              
              Random-effects GLS regression                   Number of obs     =        480
              Group variable: id                              Number of groups  =         54
              
              R-squared:                                      Obs per group:
              Within  = 0.9331                                         min =          8
              Between = 0.2907                                         avg =        8.9
              Overall = 0.8228                                         max =          9
              
              Wald chi2(16)     =   3.01e+06
              corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
              
              (Std. err. adjusted for 54 clusters in id)
              
              Robust
              IPMIPower  Coefficient  std. err.      z    P>z     [95% conf. interval]
              
              CPUAUtiliza~n 
              1     41.88219   6.364259     6.58   0.000     29.40847    54.35591
              2     61.27769   7.500947     8.17   0.000      46.5761    75.97927
              3     77.80003   8.730384     8.91   0.000     60.68879    94.91127
              4     95.52874   9.866469     9.68   0.000     76.19081    114.8667
              5     97.09743   10.10272     9.61   0.000     77.29646    116.8984
              6     101.0489    10.3693     9.75   0.000      80.7254    121.3723
              7     102.5249   10.59199     9.68   0.000     81.76496    123.2848
              8     109.8122    10.0868    10.89   0.000     90.04246     129.582
              
              InletTemper~d    .0148855   .0695147     0.21   0.830    -.1213608    .1511318
              DRAMPowerW     .858722   1.307608     0.66   0.511    -1.704142    3.421586
              Fan1RPM_std   -9.147688   1.705541    -5.36   0.000    -12.49049   -5.804889
              Fan2RPM_std    -1.15027   5.968729    -0.19   0.847    -12.84876    10.54822
              Fan3RPM_std    2.502546    5.26564     0.48   0.635    -7.817919    12.82301
              Fan4RPM_std    3.215118   3.734025     0.86   0.389    -4.103436    10.53367
              Fan5RPM_std   -.6965464    4.31918    -0.16   0.872    -9.161983     7.76889
              Fan6RPM_std    12.62829    3.36559     3.75   0.000      6.03185    19.22472
              _cons    55.93184   18.61995     3.00   0.003     19.43741    92.42627
              
              sigma_u   16.468604
              sigma_e   12.450664
              rho   .63630529   (fraction of variance due to u_i)
              
              
              . 
              end of do-file
              The question is : should I include CPUBPowerW in the regression, because it is an important variable even non of result will be significant and results does not make sense and R2 is very high. OR I should keep it out?

              Comment


              • #8
                Amera:
                you seemingly switched from -fe- to -re- specification with cluster-robust standard errors.
                These estimators are pretty different.
                As far as you main question is concerned, the fact that -CPUBPowerW- is highly correlated with the regressand is fine; the issue is the correlation among predictors (that you have to investigate), not with the regressand.
                Put in non-techinical terms, if the goal of each and every regression is to spot the contribution of each predictor (when adjusted for the other ones) to variation in the conditional mean of the regressand, if two predictor tell, more or less, the same thing, the regression machinery has hard times in disentangling their effect.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Thank you. Yes,I switched to re because I applied the test to choose between them and decided to use re as the result suggested. when I checked the correlation between CPUBPowerW and CPUAUtilization the correlation is >.75 So I think using both of them in the same regression will not make sense. I will use one of them. Also the correlation between CPUBPowerW and CPUAPowerW =.80. So I will apply the same concept. Do you agree with this?

                  Comment


                  • #10
                    Amera:
                    yes, I do.
                    That said, just out of curiosity, as far as testing -fe- vs. -re- specification, what test did you use, as -hausman- does not allow non-default standard errors and adding standard errors after -hausman- outcime is not the way to go?
                    Last edited by Carlo Lazzaro; 26 Jun 2022, 01:50.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment

                    Working...
                    X