Box-Cox regression interpretation

Masoumeh Sanagou

Join Date: May 2017
Posts: 107

Box-Cox regression interpretation

03 Sep 2019, 16:33

Hi StataList,

Could you please help me to interpret the coefficients of the following model.

Code:

DAP^0.2564817=-2.993313+0.9320813*Weight^0.2737106+ 0.2308375*DAFrames^0.2737106+0.0999711*FluoroFrames^0.2737106

I need to say something like this:
for any a% increase/decrease in weight, the expected ratio of the DAP will be B.
or
we expect about c% increase/decrease in DAP when weight increases/decreases by d%.

I did the following Box-Cox regression:

Code:

boxcox DAP Weight DAFrames FluoroFrames , model(theta) lrtest

the results were:

Code:

                                                  Number of obs   =      2,811
                                                  LR chi2(4)      =    2775.64
Log likelihood = -10360.897                       Prob > chi2     =      0.000
 
------------------------------------------------------------------------------
         DAP |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     /lambda |   .2737106   .0475931     5.75   0.000     .1804299    .3669914
      /theta |   .2564817   .0169948    15.09   0.000     .2231725    .2897908
------------------------------------------------------------------------------
 
Estimates of scale-variant parameters
-------------------------------------------------------------
             |      Coef.  chi2(df)  P>chi2(df)    df of chi2
-------------+-----------------------------------------------
Notrans      |
       _cons |  -10.64568
-------------+-----------------------------------------------
Trans        |
      Weight |   .9946927  1273.050   0.000           1
    DAFrames |   .2463437   508.567   0.000           1
FluoroFrames |   .1066865   876.712   0.000           1
-------------+-----------------------------------------------
      /sigma |   .9905126
-------------------------------------------------------------
 
---------------------------------------------------------------
   Test               Restricted    
    H0:             log likelihood       chi2       Prob > chi2
---------------------------------------------------------------
theta=lambda = -1     -13074.245      5426.70           0.000
theta=lambda =  0     -10475.138       228.48           0.000
theta=lambda =  1     -11315.795      1909.80           0.000
---------------------------------------------------------------

then I did

Code:

gen DAP1=DAP^.2564817
gen Weight1=Weight^.2737106
gen DAFrames1=DAFrames^.2737106
gen FluoroFrames1=FluoroFrames^.2737106

regress DAP1 Weight1 DAFrames1 FluoroFrames1

the results were:

Code:

      Source |       SS           df       MS      Number of obs   =     2,811
-------------+----------------------------------   F(3, 2807)      =   1589.88
       Model |  308.273108         3  102.757703   Prob > F        =    0.0000
    Residual |  181.423564     2,807  .064632549   R-squared       =    0.6295
-------------+----------------------------------   Adj R-squared   =    0.6291
       Total |  489.696672     2,810  .174269278   Root MSE        =    .25423

-------------------------------------------------------------------------------
         DAP1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
      Weight1 |   .9320813   .0232091    40.16   0.000     .8865728    .9775899
    DAFrames1 |   .2308375   .0097467    23.68   0.000      .211726    .2499489
FluoroFrames1 |   .0999711     .00311    32.15   0.000      .093873    .1060692
        _cons |  -2.993313   .0857925   -34.89   0.000    -3.161536    -2.82509
-------------------------------------------------------------------------------

Regards,

Tags: None

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#2

03 Sep 2019, 20:58

There's no simple way to estimate the output. In 1992, International Economic Review, I published a paper "Some Alternatives to the Box-Cox Regression Model." There I pointed out that there is no simple way to recover E(y|x) from the Box-Cox estimates. And even if you make strong assumptions, the resulting estimates and standard errors will be nonrobust. I suggested modeling E(y|x) using an inverse Box-Cox form, and possibly transforming some covariates, too. One gets direct estimates of the partial effects, and my paper discusses how to compute, say, an elasticity. Estimation by nonlinear or weighted nonlinear least squares is fairly easy. It didn't seem to catch on even though I still don't know how people interpret the Box-Cox estimates. Most, I suspect, violate Jensen's inequality and just act as if the mean passes through the nonlinear function.

Currently, my view is that an exponential model with flexible functions of the covariates -- squares, interactions -- is often sufficient. The coefficients are easy to interpret; in fact, you'd get the elasticity directly by using the logs of the explanatory variables. The estimates are invariant to rescaling, and you can use the Poisson or Gamma QMLEs (available using the Stata -glm- command) along with fully robust inference. The Box-Cox model maintains lots of assumptions (homoskedasticity, normality). Those Box-Cox standard errors are likely very misleading, as they're computed under the assumption that the entire distribution is correct.

Incidentally, coding my suggested conditional mean function using the -nl- command in Stata should not be too hard. And Stata will give the robust standard errors.
2 likes
Comment

Announcement

Box-Cox regression interpretation

Comment