Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • nl hockey - Clustered Standard Errors

    Dear Statalist community,

    I am using StataSE 16 (64-bit) on Windows 10. My question is regarding producing clustered standard errors when using nl hockey, which is a user-written command by Dr Mark Lunt for estimating the breakpoint between two intersecting straight lines (referred to as a piecewise [or "hockey stick"] regression). Please see the ado file:

    HTML Code:
    https://personalpages.manchester.ac.uk/staff/mark.lunt/nlhockey.ado

    and the help file:
    HTML Code:
    https://personalpages.manchester.ac.uk/staff/mark.lunt/nlhockey.hlp
    There is an UCLA tutorial on piecewise regressions in Stata available here:

    HTML Code:
    https://stats.oarc.ucla.edu/stata/faq/how-can-i-run-a-piecewise-regression-in-stata/

    from which I'll be using a version of the talk dataset that I have adapted to this question. Please use the following code to reproduce this dataset:
    Code:
    . use https://stats.idre.ucla.edu/stat/stata/faq/talk, clear
    . expand 2
    . sort id
    . by id: gen obs = _n
    . replace talk = talk + .8474287 if obs == 2 & age <= 11.35291
    . replace talk = talk + 3.974196 if obs == 2 & age > 11.35291
    . replace age = age + 1 if obs == 2
    The dataset contains 4 variables:
    • id: an anonymous identifier of a child in the dataset
    • talk: how much the child talks on the phone
    • age: the age of the child
    • obs: the repeated observation number
    When running a piecewise regression using nl hockey to look at the relationship between how much a child talks on the phone and the age of the child, 4 parameters are estimated:
    • breakpoint: the breakpoint between two intersecting straight lines modelling the change in talk per year of age
    • slope_l: the gradient of the straight line to the left of the breakpoint
    • slope_r: the gradient of the straight line to the right of the breakpoint
    • cons: the constant (at age = 0)
    The code and results are as follows:

    Code:
    . nl hockey talk age
    (obs = 400)
    
    Iteration 0:   residual SS =   1724947
    Iteration 1:   residual SS =  48522.02
    Iteration 2:   residual SS =  31301.01
    Iteration 3:   residual SS =  28998.47
    Iteration 4:   residual SS =  28917.09
    Iteration 5:   residual SS =  28859.84
    Iteration 6:   residual SS =  28763.72
    Iteration 7:   residual SS =  28763.47
    
          Source |       SS       df       MS            Number of obs =       400
    -------------+------------------------------         F(  3,   396) =    444.51
           Model |  96860.6041     3   32286.868         Prob > F      =    0.0000
        Residual |  28763.4668   396  72.6350173         R-squared     =    0.7710
    -------------+------------------------------         Adj R-squared =    0.7693
           Total |  125624.071   399  314.847296         Root MSE      =  8.522618
                                                         Res. dev.     =   2845.31
    (hockey)
    ------------------------------------------------------------------------------
            talk |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
      breakpoint |   12.79033   .6941923    18.42   0.000     11.42557    14.15509
         slope_l |   1.107216   .4397491     2.52   0.012     .2426814    1.971751
         slope_r |   3.999005   .1564209    25.57   0.000     3.691486    4.306525
            cons |   4.656507   4.259092     1.09   0.275    -3.716751    13.02976
    ------------------------------------------------------------------------------
    * Parameter cons taken as constant term in model & ANOVA table
     (SEs, P values, CIs, and correlations are asymptotic approximations)
    However, there are 2 repeated observations per child, so it would be appropriate to adjust standard errors for clustering on their id. However, on running nl hockey with clustering on the id variable, I retrieve the following error:

    Code:
    . nl hockey talk age, cluster(id)
    cluster() not allowed
    r(198);
    I am wondering how I may be able to produce clustered standard errors for results retrieved when using nl hockey. I'll explain what I have tried / looked into before opening the floor to suggestions.

    I looked into writing code based on the programmer’s command _robust. Please see the manual:

    HTML Code:
    https://www.stata.com/manuals/p_robust.pdf
    However, it requires variables of the estimated parameters - for which I am not sure what should be contained in them. I wondered whether the model equation would need to be solved for each of the parameters, and the estimates for each parameter entered into each of the corresponding variables. The code is as follows:

    Code:
    . marksample touse
    . markout `touse' id, strok
    . nl hockey talk age if `touse'
    . matrix D = e(V)
    . forvalues i = 1/4 {
    .         forvalues j = 1/4 {
    .                 matrix D[`i',`j'] = ((e(V)[`i',`j'])*(sqrt(abs(e(V)[`i',`j']))/e(rmse))^2)/abs(e(V)[`i',`j'])
    .         }
    . }
    . matrix b = e(b)
    . gen breakpoint = //???
    . gen slope_l =    //???
    . gen slope_r =    //???
    . gen cons =       //???
    . local n = e(N)
    . local k = colsof(D)
    . predict double e if `touse', residual
    . sort `touse' id
    . by `touse' id: gen byte count = 1 if _n == 1 & `touse'
    . summarize count, meanonly
    . local n_id = r(sum)
    . local dof = `n_id' - 1
    . local clopt "cluster(id)"
    . ereturn post b D, dof(`dof') esample(`touse')
    . _robust e if e(sample), minus(`k') `clopt'
    . ereturn display
    . drop breakpoint slope_l slope_r cons e count
    I also looked into re-writing the program as a substitutable expression or function evaluator program, but this hasn't been successful so far and I would welcome any advice on how to do this from the ado file for nl hockey posted above. Please see the nl manual:

    HTML Code:
    https://www.stata.com/manuals/rnl.pdf
    I look forward to responses.

    Best wishes,
    Simon Baldwin

  • #2
    Simon:
    as per FAQ, you're requested to post your query on the General forum, as this one is for practising purposes only. Thanks.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thanks Carlo, this was just a practice for some colleagues to preview before posting to the General forum. I will move the post over shortly after making some amendments.

      Best wishes,
      Simon

      Comment

      Working...
      X