Calculating and plotting the 95%CI of a nadir in Stata

Innocent Mboya

Join Date: Nov 2014

Posts: 13
#1

Calculating and plotting the 95%CI of a nadir in Stata

09 Feb 2023, 07:24

Hello,
I have a data problem where I am looking for the BMI-mortality association. The models are fit using Cox regression and BMI modelled with restricted cubic splines. Among other parameters, I was able to get the nadir, i.e., the BMI associated with the lowest HR. However, the issue is how to estimate the 95%CI for this minimum BMI value. Previous studies such as [1] have used bootstrap method. I have done a lot of reading and testing the solution with no success.

I will very much appreciate any assistance on how to implement this in Stata.

Reference
[1] Afzal, S., Tybjærg-Hansen, A., Jensen, G.B. and Nordestgaard, B.G., 2016. Change in body mass index associated with lowest mortality in Denmark, 1976-2013. Jama, 315(18), pp.1989-1996.
Tags: None

Innocent Mboya

Join Date: Nov 2014
Posts: 13

27 Feb 2023, 07:47

Hello again.
I have found a way to run this analysis and bootstrap the estimates. However, based on the following code using "auto.dta", I am not sure what I am doing wrong to get the 95%CI. I am not very good in stata programming, hence would appreciate your help.

Code:

 preserve
sysuse auto, clear
summ price
egen mean_price=mean(price)

capture program drop bmiprog
program define bmiprog , rclass
version 17
quietly summ mean_price, meanonly
local m = r(mean)
matrix V = J(1, 1000, .)
forvalues i=1/1000{
scalar r=runiform()
matrix b=rnormal(1,colsof(V))
matrix V[1, `i'] = `m' + b*sqrt(`m'*r*(1-r))
}
global bmiprog_N = e(N)
return scalar m = `m'
return matrix V = V
end

matrix se = J(1, 1, .)
matrix c1c2 = J(1, 2, .)
matrix colnames se = "SEM"
matrix colnames c1c2 = "Lower" "Upper"
matrix rownames c1c2 = "95%CI"

set trace on
bmiprog mean_price, r(V)
//scalar bmiprog_N = r(bmiprog_N)
set trace off

return list
matrix list r(V)
matrix CI = J(1, 2, .)
matrix list CI

restore

Specifically, this code returns missing values

Code:

matrix list CI

Last edited by Innocent Mboya; 27 Feb 2023, 07:52.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10254
#3

27 Feb 2023, 08:00

matrix CI = J(1, 2, .)
matrix list CI

Here, you just define matrix CI as a \(1\times 2\) matrix and assign missing values as its elements. There is no other reference to a matrix CI. What are you expecting?
Comment
Innocent Mboya

Join Date: Nov 2014

Posts: 13
#4

27 Feb 2023, 08:05

Please, how do I resolve the issue? My interest is to get the 95%CI from the values stored in "r(V)".
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#5

27 Feb 2023, 08:26

95% CIs are a pair. How do these relate to the entries in r(V)? I cannot follow what your program is doing and do not see that you store any CIs.
Comment
Innocent Mboya

Join Date: Nov 2014

Posts: 13
#6

28 Feb 2023, 02:59

Here is the problem I am trying to resolve; using predicted hazard ratio and body mass index (BMI), I obtained a BMI value associated with the lowest hazard ratio. Using auto.dta as an example, I then created a variable containing the mean value ("mean_price"), which in this case follows a uniform distribution as all values will be the same. Then, I want to get 95%CI of this value to use them in my graphs. My code above aims to get around this problem by obtaining 1000 bootstrap estimates of "mean_price". I took this path because the standard bootstrap function did not work as the standard error will be zero as all rows contains the same value.

This analysis was tested in R and worked perfectly but I cannot figure it out in Stata. The code above is the closest I got.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#7

28 Feb 2023, 07:46

I would post the R code which you state gives you the desired results. There are members in the forum fluent with both Stata and R who may be able to guide you. I cannot make much of the code in #2 as it defines a number of matrices which are never used and I do not understand the problem to give suggestions.
Comment

Innocent Mboya

Join Date: Nov 2014
Posts: 13

06 Mar 2023, 05:37

Below is the R-code used to obtain the desired outputs but I have not fully figured out how to write the corresponding program in Stata.

Code:

rm(list=ls())

library(tidyverse)
library(haven)
library(survival)
library(rms)
library(boot)

data<-read_dta("\\\\Data\\mydata.dta")

set.seed(384457)

#subset of 100000 observations from original dataset
sample<-sample_n(data,100000)

sample$ageentry<-as.numeric(difftime(sample$date,sample$dob,units='days'))
sample$ageend<-as.numeric(difftime(sample$deathdate,sample$dob,units='days'))

#code to calculate the BMI nadir

#Cox model with restricted cubic spline for BMI
cox<-coxph(Surv(time=ageentry,time2=ageend,event=status)~rcs(bmi)+factor(agegrp)+factor(sex),data=sample)
summary(cox)

#predicted values
termplot(cox,term=1,se=TRUE,plot=TRUE)
tt<-termplot(cox,term=1,se=TRUE,plot=FALSE)
#sort in ascending order by associated risk
tt$bmi<-arrange(tt$bmi,y)
#View(tt$bmi)
#take the first BMI value
BMInadir<-tt$bmi[1,1]
#BMI associated with the lowest risk
BMInadir

#function for bootstrapping
BMInadir <- function(ds0,i){
  ds <- ds0[i,]
  cox<-coxph(Surv(time=ageentry,time2=ageend,event=status)~rcs(bmi)+factor(agegrp)+factor(sex),data=ds)
  tt<-termplot(cox,term=1,se=TRUE,plot=FALSE)
  tt$bmi<-arrange(tt$bmi,y)
  BMInadir<-tt$bmi[1,1]
  return(BMInadir)
}

#bootstrapping with 20 iterations
bt_res<-boot(sample, BMInadir, R = 20, stype = "i")

#Minimum BMI value from the original model
bt_res$t0
#Bootstrapped values
bt_res$t

#parametric bootstrapping
bt_ci1<-boot.ci(bt_res, type = c("norm"),index=1)
print(bt_ci1)
#point estimate
bt_ci1$t0
#95% CI lower limit
bt_ci1$normal[2]
#95% CI upper limit
bt_ci1$normal[3]

#non-parametric, percentile based bootstrapping (I would prefer this option to the parametric bootstrap)
bt_ci2<-boot.ci(bt_res, type = c("perc"),index=1)
print(bt_ci2)
#point estimate
bt_ci2$t0
#95% CI lower limit
bt_ci2$percent[4]
#95% CI upper limit
bt_ci2$percent[5]

Comment

Innocent Mboya

Join Date: Nov 2014
Posts: 13

21 Mar 2023, 09:53

Dear Stata users,
I found a solution to what I need. Please note that my initial code above is wrong and should NOT be used (apologies for the confusion caused). While the following code is rather long, it still produces the desired results. I would appreciate if there is an easy and direct way to write/ organize e.g., in one loop.

Code:

*===============================================================================
 global data "\\Data"  
cd "$data"
use studydata.dta, clear

set seed 384457
sample 100000, count

* all-cause mortality; overall effect
*===============================================================================

mkspline bmispl = bmi, cubic displayknots
mat knots = r(knots) //needed for Marisa's code

stset deathdate, fail(status) enter(date) origin(dob) id(id) scale(365.25)

* performing bootstrap estimation of confidence intervals for bmi associated with lowest hr

* need results from the cox model and predictions for ploting
stcox bmispl* i.agegrp i.sex
xbrcspline bmispl, matknots(knots) values(15(0.05)40) ref(22.5) eform ///
    gen(bminew rr lb ub)
    
    *gen minimum bmi to use in the plot
    egen hrmin=min(rr)
    gen bmi_min=bminew if rr == hrmin
    summ bmi_min
    egen bmihr=mean(bmi_min)

* loop the bootstrap estimation
set seed 96321
quietly {
forvalues i=1/20 {
    preserve
    bsample

    stcox bmispl* i.agegrp i.sex
    xbrcspline bmispl, matknots(knots) values(15(0.05)40) ref(22.5) eform ///
        gen(bminewbs rrbs lbbs ubbs)

    egen minhr= min(rrbs)
    gen minbmi=bminewbs if rrbs == minhr
    drop if missing(minbmi)
    
    keep minhr minbmi
    save "bsdata`i'.dta", replace /*save data for each bootstrap sample*/
    restore
}
}

* append the bsdata files
preserve
use bsdata1.dta, clear

forvalues i=2/20 {
local flist : dir "$data" files "bsdata`i'.dta"
    foreach f in `flist' {
    append using "`f'"    
}
}
gen bsid=_n /*need this identifier for merging with the master data*/
save bsdata.dta, replace //can then use this data to calculate the 95%CI
forvalues i=1/20 {
erase "bsdata`i'.dta" //remove individual bsdata from directory  
}
restore

*merge with master data to access the bootstrap estimates
gen bsid=_n *need this identifier for merging with the bsdata data*/
merge 1:1 bsid using bsdata.dta
drop _merge bsid

list minhr minbmi in 1/10
centile minbmi
return list
gen lower = r(lb_1)
gen upper = r(ub_1)
gen y=0.7 /*for plotting the nadir with bmihr*/

* now recode "lower upper bmihr y" to missing if bminew is missing
foreach var in lower upper bmihr y {
    replace `var' = . if bminew==.
}

*desired plot
twoway (rarea lb ub bminew, sort color(gs13)) ///
    (line rr bminew, lwidth(medthick) lcolor(black)) ///
    (function y=1, range(bminew) clcolor(fg) lp(-) lwidth(thin)) ///
    (scatter y bmihr, col(black) msize(tiny)) ///
    (pccapsym y lower y upper, col(black) msize(vsmall) msymbol(|) ///
        mlabel(bmihr) mlabpos(1) mlabcolor(black) mlabsize(vsmall) lwidth(vvthin)) ///
    , yscale(log)  xlabel(15(5)40)  ///
    ytitle("HR (95%CI)") xtitle("Body mass index") legend(off) ///    
    name("aHRall_mar", replace) scheme(s2color)  ///
    title("BMI-mortality association", size(vsmall)) ///
        xline(22.5, lstyle(grid))
*===============================================================================

Last edited by Innocent Mboya; 21 Mar 2023, 09:56.

Announcement