I'm finessing some analysis looking at the effect of an enviornmental biomarker (EB) on anthropometric outcomes in children (AO). Let me preface by saying I am an expert SPSS user who is often asked elementary syntax questions that I readily help people with. So I understand if you think this is elementary, but I really have searched online and in the forums and can't find a response appropriate to my query. Reader - kindly read on and please help if you can. Thank you.
Originally I ran the analysis in my comfort zone of linear regression. But my colleagues and I started thinking that perhaps the linear relationship is not the best way to analyze.
So I used tertiles and ran linear regression using dummy variables. Because of the size of my sample, tertile cuts are as small as I should go to maintain over n=100 in each tertile.
But we started to see something in the third tertile that we thought might be driving the relationships we're seeing. So I was advised to use spline analysis in STATA since SPSS cannot handle this properly. The way I understood it, I was told to use the spline to tell me the best cutpoints (knots) to use instead of the tertiles.
My interpretation of this (perhaps grossly naive and incorrect) was that I should use the knots in the same way I used the tertile cuts - make dummy variables and run linear regression. But the only way I could get STATA to make the knots without any input from me was to use cubic splines. The problem with that being cubic splines of course cannot have fewer than 3 knots - which then brings my N to under 100 in the segment before the 1st knot and after the 2nd knot (more like the N hovers around 40 for those 2 segments).
Now I'm thinking what I was supposed to do was actually run a spline regression syntax which incorporates tertiles in STATA. Is there a consensus on what is the best of the 3 spline syntax modes for this? I've gotten as far as
mkspline m1sptert 3 = EB, pctile displayknots
But this gives me 375 values in each tertile. I actually am not clear on what the new variables are even supposed to represent.
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
m1sptert1 | 375 .1679392 .3416748 -1.76026 .3310139
m1sptert2 | 375 .338585 .2955811 0 .6702747
m1sptert3 | 375 .2013302 .440995 0 2.925424
I would so greatly appreciate being pointed in the right direction and especially if sample code could be offered. I'm already behind on my deadline and mostly it's because I'm very muddy on the spline analysis and I'm such a novice at STATA.
Originally I ran the analysis in my comfort zone of linear regression. But my colleagues and I started thinking that perhaps the linear relationship is not the best way to analyze.
So I used tertiles and ran linear regression using dummy variables. Because of the size of my sample, tertile cuts are as small as I should go to maintain over n=100 in each tertile.
But we started to see something in the third tertile that we thought might be driving the relationships we're seeing. So I was advised to use spline analysis in STATA since SPSS cannot handle this properly. The way I understood it, I was told to use the spline to tell me the best cutpoints (knots) to use instead of the tertiles.
My interpretation of this (perhaps grossly naive and incorrect) was that I should use the knots in the same way I used the tertile cuts - make dummy variables and run linear regression. But the only way I could get STATA to make the knots without any input from me was to use cubic splines. The problem with that being cubic splines of course cannot have fewer than 3 knots - which then brings my N to under 100 in the segment before the 1st knot and after the 2nd knot (more like the N hovers around 40 for those 2 segments).
Now I'm thinking what I was supposed to do was actually run a spline regression syntax which incorporates tertiles in STATA. Is there a consensus on what is the best of the 3 spline syntax modes for this? I've gotten as far as
mkspline m1sptert 3 = EB, pctile displayknots
But this gives me 375 values in each tertile. I actually am not clear on what the new variables are even supposed to represent.
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
m1sptert1 | 375 .1679392 .3416748 -1.76026 .3310139
m1sptert2 | 375 .338585 .2955811 0 .6702747
m1sptert3 | 375 .2013302 .440995 0 2.925424
I would so greatly appreciate being pointed in the right direction and especially if sample code could be offered. I'm already behind on my deadline and mostly it's because I'm very muddy on the spline analysis and I'm such a novice at STATA.
Comment