stptime - SMR, "using data not sorted" error

Emily Tweed

Join Date: Jun 2015
Posts: 26

stptime - SMR, "using data not sorted" error

22 Mar 2021, 09:34

Hello,

I have individual-level survival data for a cohort of people with my exposure variable of interest coded 0-3. I would like to calculate SMR for exposure groups 1-3 using exposure = 0 as the reference group. I've therefore saved the age group-specific mortality rates in a separate file, sorted by age group.

I can't share the data itself due to confidentiality restrictions but here is an example of what it looks like (simplifying to 3 age groups):

Code:

clear
input byte exposure float(age_group failure persontime id)
0 1 0  15  1
0 1 0  15  2
0 1 0   7  3
0 1 1   3  4
0 1 1 2.5  5
0 2 0   5  6
0 2 0  15  7
0 2 0  15  8
0 2 0  11  9
0 2 1   5 10
0 2 1   4 11
0 2 1 2.2 12
0 3 0 3.2 13
0 3 0  15 14
0 3 0  15 15
0 3 0   7 16
0 3 1 3.2 17
0 3 1 9.4 18
0 3 1  14 19
1 1 0  15 20
1 1 0  15 21
1 1 0   8 22
1 1 1   3 23
1 2 0   9 24
1 2 0  15 25
1 2 0  15 26
1 2 0   4 27
1 2 1  12 28
1 2 1  .5 29
1 3 0   6 30
1 3 0  15 31
1 3 0  15 32
1 3 1   8 33
1 3 1   2 34
1 3 1   4 35
2 1 0  15 36
2 1 0  14 37
2 1 0   8 38
2 1 1   6 39
2 2 0  13 40
2 2 0  15 41
2 2 0   8 42
2 2 0   5 43
2 2 1   1 44
2 2 1  15 45
2 3 0   9 46
2 3 0  15 47
2 3 0  15 48
2 3 1   9 49
2 3 1   6 50
2 3 1   1 51
3 1 0  13 52
3 1 0  10 53
3 1 1   3 54
3 1 1   7 55
3 2 0  15 56
3 2 0  15 57
3 2 0   9 58
3 2 1 6.3 59
3 2 1   7 60
3 2 1   1 61
3 3 0  14 62
3 3 0  15 63
3 3 0  12 64
3 3 0   8 65
3 3 1  11 66
3 3 1  12 67
end

When I try to run stptime with the SMR option, I get the following error message:
"using data not sorted"

Code:

stptime, smr(age_group rate) using("U:\working\unexposed_rates.dta" by(exposure) per(100000)

This is despite the using data definitely being sorted. I also get the same message if I try

Code:

stptime if exposure>0, smr(age_group rate) using("U:\working\unexposed_rates.dta" by(exposure) per(100000)

I read this response that suggests that sorting the using data manually would fix the problem, but that hasn't been the case on my attempts: https://www.statalist.org/forums/for...-using-stptime

Any suggestions as to where I might be going wrong gratefully received.

Thanks

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

22 Mar 2021, 13:23

Stata is rarely wrong about whether data is or is not sorted, and Clyde Schechter is rarely wrong in his advice.

If you run

Code:

describe using "U:\working\unexposed_rates.dta"

just before you run the sptime command, the bottom of the output from describe will have either

Code:

Sorted by: exposure

if it is sorted by exposure, which is your by-variable, or

Code:

Sorted by:

if it is not sorted.
2 likes
Comment
Emily Tweed

Join Date: Jun 2015

Posts: 26
#3

23 Mar 2021, 06:16

Hi

Thanks William - you're right, it hadn't saved the sorting order correctly when I created the file using post.

However, I'm now getting the error "no observations merged, at() option not specified or incorrectly specified" when I run the stptime command using the syntax as below - I haven't specified an at() option as am looking for an overall SMR, though when I try to add this in keeping with the age bands in individual-level and standard data, I still get same message:

Code:

stptime if exposure>0, smr(age_group rate) using("U:\working\unexposed_rates.dta" by(exposure) per(100000)

The structure of the standard population is as follows, in case it helps:

Code:

input byte exposure float(age_group rate) 0 1 52.9 0 2 88.2 0 3 125.7 end

Also, I thought I should be sorting on age group (which is essentially the 'matching' variable between the individual-level data and the standard population), as exposure will be 0 for all rows in the standard population?

Any advice gratefully received.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2417
#4

23 Mar 2021, 07:55

I see a missing right parenthesis at the end of the -using- option:

Code:

using("U:\working\unexposed_rates.dta") by(exposure) ...
Comment
Emily Tweed

Join Date: Jun 2015

Posts: 26
#5

23 Mar 2021, 08:14

Well-spotted Mike - transcription error* on my part I'm afraid, it still doesn't work even with the parenthesis...

(*The secure analysis environment I'm using doesn't allow copy/paste)
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

23 Mar 2021, 10:02

A casual look at stptime.ado suggests that perhaps stptime determines the levels of exposure without taking into account the effect of the if clause, and thus attempts to do whatever it does (sorry, I'm just a guy who trusts Clyde's advice and Stata's assertions about sorting) for observations with exposure==0 for which there are none after applying the if clause.

Perhaps

Code:

keep if exposure>0 stptime, smr(age_group rate) using("U:\working\unexposed_rates.dta") by(exposure) per(100000)

will succeed.
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#7

23 Mar 2021, 11:18

It shouldn't matter, but you don't need the variable exposure in the using file.

I can't help with your specific problem without seeing more details of what you are doing. You could also try -strate- or estimate the SMRs from first principles (i.e., do the merging and calculation of expected rates yourself).

However, it's not clear why you are calculating SMRs. Your analytic approach seems non-standard.

I have individual-level survival data for a cohort of people with my exposure variable of interest coded 0-3. I would like to calculate SMR for exposure groups 1-3 using exposure = 0 as the reference group. I've therefore saved the age group-specific mortality rates in a separate file, sorted by age group.

Why not just use the individual data to estimate the rate ratios for exposure groups 1-3 compared to group 0?

SMRs are typically used when one does not have individual data on the unexposed so uses tabulated rates instead. You have individual data, which you are then using to tabulate rates, which are then used as the denominator in the rate ratio. This seems to be a lot of extra steps for no reason. Or maybe you have a reason that's not clear from your OP? I may be wrong, but it's possible the standard error of the SMR is calculated under the assumption that only the observed count is a random variable and the expected count is fixed and known. That is, by using your approach you are assuming that the variance of the rate among exposure category 0 is 0 and the covariance is zero. That is, your variance estimates may not be correct because you may be erroneously assuming that the rate among exposure group 0 is estimated without random error.
1 like
Comment

Announcement

stptime - SMR, "using data not sorted" error

Comment

Comment

Comment

Comment

Comment

Comment