Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Original data source for colon.dta in strs

    Hello! everyone,

    I use "colon.dta" in strs to estimate relative survival in my job, but the difficulty is that I can't find the original data source of "colon.dta". I need to verify the original data source in my job.
    So, anyone can help me or offer some clues?

    Here is the code from the work of Paul W. Dickman, et al in my job:
    •drop _all
    •use "colon.dta"
    •generate id=_n
    •stset exit, origin(dx) failure(status==1 2) id(id) scale(365.24)
    •generate long potfu = date("31/12/1995", "DMY")
    •strs using "popmort.dta" if yydx < 1981, breaks(0(1)15) mergeby(_year sex _age) list(n d w cr_e1 cr_e2 cr_hak cns_pp) ederer1 potfu(potfu) pohar savgroup(fig1sex,replace)

    I first found this database from "http://pauldickman.com/data/colon.dta", but it is obviously not the original data source. I believe the colon.dta should come from "https://cancerregistry.fi/information/" (Finnish Cancer Registry) or "https://www.stat.fi/index_en.html" (Finnish Statistic). However, I didn't find it. So I want to ask if anyone is familiar with the colon.dta or these websites?
    Thanks very much!

    reference:
    Dickman PW, Coviello EJ. Estimating and modeling relative survival [J]. Stat J, 2015, 15(1): 186-215.

  • #2
    You could write to Professor Dickman at the email address given on the contact page of the the website you found the database on.

    In the Stata Journal paper cited, he confirms that the data was "provided by the Finnish Cancer Registry on patients diagnosed with colon carcinoma in Finland, 1975–1994."

    If you run
    Code:
    notes list _all
    after loading the dataset into memory, you will read
    Code:
    dx:
      1.  These data are based on real data from a national cancer registry, but the original data
          have been permuted so that the day and month of diagnosis are random but the survival time
          is close to the actual survival time.
    
    exit:
      1.  These data are based on real data from a national cancer registry, but the original data
          have been permuted so that the day and month of exit for individuals who die have been
          permuted.
    This suggests to me that his data was provided by the Registry in response to a request for individual-level data rather than the summary data that is available on the web site.

    Comment


    • #3
      William Lisowski, thanks so much!

      You have offered me a feasible way, and the information from the code also helps me a lot. The data we used have been permuted. Maybe I can use the URL directly or I will send an email to Professor Dickman for more information.

      Comment

      Working...
      X