Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create dictionary file from SEER that Stata can read?

    Hi

    I am very new to this, so hope the question is making sense.
    From SEER, i was able to export a text file with data and a dictionary file. This dictionary file is .dic file and state does not recognize it. It seems like stata needs a .dct file.
    Has anyone experience in either converting a .dic file into a .dct file or being able to directly get from SEER*stat the right file format?
    best
    Arnaud

  • #2
    I would guess that the subset of people on Stata who have used ".dic" files and SEER is not null, but is small. I think you'd improve your chances of getting help if you could post an example and/or description of the .dic dictionary file format. If .dic files are just text descriptions of text data layout, someone with a little programming skill in text processing, rather than specific experience, might solve your problem more quickly than waiting for an experienced person to take interest.

    Comment


    • #3
      Hi Mike,
      Many thanks.
      have copy/pasted below an example of the .dic file from SEER:
      best
      Arnaud


      [System]
      Output filename=C:\Test_111821_Lymphoma.txt
      Matrix filename=C:\Test_111821_Lymphoma.slm
      Database name=Incidence - SEER Research Plus Data, 18 Registries, Nov 2020 Sub (2000-2018) - Linked To County Attributes - Total U.S., 1969-2019 Counties

      [Session Options]
      Type=Survival

      [Export Options]
      GZipped=false
      Variable format=labels
      File format=DOS/Windows
      Field delimiter=tab
      Missing character=space
      Fields with delimiter in quotes=true
      Remove thousands separators=true
      Variable names included=false
      Column Variables as Stats=false

      [Variables]
      Var1Name=ICD-O-3 Hist/behav, malignant
      Var1DisplayType=Unformatted
      Var2Name=Sex
      Var2DisplayType=Unformatted
      Var3Name=Year of diagnosis
      Var3DisplayType=Unformatted
      Var4Name=Race recode (W, B, AI, API)
      Var4DisplayType=Unformatted
      Var5Name=Origin recode NHIA (Hispanic, Non-Hisp)
      Var5DisplayType=Unformatted
      Var6Name=Chemotherapy recode (yes, no/unk)
      Var6DisplayType=Unformatted
      Var7Name=Marital status at diagnosis
      Var7DisplayType=Unformatted
      Var8Name=Vital status recode (study cutoff used)
      Var8DisplayType=Unformatted

      [Format=ICD-O-3 Hist/behav, malignant]
      7996=Benign
      7997=Borderline
      7998=In situ
      3=8000/3: Neoplasm, malignant
      7=8001/3: Tumor cells, malignant
      11=8002/3: Malignant tumor, small cell type
      15=8003/3: Malignant tumor, giant cell type
      .....

      [Format=Sex]
      1=Male
      2=Female

      [Format=Year of diagnosis]
      200=2000
      201=2001
      202=2002
      .......

      Comment


      • #4
        From seeing just this piece of your .dic file, I have changed my mind. I now think that translating this file into a .dct file can reliably be done only by a person who has deeper knowledge of the .dic scheme. Although the "Field delimiter=tab" item makes me think that the data is just in a tab-delimited text file, the method of listing variables and so forth shown in your example is not transparent to me. Maybe someone else here will have a more helpful idea, or perhaps some more investigation on the SEER data site (about which I know nothing) would give more details, or, better yet, give an option to obtain the data and dictionary in a more well-known format.
        Last edited by Mike Lacy; 18 Nov 2021, 13:44.

        Comment


        • #5
          Hello Mike, many thanks for taking the time. I will keep digging and post any solution I find !

          Comment


          • #6
            Maybe there's a workaround in which you can export to a file format with labels that Stata can read, like SPSS or SAS, or can be converted to Stata format using a third-party program like Stat/Transfer?
            Code:
            . help import
            David Radwin
            Senior Researcher, California Competes
            californiacompetes.org
            Pronouns: He/Him

            Comment

            Working...
            X