Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New SSC package: -sccsdta- for generating the dataset for the self-controlled case series method (SCCS)

    Thanks to Kit Baum, there is a new package -sccsdta- at the SSC.

    The SCCS method examines the association between a time-varying exposure and an event outcome.
    The study samples only cases, and it requires that an event has occurred during the observation period.
    The method doesn't compare incidences for cases with incidences for references.
    Instead, it contrasts incidences in periods of risk with incidences in periods where the case is not at risk.
    In this approach, cases serve as their control for fixed confounders.
    It's feasible to adjust for time effects such as age.
    The intervals are marked by risk, time, and individual.
    For each interval, the number of incidences and the width of the interval are determined.

    An introduction is: https://www.medicine.mcgill.ca/epide...l/Tutorial.pdf or https://www.bmj.com/content/354/bmj.i4515

    Below is a reproduction of example 3.1 in the tutorial from the first reference:

    Hospital records indicate an association between MMR vaccination and viral meningitis.
    Specifically, using a certain live mumps vaccine, known as the Urabe strain, has been linked to an increased risk of viral meningitis.
    Instances of viral meningitis were identified in 10 children during their second year of life.

    Event day (day of meningitis) and a day for exposure (day of vaccination) for 10 children are listed in the dataset

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(eventday exday)
    398 458
    413 392
    449 429
    455 433
    472 432
    474 395
    485 470
    524 496
    700 428
    399 716
    end
    The observation period was from the 366th to the 730th day of age.
    Evidence led to the definition of the risk period as the 15th to the 35th day following the administration of the MMR vaccine.
    Age groups were 366 to 547 days and 548 to 730 days.

    Using the sccsdta command:
    Code:
    . sccsdta eventday exday, enter(365) exit(730) riskpoints(14 35) timepoints(547)
    transforms the dataset into (first 3 children):
    Code:
    . list if _id < 4, sepby(_id) noobs
    
      +---------------------------------------------------------------------------------------+
      | eventday   exday   _id   _start   _stop   _nevents      _exgr        _tmgr   _inter~l |
      |---------------------------------------------------------------------------------------|
      |      398     458     1      365     472          1         no   ]365; 547]        107 |
      |      398     458     1      472     493          0   ]14; 35]   ]365; 547]         21 |
      |      398     458     1      493     547          0         no   ]365; 547]         54 |
      |      398     458     1      547     730          0         no   ]547; 730]        183 |
      |---------------------------------------------------------------------------------------|
      |      413     392     2      365     406          0         no   ]365; 547]         41 |
      |      413     392     2      406     427          1   ]14; 35]   ]365; 547]         21 |
      |      413     392     2      427     547          0         no   ]365; 547]        120 |
      |      413     392     2      547     730          0         no   ]547; 730]        183 |
      |---------------------------------------------------------------------------------------|
      |      449     429     3      365     443          0         no   ]365; 547]         78 |
      |      449     429     3      443     464          1   ]14; 35]   ]365; 547]         21 |
      |      449     429     3      464     547          0         no   ]365; 547]         83 |
      |      449     429     3      547     730          0         no   ]547; 730]        183 |
      +---------------------------------------------------------------------------------------+
    We estimate the incidence rate ratio of the risk period versus the no-risk period using a Poisson regression and looking at the i._exgr estimate:
    Code:
    . poisson _nevents i._exgr i._tmgr i._id, exposure(_interval) irr
    We get a better estimate report using the command regmat (ssc install matrixtools - works from version 13.1)
    Code:
    . regmat, o(_nevents) e(i._exgr) a("i._tmgr i._id") eform d(3) label btext(irr) names("") base: poisson, exposure(_interval)
      
    -----------------------------------------------------------------------------------
                                      irr  se(irr)  Lower 95% CI  Upper 95% CI  P value
    -----------------------------------------------------------------------------------
    Events(#)  At risk (no)         1.000                                              
               At risk (]14; 35])  12.037    2.031         3.002        48.259    0.000
    -----------------------------------------------------------------------------------
    Enjoy
    Last edited by Niels Henrik Bruun; 07 May 2024, 23:05.
    Kind regards

    nhb

  • #2
    Thanks to Kit Baum, the package package -sccsdta- was updated at the SSC.
    The help Is extended and rewritten.
    I've also revised the options for setting the dates.
    Hopefully, it is more clear now, what you get.
    Enjoy
    Kind regards

    nhb

    Comment


    • #3
      Hello Neils,
      I used the following code to calculate IRR. I tried to generate similar results using sccsdta but to no avail. Can you please help me implement it using sccsdta?
      Code:
      clear
      input indiv str11 vaccination_dt_str str11 hospitalization_dt_str str11 birthdt_str
      1 "11feb2004" "18mar2004" "01jan2004"
      1 "22nov2004" "28nov2004" "01jan2004"
      2 "11feb2004" "22feb2004" "01jan2004"
      2 "11feb2004" "12feb2004" "01jan2004"
      2 "22nov2004" "28nov2004" "01jan2004"
      3 "11feb2004" "25mar2004" "01jan2004"
      3 "22nov2004" "26nov2004" "01jan2004"
      3 "22nov2004" "30nov2004" "01jan2004"
      3 "22nov2004" "21feb2005" "01jan2004"
      4 "11feb2004" "25may2004" "01jan2004"
      4 "22nov2004" "15dec2004" "01jan2004"
      5 "11feb2004" "26jun2004" "01jan2004"
      end
      
      * Convert date variables
      gen vaccination_dt = date(vaccination_dt, "DMY")
      gen hospitalization_dt = date(hospitalization_dt, "DMY")
      gen birthdt = date(birthdt, "DMY")
      
      gen admtday = hospitalization_dt - birthdt
      //Generate exposure group cut points
      //Risk period : 0-28 days, control period: 35-83 days
      gen cutp40 = vaccination_dt - birthdt -1 //Include vaccination day
      gen cutp41 = cutp40 + 29
      gen cutp42 = cutp40 + 35
      gen cutp43 = cutp40 + 84
      
      drop birthdt vaccination_dt hospitalization_dt
      
      reshape long cutp,i(indiv  admtday) j(type)
      
      sort indiv admtday cutp type
      
      //Number of hospitalization events within each interval
      by indiv : gen int nevents = 1 if admtday > cutp[_n-1]+0.5 & admtday <= cutp[_n]+0.5
      
      // A second ED visit\hospitalization during same risk or control window will be counted as a separate event if it occurred >7 days after the first.;
      sort indiv type nevents admtday cutp 
      
      collapse (sum) nevents, by(indiv  cutp type)
      //Generate intervals
      by indiv : gen int interval = cutp[_n] - cutp[_n-1]
      
      //Exposure groups
      gen exgr = type - 40 if type > 40
      count if exgr >= .
      local nmiss = r(N)
      local nchange = 1
      while `nchange' > 0 {
          by indiv: replace exgr = exgr[_n+1] if exgr >= . & !missing(exgr[_n-1])
          count if exgr >= .
          local nchange = `nmiss' - r(N)
          local nmiss = r(N)
      }
      replace exgr = . if type == 40
      
      replace exgr = 0 if exgr == 3 //Control period
      replace exgr = . if inlist(exgr ,2) //Washout period
      
      drop if inlist(interval ,.,0) | exgr == .
      
      sort indiv cutp exgr
      drop cutp type
      
      generate loginterval = log(interval)
      
      xi: xtpoisson nevents i.exgr, fe i(indiv ) offset(loginterval ) irr
      
      i.exgr            _Iexgr_0-1          (naturally coded; _Iexgr_0 omitted)
      note: 1 group (2 obs) dropped because of all zero outcomes
      
      Iteration 0:   log likelihood = -11.203621  
      Iteration 1:   log likelihood = -8.1165048  
      Iteration 2:   log likelihood = -8.1153088  
      Iteration 3:   log likelihood = -8.1153087  
      
      Conditional fixed-effects Poisson regression    Number of obs     =         16
      Group variable: indiv                           Number of groups  =          4
      
                                                      Obs per group:
                                                                    min =          4
                                                                    avg =        4.0
                                                                    max =          4
      
                                                      Wald chi2(1)      =       4.91
      Log likelihood  = -8.1153087                    Prob > chi2       =     0.0266
      
      ------------------------------------------------------------------------------
           nevents |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          _Iexgr_1 |   5.913793   4.741583     2.22   0.027     1.228532    28.46726
       loginterval |          1  (offset)
      ------------------------------------------------------------------------------
      I tried using sccsdta like below:
      Code:
      gen eventday = hospitalization_dt - birthdt
      gen expday = vaccination_dt - birthdt -1
      gen enday = vaccination_dt - birthdt -1
      sccsdta eventday expday, en(enday)  r(0 28 34) t(84) 
      . xtpoisson _nevents i._exgr , fe i(_rowid) exposure(_interval) irr
      note: 3 groups (9 obs) dropped because of all zero outcomes
      
      Iteration 0:   log likelihood = -8.7278736  
      Iteration 1:   log likelihood = -4.9278887  
      Iteration 2:   log likelihood = -4.8018012  
      Iteration 3:   log likelihood = -4.7742483  
      Iteration 4:   log likelihood = -4.7689667  
      Iteration 5:   log likelihood = -4.7677265  
      Iteration 6:   log likelihood = -4.7674282  
      Iteration 7:   log likelihood = -4.7673685  
      Iteration 8:   log likelihood = -4.7673588  
      Iteration 9:   log likelihood = -4.7673564  
      Iteration 10:  log likelihood = -4.7673559  
      
      Conditional fixed-effects Poisson regression    Number of obs     =         27
      Group variable: _rowid                          Number of groups  =          9
      
                                                      Obs per group:
                                                                    min =          3
                                                                    avg =        3.0
                                                                    max =          3
      
                                                      Wald chi2(2)      =       5.22
      Log likelihood  = -4.7673559                    Prob > chi2       =     0.0734
      
      ------------------------------------------------------------------------------
          _nevents |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             _exgr |
          ]0; 28]  |   6.248765   5.009883     2.29   0.022     1.298231    30.07712
         ]28; 34]  |   4.37e-07   .0013496    -0.00   0.996            0           .
      ln(_inter~l) |          1  (exposure)
      ------------------------------------------------------------------------------

      Comment


      • #4
        Hello Gunnu
        I think your problem is that your start and end days are the same.
        I did set the end day to 365 in the option timepoints.


        Code:
        cls
        . clear
        
        . input indiv str11 vac_dt_str str11 hosp_dt_str str11 birth_dt_str
        
                  indiv   vac_dt_str  hosp_dt_str  birth_dt_~r
          1. 1 "11feb2004" "18mar2004" "01jan2004"
          2. 1 "22nov2004" "28nov2004" "01jan2004"
          3. 2 "11feb2004" "22feb2004" "01jan2004"
          4. 2 "11feb2004" "12feb2004" "01jan2004"
          5. 2 "22nov2004" "28nov2004" "01jan2004"
          6. 3 "11feb2004" "25mar2004" "01jan2004"
          7. 3 "22nov2004" "26nov2004" "01jan2004"
          8. 3 "22nov2004" "30nov2004" "01jan2004"
          9. 3 "22nov2004" "21feb2005" "01jan2004"
         10. 4 "11feb2004" "25may2004" "01jan2004"
         11. 4 "22nov2004" "15dec2004" "01jan2004"
         12. 5 "11feb2004" "26jun2004" "01jan2004"
         13. end
        
        . 
        . * Convert date variables
        . gen vac_days = date(vac_dt_str, "DMY") - date(birth_dt_str, "DMY") - 1
        
        . gen hosp_days = date(hosp_dt_str, "DMY") - date(birth_dt_str, "DMY")
        
        . drop *_str
        
        . sccsdta hosp_days vac_days, enter(0) riskpoints(0 28 34) t(84 365)
        
        The SCCS regression summary table
        
                                         |       IRR       [95%        CI]   P(IRR=1) 
        ---------------------------------+-------------------------------------------
        At risk                          |                                           
                                    ctrl |     1.000          .          .          . 
                                 ]0; 28] |    25.099      5.657    111.363      0.000 
                                ]28; 34] |     0.000      0.000          .      0.996 
        ---------------------------------+-------------------------------------------
        Time group                       |                                           
                       ]enter; enter+84] |     1.000          .          .          . 
                   ]enter+84; enter+365] |     1.617      0.258     10.123      0.607
        Kind regards

        nhb

        Comment


        • #5
          Thanks to Kit Baum, there is an update on the sccsdta at the SSC. The update fixes a minor bug and adds a summary table in the output.
          Kind regards

          nhb

          Comment


          • #6
            Hi Niels, I am using a SCCS study design to investigate the non-specific effect of childhood influenza vaccine against RSV-hospitalizations in children aged 6 months to 5 years. For the purpose of the below code, i am looking at first RSV-hospitalisation and first influenza vaccine dose between 6 months to 5 years. I am trying to use the sccsdta package but keep getting an error at step 7. Any advice would be greatly appreciated.
            Observation period: 01jan2010-31dec2024
            Flu vac risk window: 0-180 days

            Code:
            **STEP 1. Prepare dataset:
            keep child_id bdob_final dod admdate_H fluvacc_date1
            egen child_id_num = group(child_id)

            **STEP 2. Define study entry and exit dates:
            *Study entry = 6 months
            gen entry_date = bdob_final + 180
            replace entry_date = td(01jan2010) if entry_date < td(01jan2010)

            *Study exit = 5 years
            gen exit_date = bdob_final + 1825

            *Exit at death if earlier
            replace exit_date = dod if !missing(dod) & dod < exit_date

            *Study end date cap
            replace exit_date = td(31dec2024) if exit_date > td(31dec2024)

            **STEP 3. Prepare event variable:
            gen t_event = fluvacc_date1

            **STEP 4. Prepare exposure variables:
            * Risk window = 180 days post-vaccination
            gen vac_start1 = fluvacc_date1
            gen vac_end1 = fluvacc_date1 + 180

            *Set missing for unvaccinated children
            replace vac_start1 = . if missing(fluvacc_date1)
            replace vac_end1 = . if missing(fluvacc_date1)

            *Ensure no negative exposure relative to study start
            replace vac_start1 = . if vac_start1 < entry_date
            replace vac_end1 = . if vac_end1 < entry_date

            **STEP 5. Prepare start/end of observation for SCCS:
            gen t_start = entry_date
            gen t_end = exit_date

            **STEP 6. Format all dates as Stata %td:
            format t_event %td
            format t_start %td
            format t_end %td
            format vac_start1 %td
            format vac_end1 %td

            **STEP 7. Create SCCS dataset:
            // list child_id_num vac_start1 vac_end1 t_start t_end if vac_end1 < vac_start1 | vac_start1 < t_start //
            // drop if vac_end1 < vac_start1 | vac_start1 < t_start

            sccsdta t_event t_start t_end id(child_id_num), exp(vac_start1 vac_end1) discrete

            **STEP 8. Run SCCS model:
            sccs, offset(log(_time)) cluster(child_id_num)

            **STEP 9. Add age groups
            gen age_group = .
            replace age_group = 1 if t_start <= t_event & t_event < dob + 365
            replace age_group = 2 if dob + 365 <= t_event & t_event < dob + 730
            replace age_group = 3 if dob + 730 <= t_event & t_event < dob + 1825
            sccs, offset(log(_time)) cluster(child_id_num) age(age_group)

            **STEP 10: Add seasonality
            X

            Comment


            • #7
              Guest: Your code
              sccsdta t_event t_start t_end id(child_id_num), exp(vac_start1 vac_end1) discrete
              uses options that doesn't exist. Please read the help file. I can't use your code steps to find errors. Next time, send the sccsdta code and error message from the result window or the log.
              Last edited by sladmin; 17 Mar 2026, 07:10. Reason: anonymize original poster
              Kind regards

              nhb

              Comment


              • #8
                See response below.
                Last edited by sladmin; 17 Mar 2026, 07:10. Reason: anonymize original poster

                Comment


                • #9
                  -

                  Comment


                  • #10
                    Guest, I can't tell you anything from your code. What I need is a codeline like
                    sccsdta t_event t_exposure, enter(entry_date) exit(exit_date) riskpoints(0 180) timepoints(numbers here) absolutetimepoints
                    , and what is written in the log just after the code.
                    Can you reproduce the error if you use the examples in the help file?
                    Last edited by sladmin; 17 Mar 2026, 07:10. Reason: anonymize original poster
                    Kind regards

                    nhb

                    Comment


                    • #11
                      Hi Niels Henrik Bruun , please see my correct code below:

                      ************************************************** *****************
                      *SCCS #1 - considering first influenza vaccine dose only
                      ************************************************** *****************
                      **STEP 1. Prepare dataset:
                      // ssc install sccsdta, replace
                      keep child_id bdob_final dod admdate_H fluvacc_date1
                      egen child_id_num = group(child_id)

                      **STEP 2. Define study entry and exit dates:
                      *Study entry = 6 months
                      gen entry_date = bdob_final + 180
                      replace entry_date = td(01jan2010) if entry_date < td(01jan2010)

                      *Study exit = 5 years
                      gen exit_date = bdob_final + 1825

                      *Exit at death if earlier
                      replace exit_date = dod if !missing(dod) & dod < exit_date

                      *Study end date cap
                      replace exit_date = td(31dec2023) if exit_date > td(31dec2023)

                      **STEP 3. Prepare event variable:
                      gen t_event = admdate_H

                      **STEP 4. Prepare exposure variable:
                      gen t_exposure = fluvacc_date1

                      **STEP 5:Format variables
                      format t_event t_exposure entry_date exit_date %td

                      **STEP 6: Create SCCS dataset:
                      *Risk window: [0, 180] days post exposure (inclusive end points)
                      *Enter/exit are absolute dates (per person); riskpoints are relative to exposure

                      *-------------------------------------------------------------
                      *AGE splits and RSV seasonality:
                      *-------------------------------------------------------------
                      *Note: RSV seasonality changed during the covid years in Western Australia (2020-2021 we had summer seasonal peaks), so the most reliable way to adjust for each year and season is by adding calendar-season effects:
                      *Force interval cuts at every season boundary using timepoints() with absolutetimepoints, so each SCCS interval never straddles a season change; then Flag each interval as "in-season" vs "off-season" with a binary indicator; and include that indicator in the FE Poisson model.

                      *Build a list of ALL calendar boundary dates where season status changes
                      // 2010–2019, 2022–2023: season = March–September (inclusive) → boundaries at Mar 1 and Oct 1 for each of those years.
                      // 2020: season = December → boundaries at Dec 1 2020 and Jan 1 2021.
                      // 2021: season = Jan–Feb and Dec → boundaries at Jan 1-Mar 1, Dec 1 2021-Jan 1 2022.

                      ***STEP 6A) Build a NUMERIC macro of season boundary dates
                      *(all values are evaluated to numbers before passing to timepoints())
                      local tp

                      *2010–2019 and 2022–2023: boundaries at Mar 1 and Oct 1 each year
                      foreach y in 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2022 2023 {
                      local tp `tp' `=td("01mar`y'")' `=td("01oct`y'")'
                      }

                      *2020: December only -> boundaries at Dec 1, 2020 and Jan 1, 2021
                      local tp `tp' `=td("01dec2020")' `=td("01jan2021")'

                      *2021: Jan–Feb and Dec -> boundaries at Jan 1, Mar 1, Dec 1, 2021 and Jan 1, 2022
                      local tp `tp' `=td("01jan2021")' `=td("01mar2021")' `=td("01dec2021")' `=td("01jan2022")'

                      *(optional) sort, remove duplicates, and validate as a numlist
                      numlist "`tp'", sort
                      local tp `r(numlist)'

                      * Quick sanity check (remove this once you're happy)
                      display as txt "Season endpoints (numeric daily dates): `tp'"

                      ***STEP 6B) Recreate SCCS intervals with:
                      // risk window [0,180] days post-exposure
                      // calendar splits at all season boundaries (absolute dates)
                      // will add age bands after intervalization

                      sccsdta t_event t_exposure, ///
                      enter(entry_date) exit(exit_date) ///
                      riskpoints(0 180) ///
                      timepoints(`tp') absolutetimepoints

                      ***STEP 6C)
                      *Note: Because we used absolutetimepoints for calendar boundaries, we'll construct age bands after we have the intervals (using each interval's midpoint). This avoids mixing relative (age) and absolute (calendar) endpoints in the same timepoints() call.

                      *Midpoint date and age in days
                      gen double _mid = floor((_start + _stop - 1)/2)
                      format _mid %td
                      gen double age_days = _mid - bdob_final

                      *Age bands at 180-365, 366-730, 731-1825 days

                      gen byte age_band = .
                      replace age_band = 1 if age_days <= 365
                      replace age_band = 2 if inrange(age_days, 366, 730)
                      replace age_band = 3 if inrange(age_days, 731, 1825)
                      replace age_band = 4 if age_days > 1825
                      label define ageb 1 "≤365d" 2 "366–730d" 3 "731–1825d" 4 ">1825d"
                      label values age_band ageb


                      *Influenza season indicator using the midpoint (intervals don't straddle boundaries now)
                      gen byte influenza_season = 0

                      foreach y in 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2022 2023 {
                      replace influenza_season = 1 if inrange(_mid, td(01mar`y'), td(30sep`y'))
                      }
                      replace influenza_season = 1 if inrange(_mid, td(01dec2020), td(31dec2020))
                      replace influenza_season = 1 if inrange(_mid, td(01jan2021), td(28feb2021))
                      replace influenza_season = 1 if inrange(_mid, td(01dec2021), td(31dec2021))

                      label define influ 0 "Off-season" 1 "Influenza season"
                      label values influenza_season influ

                      ***STEP D) Fit FE Poisson SCCS with exposure time, age bands, and season
                      xtset child_id_num
                      xtpoisson _nevents i._exgr i.age_band i.influenza_season, fe exposure(_interval) vce(robust) irr

                      Comment


                      • #12
                        Guest, please reread #10. If I don't send the error message, I can't help you. Please do not send a lot of code, which is of no use to me.
                        Last edited by sladmin; 17 Mar 2026, 07:10. Reason: anonymize original poster
                        Kind regards

                        nhb

                        Comment


                        • #13
                          I corrected my code and am no longer getting an error message. Apologies for taking up your time. Please disregard my enquiry.

                          Comment


                          • #14
                            Niels Henrik Bruun can the sccsdta command allow for multiple exposures? e.g. multiple vaccine doses.

                            Comment

                            Working...
                            X