Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape matched case control data from wide to long format

    Hello,

    I have a case control study with 59 cases matched 1:3 with controls on birth year. The match was completed adapting code from threads in this forum (e.g. "Matching cases and controls based age and gender") code below. The data is now in wide format where there are a total of 177 observations with three entries per case with its corresponding control. There are a total of 300 variables in the wide format (half for the case (e.g. m_age, i_bw, etc), half for the control (denoted by _ctrl; e.g. m_age_ctrl, i_bw_ctrl, etc)

    The wide format is great for mcc analysis but I am also interested in using clogit and I cannot figure out how to reshape the data into a long format to allow this. I have found examples of how to reshape case control studies from long to wide but not the other way around. I am familiar with the reshape command and have used this for longitudinal studies etch but it seems less clear to me how to do this for case control and I am not have success with the code I am trying.

    As an example, I have tried reshape long m_age* i_bw*, i(record_id) j(casecon) but receive the following error message:

    no xij variables found
    You typed something like reshape wide a b, i(i) j(j).
    reshape looked for existing variables named a# and b# but could not find any. Remember this picture:

    long wide
    +---------------+ +------------------+
    | i j a b | | i a1 a2 b1 b2 |
    |---------------| <--- reshape ---> |------------------|
    | 1 1 1 2 | | 1 1 3 2 4 |
    | 1 2 3 4 | | 2 5 7 6 8 |
    | 2 1 5 6 | +------------------+
    | 2 2 7 8 |
    +---------------+

    long to wide: reshape wide a b, i(i) j(j) (j existing variable)
    wide to long: reshape long a b, i(i) j(j) (j new variable)
    r(111);

    I'd appreciate any advice -- should I match in a different way that puts it in long format from the beginning?

    Thank you in advance.

    Matching code:

    // READ IN DATA FILE OF COMBINED CASES & CONTROLS
    set seed 1234 // OR YOUR FAVORITE SEED

    // GENERATE AGE GROUPS (MODIFY LIMITS AS APPROPRIATE TO DATA)
    gen byte year_group = 1 if yearbirth==2008
    replace year_group = 2 if yearbirth==2009
    replace year_group = 3 if yearbirth==2010
    replace year_group = 4 if yearbirth==2011
    replace year_group = 5 if yearbirth==2012
    replace year_group = 6 if yearbirth==2013
    replace year_group = 7 if yearbirth==2014
    replace year_group = 8 if yearbirth==2015
    replace year_group = 9 if yearbirth==2016
    replace year_group = 10 if yearbirth==2017

    gen double shuffle = runiform() // TO RANDOMIZE MATCH SELECTIONS

    // FORM A FILE OF CONTROLS ONLY
    preserve
    keep if case_control == 0
    // ASSIGN A PRIORITY FOR MATCHING WITHIN EACH YEAR_GROUP COMBINATION
    // IN BATCHES OF (UP TO) THREE
    by year_group (shuffle), sort: gen int priority = floor((_n-1)/3) + 1
    drop shuffle
    // RENAME VARIABLES TO AVOID CLASH
    rename * *_ctrl
    foreach x in year_group priority {
    rename `x'_ctrl `x'
    }
    tempfile controls
    save `controls'

    // NOW MAKE A FILE OF CASES
    restore
    keep if case_control == 1
    // AGAIN PRIORITIZE FOR MATCHING
    by year_group (shuffle), sort: gen int priority = _n
    drop shuffle
    // MERGE WITH CONTROLS
    merge 1:m year_group priority using `controls', keep(master match)
    Last edited by Anne-Marie Rick; 15 Mar 2019, 11:58.

  • #2
    -reshape- doesn't want wildcards. Also, I don't know if the variable suffix can be null, as it is for your cases. I'd suggest the following:
    Code:
    rename VarListOfCaseVariables =_case
    reshape long m_age i_bw etc., i(YourMatchIDVariable) j(CC) string
    gen byte CaseVsCtl = strpos(CC, "case")

    Comment


    • #3
      There are two problems. One is that your variables are not well-named for -reshape long-, and the other is that you are not using the correct syntax in your -reshape- command.

      In order for data to be reshaped long, the variables you will be reshaping need to have a name that looks like stem_suffix.* Your m_age_ctrl qualifies, but m_age does not because it has only the stem and no suffix. So all your case variables need to be renamed adding _case (or some similar distinctive suffix) to them.

      As for the -reshape- syntax, what you must list after -reshape long- is not the names of the variables to be reshaped, it is the list of the stems only. In particular, the use of wildcards here is an invitation to failure. So after you rename the variables as suggested in the preceding paragraph, the syntax then looks like
      Code:
      reshape long m_age i_bw /*etc.--list all stems*/, i(recordid) j(case_control) string
      *This is not literally true. The "suffix" can be a prefix or even an infix, and there is a special @ operator which deals with that, but your life is generally easier if you use the stem-suffix naming scheme so you don't have to deal with @.

      Edit: Crossed with #2, where Mike Lacy says the same things so much more succinctly.

      Comment


      • #4
        Perhaps the first time in my life I was not verbose.

        Comment


        • #5
          Thank you to you both. This did the trick. I appreciate the help.

          Comment

          Working...
          X