Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshaping the data based on year (CMC date)

    Dear Statlist users,

    I would like to expand (or to have a long form) of my data such that:

    v395* stands for year of arrival in CMC format.

    If the individual having CASEID " 20010001 06" arrive in year 2016 for example (_v3955 in CMC format), and if there is another year of arrival for that person for example in 2014 (_v3956 in CMC format) and if there is any in 2013 (_v3957 in CMC format), I would like to see that individual like this:


    CASEID Year
    CASEID 20010001 06 2016
    CASEID 20010001 06 2014
    CASEID 20010001 06 2013

    I have used

    gen toexpand = v395*
    expand toexpand
    bysort CASEID
    drop toexpand

    but it did not work unfortunately.


    Thank you very much in advance for any help.





    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str15 CASEID long(_v3955 _v3956 _v3957 _v3958 _v3959 _v3960 _v3961 _v3962 _v3963 _v3964)
    "    20010001 06" 581757      .      .      . . . . . . .
    "    20010001 06" 581848      .      .      . . . . . . .
    "    20010004 02" 585041      .      .      . . . . . . .
    "    20010004 02" 585158      .      .      . . . . . . .
    "    20010008 02" 567047 627477      .      . . . . . . .
    "    20010008 02" 566717 628364      .      . . . . . . .
    "    20010008 02" 566692 628048      .      . . . . . . .
    "    20010009 02" 567241      .      .      . . . . . . .
    "    20010009 02" 566846      .      .      . . . . . . .
    "    20010010 02" 606004      .      .      . . . . . . .
    "    20010010 02" 606045      .      .      . . . . . . .
    "    20010015 02" 546256 584584 627769      . . . . . . .
    "    20010017 03" 611516      .      .      . . . . . . .
    "    20010017 03" 611618      .      .      . . . . . . .
    "    20010017 06" 607456      .      .      . . . . . . .
    "    20010017 06" 607786      .      .      . . . . . . .
    "    20010018 04" 579222 601267      .      . . . . . . .
    "    20010019 04" 577158      .      .      . . . . . . .
    "    20010020 02" 560747 570717 601108      . . . . . . .
    "    20020001 03" 589576      .      .      . . . . . . .
    "    20020002 02" 605364      .      .      . . . . . . .
    "    20020005 02" 594720      .      .      . . . . . . .
    "    20020005 02" 594612      .      .      . . . . . . .
    "    20020009 02" 579058 625475      .      . . . . . . .
    "    20020013 02" 594520      .      .      . . . . . . .
    "    20020014 02" 577655      .      .      . . . . . . .
    "    20020015 02" 572092      .      .      . . . . . . .
    "    20020015 02" 572636      .      .      . . . . . . .
    "    20020017 02" 613266      .      .      . . . . . . .
    "    20020017 02" 613051      .      .      . . . . . . .
    "    20020018 02" 580737      .      .      . . . . . . .
    "    20020019 04" 607988 618427      .      . . . . . . .
    "    20020020 02" 577561      .      .      . . . . . . .
    "    20020020 02" 577905      .      .      . . . . . . .
    "    20030002 02" 545325 592810      .      . . . . . . .
    "    20030002 02" 545114 592962      .      . . . . . . .
    "    20030003 02" 582967      .      .      . . . . . . .
    "    20030003 02" 582139      .      .      . . . . . . .
    "    20030003 02" 582128      .      .      . . . . . . .
    "    20030003 06" 583470      .      .      . . . . . . .
    "    20030003 06" 583513      .      .      . . . . . . .
    "    20030004 02" 595130 606645      .      . . . . . . .
    "    20030005 02" 620806      .      .      . . . . . . .
    "    20030005 02" 620413      .      .      . . . . . . .
    "    20030005 02" 620526      .      .      . . . . . . .
    "    20030005 06" 620358      .      .      . . . . . . .
    "    20030005 09"      .      .      .      . . . . . . .
    "    20030005 09"      .      .      .      . . . . . . .
    "    20030005 14" 580543 633232      .      . . . . . . .
    "    20030006 02" 554035 620208      .      . . . . . . .
    "    20030008 02" 564450      .      .      . . . . . . .
    "    20030008 02" 563586      .      .      . . . . . . .
    "    20030008 02" 564425      .      .      . . . . . . .
    "    20030009 02" 612457 619764 612804 633019 . . . . . .
    "    20030010 12" 567632      .      .      . . . . . . .
    "    20030010 12" 568189      .      .      . . . . . . .
    "    20030010 12" 567302      .      .      . . . . . . .
    "    20030011 02" 556233 578429      .      . . . . . . .
    "    20030012 02" 585784 579831 612714      . . . . . . .
    "    20030014 02" 564987      .      .      . . . . . . .
    "    20030014 02" 565357      .      .      . . . . . . .
    "    20030016 02" 566640      . 618273      . . . . . . .
    "    20030016 02" 566729      . 618654      . . . . . . .
    "    20030017 02" 581834      .      .      . . . . . . .
    "    20030017 02" 581249      .      .      . . . . . . .
    "    20030018 02" 620581      .      .      . . . . . . .
    "    20030018 02" 620875      .      .      . . . . . . .
    "    20030019 02" 604796      .      .      . . . . . . .
    "    20030019 02" 604350      .      .      . . . . . . .
    "    20030020 02" 590104      .      .      . . . . . . .
    "    20030020 02" 589574      .      .      . . . . . . .
    "    20040001 02" 617712      .      .      . . . . . . .
    "    20040001 02" 617630      .      .      . . . . . . .
    "    20040002 02" 573268      .      .      . . . . . . .
    "    20040002 11" 608946      .      .      . . . . . . .
    "    20040004 02" 585907      .      .      . . . . . . .
    "    20040006 02" 586474      .      .      . . . . . . .
    "    20040007 02" 575351      .      .      . . . . . . .
    "    20040008 03" 568256      .      .      . . . . . . .
    "    20040008 03" 568518      .      .      . . . . . . .
    "    20040008 03" 568524      .      .      . . . . . . .
    "    20040009 10" 606263      .      .      . . . . . . .
    "    20040009 10" 606466      .      .      . . . . . . .
    "    20040011 02" 565101      .      .      . . . . . . .
    "    20040011 02" 565181      .      .      . . . . . . .
    "    20040012 02" 580828      .      .      . . . . . . .
    "    20040014 02" 558642      .      .      . . . . . . .
    "    20040015 02" 546775      .      .      . . . . . . .
    "    20040015 02" 546656      .      .      . . . . . . .
    "    20040015 02" 546028      .      .      . . . . . . .
    "    20040016 07" 578989      .      .      . . . . . . .
    "    20040016 07" 579029      .      .      . . . . . . .
    "    20040017 02" 612964 614305      .      . . . . . . .
    "    20040017 02" 613046 613678      .      . . . . . . .
    "    20050002 03" 578711 631264      .      . . . . . . .
    "    20050003 01" 609828      .      .      . . . . . . .
    "    20050004 04" 596621      .      .      . . . . . . .
    "    20050004 07" 595902      .      .      . . . . . . .
    "    20050005 02" 619847 630302      .      . . . . . . .
    "    20050007 02" 601116      .      .      . . . . . . .
    end
    format %td _v3955
    format %td _v3956
    format %td _v3957
    format %td _v3958
    format %td _v3959
    format %td _v3960
    format %td _v3961
    format %td _v3962
    format %td _v3963
    format %td _v3964

  • #2
    I thought I understood CMC dates after Googling, but I clearly don't because I got garbage when I wrote code. So, I didn't get further than

    Code:
    gen long obsno = _n 
    reshape long _v , i(CASEID obsno) j(CMC) 
    drop if _v == . 
    list
    which may help. But I can say that formatting numbers like 581757 as Stata daily dates makes no obvious sense.

    Comment


    • #3
      I think there is something wrong with your data set. I have never before heard of CMC date coding, but I found information at https://mrc-ide.github.io/naomi/reference/cmc_date.html. I also located two other references explaining CMC date coding and they confirm what is said there.

      Based on that information, I would respond to the request in #1 with:
      Code:
      gen `c(obs_t)' obs_no = _n
      reshape long _v, i(obs_no) j(cmc)
      drop if missing(_v)
      
      gen mdate = cmc + tm(1899m12)
      format mdate %tm
      
      gen year = year(dofm(mdate))
      gen month = month(dofm(mdate))

      However, this results in years that are far in the future. But I think that the numbers in your _v* variables are wrong. Within the linked page about CMC dates, it gives the following example (which is not from Stata):
      cmc_date(as.Date("1987-02-11", format = "%Y-%m-%d"))

      #> [1] 1046
      But if dates in 1987 really do have a CMC code a little over 1,000, then CMC dates over 3900 are clearly in the remote future. So I think those numbers in the names of the _v* variables are not actually CMC dates.

      So, if you have a reference that explains how the numbers in the _v* variable names were calculated, perhaps help may be forthcoming. Otherwise, you need to revisit the data management that created those variables and fix it to get correct CMC coding for the dates.

      Added: Your interpretation of those numbers is also puzzling in another way. In #1 you associated CMC 3955 with year 2016, and 3956 with 2014. But why would the CMC code go up when the year goes down, and why would the CMC change by only 1 when there has been a change of 2 years (= 24 months)? Something is wrong here. Those numbers are not correct CMC codes.
      Last edited by Clyde Schechter; 01 Feb 2023, 13:35.

      Comment

      Working...
      X