Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wide to Long Reshaping





    Hi all,

    I have data in the following format:

    Code:
     
     * Example generated by -dataex-. For more info, type help dataex clear input long(mci cl_id) float(yearmo_call servicemonth1) 1000  992094 720 636 1000 1179866 708 707 1000 1036729 708 645 1000  854614 696 636 1000  854614 696 636 1000  864615 624 681 1000  908069 708 636 1000  871369 672 636 1000 1125496 684 636 1000  881157 600 636 1000  833150 672 636  1000  820983 588 636 1000  963879 624 636 end format %tm yearmo_call format %tm servicemonth1
    You can think of mci and cl_id as a sort of identifier in the data. Yearmo_call is the year and month that a call comes in. Servicemonth1 is the month a service occurs....in my data there are variables for servicemonth1-servicemonth95. I would like to create a long data set that has an observation for each mci at the yearmonth level. So, even if they do not get a call or service in that yearmonth, I just have an observation that shows 0s for that month.

    I would like something similar to this:

    MCI YearMonthVariable CallOccurred ServiceOccurred
    1 2014m1 0 0
    1 2014m2 1 0
    1 2014m3 0 0

    And so on and so forth for every ID and every yearmonth. Any ideas?
    --Collin

  • #2
    You can think of mci and cl_id as a sort of identifier in the data.
    No, you can't. Look at your fourth and fifth observation in the example data: both have the same values for mci and cl_id. Duplicates like that need to be fixed before you can do what you are asking for with this data.

    On top of that, the example data you show is not really helpful for developing and testing code because, although you say your real data contains 95 servicemonth* variables, your example contains only 1--which is a qualitative difference that obscures much of the reshaping problem.

    I have modified your example to include a second servicemonth variable, and also to eliminate one of the duplicate observations on mci cl_id.

    After you fix your data, I believe the following code will do what you ask:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(mci cl_id) float(Yearmo_call servicemonth1 servicemonth2)
    1000  992094 720 636 636
    1000 1179866 708 653 707
    1000 1036729 708 641 645
    1000  854614 696 636 644
    1000  864615 624 646 681
    1000  908069 708 634 636
    1000  871369 672 636 638
    1000 1125496 684 636 656
    1000  881157 600 636 714
    1000  833150 672 636 655
    1000  820983 588 636 668
    1000  963879 624 636 640
    end
    format %tm Yearmo_call
    format %tm servicemonth1
    format %tm servicemonth2
    
    rename Yearmo_call servicemonth0
    reshape long servicemonth, i(mci cl_id) j(_j)
    rename servicemonth month
    
    by mci cl_id month, sort: egen byte call_occurred = max(_j == 0)
    by mci cl_id month: egen byte service_occurred = max(_j != 0)
    collapse (first) call_occurred service_occurred, by(mci cl_id month)
    egen long identifier = group(mci cl_id)
    xtset identifier month
    tsfill, full
    mvencode *_occurred, mv(0) override
    by identifier (mci), sort: replace mci = mci[1]
    by identifier (cl_id), sort: replace cl_id = cl_id[1]
    drop identifier

    Comment


    • #3
      [QUOTE=Collin James;n1630375]



      Hi all,

      I have

      Comment

      Working...
      X