Wide to Long Reshaping

Collin James

Join Date: Sep 2021

Posts: 107
#1

Wide to Long Reshaping

05 Oct 2021, 08:00

Hi all,

I have data in the following format:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long(mci cl_id) float(yearmo_call servicemonth1) 1000 992094 720 636 1000 1179866 708 707 1000 1036729 708 645 1000 854614 696 636 1000 854614 696 636 1000 864615 624 681 1000 908069 708 636 1000 871369 672 636 1000 1125496 684 636 1000 881157 600 636 1000 833150 672 636 1000 820983 588 636 1000 963879 624 636 end format %tm yearmo_call format %tm servicemonth1

You can think of mci and cl_id as a sort of identifier in the data. Yearmo_call is the year and month that a call comes in. Servicemonth1 is the month a service occurs....in my data there are variables for servicemonth1-servicemonth95. I would like to create a long data set that has an observation for each mci at the yearmonth level. So, even if they do not get a call or service in that yearmonth, I just have an observation that shows 0s for that month.

I would like something similar to this:

MCI YearMonthVariable CallOccurred ServiceOccurred
1 2014m1 0 0
1 2014m2 1 0
1 2014m3 0 0

And so on and so forth for every ID and every yearmonth. Any ideas?
--Collin
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

05 Oct 2021, 09:47

You can think of mci and cl_id as a sort of identifier in the data.

No, you can't. Look at your fourth and fifth observation in the example data: both have the same values for mci and cl_id. Duplicates like that need to be fixed before you can do what you are asking for with this data.

On top of that, the example data you show is not really helpful for developing and testing code because, although you say your real data contains 95 servicemonth* variables, your example contains only 1--which is a qualitative difference that obscures much of the reshaping problem.

I have modified your example to include a second servicemonth variable, and also to eliminate one of the duplicate observations on mci cl_id.

After you fix your data, I believe the following code will do what you ask:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long(mci cl_id) float(Yearmo_call servicemonth1 servicemonth2)
1000  992094 720 636 636
1000 1179866 708 653 707
1000 1036729 708 641 645
1000  854614 696 636 644
1000  864615 624 646 681
1000  908069 708 634 636
1000  871369 672 636 638
1000 1125496 684 636 656
1000  881157 600 636 714
1000  833150 672 636 655
1000  820983 588 636 668
1000  963879 624 636 640
end
format %tm Yearmo_call
format %tm servicemonth1
format %tm servicemonth2

rename Yearmo_call servicemonth0
reshape long servicemonth, i(mci cl_id) j(_j)
rename servicemonth month

by mci cl_id month, sort: egen byte call_occurred = max(_j == 0)
by mci cl_id month: egen byte service_occurred = max(_j != 0)
collapse (first) call_occurred service_occurred, by(mci cl_id month)
egen long identifier = group(mci cl_id)
xtset identifier month
tsfill, full
mvencode *_occurred, mv(0) override
by identifier (mci), sort: replace mci = mci[1]
by identifier (cl_id), sort: replace cl_id = cl_id[1]
drop identifier

Comment

Collin James

Join Date: Sep 2021

Posts: 107
#3

06 Oct 2021, 15:57

[QUOTE=Collin James;n1630375]

Hi all,

I have
Comment

Announcement