Hi,
I am trying to convert some columns of a long dataset into wide format.
the data looks somewhat like below:
id date sale_amt discount_yn
1 01jan2013 50 y
1 02jan2013 60 n
2 01jan2013 30 y
2 02jan2013 20 n
3 01jan2013 40 y
3 02jan2013 15 n
I want to convert the discount_yn column into wide format, so that when I collapse the discount_yn column, I can retrieve the sum number of discounts each id received per month.
Ideally, I want the data to look somewhat like this:
id date sale_amt discount_y discount_n
1 01jan2013 50 1 0
1 02jan2013 60 0 1
2 01jan2013 30 1 0
2 02jan2013 20 0 1
3 01jan2013 40 1 0
3 02jan2013 15 0 1
Then,
collapse (sum) discount_y, by(id)
I thought the solution is to use 'reshape' command,
so I tried
reshape wide discount_yn, i(id) j(date)
but the result created columns discount_01jan2013 discount_02jan2013............. and so on.
As far as I can tell, the reshape command won't automatically separate the column discount_yn to discount_y and discount_n by itself, I think I am missing something here.
Could anyone provide some insights to help me get through this issue?
Thanks in advance.
I am trying to convert some columns of a long dataset into wide format.
the data looks somewhat like below:
id date sale_amt discount_yn
1 01jan2013 50 y
1 02jan2013 60 n
2 01jan2013 30 y
2 02jan2013 20 n
3 01jan2013 40 y
3 02jan2013 15 n
I want to convert the discount_yn column into wide format, so that when I collapse the discount_yn column, I can retrieve the sum number of discounts each id received per month.
Ideally, I want the data to look somewhat like this:
id date sale_amt discount_y discount_n
1 01jan2013 50 1 0
1 02jan2013 60 0 1
2 01jan2013 30 1 0
2 02jan2013 20 0 1
3 01jan2013 40 1 0
3 02jan2013 15 0 1
Then,
collapse (sum) discount_y, by(id)
I thought the solution is to use 'reshape' command,
so I tried
reshape wide discount_yn, i(id) j(date)
but the result created columns discount_01jan2013 discount_02jan2013............. and so on.
As far as I can tell, the reshape command won't automatically separate the column discount_yn to discount_y and discount_n by itself, I think I am missing something here.
Could anyone provide some insights to help me get through this issue?
Thanks in advance.
Comment