Hello everyone,
I am using Stata 13.1. I have a large dataset and I am attempting to use reshape to get it into a format in which I can run further analysis. Currently, I have a "customer" variable with roughly 3.3 million observations, as well as "unit_sold" and "age" variables. For each customer, variables are assigned based on whether a unit was sold in a particular year with a 1, and the customer's age in that year (see code below). Ultimately I would like to sum the units sold for each single year of age, by year.
Here is the original form of my dataset:
I ran the command
I now have a single row for each customer (which I can ultimately drop), and columns for individual years. However, I've been struggling to run another reshape long command combined with collapse, in order to get my data into this form:
When I run
Stata generates a "_j" variable that contains all years in the dataset for each customer, and then recreates the table with a "1" to indicate a unit sold for a particular year in EACH variable column. The result being, that if I run
I end up with astronomically high numbers.
Any guidance would be greatly appreciated. I am a new Stata user and have tried to do as much research as possible before posting. Thanks in advance.
I am using Stata 13.1. I have a large dataset and I am attempting to use reshape to get it into a format in which I can run further analysis. Currently, I have a "customer" variable with roughly 3.3 million observations, as well as "unit_sold" and "age" variables. For each customer, variables are assigned based on whether a unit was sold in a particular year with a 1, and the customer's age in that year (see code below). Ultimately I would like to sum the units sold for each single year of age, by year.
Here is the original form of my dataset:
Code:
+----------------------------------+
customer unit_s~d age year
----------------------------------
1. 2550 1 38 2001
2. 2550 1 39 2002
3. 2558 1 64 2001
4. 2558 1 66 2003
5. 2558 1 67 2004
----------------------------------
6. 3486 1 29 2002
7. 3486 1 30 2003
+----------------------------------+
Code:
reshape wide unit_sold age, i (customer) j (year)
Code:
+-------------------------------------------------------------------------+
customer u_s~2000 age2000 u_s~2001 age2001 u_s~2002 age2002
-------------------------------------------------------------------------
1. 4781 1 25 . . 1 27
2. 5221 1 33 1 34 1 35
3. 84612 1 36 1 37 1 38
4. 94675 1 48 1 49 1 50
+-------------------------------------------------------------------------+
Code:
+--------------------------------------+
| age uni~2001 uni~2002 uni~2003 |
|--------------------------------------|
1. | 16 8465 7548 7201 |
2. | 17 9682 9284 8621 |
3. | 18 3784 4551 4620 |
4. | 19 7986 7496 6633 |
5. | 20 7845 8121 8305 |
+--------------------------------------+
Code:
reshape long age, i (customer)
Code:
collapse (sum) unit_sold*, by (age)
Any guidance would be greatly appreciated. I am a new Stata user and have tried to do as much research as possible before posting. Thanks in advance.

Comment