Hello everyone,
I am using Stata 13.1. I have a large dataset and I am attempting to use reshape to get it into a format in which I can run further analysis. Currently, I have a "customer" variable with roughly 3.3 million observations, as well as "unit_sold" and "age" variables. For each customer, variables are assigned based on whether a unit was sold in a particular year with a 1, and the customer's age in that year (see code below). Ultimately I would like to sum the units sold for each single year of age, by year.
Here is the original form of my dataset:
I ran the command
I now have a single row for each customer (which I can ultimately drop), and columns for individual years. However, I've been struggling to run another reshape long command combined with collapse, in order to get my data into this form:
When I run
Stata generates a "_j" variable that contains all years in the dataset for each customer, and then recreates the table with a "1" to indicate a unit sold for a particular year in EACH variable column. The result being, that if I run
I end up with astronomically high numbers.
Any guidance would be greatly appreciated. I am a new Stata user and have tried to do as much research as possible before posting. Thanks in advance.
I am using Stata 13.1. I have a large dataset and I am attempting to use reshape to get it into a format in which I can run further analysis. Currently, I have a "customer" variable with roughly 3.3 million observations, as well as "unit_sold" and "age" variables. For each customer, variables are assigned based on whether a unit was sold in a particular year with a 1, and the customer's age in that year (see code below). Ultimately I would like to sum the units sold for each single year of age, by year.
Here is the original form of my dataset:
Code:
+----------------------------------+ customer unit_s~d age year ---------------------------------- 1. 2550 1 38 2001 2. 2550 1 39 2002 3. 2558 1 64 2001 4. 2558 1 66 2003 5. 2558 1 67 2004 ---------------------------------- 6. 3486 1 29 2002 7. 3486 1 30 2003 +----------------------------------+
Code:
reshape wide unit_sold age, i (customer) j (year)
Code:
+-------------------------------------------------------------------------+ customer u_s~2000 age2000 u_s~2001 age2001 u_s~2002 age2002 ------------------------------------------------------------------------- 1. 4781 1 25 . . 1 27 2. 5221 1 33 1 34 1 35 3. 84612 1 36 1 37 1 38 4. 94675 1 48 1 49 1 50 +-------------------------------------------------------------------------+
Code:
+--------------------------------------+ | age uni~2001 uni~2002 uni~2003 | |--------------------------------------| 1. | 16 8465 7548 7201 | 2. | 17 9682 9284 8621 | 3. | 18 3784 4551 4620 | 4. | 19 7986 7496 6633 | 5. | 20 7845 8121 8305 | +--------------------------------------+
Code:
reshape long age, i (customer)
Code:
collapse (sum) unit_sold*, by (age)
Any guidance would be greatly appreciated. I am a new Stata user and have tried to do as much research as possible before posting. Thanks in advance.
Comment