How to collapse two rows

Alberto Osella

Join Date: Apr 2014

Posts: 24
#1

How to collapse two rows

22 Aug 2014, 05:55

I have a filename.dta which contains information on food intake coming from a Food Frequency Questionnaire. Because of a technical error there are two rows for the same subject for the same food but with different food code. Now I'd want to "collapse" the two rows. I'm using Stata 12.1 on W7. Can you help me?
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

22 Aug 2014, 06:00

Show us what the data look like and what you want instead.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#3

22 Aug 2014, 06:02

Welcome to Statalist, Alberto!

Originally posted by Alberto Osella View Post

. Now I'd want to "collapse" the two rows.

Could you please clarify precisely what you would like to do (it's unclear to me -- I don't think you mean you want to use collapse). I suggest that you list, noobs a few sample lines for the relevant variables (including identifier), and post the results. When you do so, please put it within CODE delimiters for legibility and ease of copy/paste. (You get CODE delimiters via the advanced editor -- hit underlined-A and then note the "#" button.)
After you've listed the few lines of sample output, post what you'd like to see the lines changed to.
Comment
Alberto Osella

Join Date: Apr 2014

Posts: 24
#4

23 Aug 2014, 03:29

Thank you Nick and Stephen and sorry. My data look like this:

Id Food_Code Quantity Description
1959 A508 1297 Milk
1959 A278 1396 Beverages
1959 B704 2876 Beverages
1959 B347 1384 Bread

And I want this:

Id Food_Code Quantity Description
1959 A508 1297 Milk
1959 A278+B704 4272 Beverages
1959 B347 1384 Bread

Thank you
Comment

Stephen Jenkins

Join Date: Apr 2014
Posts: 1435

23 Aug 2014, 03:59

Alberto: thank you for the look at your data. But, please, please use CODE delimiters as requested. Your output is not as legible as it could be, and it's difficult to directly paste into Stata's data browser to "play". This makes if more difficult for people to help. With some roundabout use of other software in order to get your data into Stata (which I'd prefer not to have done), I produced the following. Does it give you ideas about how to solve your problem. (NB note CODE delimiting!

)

Code:

 
. list
       +----------------------------------------+
     |   id   food_c~e   quantity   descrip~n |
     |----------------------------------------|
  1. | 1959       A278       1396   Beverages |
  2. | 1959       B704       2876   Beverages |
  3. | 1959       B347       1384       Bread |
  4. | 1959       A508       1297        Milk |
     +----------------------------------------+
  . bys id description (food_code): egen newq = total(quantity)
  . bys id description (food_code): ge fcode2 = food_code[_n-1]
(3 missing values generated)
  . list
       +--------------------------------------------------------+
     |   id   food_c~e   quantity   descrip~n   newq   fcode2 |
     |--------------------------------------------------------|
  1. | 1959       A278       1396   Beverages   4272          |
  2. | 1959       B704       2876   Beverages   4272     A278 |
  3. | 1959       B347       1384       Bread   1384          |
  4. | 1959       A508       1297        Milk   1297          |
     +--------------------------------------------------------+
  . ge code = food_code + " + "  + fcode2
  . list
       +----------------------------------------------------------------------+
     |   id   food_c~e   quantity   descrip~n   newq   fcode2          code |
     |----------------------------------------------------------------------|
  1. | 1959       A278       1396   Beverages   4272                A278 +  |
  2. | 1959       B704       2876   Beverages   4272     A278   B704 + A278 |
  3. | 1959       B347       1384       Bread   1384                B347 +  |
  4. | 1959       A508       1297        Milk   1297                A508 +  |
     +----------------------------------------------------------------------+
  . bys id description (food_code): ge last = _n == _N
  . li
       +-----------------------------------------------------------------------------+
     |   id   food_c~e   quantity   descrip~n   newq   fcode2          code   last |
     |-----------------------------------------------------------------------------|
  1. | 1959       A278       1396   Beverages   4272                A278 +       0 |
  2. | 1959       B704       2876   Beverages   4272     A278   B704 + A278      1 |
  3. | 1959       B347       1384       Bread   1384                B347 +       1 |
  4. | 1959       A508       1297        Milk   1297                A508 +       1 |
     +-----------------------------------------------------------------------------+
  . keep if last
(1 observation deleted)
  . li
       +-----------------------------------------------------------------------------+
     |   id   food_c~e   quantity   descrip~n   newq   fcode2          code   last |
     |-----------------------------------------------------------------------------|
  1. | 1959       B704       2876   Beverages   4272     A278   B704 + A278      1 |
  2. | 1959       B347       1384       Bread   1384                B347 +       1 |
  3. | 1959       A508       1297        Milk   1297                A508 +       1 |
     +-----------------------------------------------------------------------------+

No doubt the information within the new code variable could be tidier (without the redundant "+") but it might do for now. Observe how Stata is intelligent about the handling of string variables in the generate command. Note too that I've tried to provide something that would work with your whole data set (with reference to "id" in the bysort commands). You would need to generalise the approach if you have more than 2 "repeat" observations.

Comment

Alberto Osella

Join Date: Apr 2014

Posts: 24
#6

23 Aug 2014, 04:04

Thank you Stephen. It's work. And sorry. I've not used CODE because I'm learning it and I needed a speed answer. Next time I'll use it
Comment

Announcement