Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to collapse two rows

    I have a filename.dta which contains information on food intake coming from a Food Frequency Questionnaire. Because of a technical error there are two rows for the same subject for the same food but with different food code. Now I'd want to "collapse" the two rows. I'm using Stata 12.1 on W7. Can you help me?

  • #2
    Show us what the data look like and what you want instead.

    Comment


    • #3
      Welcome to Statalist, Alberto!
      Originally posted by Alberto Osella View Post
      . Now I'd want to "collapse" the two rows.
      Could you please clarify precisely what you would like to do (it's unclear to me -- I don't think you mean you want to use collapse). I suggest that you list, noobs a few sample lines for the relevant variables (including identifier), and post the results. When you do so, please put it within CODE delimiters for legibility and ease of copy/paste. (You get CODE delimiters via the advanced editor -- hit underlined-A and then note the "#" button.)
      After you've listed the few lines of sample output, post what you'd like to see the lines changed to.

      Comment


      • #4
        Thank you Nick and Stephen and sorry. My data look like this:

        Id Food_Code Quantity Description
        1959 A508 1297 Milk
        1959 A278 1396 Beverages
        1959 B704 2876 Beverages
        1959 B347 1384 Bread

        And I want this:

        Id Food_Code Quantity Description
        1959 A508 1297 Milk
        1959 A278+B704 4272 Beverages
        1959 B347 1384 Bread

        Thank you

        Comment


        • #5
          Alberto: thank you for the look at your data. But, please, please use CODE delimiters as requested. Your output is not as legible as it could be, and it's difficult to directly paste into Stata's data browser to "play". This makes if more difficult for people to help. With some roundabout use of other software in order to get your data into Stata (which I'd prefer not to have done), I produced the following. Does it give you ideas about how to solve your problem. (NB note CODE delimiting! )
          Code:
           
          . list
                 +----------------------------------------+
               |   id   food_c~e   quantity   descrip~n |
               |----------------------------------------|
            1. | 1959       A278       1396   Beverages |
            2. | 1959       B704       2876   Beverages |
            3. | 1959       B347       1384       Bread |
            4. | 1959       A508       1297        Milk |
               +----------------------------------------+
            . bys id description (food_code): egen newq = total(quantity)
            . bys id description (food_code): ge fcode2 = food_code[_n-1]
          (3 missing values generated)
            . list
                 +--------------------------------------------------------+
               |   id   food_c~e   quantity   descrip~n   newq   fcode2 |
               |--------------------------------------------------------|
            1. | 1959       A278       1396   Beverages   4272          |
            2. | 1959       B704       2876   Beverages   4272     A278 |
            3. | 1959       B347       1384       Bread   1384          |
            4. | 1959       A508       1297        Milk   1297          |
               +--------------------------------------------------------+
            . ge code = food_code + " + "  + fcode2
            . list
                 +----------------------------------------------------------------------+
               |   id   food_c~e   quantity   descrip~n   newq   fcode2          code |
               |----------------------------------------------------------------------|
            1. | 1959       A278       1396   Beverages   4272                A278 +  |
            2. | 1959       B704       2876   Beverages   4272     A278   B704 + A278 |
            3. | 1959       B347       1384       Bread   1384                B347 +  |
            4. | 1959       A508       1297        Milk   1297                A508 +  |
               +----------------------------------------------------------------------+
            . bys id description (food_code): ge last = _n == _N
            . li
                 +-----------------------------------------------------------------------------+
               |   id   food_c~e   quantity   descrip~n   newq   fcode2          code   last |
               |-----------------------------------------------------------------------------|
            1. | 1959       A278       1396   Beverages   4272                A278 +       0 |
            2. | 1959       B704       2876   Beverages   4272     A278   B704 + A278      1 |
            3. | 1959       B347       1384       Bread   1384                B347 +       1 |
            4. | 1959       A508       1297        Milk   1297                A508 +       1 |
               +-----------------------------------------------------------------------------+
            . keep if last
          (1 observation deleted)
            . li
                 +-----------------------------------------------------------------------------+
               |   id   food_c~e   quantity   descrip~n   newq   fcode2          code   last |
               |-----------------------------------------------------------------------------|
            1. | 1959       B704       2876   Beverages   4272     A278   B704 + A278      1 |
            2. | 1959       B347       1384       Bread   1384                B347 +       1 |
            3. | 1959       A508       1297        Milk   1297                A508 +       1 |
               +-----------------------------------------------------------------------------+
          No doubt the information within the new code variable could be tidier (without the redundant "+") but it might do for now. Observe how Stata is intelligent about the handling of string variables in the generate command. Note too that I've tried to provide something that would work with your whole data set (with reference to "id" in the bysort commands). You would need to generalise the approach if you have more than 2 "repeat" observations.

          Comment


          • #6
            Thank you Stephen. It's work. And sorry. I've not used CODE because I'm learning it and I needed a speed answer. Next time I'll use it

            Comment

            Working...
            X