Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to summarize multiple observations per ID?

    Company_ID Bank_type
    1 1
    1 3
    2 5
    3 9
    3 1
    3 2
    4 3
    4 3
    5 1
    6 1
    6 1
    7 4
    7 7
    7 1
    7 2
    7 3
    Hi,

    I have a data set looking like this one. There are different firms, identified by Company_ID. These different firms are customers of different type of banks, some of them have just one bank, some of them more. Now I would like to have a new variable (bank_code) that assigns to each firm exactly one number, that tells me the combination of bank types. So this new variable should be 13 for firm 1, 5 for firm 2, 129 for firm 3 and so on...unfortunately even after looking through the forum and google for some hours I still have no idea how to perform this in Stata. Can the collapse command help, or is there some way with egen? Any help would be greatly appreciated! Thanks a lot!

  • #2
    Something like

    Code:
    bysort Company_ID (Bank_type) : gen Types = string(Bank_type) if _n == 1
    by Company_ID : replace Types = Types[_n-1] + cond(Bank_type != Bank_type[_n-1], string(Bank_type), "") if _n > 1
    by Company_ID: replace Types = Types[_N]
    By the way, you are asked to use a full real name here with given name and family name.
    Last edited by Nick Cox; 21 May 2015, 07:01.

    Comment


    • #3
      Exactly what I was looking for, thanks a lot!

      Comment


      • #4
        Hi,

        I have a similar problem. My data looks like this:

        familyid allHNames

        1 Rick
        1 Matt Jones
        1 Shivani Pandey
        2 AK
        2 Balbir
        3 Rick
        4 Rohan Merkel
        4 Blair Woldorf
        4 Rishabh Jain
        4 BP

        and I want to create a dataset which looks like this:

        familyid allHNames allNames

        1 Rick Rick Matt Jones Shivani Pandey
        1 Matt Jones Rick Matt Jones Shivani Pandey
        1 Shivani Pandey Rick Matt Jones Shivani Pandey
        2 AK AK Balbir
        2 Balbir AK Balbir
        3 Rick Rick
        4 Rohan Merkel Rohan Merkel Blair Woldorf Rishabh Jain BP
        4 Blair Woldorf Rohan Merkel Blair Woldorf Rishabh Jain BP
        4 Rishabh Jain Rohan Merkel Blair Woldorf Rishabh Jain BP
        4 BP Rohan Merkel Blair Woldorf Rishabh Jain BP



        I used the above code like this:

        sort familyid allHNames

        bysort familyid(allHNames): gen allNames = string(allHNames) if _n == 1
        by familyid : replace allNames = allNames[_n-1] + cond(allHNames != allHNames[_n-1], string(allHNames), "") if _n > 1
        by familyid: replace allNames = allNames[_N]


        But I get an error saying "type mismatch" just after the "bysort.." command line. I appreciate any help here!

        Comment


        • #5
          Code:
          by familyid : replace allNames = allNames[_n-1] + cond(allHNames != allHNames[_n-1], allHNames, "") if _n > 1

          Comment

          Working...
          X