Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create multiple variables from a string variable

    Hello everyone! This is my first post, if I don’t respect some blog rules: sorry!
    I have a simple question about data management.

    I'm working with a dataset in which I have an ID variable that identifies the code of a project and a lot of other descriptive variables. The one that interested me is about the localization of the project at the regional level (nuts code) but a project can refer to more than one nuts code. So in my dataset, I have the duplicates of the ID variables for how many nuts codes are involved in the project. An example:

    ID nuts2 code
    2014BE16RFOP003 BE31
    2014BE16RFOP003 BE32
    2014BE16RFOP003 BE33
    2014BE16RFOP003 BE34
    2014BE16RFOP003 BE35
    2014DE16RFOP012 DED2
    2014DE16RFOP012 DED4
    2014DE16RFOP012 DED5

    My purpose is to have only one row for each ID, so the strategy that I'm trying is to "collapse" by ID and create the variables about the nuts code of the project. The result that I'm imaging to get is something like that:

    ID nuts_code_1 nuts_code_2 nuts_code_3 nuts_code_4 nuts_code_5
    2014BE16RFOP003 BE31 BE32 BE33 BE34 BE35
    2014DE16RFOP012 DED2 DED4 DED5

    I've tried to bypass the problem by running a table command and manipulating the output in Excel to create a new dataset, but I bet it's not the fastest solution (also for the time problems related to the command "collect export")

    Any advice?? Thanks a lot!

  • #2
    Statalist isn't a blog -- it is a web forum -- and the only rules are implicit (don't do anything that would make StataCorp want to ban you): everything else is a request!

    In terms of your question you can get from stated start to desired destination, but the real question is what it implies for your other variables.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str15 id str4 nuts2code
    "2014BE16RFOP003" "BE31"
    "2014BE16RFOP003" "BE32"
    "2014BE16RFOP003" "BE33"
    "2014BE16RFOP003" "BE34"
    "2014BE16RFOP003" "BE35"
    "2014DE16RFOP012" "DED2"
    "2014DE16RFOP012" "DED4"
    "2014DE16RFOP012" "DED5"
    end
    
    . bysort id (nuts2code) : gen id2 = _n
    
    . reshape wide nuts2code, i(id) j(id2)
    (j = 1 2 3 4 5)
    
    Data                               Long   ->   Wide
    -----------------------------------------------------------------------------
    Number of observations                8   ->   2           
    Number of variables                   3   ->   6           
    j variable (5 values)               id2   ->   (dropped)
    xij variables:
                                  nuts2code   ->   nuts2code1 nuts2code2 ... nuts2code5
    -----------------------------------------------------------------------------
    
    . l
    
         +------------------------------------------------------------------------+
         |              id   nuts2c~1   nuts2c~2   nuts2c~3   nuts2c~4   nuts2c~5 |
         |------------------------------------------------------------------------|
      1. | 2014BE16RFOP003       BE31       BE32       BE33       BE34       BE35 |
      2. | 2014DE16RFOP012       DED2       DED4       DED5                       |
         +------------------------------------------------------------------------+

    Comment


    • #3
      Thank you so much, I hadn’t thought about the reshape command and your suggestion worked great!
      To clarify about the "real question" that you suggest: I have no problems with the other variables because this is just one of the steps of a merging process and I will recall those variables later.
      Sorry for the bad description of Statlist, now that I've received my "welcome" I know the rules...

      Comment

      Working...
      X