Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • splitting a string variable into unique dummy variables

    Dear all,

    I have a string-related problem that I was not able to solve so far. I use Stata 13 MP.

    I have string variable "product_code" that, at each date, contains product codes separated by blank spaces
    1 STATA SAS MATLAB
    2 R MATLAB
    3 STATA EXCEL PEN
    I would like to create dummy variables for each possible product code in my dataset so that I can know at each date which product code was available.
    In the example, the dataset I want to obtain looks like
    date product code D_STATA D_SAS D_MATLAB D_R D_EXCEL D_PEN
    1 STATA SAS MATLAB 1 1 1 0 0 0
    2 R MATLAB 0 0 1 1 0 0
    3 STATA EXCEL PEN 1 0 0 0 1 1

    Do you know how I can do that? I tried using -split, but without success.
    Many thanks!!


  • #2
    In this case,

    Code:
     
    foreach s in Stata SAS MATLAB R Excel pen { 
         gen D_`s' = strpos(product_code, "`s'") > 0 
    }
    In practice, watch for lower and upper case and spelling differences.

    Comment


    • #3
      Thank you Nick!!
      Neat implementation indeed.

      Assume however that the list of possible product codes is just too large to be typed manually in the loop.
      I was thinking about using levelsof to get the unique product codes in the dataset, but the fact that multiple product codes are contained in one single cell complicates the matter...

      Do you see any possible solution here?
      Thanks again

      Comment


      • #4
        Uli Kohler and I wrote a paper on this some years ago. http://www.stata-journal.com/sjpdf.h...iclenum=pr0008 I don't want to re-read it to rediscover if we discussed this.

        But you have all the ingredients of an answer already:

        Code:
         
        split product_code 
        
        foreach v in `r(varlist)' {
             levelsof `v', local(this)
             local all : list all | this
        }
        
        mac li
        If your items are not single words, it's a lot messier.


        Comment


        • #5

          Of course, everything is stored in r(varlist) !!!
          Thank you very much again, Nick

          Comment

          Working...
          X