Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting multiple values in a variable - special delimeter character problem

    Hi all,
    I have an Excel data set for patents that I imported into STATA. In the ID variable of a patent (each patent corresponds to a row) I have multiple values (firm IDs) as the patents are granted to x number of applicants. The data is str144. firmIDs as string variables. If there are more than one firms involved, the IDs seem to be combined = no delimeter. I need to split each firm (and then reshape to create a dataset in which one row represent patent-firm. So if there is onyl one patent with 3 firms, I will have a dataset with 3 records. This is not the problem. The problem is the delimeter so that I can parse ands split the firmID data.
    In the Excel each firm in one patent record shows as a new line, formatted Alt + Ent. In Excel the delimeter is Ctrl+J if I want to convert text to column. What is the character that corresponds to this in STATA. Spitting in Excel will take longer time.

    When I ran charlist for that ID variable, the result is not helpful (as I saw in other posts) : *-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ

    These are usual characters for one firmID in the patent data
    I highly appreciate your help, best
    Melike


  • #2
    You'd be much more likely to get some help here if you supply an example of your Stata data set using -dataex-, as described in the StataList FAQ for new members. Purely verbal descriptions of data are hard to understand, particularly with complexities such as you apparently have.

    Beyond that: I created a small example data file in Excel with a "Control+J" (line feed) within a cell. When I imported it into Stata, that character did not appear, as you found with charlist. I also tried saving it from Excel as a CSV, and it didn't import nicely into Stata.

    Consequently, my first suggestion is to do a find/replace in Excel to change all instances of Control+J to some other character not used within the ID string variable (comma or tab or whatever), and then import it into Stata. Then, present here a -dataex- example from that data set.

    Comment

    Working...
    X