Splitting multiple values in a variable - special delimeter character problem

melike_f

Join Date: Oct 2014

Posts: 4
#1

Splitting multiple values in a variable - special delimeter character problem

07 Nov 2023, 06:15

Hi all,
I have an Excel data set for patents that I imported into STATA. In the ID variable of a patent (each patent corresponds to a row) I have multiple values (firm IDs) as the patents are granted to x number of applicants. The data is str144. firmIDs as string variables. If there are more than one firms involved, the IDs seem to be combined = no delimeter. I need to split each firm (and then reshape to create a dataset in which one row represent patent-firm. So if there is onyl one patent with 3 firms, I will have a dataset with 3 records. This is not the problem. The problem is the delimeter so that I can parse ands split the firmID data.
In the Excel each firm in one patent record shows as a new line, formatted Alt + Ent. In Excel the delimeter is Ctrl+J if I want to convert text to column. What is the character that corresponds to this in STATA. Spitting in Excel will take longer time.

When I ran charlist for that ID variable, the result is not helpful (as I saw in other posts) : *-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ

These are usual characters for one firmID in the patent data
I highly appreciate your help, best
Melike
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2421
#2

07 Nov 2023, 08:53

You'd be much more likely to get some help here if you supply an example of your Stata data set using -dataex-, as described in the StataList FAQ for new members. Purely verbal descriptions of data are hard to understand, particularly with complexities such as you apparently have.

Beyond that: I created a small example data file in Excel with a "Control+J" (line feed) within a cell. When I imported it into Stata, that character did not appear, as you found with charlist. I also tried saving it from Excel as a CSV, and it didn't import nicely into Stata.

Consequently, my first suggestion is to do a find/replace in Excel to change all instances of Control+J to some other character not used within the ID string variable (comma or tab or whatever), and then import it into Stata. Then, present here a -dataex- example from that data set.
Comment

Announcement

Splitting multiple values in a variable - special delimeter character problem

Comment