Hey Statalist Community!
This is my first time posting, so if I commit any faux pas, please let me know.
Anyway, I'm working with a dataset that includes (among many other things) company names. Many of these names have non-alphanumeric characters (e.g. - / .). Below is code that I have been using to isolate those names that include non-alphanumeric characters:
BEGIN CODE
gen slash = regexm(cname, "/")
keep if slash == 1
drop slash
duplicates drop
export excel using location/excelfile.xlsx, sheet("Slash") sheetmodify cell(B2)
END CODE
Where "cname" refers to the company name, and "location/excelfile.xlsx" is some arbitrary excel file that I've been exporting to.
This code has worked well for all of the characters with the exception of the period. Whenever I use gen period = regexm(cname, "."), every entry is tagged, not just those that have a period in the name. I presume this occurs because of Stata's default interpretation of ".", but I'm not sure what to do next.
Any suggestions would be welcome.
Thank you,
Andrew
This is my first time posting, so if I commit any faux pas, please let me know.
Anyway, I'm working with a dataset that includes (among many other things) company names. Many of these names have non-alphanumeric characters (e.g. - / .). Below is code that I have been using to isolate those names that include non-alphanumeric characters:
BEGIN CODE
gen slash = regexm(cname, "/")
keep if slash == 1
drop slash
duplicates drop
export excel using location/excelfile.xlsx, sheet("Slash") sheetmodify cell(B2)
END CODE
Where "cname" refers to the company name, and "location/excelfile.xlsx" is some arbitrary excel file that I've been exporting to.
This code has worked well for all of the characters with the exception of the period. Whenever I use gen period = regexm(cname, "."), every entry is tagged, not just those that have a period in the name. I presume this occurs because of Stata's default interpretation of ".", but I'm not sure what to do next.
Any suggestions would be welcome.
Thank you,
Andrew
Comment