Formatting Strings: same words being recognized as "different"

John Singer

Join Date: Oct 2021
Posts: 46

Formatting Strings: same words being recognized as "different"

26 Oct 2021, 14:49

Hello,

I imported a dataset from Excel, but the string variables are being recognized as "different" when they should be the same. For example, below there should only be one category for "In Force". As of now, all of my variables are strings, and this is a problem for all of them.

How do I help Stata recognize that the text is "the same?" You can see in the second screenshot that they even appear "crooked" in browse, but e.g.

Code:

 format Status %-14s

didn't make a difference. I can also make edits in the Excel sheet, but of course I'd like to avoid this without knowing what exactly the problem is. Thank you in advance!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str14 Model str16 EGA_In_Effect str29 Status
"Model 1"     "6/30/2014"    "In Force "                
"Model 1"        "11/30/2014"   "In Force "                
"Model 1"        "6/30/2014"    "In Force "                  
"Model 1"        "6/30/2014"    "In Force "                
"Model 2"        "6/30/2014"    "In Force "                    
"Model 1"     "6/30/2014"    "In Force "                
"Model 2"     "6/30/2014"    "In Force "                
"Model 1"     "6/30/2014"    "In Force "                
"Model 1  " "6/30/2014"    "In Force "                
"Model 1"     "6/30/2014"    "In Force "              
"Model 1 "   "6/30/2014" "In Force "                
"Model 1"     "6/30/2014"    "In Force "                
"Model 1 "   "6/30/2014"    "In Force "                
"Model 2"     "6/30/2014"    "In Force "                
"Model 1"     "6/30/2014"    "In Force "                
"Model 1"     "6/30/2014"    "In Force "                
"Model 1"     "6/30/2014"    "In Force "                
"Model 1"     "6/30/2014"    "Signed "                  
"Model 1"     "11/30/2014"   "In Force "                
"Model 1"     "6/30/2014"    "In Force "                
"Model 1"     "6/30/2014"    "In Force "                
"Model 2"     "6/30/2014"    "Signed"                    
"Model 1"     "6/30/2014"    "Agreement in Substance"    
"Model 1"     "6/30/2014"    "In Force "      
        
end

Click image for larger version

Name: example.png
Views: 1
Size: 66.4 KB
ID: 1633405

Click image for larger version

Name: example2.png
Views: 1
Size: 10.9 KB
ID: 1633406

Thank you in advance!

Edit 1: Related post potentially: https://www.statalist.org/forums/for...-string-values

Edit 2: I tried

Code:

local varlist status status_date ega_in_effect model related_agreement1 related_agreement_2 correction_1 correction_2 superseding_model_1 understanding_1

foreach v of varlist `varlist'{
replace `v' = subinstr(`v', " ", "", ., ?)
}

and it didn't work either. I also tried importing the file as a csv instead and it didn't make a difference. Even with the above code (which should have gotten rid of all spaces, the data looks like this:

Attached Files

Last edited by John Singer; 26 Oct 2021, 15:12.

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

26 Oct 2021, 15:20

Replace the replace command in your loop with

Code:

replace `v' = trim(`v')
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#3

26 Oct 2021, 15:23

I think you have several non-ASCII characters. Use chartab from SSC for a tabulation.
2 likes
Comment

Announcement

Formatting Strings: same words being recognized as "different"

Comment

Comment