Hello,
I am working on a dataset with variables that contains several numbers, strings and empty spaces. Also, some observations have decimals with commas and dots. My goal is to only use the numbers with decimals in dots so I can analyze them properly. The empty spaces and strings I would like to keep them as missing observations.
This is an example:
I thought that maybe this way I can solve the problem: to iterate in a foreach loop for every variable to replace the empty spaces and strings to missing, then use subinstr to replace the decimals with commas to dots, and then destring those variables to numeric.
My approach is:
foreach v or varlist var1 var2 var3 {
replace `v' = "." if `v' == ""
replace `v' = "." if `v' == "NI" | `v' == "not_" | `v' == "unavail."
replace `v' = subinstr(`v', ",", ".", 1)
format %9.2f `v'
}
The problem is there are too many variables with multiple strings, and I do not want to write each `v' == "some string" (because there are a lot and different). Can I use another way to replace all strings from each variable?
Many thanks,
David
I am working on a dataset with variables that contains several numbers, strings and empty spaces. Also, some observations have decimals with commas and dots. My goal is to only use the numbers with decimals in dots so I can analyze them properly. The empty spaces and strings I would like to keep them as missing observations.
This is an example:
var1 | var2 | var3 | |
1494. | 13.3 | NI | NI |
1495. | NI | NI | 888 |
1496. | not_ | ||
1497. | NI | NI | NI |
1498. | |||
1499. | 13.7 | 9 | 664.1 |
1500. | |||
1501. | 15.7 | 10.2 | 909,2 |
1502. | NI | NI | NI |
1503. | |||
1504. | NI | NI | 750 |
1505. |
My approach is:
foreach v or varlist var1 var2 var3 {
replace `v' = "." if `v' == ""
replace `v' = "." if `v' == "NI" | `v' == "not_" | `v' == "unavail."
replace `v' = subinstr(`v', ",", ".", 1)
format %9.2f `v'
}
The problem is there are too many variables with multiple strings, and I do not want to write each `v' == "some string" (because there are a lot and different). Can I use another way to replace all strings from each variable?
Many thanks,
David
Comment