Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify string observations in multiple variables

    Hello,
    I am working on a dataset with variables that contains several numbers, strings and empty spaces. Also, some observations have decimals with commas and dots. My goal is to only use the numbers with decimals in dots so I can analyze them properly. The empty spaces and strings I would like to keep them as missing observations.
    This is an example:
    var1 var2 var3
    1494. 13.3 NI NI
    1495. NI NI 888
    1496. not_
    1497. NI NI NI
    1498.
    1499. 13.7 9 664.1
    1500.
    1501. 15.7 10.2 909,2
    1502. NI NI NI
    1503.
    1504. NI NI 750
    1505.
    I thought that maybe this way I can solve the problem: to iterate in a foreach loop for every variable to replace the empty spaces and strings to missing, then use subinstr to replace the decimals with commas to dots, and then destring those variables to numeric.
    My approach is:

    foreach v or varlist var1 var2 var3 {
    replace `v' = "." if `v' == ""
    replace `v' = "." if `v' == "NI" | `v' == "not_" | `v' == "unavail."
    replace `v' = subinstr(`v', ",", ".", 1)
    format %9.2f `v'
    }


    The problem is there are too many variables with multiple strings, and I do not want to write each `v' == "some string" (because there are a lot and different). Can I use another way to replace all strings from each variable?
    Many thanks,



    David

  • #2
    I understand your problem to be this:

    You have some string variables in which some string values have a numeric interpretation while the other string values do not and might as well be missing.

    Note that to Stata an observation is an entire row, record or case in the dataset for one or more variables. The variables themselves be string or numeric. "string observations" are better called "string values".

    destring, force for once might be most of what you need. Consider this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4(var1 var2) str5 var3
    "13.3" "NI"   "NI"   
    "NI"   "NI"   "888"  
    "not_" ""     ""     
    "NI"   "NI"   "NI"   
    ""     ""     ""     
    "13.7" "9"    "664.1"
    ""     ""     ""     
    "15.7" "10.2" "909,2"
    "NI"   "NI"   "NI"   
    ""     ""     ""     
    "NI"   "NI"   "750"  
    ""     ""     ""     
    end
    
    . destring var*, force gen(new1 new2 new3)
    var1: contains nonnumeric characters; new1 generated as double
    (9 missing values generated)
    var2: contains nonnumeric characters; new2 generated as double
    (10 missing values generated)
    var3: contains nonnumeric characters; new3 generated as double
    (9 missing values generated)
    
    . list
    
         +-------------------------------------------+
         | var1   new1   var2   new2    var3    new3 |
         |-------------------------------------------|
      1. | 13.3   13.3     NI      .      NI       . |
      2. |   NI      .     NI      .     888     888 |
      3. | not_      .             .               . |
      4. |   NI      .     NI      .      NI       . |
      5. |           .             .               . |
         |-------------------------------------------|
      6. | 13.7   13.7      9      9   664.1   664.1 |
      7. |           .             .               . |
      8. | 15.7   15.7   10.2   10.2   909,2       . |
      9. |   NI      .     NI      .      NI       . |
     10. |           .             .               . |
         |-------------------------------------------|
     11. |   NI      .     NI      .     750     750 |
     12. |           .             .               . |
         +-------------------------------------------+
    destring can't deal well with a mix of decimal points and decimal commas. If 909,2 were 909.2 you would need to change it first.

    Comment


    • #3
      That worked perfectly fine!
      Before using destring force, I used subinstr to change the commas to dots.

      Thanks Nick!

      Comment

      Working...
      X