Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • destring a complicated variable, with several nonnumerical forms

    Hi,

    The variable grade in my dataset is badly collected and it has been saved as a string variable.
    How can I destring it while some observations have % or not, U (for fails), commas and dots?


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str22 grade
    "66,55%"
    "81%"  
    "72.5"  
    "60%"  
    "84,85%"
    "84,4%"
    "84,65%"
    "81,35%"
    "75%"  
    "81,4%"
    "79,5%"
    "U"    
    "81.7%"
    "71,05%"
    "91,45%"
    "70%"  
    "89,9%"
    "83,8%"
    "88,75%"
    "82,65%"
    "88,75"
    "86,95%"
    "90,35%"
    "79.6"  
    "74"
    "82,6%"
    "84,5%"
    "77.6%"
    "81"    
    "67,5%"
    "U"    
    
    end
    Thanks in advance
    Last edited by Neg Kha; 27 Dec 2023, 06:08.

  • #2
    ".a" identifies failures below. See

    Code:
    help missing
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str22 grade
    "66,55%"
    "81%"  
    "72.5"  
    "60%"  
    "84,85%"
    "84,4%"
    "84,65%"
    "81,35%"
    "75%"  
    "81,4%"
    "79,5%"
    "U"    
    "81.7%"
    "71,05%"
    "91,45%"
    "70%"  
    "89,9%"
    "83,8%"
    "88,75%"
    "82,65%"
    "88,75"
    "86,95%"
    "90,35%"
    "79.6"  
    "74"
    "82,6%"
    "84,5%"
    "77.6%"
    "81"    
    "67,5%"
    "U"    
    
    end
    
    gen wanted= subinstr(subinstr(subinstr(grade, "%", "", .), ",", ".",.),"U", ".a",.)
    destring wanted, replace
    Res.:

    Code:
    . l, sep(0)
    
         +-----------------+
         |  grade   wanted |
         |-----------------|
      1. | 66,55%    66.55 |
      2. |    81%       81 |
      3. |   72.5     72.5 |
      4. |    60%       60 |
      5. | 84,85%    84.85 |
      6. |  84,4%     84.4 |
      7. | 84,65%    84.65 |
      8. | 81,35%    81.35 |
      9. |    75%       75 |
     10. |  81,4%     81.4 |
     11. |  79,5%     79.5 |
     12. |      U       .a |
     13. |  81.7%     81.7 |
     14. | 71,05%    71.05 |
     15. | 91,45%    91.45 |
     16. |    70%       70 |
     17. |  89,9%     89.9 |
     18. |  83,8%     83.8 |
     19. | 88,75%    88.75 |
     20. | 82,65%    82.65 |
     21. |  88,75    88.75 |
     22. | 86,95%    86.95 |
     23. | 90,35%    90.35 |
     24. |   79.6     79.6 |
     25. |     74       74 |
     26. |  82,6%     82.6 |
     27. |  84,5%     84.5 |
     28. |  77.6%     77.6 |
     29. |     81       81 |
     30. |  67,5%     67.5 |
     31. |      U       .a |
         +-----------------+

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      ".a" identifies failures below. See

      Code:
      help missing
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str22 grade
      "66,55%"
      "81%"
      "72.5"
      "60%"
      "84,85%"
      "84,4%"
      "84,65%"
      "81,35%"
      "75%"
      "81,4%"
      "79,5%"
      "U"
      "81.7%"
      "71,05%"
      "91,45%"
      "70%"
      "89,9%"
      "83,8%"
      "88,75%"
      "82,65%"
      "88,75"
      "86,95%"
      "90,35%"
      "79.6"
      "74"
      "82,6%"
      "84,5%"
      "77.6%"
      "81"
      "67,5%"
      "U"
      
      end
      
      gen wanted= subinstr(subinstr(subinstr(grade, "%", "", .), ",", ".",.),"U", ".a",.)
      destring wanted, replace
      Res.:

      Code:
      . l, sep(0)
      
      +-----------------+
      | grade wanted |
      |-----------------|
      1. | 66,55% 66.55 |
      2. | 81% 81 |
      3. | 72.5 72.5 |
      4. | 60% 60 |
      5. | 84,85% 84.85 |
      6. | 84,4% 84.4 |
      7. | 84,65% 84.65 |
      8. | 81,35% 81.35 |
      9. | 75% 75 |
      10. | 81,4% 81.4 |
      11. | 79,5% 79.5 |
      12. | U .a |
      13. | 81.7% 81.7 |
      14. | 71,05% 71.05 |
      15. | 91,45% 91.45 |
      16. | 70% 70 |
      17. | 89,9% 89.9 |
      18. | 83,8% 83.8 |
      19. | 88,75% 88.75 |
      20. | 82,65% 82.65 |
      21. | 88,75 88.75 |
      22. | 86,95% 86.95 |
      23. | 90,35% 90.35 |
      24. | 79.6 79.6 |
      25. | 74 74 |
      26. | 82,6% 82.6 |
      27. | 84,5% 84.5 |
      28. | 77.6% 77.6 |
      29. | 81 81 |
      30. | 67,5% 67.5 |
      31. | U .a |
      +-----------------+
      This is great thanks! There are some observations that have quotations around the number like "76%". How can I get rid of those?
      Edit: Actually, I figured it out. I can use `"" ' for that. Thanks anyways!!
      Last edited by Neg Kha; 27 Dec 2023, 07:32.

      Comment

      Working...
      X