Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • selecting the first two/three digits of a numeric variable ( which consists of 2-10 digits ) without converting it to string

    Hello there,

    The popular formula for selecting the first two/three digits of a numeric variable is to convert it to string and then selecting the first two/three digits, then restringing it like the following.

    Code:
    tostring industry_code, gen(ind_str)
    gen industry_2_digit = substr(ind_str, 1, 2)
    OR

    Code:
    gen code = substr(strofreal(industry_code), 1, 2)
    But, my dataset is almost 10 GB size, so if I convert the industry_code to string it shuts off after a long time which is very frustrating. Is there any other way other than the codes above , I can select the first two/three digits of the numerical variable industry_code from my dataset below ??


    Code:
    input long industry_code
    
    10
    
    10
    
    10
    
    10
    
    10
    
    10
    
    10
    
    10
    
    102
    
    102
    
    102
    
    102
    
    1021
    
    1021
    
    1021
    
    1021
    
    1028
    
    1028
    
    1028
    
    1028
    
    4849
    end

  • #2
    Code:
    clear
    input long industry_code
      10
      10
      97
      10
      10
      10
      10
      10
     102
     102
     102
     102
    1021
    1021
    1021
    1021
    1028
    1028
    1028
    1028
    4849
    31599
    end
    
    gen wanted= int(industry_code) if  int(industry_code/10)<=9.9
    forval i=2/10{
        local j=`i'-1
        qui replace wanted= int(industry_code/10^`j') if  int(industry_code/10^`i')<=9.9 & missing(wanted)
    }
    Res.:

    Code:
    
    . l, sep(0)
    
         +-------------------+
         | indust~e   wanted |
         |-------------------|
      1. |       10       10 |
      2. |       10       10 |
      3. |       97       97 |
      4. |       10       10 |
      5. |       10       10 |
      6. |       10       10 |
      7. |       10       10 |
      8. |       10       10 |
      9. |      102       10 |
     10. |      102       10 |
     11. |      102       10 |
     12. |      102       10 |
     13. |     1021       10 |
     14. |     1021       10 |
     15. |     1021       10 |
     16. |     1021       10 |
     17. |     1028       10 |
     18. |     1028       10 |
     19. |     1028       10 |
     20. |     1028       10 |
     21. |     4849       48 |
     22. |    31599       31 |
         +-------------------+

    Comment


    • #3
      Here's another way. It is simpler, in one sense, and more complicated than #2, in a different sense.
      Code:
      gen wanted = int(industry_code/(10^(int(log10(industry_code))-1))) if industry_code >= 10
      replace wanted = industry_code if industry_code < 10
      Last edited by Clyde Schechter; 22 Sep 2022, 16:54.

      Comment


      • #4
        Been looking on online , and at last kept hard-coding. Eventually, found a solution and heartfelt thanks to both of you for providing me with such elegant solution to this problem. Having a huge dataset is the main issue here, and because of your kind contribttion now I know how to get around this issue. Much appreciated!

        Comment

        Working...
        X