Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help please

    Hello, I am doing a job for my university and I need help, could you give it to me? I am new to stata, I would like to know how I can filter a variable by its third digit. Basically I need to create a new variable that is, for example, equal to 1 if the third digit of the variable X is equal to 9, equal to 2, if the third digit of the variable X is equal to 5, etc. Thank you.
    PS: the variable on which I must filter is numeric and has a total of 5 digits.

  • #2
    There are at least a few ways to do this, but I'll show you two. First start with some data.

    Code:
    clear
    input long(idnum)
    12345
    23456
    34567
    45678
    56789
    67890
    78901
    89012
    end
    The first thing you need to work out is how to extract the 3rd digit. Since you have a number, you can work with it is as digit, using somewhat clever tricks of division. First I take the integer component of the ID number divided by 100, then take the remainder after dividing that number by 10. (An alternative is to convert the ID to string and -substr()- the 3rd character.)

    Code:
    gen byte digit = mod(int(idnum / 100), 10)
    Once I have the digit, all I need to do now is compare its value and assign a value to a new variable. The way I show here is easy for a beginner to understand. Start by making a new variable, -want1-, and set all its values to missing. If a value is missing, we take it to mean that we have not yet found the value to assign, or such a value doesn't exist.

    Code:
    gen byte want1 = .
    replace want1 = 1 if mi(want1) & digit==9   /* if -want1- hasn't been assigned and the 3rd digit is 9, then assign a value of 1 to -want1-. */
    replace want1 = 2 if mi(want1) & digit==5
    /* additional replacements are easy to add */
    The above approach works well for learning purposes, but in very large datasets can be very slow because it requires multiple passes through the dataset. A more succinct way of doing it could be written as follows. -cond()- is like an if-else function, but vectorizes over the data. It reads: if digit equals 9, then return the value of 1, or else if digit equal 5 then return a value of 2, or else return a missing value, and assign the returned value to -want2-. This can also be extended directly to more conditions.

    Code:
    gen want2 = cond(digit==9, 1, ///
                        cond(digit==5, 2, .))
    Note that I used the -///- operator which means the code continues onto the next line. This must be run from a do-file.

    The last step is then to drop -digit- which is no longer needed. Putting this together:

    Code:
    gen byte digit = mod(int(idnum / 100), 10)
    
    gen byte want1 = .
    replace want1 = 1 if mi(want1) & digit==9
    replace want1 = 2 if mi(want1) & digit==5
    
    gen want2 = cond(digit==9, 1, ///
                cond(digit==5, 2, .))
    list
    drop digit
    Result

    Code:
    . list
    
         +-------------------------------+
         | idnum   digit   want1   want2 |
         |-------------------------------|
      1. | 12345       3       .       . |
      2. | 23456       4       .       . |
      3. | 34567       5       2       2 |
      4. | 45678       6       .       . |
      5. | 56789       7       .       . |
         |-------------------------------|
      6. | 67890       8       .       . |
      7. | 78901       9       1       1 |
      8. | 89012       0       .       . |
         +-------------------------------+

    Comment


    • #3
      Originally posted by Leonardo Guizzetti View Post
      There are at least a few ways to do this, but I'll show you two. First start with some data.

      Code:
      clear
      input long(idnum)
      12345
      23456
      34567
      45678
      56789
      67890
      78901
      89012
      end
      The first thing you need to work out is how to extract the 3rd digit. Since you have a number, you can work with it is as digit, using somewhat clever tricks of division. First I take the integer component of the ID number divided by 100, then take the remainder after dividing that number by 10. (An alternative is to convert the ID to string and -substr()- the 3rd character.)

      Code:
      gen byte digit = mod(int(idnum / 100), 10)
      Once I have the digit, all I need to do now is compare its value and assign a value to a new variable. The way I show here is easy for a beginner to understand. Start by making a new variable, -want1-, and set all its values to missing. If a value is missing, we take it to mean that we have not yet found the value to assign, or such a value doesn't exist.

      Code:
      gen byte want1 = .
      replace want1 = 1 if mi(want1) & digit==9 /* if -want1- hasn't been assigned and the 3rd digit is 9, then assign a value of 1 to -want1-. */
      replace want1 = 2 if mi(want1) & digit==5
      /* additional replacements are easy to add */
      The above approach works well for learning purposes, but in very large datasets can be very slow because it requires multiple passes through the dataset. A more succinct way of doing it could be written as follows. -cond()- is like an if-else function, but vectorizes over the data. It reads: if digit equals 9, then return the value of 1, or else if digit equal 5 then return a value of 2, or else return a missing value, and assign the returned value to -want2-. This can also be extended directly to more conditions.

      Code:
      gen want2 = cond(digit==9, 1, ///
      cond(digit==5, 2, .))
      Note that I used the -///- operator which means the code continues onto the next line. This must be run from a do-file.

      The last step is then to drop -digit- which is no longer needed. Putting this together:

      Code:
      gen byte digit = mod(int(idnum / 100), 10)
      
      gen byte want1 = .
      replace want1 = 1 if mi(want1) & digit==9
      replace want1 = 2 if mi(want1) & digit==5
      
      gen want2 = cond(digit==9, 1, ///
      cond(digit==5, 2, .))
      list
      drop digit
      Result

      Code:
      . list
      
      +-------------------------------+
      | idnum digit want1 want2 |
      |-------------------------------|
      1. | 12345 3 . . |
      2. | 23456 4 . . |
      3. | 34567 5 2 2 |
      4. | 45678 6 . . |
      5. | 56789 7 . . |
      |-------------------------------|
      6. | 67890 8 . . |
      7. | 78901 9 1 1 |
      8. | 89012 0 . . |
      +-------------------------------+
      Thanks for the complete explanation!!. I thank you very much, it has helped me and also I have learned a lot compared to the knowledge I had

      Comment

      Working...
      X