Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate new variable based off word

    Hi All,

    I am wondering how I can generate a new variable (graduate_degree) with 1= yes and 0=no based off the word master in another variable (degree_title), example of the variable (degree_title) below

    I am trying to differentiate between an undergraduate degree and a graduate degree

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str64 degree_title           
    "BACHELOR OF SCIENCE"       
    ""                          
    ""                          
    ""                          
    ""                          
    ""                          
    ""                          
    ""                          
    "ASSOCIATE OF SCIENCE"      
    "ASSOCIATE OF SCIENCE"      
    ""                          
    ""                          
    "CERT OF CAREER PREPARATION"
    "BACHELOR OF SCIENCE"       
                      
    ""                          
    "BACHELOR OF ARTS"          
    ""                          
    ""                          
    ""                          
    ""                          
    ""                          
    ""                          
    "MASTER OF ARTS"            
                        
    ""                          
    "BACHELOR OF SCIENCE"                                               
    end

  • #2
    Something like
    Code:
    generate byte graduate_degree = strpos(strlower(degree_title), "master") > 0
    should work.

    Comment


    • #3
      Joseph Coveney thank you so much! This worked perfectly

      Comment


      • #4
        Joseph Coveney How could I modify this code if I want to include MS, MA, M.ED. and JURIS DOCTOR?

        Comment


        • #5
          gen graduate_degree = (strpos(lower(degree_title), "master") > 0) | ///
          (strpos(lower(degree_title), "ms") > 0) | ///
          (strpos(lower(degree_title), "ma") > 0) | ///
          (strpos(lower(degree_title), "m.ed.") > 0) | ///
          (strpos(lower(degree_title), "juris doctor") > 0)

          Comment


          • #6
            gen graduate_degree = (strpos(lower(degree_title), "master") > 0) | ///
            (strpos(lower(degree_title), "ms") > 0) | ///
            (strpos(lower(degree_title), "ma") > 0) | ///
            (strpos(lower(degree_title), "m.ed.") > 0) | ///
            (strpos(lower(degree_title), "juris doctor") > 0)

            OR

            local grad_list "master ms ma m.ed. juris doctor"
            gen graduate_degree = 0
            foreach keyword of local grad_list {
            replace graduate_degree = 1 if strpos(lower(degree_title), "`keyword'") > 0
            }

            Comment


            • #7
              Bader Bin Adwan This was really helpful, thank you. A follow up, for "ma" the code pulls degree titles that have the letters ma next to each other, like maintenance, and I just wanted it to pull ma as in masters of arts. Not sure if there is a way to fix this

              Comment


              • #8
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str64 degree_title
                "master of science"  
                "m.a."              
                "maintenance"        
                "BACHELOR OF SCIENCE"
                "MASTER OF ARTS"    
                "ma"                
                "BACHELOR OF SCIENCE"
                "mama"              
                end
                
                gen wanted= ustrregexm(lower(degree_title), "\b(master of arts|ma|m\.a|m\.a\.)\b")
                Res.:

                Code:
                . l, sep(0)
                
                     +------------------------------+
                     |        degree_title   wanted |
                     |------------------------------|
                  1. |   master of science        0 |
                  2. |                m.a.        1 |
                  3. |         maintenance        0 |
                  4. | BACHELOR OF SCIENCE        0 |
                  5. |      MASTER OF ARTS        1 |
                  6. |                  ma        1 |
                  7. | BACHELOR OF SCIENCE        0 |
                  8. |                mama        0 |
                     +------------------------------+
                But beware of other variants e.g., "M. Arts" that the above code does not catch. Visually inspect the results.

                Comment


                • #9
                  In that case you can use
                  Code:
                  replace graduate_degree= 1 if regexm(degree_title, "^ma$")

                  Comment

                  Working...
                  X