Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recode A String Variable

    Code:
    input VAR1 VAR2
    A1 1
    A2 0
    A3 1
    A4 1
    A5 1
    A6 1
    A7 1
    A8 1
    A9 1
    A10 1
    A15 1
    B7 0
    A1 0
    A16 1
    A17 1
    A18 1
    A19 1
    A20 0
    A21 1
    end
    Say you have data such as the ones shown. I have 'VAR1' and wish to create from it 'VAR2' which takes values 1 if VAR1 contains at the beginning: A1, A3-A10, A15-A19, A21 or if not then it is zero. I believe for this you can use strpos(VAR1) but is it possible to say for example: strpos(VAR1, "A1, A3/A10, A15/A19, A21") ?

  • #2
    I believe for this you can use strpos(VAR1) but is it possible to say for example: strpos(VAR1, "A1, A3/A10, A15/A19, A21") ?
    No, that would not be legal syntax.

    Truthfully, I don't even see a way to come close to that in this context.

    Comment


    • #3
      Clyde Schechter Thank you Clyde for your answer. So then to clarify, there is no way in Stata to do this? I would have to list out all of the value strings separately?

      Comment


      • #4
        Yes, one way or another you would need to do that.

        Comment


        • #5
          Clyde Schechter I am sorry if this is a separate question and I can post it somewhere different but is it possible to split a string variable such as 'VAR1' into two new variables, one that is just letters and one that is just numbers?

          Comment


          • #6
            I'm not sure what you mean. What would the split variables look like?

            Comment


            • #7
              Clyde Schechter For example if 'VAR1' contains strings: A01, A11, UT5, then 'VAR1A' would contain A, A, UT and 'VAR1B' would contain 1,11,5

              Comment


              • #8
                Cross-posted at https://stackoverflow.com/questions/...tring-variable Please note our policy on cross-posting, which is that you are asked to tell us about it.

                Clyde Schechter is not quite right in #3.


                Code:
                strpos(VAR1, "A1, A3/A10, A15/A19, A21")
                is a legal expression which yields 0 whenever VAR1 doesn't contain the literal string and its integer position whenever it does, What it doesn't do is take the comma-separated strings as alternative arguments. It's all or nothing. Forgetting variables, note these examples.

                Code:
                . di strpos("frog A1, A2", "A1, A2")
                6
                
                . di strpos("A1, A2", "A1, A2")
                1
                In each case the second argument (including commas treated literally) is found within the first argument so the result is an integer position.

                Comment


                • #9
                  As usual, Nick is right. It is legal syntax, but it doesn't do what is wanted in #1.

                  Comment


                  • #10
                    Nick Cox Thank you I had forgotten to do the cross post. I tried your recommend Code but it does not produce the desired output shown in the sample data, here is what I try:

                    gen VAR2a = 1 if strpos(var1, "A1, A3/A10, A15/A19, A21") > 0

                    Comment


                    • #11
                      I cross posted at: https://stackoverflow.com/questions/...ect=1#64111028

                      Comment


                      • #12
                        Here is an approach using numlist, inlist(), and regular expression matching; I leave it to others to decide whether that is more convenient (and clearer) than spelling things out.

                        Code:
                        // expand the numeric list and put into local
                        numlist "1 3/10 15/19 21"
                        local numlist `r(numlist)'
                        
                        // make sure we do not hit the limit of inlist()
                        assert `: word count `numlist'' < 250
                        
                        // separate entries by comma
                        local numlist : subinstr local numlist " " ", " , all
                        
                        // create the indicator
                        generate byte wanted = inlist(real(regexs(1)), `numlist') ///
                            if regexm(VAR1, "^A([0-9]*)$")
                            
                            // ... fix missing values
                        replace wanted = 0 if mi(wanted)

                        Comment


                        • #13
                          Another approach, clunky but explicit:

                          Code:
                          gen wanted = inlist(substr(VAR1, 2, 1), "1", "3", "4", "5", "6", "7", "8", "9") | inlist(substr(VAR1, 2, 2), "10", "15", "16", "17", "18", "19", "21")

                          Comment


                          • #14
                            The following would do it too, I think
                            Code:
                            . gen dummy = 0
                            
                            . foreach num of numlist 1 3/10 15/19 21 {
                              2. replace dummy = 1 if strpos(VAR1,"A`num'")
                              3. }

                            Comment


                            • #15
                              Joro's code is, in my view, the best suggestion so far. It is short and still very easy to follow. There is one potential pitfall: strpos(VAR1, "A1") will match both A1 and A10 (and A1234 for that matter).

                              Comment

                              Working...
                              X