Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Syntax error when creating a long

    Hello

    I have a long list of string variables in database1, and a second database (database2) that contains strings that identify the strings in database1

    for example, database1 has:
    variable 'animal' which has values "str1", "str2", "str3"
    variable 'person' has values "str4", "str5", "str6"


    then database2 has a list of these strings and what they mean

    uuid, value
    --------------
    str1, cat
    str2, dog
    str3, bird
    str4, bob
    str5, joe
    str6, sam

    database2 has tens of thousands of these uuid, value pairs, and I want to cut it down in size so that the only uuid, value pairs are the ones that I am interested in

    I have tried the following code to create a command to keep only the strings that I want, but I am getting a syntax error r(111). I am hoping you can help me figure out why

    database1:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str36(animal person)
    "38305D31-B3BE-1770-7765-E15E29995BB9" "906da645-23d4-4621-89ac-505172688bde"
    "38305D31-B3BE-1770-7765-E15E29995BB9" "32fd9b58-e2fd-47dd-b0b4-209b5592c07f"
    "38305D31-B3BE-1770-7765-E15E29995BB9" "906da645-23d4-4621-89ac-505172688bde"
    "38305D31-B3BE-1770-7765-E15E29995BB9" "906da645-23d4-4621-89ac-505172688bde"
    "38305D31-B3BE-1770-7765-E15E29995BB9" "906da645-23d4-4621-89ac-505172688bde"
    "E2B7634F-D903-37DE-C1D7-675528EB85CD" "906da645-23d4-4621-89ac-505172688bde"
    "38305D31-B3BE-1770-7765-E15E29995BB9" "906da645-23d4-4621-89ac-505172688bde"
    "E2B7634F-D903-37DE-C1D7-675528EB85CD" "906da645-23d4-4621-89ac-505172688bde"
    "38305D31-B3BE-1770-7765-E15E29995BB9" "34eecbd7-3502-4d76-9b71-2222b030e9ae"
    "38305D31-B3BE-1770-7765-E15E29995BB9" "906da645-23d4-4621-89ac-505172688bde"
    "38305D31-B3BE-1770-7765-E15E29995BB9" "906da645-23d4-4621-89ac-505172688bde"
    end
    database2:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str36 uuid str81 uuid_value
    "38305D31-B3BE-1770-7765-E15E29995BB9" "Cat"          
    "E2B7634F-D903-37DE-C1D7-675528EB85CD" "Dog"                          
    "906da645-23d4-4621-89ac-505172688bde" "Bob"                    
    "32fd9b58-e2fd-47dd-b0b4-209b5592c07f" "Joe"                    
    "00258D76-66A7-A9E1-9552-76C5A4B8D07C" "Bird"
    "0028AB65-B437-8DC2-1234-481B092D78B6" "Lion"            
    "0032C24D-7C63-EDF1-11C7-42232EE57335" "Tom"                  
    "0053932E-AF39-E151-C6BB-0888B4CD37CA" "Jon"              
    "34eecbd7-3502-4d76-9b71-2222b030e9ae" "Sam"                          
    "0085ADA5-3C01-9831-A0AD-42263455213A" "Snake"            
    end

    I am using this code to create a long ' keep if == ' statement based on the data in database1, which I can then run after loading database2:

    Code:
    clear
    use database1
    
    local strvars animal person
    
    local slist "keep if uuid == "
    
    // get list of each possible value of the variables specified in strvars and store in slist
    foreach v in `strvars' {
        di "`v'"
        levelsof `v', clean separate(`"| uuid == "')
        local slist = "`slist'`r(levels)'| uuid == "
    }
    
    // add in quotes
    local slist: subinstr local slist "== " `"== ""', all
    local slist: subinstr local slist "|" `""|"', all
    
    // remove the uuid == from end of string
    local len = length(`"`slist'"') - 11
    local slist = substr(`"`slist'"',1,`len')
    
    di `"`slist'"'
    
    
    clear
    use database2
    
    // execute the keep if command
    di "`slist'"
    I get this error:


    Code:
    . di "`slist'"
    keep if uuid == B3BE not found
    r(111);
    When i look at the string created by di `"`slist'"'
    Code:
     keep if uuid == "38305D31-B3BE-1770-7765-E15E29995BB9"| uuid == "E2B7634F-D903-37DE-C1D7-675528EB85CD"| uuid == "32fd9b58-e2fd-47dd-b0b4-209b5592c07f"| uuid == "34eecbd7-3502-4d76-9b71-2222b030e9ae"| uuid == "906da645-23d4-4621-89ac-505172688bde"
    It looks like it is formatted correctly, and if I copy/paste it into the command line it works. Any insight would be appreciated.

    Thanks!
    HP
    Last edited by HP Williams; 01 Aug 2023, 16:57. Reason: Tried to fix title but cant...

  • #2
    sorry the title got cut off... dont know how to change it....

    Comment


    • #3
      The reason -display "`slist'"- is throwing an error is that `slist' itself contains quotes ("), and so `slist' must be surrounded in compound double quotes (`" "'). You actually did this correctly just before your -clear- and -use database2- commands. But you forgot it afterwards.

      That said, your code, though a clever use of -levelsof-'s -separate()- option, is unnecessarily complicated for the task of whittling down database2 to the values that appear in dataset1. A shorter, more Stata-ish solution would be:
      [code]
      use database1, clear
      rename (animal person) uuid=
      gen `c(obs_t)' obs_no = _n
      reshape long uuid, i(obs_no) j(anim_pers) string
      tempfile holding
      save `holding'

      use database2, clear
      isid uuid, sort
      merge 1:m uuid using `holding', keep(match) nogenerate
      duplicates drop
      [code]
      At the end of this, the data in memory is database2, purged of any uuid's that do not appear as animal or person in database1. Also in memory at this point are variables indicating which observation in database1 the uuid matches to, and whether it matches to the person or animal. (If you don't need that information, just drop obs_no and anim_pers.) Whether you want to save this as a file, or just work with it from here, is up to you, of course.

      Added: I'm not sure how this post got here. I didn't hit "Post Reply," at least not so far as I know. Anyway, everything here plus a bit more appears in the next post. Sorry.
      Last edited by Clyde Schechter; 01 Aug 2023, 19:25.

      Comment


      • #4
        The reason -display "`slist'"- is throwing an error is that `slist' itself contains quotes ("), and so `slist' must be surrounded in compound double quotes (`" "'). You actually did this correctly just before your -clear- and -use database2- commands. But you forgot it afterwards.

        That said, your code, though a clever use of -levelsof-'s -separate()- option, is unnecessarily complicated for the task of whittling down database2 to the values that appear in dataset1. A shorter, more Stata-ish solution would be:
        Code:
        use database1, clear
        rename (animal person) uuid=
        gen `c(obs_t)' obs_no = _n
        reshape long uuid, i(obs_no) j(anim_pers) string
        tempfile holding
        save `holding'
        
        use database2, clear
        isid uuid, sort
        merge 1:m uuid using `holding', keep(match) nogenerate
        duplicates drop
        At the end of this, the data in memory is database2, purged of any uuid's that do not appear as animal or person in database1. Also in memory at this point are variables indicating which observation in database1 the uuid matches to, and whether it matches to the person or animal. (If you don't need that information, just drop obs_no and anim_pers.) Whether you want to save this as a file, or just work with it from here, is up to you, of course.

        Notice also that this code results in some of the uuid values appearing several times because they appear in database1 associated with several different animals or people. If you just want a bare list of the uuid's with no associations and no duplicates, follow the code shown with
        Code:
        keep uuid
        duplicates drop
        Last edited by Clyde Schechter; 01 Aug 2023, 19:24.

        Comment


        • #5
          Thanks!, that is very helpful!!

          However, I still have a question about using my more roundabout way of doing this.

          When i use

          Code:
          di `"`slist'"'
          it just prints the keep if ... command, it does not actually execute the command.

          Is there something else I have to do to actually have Stata execute the command?

          Comment


          • #6
            Yes, to have Stata execute the command you would write:
            Code:
            `slist'
            -display- just tells Stata to write whatever follows to the Results window (and log file if you are logging).

            Comment


            • #7
              Great. thanks so much!
              HP

              Comment

              Working...
              X