Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Writing a loop to create separate string columns

    Dear Stata users,

    I need assistance in creating a separate list for translations in my dataset. Currently, my data includes variables x1, x2, and x3, x4, x5, x6. Among these variables, x2, x3,and x4 contain comments written in a foreign language. I would like to create two new columns, "name_var" and "trans," which will help me organize the translation process. In the "name_var" column, I want to display the variable name (e.g., x2), and in the "trans" column, I want to extract the specific phrase from x2 that requires translation.

    My current code looks like this:

    gen name_var = ""
    gen trans = ""

    local A x2 x3 x4
    foreach var of varlist `A' {
    replace name_var= "`var'"
    replace trans = `var' if `var' != ""
    }

    However, this is not working for me since in the results I get only values from x4 and not from x2 and x3. Any guidance on how to achieve the desired result would be greatly appreciated.

    Thank you!
    Last edited by Agnes Weis; 07 Jul 2023, 14:38.

  • #2
    I'm not at all sure what you're trying to do here, especially with name_var. Clearly, on each iteration of the loop, you overwrite the content of both name_var and trans. So of course you end up at the end of the loop with a state based almost entirely on the x4 variable. You've just overwritten everything and replaced it with content from x4. Even if overwriting the content is the goal here, this isn't safe:

    Code:
    replace trans = `var' if `var' != ""
    because it is possible that x2 x3 and x4 are missing on different observations, and so some data may be carried over from the last iteration of the loop. Moreover, name_var is clearly going to be constant on each iteration of the loop, so I'm not sure what the point of that variable is. I think the fix might be to have a separate "trans" column for each x variable. Maybe something like this:

    Code:
    foreach var in x2 x3 x4 {
        gen trans_`var' = `var'
    }
    It really seems like extracting the portion of the string that needs to be translated is the hard part, especially if there isn't some clear separator character. We may be able to help with that if you provide example data using the -dataex- command.
    Last edited by Daniel Schaefer; 07 Jul 2023, 14:52.

    Comment


    • #3
      I guess this might make sense if x2, x3, and x4 are mutually exclusive such that (e.g.) if there is a note in x2, there is definitely no note in x3 or x4... Again, a data example would be very helpful here.

      Comment


      • #4
        If you want one translation variable, reshape long your data.

        Code:
        rename (x2 x3 x4) (translate2 translate3 translate4)
        gen long obsno=_n
        reshape long translate,  i(obsno) j(name_var)
        tostring name_var, replace
        replace name_var= "x"+ name_var

        Comment


        • #5
          Originally posted by Daniel Schaefer View Post
          I guess this might make sense if x2, x3, and x4 are mutually exclusive such that (e.g.) if there is a note in x2, there is definitely no note in x3 or x4... Again, a data example would be very helpful here.
          My data looks something like this. The first column is the specific survey ID, the second column is x2, the third column is x3, the fourth column is x4.


          "e1809cd7-8a66-448b-b10b-e321e4ea74f4" "Ukaid" "Dhamaan bulshada aya ka masuul ah" "Waxan u baahanahay layr aan ku ifsado musqusha"
          "6cbd04d5-b316-4f16-90dc-b31646f8dd27" "Ukaid " "Hindiya wardher aya ka masul ah nadafada musqulaha" "Maya majiraan"
          "f8f03355-2a48-40d8-8b0f-c11ab291ac23" "Ukaid" "4 Dadaka deganka ayaa nadiifyaan" "Waxan u bahanahay saabuno an ku nadifsano musqulaha. waxan kaloo u baahanahay layr aan musqulaha ku nadifsano"
          "62557772-a2b8-431e-b0e8-9811c34228e8" "Ukaid" "Dadaka deganka ayaa nadiifiyaan" "Sabuun la aan ba nahaysa, iyo layr la aan"
          "fcb999cf-c691-428c-acba-06f7c2f406ff" "Ukiaid" "Dumarka deegaanka" "Saabuun ban u baahanahay iyo biyo si aan u nadiifino musqulaha"
          "ff5c2809-b49e-4575-b8a0-688d03df749f" "Ukaid" "Bulshada deeganka" "Maya"
          "d0575f64-ad92-439b-8f7b-0d071ae31ca3" "Ukaid" "Gabdhaha deegaanka" "Haa habeenki nal ama layr la aan ayan nahay"
          "c71843b0-fddd-4de7-984b-0540880d6eb8" "Ukaid " "Bulshada deegaanka" "Maya"
          "2ef02b10-b4c3-4319-9f5f-544aa9601f89" "Ukaid" "Dadka deegaanka" "Maya"
          "29672919-d782-4cf0-ac84-b03383ba49ed" "Ukaid" "Dadka deganka dhamantod,majiro qof u gooni ah" "Maya "
          "dc0eb7f2-0ae1-46f0-9c57-3a8382756712" "Ukaid" "Qaar ka mid ah gabdhaha deegaanka" "Maya"

          I want my data to look something like this:
          Var_name Trans ID
          x3 Dhamaan bulshada aya ka masuul ah e1809cd7-8a66-448b-b10b-e321e4ea74f4
          x3 Hindiya wardher aya ka masul ah nadafada musqulaha 6cbd04d5-b316-4f16-90dc-b31646f8dd27
          x2 Ukaid f8f03355-2a48-40d8-8b0f-c11ab291ac23

          Comment


          • #6
            Thanks. Looks to me like Andrew's code in #4 works flawlessly. Do you still need help?

            Code:
            clear
            input str80(id x2 x3 x4)
            "e1809cd7-8a66-448b-b10b-e321e4ea74f4" "Ukaid" "Dhamaan bulshada aya ka masuul ah" "Waxan u baahanahay layr aan ku ifsado musqusha"
            "6cbd04d5-b316-4f16-90dc-b31646f8dd27" "Ukaid " "Hindiya wardher aya ka masul ah nadafada musqulaha" "Maya majiraan"
            "f8f03355-2a48-40d8-8b0f-c11ab291ac23" "Ukaid" "4 Dadaka deganka ayaa nadiifyaan" "Waxan u bahanahay saabuno an ku nadifsano musqulaha. waxan kaloo u baahanahay layr aan musqulaha ku nadifsano"
            "62557772-a2b8-431e-b0e8-9811c34228e8" "Ukaid" "Dadaka deganka ayaa nadiifiyaan" "Sabuun la aan ba nahaysa, iyo layr la aan"
            "fcb999cf-c691-428c-acba-06f7c2f406ff" "Ukiaid" "Dumarka deegaanka" "Saabuun ban u baahanahay iyo biyo si aan u nadiifino musqulaha"
            "ff5c2809-b49e-4575-b8a0-688d03df749f" "Ukaid" "Bulshada deeganka" "Maya"
            "d0575f64-ad92-439b-8f7b-0d071ae31ca3" "Ukaid" "Gabdhaha deegaanka" "Haa habeenki nal ama layr la aan ayan nahay"
            "c71843b0-fddd-4de7-984b-0540880d6eb8" "Ukaid " "Bulshada deegaanka" "Maya"
            "2ef02b10-b4c3-4319-9f5f-544aa9601f89" "Ukaid" "Dadka deegaanka" "Maya"
            "29672919-d782-4cf0-ac84-b03383ba49ed" "Ukaid" "Dadka deganka dhamantod,majiro qof u gooni ah" "Maya "
            "dc0eb7f2-0ae1-46f0-9c57-3a8382756712" "Ukaid" "Qaar ka mid ah gabdhaha deegaanka" "Maya"
            end
            
            rename (x2 x3 x4) (translate2 translate3 translate4)
            gen long obsno=_n
            reshape long translate,  i(obsno) j(name_var)
            tostring name_var, replace
            replace name_var= "x"+ name_var

            Comment


            • #7
              Originally posted by Daniel Schaefer View Post
              Thanks. Looks to me like Andrew's code in #4 works flawlessly. Do you still need help?

              Code:
              clear
              input str80(id x2 x3 x4)
              "e1809cd7-8a66-448b-b10b-e321e4ea74f4" "Ukaid" "Dhamaan bulshada aya ka masuul ah" "Waxan u baahanahay layr aan ku ifsado musqusha"
              "6cbd04d5-b316-4f16-90dc-b31646f8dd27" "Ukaid " "Hindiya wardher aya ka masul ah nadafada musqulaha" "Maya majiraan"
              "f8f03355-2a48-40d8-8b0f-c11ab291ac23" "Ukaid" "4 Dadaka deganka ayaa nadiifyaan" "Waxan u bahanahay saabuno an ku nadifsano musqulaha. waxan kaloo u baahanahay layr aan musqulaha ku nadifsano"
              "62557772-a2b8-431e-b0e8-9811c34228e8" "Ukaid" "Dadaka deganka ayaa nadiifiyaan" "Sabuun la aan ba nahaysa, iyo layr la aan"
              "fcb999cf-c691-428c-acba-06f7c2f406ff" "Ukiaid" "Dumarka deegaanka" "Saabuun ban u baahanahay iyo biyo si aan u nadiifino musqulaha"
              "ff5c2809-b49e-4575-b8a0-688d03df749f" "Ukaid" "Bulshada deeganka" "Maya"
              "d0575f64-ad92-439b-8f7b-0d071ae31ca3" "Ukaid" "Gabdhaha deegaanka" "Haa habeenki nal ama layr la aan ayan nahay"
              "c71843b0-fddd-4de7-984b-0540880d6eb8" "Ukaid " "Bulshada deegaanka" "Maya"
              "2ef02b10-b4c3-4319-9f5f-544aa9601f89" "Ukaid" "Dadka deegaanka" "Maya"
              "29672919-d782-4cf0-ac84-b03383ba49ed" "Ukaid" "Dadka deganka dhamantod,majiro qof u gooni ah" "Maya "
              "dc0eb7f2-0ae1-46f0-9c57-3a8382756712" "Ukaid" "Qaar ka mid ah gabdhaha deegaanka" "Maya"
              end
              
              rename (x2 x3 x4) (translate2 translate3 translate4)
              gen long obsno=_n
              reshape long translate, i(obsno) j(name_var)
              tostring name_var, replace
              replace name_var= "x"+ name_var
              No, it's not working since data is already in the long format.

              Comment


              • #8
                Can you say a little more about why it's "not working"? For example, is there an error code, or are you not happy with the resulting format for your data? If it's the former, I can't reproduce. Here is a toy example where I start with a wide dataset with two "years" of data (2018 and 2019). I reshape long, and reshape long again. The code runs without error, and gives the expected result.

                Code:
                * data in wide format
                clear
                input int(id x22018 x22019 x32018 x32019 x42018 x42019)
                1 1 1 1 1 1 1
                2 2 2 2 2 2 2
                3 3 3 3 3 3 3
                4 4 4 4 4 4 4
                5 5 5 5 5 5 5
                6 6 6 6 6 6 6
                7 7 7 7 7 7 7
                end 
                
                * reshape to long format based on year
                reshape long x2 x3 x4, i(id) j(year)
                list, noobs clean
                
                * reshape a second time
                rename (x2 x3 x4) (translate2 translate3 translate4)
                gen long obsno=_n
                reshape long translate,  i(obsno) j(name_var)
                tostring name_var, replace
                replace name_var= "x"+ name_var
                list, noobs clean
                Code:
                . list, noobs clean
                
                    id   year   x2   x3   x4  
                     1   2018    1    1    1  
                     1   2019    1    1    1  
                     2   2018    2    2    2  
                     2   2019    2    2    2  
                     3   2018    3    3    3  
                     3   2019    3    3    3  
                     4   2018    4    4    4  
                     4   2019    4    4    4  
                     5   2018    5    5    5  
                     5   2019    5    5    5  
                     6   2018    6    6    6  
                     6   2019    6    6    6  
                     7   2018    7    7    7  
                     7   2019    7    7    7
                Code:
                . list, noobs clean
                
                    obsno   name_var   id   year   transl~e  
                        1         x2    1   2018          1  
                        1         x3    1   2018          1  
                        1         x4    1   2018          1  
                        2         x2    1   2019          1  
                        2         x3    1   2019          1  
                        2         x4    1   2019          1  
                        3         x2    2   2018          2  
                        3         x3    2   2018          2  
                        3         x4    2   2018          2  
                        4         x2    2   2019          2  
                        4         x3    2   2019          2  
                        4         x4    2   2019          2  
                        5         x2    3   2018          3  
                        5         x3    3   2018          3  
                        5         x4    3   2018          3  
                        6         x2    3   2019          3  
                        6         x3    3   2019          3  
                        6         x4    3   2019          3  
                        7         x2    4   2018          4  
                        7         x3    4   2018          4  
                        7         x4    4   2018          4  
                        8         x2    4   2019          4  
                        8         x3    4   2019          4  
                        8         x4    4   2019          4  
                        9         x2    5   2018          5  
                        9         x3    5   2018          5  
                        9         x4    5   2018          5  
                       10         x2    5   2019          5  
                       10         x3    5   2019          5  
                       10         x4    5   2019          5  
                       11         x2    6   2018          6  
                       11         x3    6   2018          6  
                       11         x4    6   2018          6  
                       12         x2    6   2019          6  
                       12         x3    6   2019          6  
                       12         x4    6   2019          6  
                       13         x2    7   2018          7  
                       13         x3    7   2018          7  
                       13         x4    7   2018          7  
                       14         x2    7   2019          7  
                       14         x3    7   2019          7  
                       14         x4    7   2019          7
                If it's the latter, then I think you have a fundamental issue with what you're asking for. You want each observation in x2, x3, and x4 to be placed into a single column. Clearly, that single column will have 3 times as many observations, and your table will have to expand to reflect that.

                Comment

                Working...
                X