Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to transforming data using loops

    I have two or three ideas of using loops to achieve data transformation for data manipulation, but I can't achieve it, ask for help, modify and improve the program to realize the use of loops to complete data transformation operations

    raw data as follows:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte group str3 keys str11 contens
    1 "A"   "A"          
    1 "B"   "B"          
    1 "str" "a"          
    2 "A"   "A"          
    2 "B"   "B"          
    2 "str" "a b"        
    3 "A"   "A"          
    3 "B"   "B"          
    3 "str" "a b c d"    
    4 "A"   "A"          
    4 "B"   "B"          
    4 "str" "a b c d e f"
    5 "A"   "A"          
    5 "B"   "B"          
    5 "str" "a b"        
    end




    target data as follows:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte group str3 keys str1 contents
    1 "A"   "A"
    1 "B"   "B"
    1 "str" "a"
    2 "A"   "A"
    2 "B"   "B"
    2 "str" "a"
    2 "str" "b"
    3 "A"   "A"
    3 "B"   "B"
    3 "str" "a"
    3 "str" "b"
    3 "str" "c"
    3 "str" "d"
    4 "A"   "A"
    4 "B"   "B"
    4 "str" "a"
    4 "str" "b"
    4 "str" "c"
    4 "str" "d"
    4 "str" "e"
    4 "str" "f"
    5 "A"   "A"
    5 "B"   "B"
    5 "str" "a"
    5 "str" "b"
    end

    I got 3 ideas to solve the problem with loops, but the program is not well written, please lend a helping hand, thank you

    Idea one: First use the split command to split the string by spaces, insert a line less than the number of words by 1 (because there is a line) according to the number of words, then _g1 replaces the string, _g2 replaces the next line, according to the word The number of cycles repeats until the completion

    count if keys== "str"
    local tol= r(N)+_N
    split contents, gen(_g)
    forvalues n=1(1)`tol' {
    if keys== "str" {
    local wc = wordcount(contents[`n'])-1
    if `wc'>= 1{
    insobs `wc', after(`n')
    }
    replace contents = _g1 if keys== "str" // This program does not need a loop, but I don't know how to deal with it
    forvalues b=2/`wc' {
    replace contents[`=`n'+3-`b''] = _g`b' if keys== "str" // error weights not allowed Replace contents[_n+1] contents[_n+2] contents[_n+3] with _g2 _g3 _g4... in turn until all words are filled in
    }
    }
    }


    Idea two: Use the ends function of the egen command to split the string into two parts before and after the first space and store them in separate variables, then replace the string before the space (the first word) with the original string, and then add the string before the space (the first word). Insert a line after the space, and fill in the space below the original string with the string after the space. Then the same method splits the string after the first space until it is completely filled.



    count keys== "str"
    local tol= r(N)+_N
    forvalues n=1(1)`tol' {
    if keys[`n']== "str" {
    insobs 1, after(`n')
    }
    }

    local wc = wordcount(contents[`n'])

    egen contents2 = ends(contents),punct(" ")
    egen contents3 = ends(contents),punct(" ") tail // Split the string in the contents variable into two parts according to the first space, and then loop
    replace contents = contents2 if keys == "str"
    replace contents[_n+1] = contents3[_n] if keys[_n+1] == "" // error weights not allowed
    drop contents2 contents3




    Idea three: Similar to idea 2, use regular expressions to match the words before the space and the words after the space in the string, store them in the temporary element, and then insert them cyclically according to the number of words. This method avoids generation and deletion. variable

    local first = ustrregexs(1) if ustrregexm(contents,("\w+") // matches the word before the first space, but I don't get the regex to match
    local tail = ustrregexs(2) if ustrregexm(contents,(?) ) // matches the word after the first space


    Thank you, please help me to see if my idea works? No matter what kind of solution is very helpful, how to improve the above program, I look forward to your help, For any of these ideas to improve or have a better update method, I am very grateful to you.
    Last edited by fu gang; 04 Jul 2022, 20:45.

  • #2
    clear
    input byte group str3 keys str11 contens
    1 "A" "A"
    1 "B" "B"
    1 "str" "a"
    2 "A" "A"
    2 "B" "B"
    2 "str" "a b"
    3 "A" "A"
    3 "B" "B"
    3 "str" "a b c d"
    4 "A" "A"
    4 "B" "B"
    4 "str" "a b c d e f"
    5 "A" "A"
    5 "B" "B"
    5 "str" "a b"
    end
    split contens,p(" ")
    drop contens
    gen i=_n
    reshape long contens, i(i) j(j)
    drop if contens==""
    keep group keys contens

    *You don't need a loop, just use the reshape command

    Comment


    • #3
      You don't need a loop, just use the reshape command

      code:
      clear
      input byte group str3 keys str11 contens
      1 "A" "A"
      1 "B" "B"
      1 "str" "a"
      2 "A" "A"
      2 "B" "B"
      2 "str" "a b"
      3 "A" "A"
      3 "B" "B"
      3 "str" "a b c d"
      4 "A" "A"
      4 "B" "B"
      4 "str" "a b c d e f"
      5 "A" "A"
      5 "B" "B"
      5 "str" "a b"
      end
      split contens,p(" ")
      drop contens
      gen i=_n
      reshape long contens, i(i) j(j)
      drop if contens==""
      keep group keys contens

      Comment


      • #4
        You don't need a loop, just use the reshape command

        code:
        Code:
        clear
        input byte group str3 keys str11 contens
        1 "A" "A"
        1 "B" "B"
        1 "str" "a"
        2 "A" "A"
        2 "B" "B"
        2 "str" "a b"
        3 "A" "A"
        3 "B" "B"
        3 "str" "a b c d"
        4 "A" "A"
        4 "B" "B"
        4 "str" "a b c d e f"
        5 "A" "A"
        5 "B" "B"
        5 "str" "a b"
        end
        split contens,p(" ")
        drop contens
        gen i=_n
        reshape long contens, i(i) j(j)
        drop if contens==""
        keep group keys contens

        Comment


        • #5
          Thank you very much, a teacher once taught me this method, there are other good methods, but I want to use a loop to achieve this data manipulation. The idea of ​​the loop is very clear, but I don't how to write the loop program.

          Comment


          • #6
            This,needs a cross-reference to your previous thread in which it was already pointed out that the problem doesn't need a loop. Just posting the question again without a cross-reference is not good forum practice. Yan yucong gave a good answer here, but clearly was not aware of previous answers within

            https://www.statalist.org/forums/for...following-data

            Wanting a loop here is, frankly, perverse. I saw the last post in that thread when I was travelling and it was not easy to reply at length.

            Your attempt in #1 is very confused, as you try to use subscripts on the left-hand side of a replace statement (where they are illegal) and you don't use subscripts on a if command (where they are needed for what you want). I think there are other errors, but I stopped there.

            Sorry, but having worked on this problem once in a direct way, I am not tempted to rewrite your code to do it in an indirect way.
            Last edited by Nick Cox; 05 Jul 2022, 10:33.

            Comment


            • #7
              Ok, I understand, I won't post the same question next time, thank you

              Comment

              Working...
              X