How to transforming data using loops

fu gang

Join Date: Jan 2021

Posts: 138
#1

How to transforming data using loops

04 Jul 2022, 20:24

I have two or three ideas of using loops to achieve data transformation for data manipulation, but I can't achieve it, ask for help, modify and improve the program to realize the use of loops to complete data transformation operations

raw data as follows:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte group str3 keys str11 contens 1 "A" "A" 1 "B" "B" 1 "str" "a" 2 "A" "A" 2 "B" "B" 2 "str" "a b" 3 "A" "A" 3 "B" "B" 3 "str" "a b c d" 4 "A" "A" 4 "B" "B" 4 "str" "a b c d e f" 5 "A" "A" 5 "B" "B" 5 "str" "a b" end

target data as follows:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte group str3 keys str1 contents 1 "A" "A" 1 "B" "B" 1 "str" "a" 2 "A" "A" 2 "B" "B" 2 "str" "a" 2 "str" "b" 3 "A" "A" 3 "B" "B" 3 "str" "a" 3 "str" "b" 3 "str" "c" 3 "str" "d" 4 "A" "A" 4 "B" "B" 4 "str" "a" 4 "str" "b" 4 "str" "c" 4 "str" "d" 4 "str" "e" 4 "str" "f" 5 "A" "A" 5 "B" "B" 5 "str" "a" 5 "str" "b" end

I got 3 ideas to solve the problem with loops, but the program is not well written, please lend a helping hand, thank you

Idea one: First use the split command to split the string by spaces, insert a line less than the number of words by 1 (because there is a line) according to the number of words, then _g1 replaces the string, _g2 replaces the next line, according to the word The number of cycles repeats until the completion

count if keys== "str"
local tol= r(N)+_N
split contents, gen(_g)
forvalues n=1(1)`tol' {
if keys== "str" {
local wc = wordcount(contents[`n'])-1
if `wc'>= 1{
insobs `wc', after(`n')
}
replace contents = _g1 if keys== "str" // This program does not need a loop, but I don't know how to deal with it
forvalues b=2/`wc' {
replace contents[`=`n'+3-`b''] = _g`b' if keys== "str" // error weights not allowed Replace contents[_n+1] contents[_n+2] contents[_n+3] with _g2 _g3 _g4... in turn until all words are filled in
}
}
}

Idea two: Use the ends function of the egen command to split the string into two parts before and after the first space and store them in separate variables, then replace the string before the space (the first word) with the original string, and then add the string before the space (the first word). Insert a line after the space, and fill in the space below the original string with the string after the space. Then the same method splits the string after the first space until it is completely filled.

count keys== "str"
local tol= r(N)+_N
forvalues n=1(1)`tol' {
if keys[`n']== "str" {
insobs 1, after(`n')
}
}

local wc = wordcount(contents[`n'])

egen contents2 = ends(contents),punct(" ")
egen contents3 = ends(contents),punct(" ") tail // Split the string in the contents variable into two parts according to the first space, and then loop
replace contents = contents2 if keys == "str"
replace contents[_n+1] = contents3[_n] if keys[_n+1] == "" // error weights not allowed
drop contents2 contents3

Idea three: Similar to idea 2, use regular expressions to match the words before the space and the words after the space in the string, store them in the temporary element, and then insert them cyclically according to the number of words. This method avoids generation and deletion. variable

local first = ustrregexs(1) if ustrregexm(contents,（"\w+") // matches the word before the first space, but I don't get the regex to match
local tail = ustrregexs(2) if ustrregexm(contents,（？) ) // matches the word after the first space

Thank you, please help me to see if my idea works? No matter what kind of solution is very helpful, how to improve the above program, I look forward to your help, For any of these ideas to improve or have a better update method, I am very grateful to you.

Last edited by fu gang; 04 Jul 2022, 20:45.
Tags: None
Yan yucong

Join Date: Jul 2022

Posts: 3
#2

04 Jul 2022, 21:13

clear
input byte group str3 keys str11 contens
1 "A" "A"
1 "B" "B"
1 "str" "a"
2 "A" "A"
2 "B" "B"
2 "str" "a b"
3 "A" "A"
3 "B" "B"
3 "str" "a b c d"
4 "A" "A"
4 "B" "B"
4 "str" "a b c d e f"
5 "A" "A"
5 "B" "B"
5 "str" "a b"
end
split contens,p(" ")
drop contens
gen i=_n
reshape long contens, i(i) j(j)
drop if contens==""
keep group keys contens

*You don't need a loop, just use the reshape command
Comment
Yan yucong

Join Date: Jul 2022

Posts: 3
#3

04 Jul 2022, 21:38

You don't need a loop, just use the reshape command

code：
clear
input byte group str3 keys str11 contens
1 "A" "A"
1 "B" "B"
1 "str" "a"
2 "A" "A"
2 "B" "B"
2 "str" "a b"
3 "A" "A"
3 "B" "B"
3 "str" "a b c d"
4 "A" "A"
4 "B" "B"
4 "str" "a b c d e f"
5 "A" "A"
5 "B" "B"
5 "str" "a b"
end
split contens,p(" ")
drop contens
gen i=_n
reshape long contens, i(i) j(j)
drop if contens==""
keep group keys contens
Comment

Yan yucong

Join Date: Jul 2022
Posts: 3

04 Jul 2022, 21:40

You don't need a loop, just use the reshape command

code：

Code:

clear
input byte group str3 keys str11 contens
1 "A" "A"
1 "B" "B"
1 "str" "a"
2 "A" "A"
2 "B" "B"
2 "str" "a b"
3 "A" "A"
3 "B" "B"
3 "str" "a b c d"
4 "A" "A"
4 "B" "B"
4 "str" "a b c d e f"
5 "A" "A"
5 "B" "B"
5 "str" "a b"
end
split contens,p(" ")
drop contens
gen i=_n
reshape long contens, i(i) j(j)
drop if contens==""
keep group keys contens

Comment

fu gang

Join Date: Jan 2021

Posts: 138
#5

05 Jul 2022, 10:14

Thank you very much, a teacher once taught me this method, there are other good methods, but I want to use a loop to achieve this data manipulation. The idea of the loop is very clear, but I don't how to write the loop program.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#6

05 Jul 2022, 10:25

This,needs a cross-reference to your previous thread in which it was already pointed out that the problem doesn't need a loop. Just posting the question again without a cross-reference is not good forum practice. Yan yucong gave a good answer here, but clearly was not aware of previous answers within

https://www.statalist.org/forums/for...following-data

Wanting a loop here is, frankly, perverse. I saw the last post in that thread when I was travelling and it was not easy to reply at length.

Your attempt in #1 is very confused, as you try to use subscripts on the left-hand side of a replace statement (where they are illegal) and you don't use subscripts on a if command (where they are needed for what you want). I think there are other errors, but I stopped there.

Sorry, but having worked on this problem once in a direct way, I am not tempted to rewrite your code to do it in an indirect way.

Last edited by Nick Cox; 05 Jul 2022, 10:33.
Comment
fu gang

Join Date: Jan 2021

Posts: 138
#7

06 Jul 2022, 19:27

Ok, I understand, I won't post the same question next time, thank you
Comment

Announcement

How to transforming data using loops

Comment

Comment

Comment

Comment

Comment

Comment