looping over lines

lamya kejji

Join Date: May 2016

Posts: 32
#1

looping over lines

06 May 2016, 08:47

Hello,

I am very new to Stata I ve just started today and I need some help.
I have a xlsx file with some useless information in some lines. This information is text, it is like a header, but it comes like twice in the file. The columns and rows don't have names. So I would like to loop over the lines, and see if one of these lines contains a string let's say "str", if so then I will drop it.
The pseudo code would be :
for (i =0;i<nb_lines;i++){
if line.contains("str"){
drop line
}
}
The data looks like this:
the header

useful info
.... useful info
..... useful info
....

the header

Thank you in advance,

Lamya
Tags: None
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#2

06 May 2016, 09:49

In the future, please read the FAQ, especially #12 that shows how to give a sample of your data with dataex and how to use code delimters.

Read through help strmatch
This will loop through all variables for each observation (row) and drop those observations not containing "string":

Code:

gen flag=. forvalues i = 1 / `=_N' { local counter=0 foreach var of varlist _all { local j `=`var'[`i']' if `=strmatch("`j'" , "*string*")' == 1 local ++counter } if `counter'==0 replace flag=1 in `i' } *after examining the flag variable to ensure you got the right string: drop if flag==1

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
lamya kejji

Join Date: May 2016

Posts: 32
#3

09 May 2016, 06:43

Thank you so much Carole.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35672
#4

09 May 2016, 06:53

I think Carole J. Wilson means keep not drop in her last line.

Further, the double loop over observations and variables isn't needed here so far as I can see.

Code:

gen flag = 0 quietly foreach var of varlist _all { replace flag = flag + strmatch(`var' , "*string*") } keep if flag

has the same consequence
1 like
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#5

09 May 2016, 07:01

Nick's code is much cleaner and easier to read, but the original post requested to drop if the string was present.

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
lamya kejji

Join Date: May 2016

Posts: 32
#6

09 May 2016, 07:04

I indeed changed drop to keep.
Thank you Nick for this optimized answer, it worked too ( I used drop this time ).
Comment
lamya kejji

Join Date: May 2016

Posts: 32
#7

09 May 2016, 07:10

I wanted to use your code @Carole J. Wilson, to drop all the lines that come after a certain position (line where I find a "string", unique in the file ). I store the value of the position in the flag. How can I use the value of the flag?

Code:

gen flag=. forvalues i = 1 / `=_N' { local counter=0 foreach var of varlist _all { local j `=`var'[`i']' if `=strmatch("`j'" , "*string*")' == 1 local ++counter } if `counter'==1 replace flag=`i' in `i' } drop in `flag'/`_N'

Last edited by lamya kejji; 09 May 2016, 07:27.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35672
#8

09 May 2016, 07:16

Carole:

Thanks for the quick reply. You are quite right that the original post asked for drop. I was reacting to your comment that your code
would drop those observations not containing "string" (emphasis added).

As Iamya appears now to be saying that keep is interesting too, we can perhaps all agree that the code finds observations with matches and can then be used to drop or keep depending on circumstance.
Comment
lamya kejji

Join Date: May 2016

Posts: 32
#9

09 May 2016, 07:30

Nick or Carole, could you please help with my second question ?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35672

#10

09 May 2016, 07:36

If I understand the new question correctly an answer is

Code:

gen flag = 0
quietly foreach var of varlist _all {    
     replace flag = flag + strmatch(`var' , "*string*")
}

keep if sum(flag[_n-1]) < 1

Comment

Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#11

09 May 2016, 07:41

After you get the flag variable (my way or Nick's), we'll create a counter variable id that is just the number of the line:

Code:

gen id=_n sum id if flag==1

The resulting minimum value is the id number of the first time flag==1

Code:

drop if id > r(min)

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
lamya kejji

Join Date: May 2016

Posts: 32
#12

09 May 2016, 07:43

Exactly! Thank you Nick.

Could you please explain the last line ?
Comment
lamya kejji

Join Date: May 2016

Posts: 32
#13

09 May 2016, 07:47

Aah I understand now. Thank you Carole this is really helping !

Originally posted by Carole J. Wilson View Post

After you get the flag variable (my way or Nick's), we'll create a counter variable id that is just the number of the line:

Code:

gen id=_n sum id if flag==1

The resulting minimum value is the id number of the first time flag==1

Code:

drop if id > r(min)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35672
#14

09 May 2016, 07:48

You should be able to work it out! If flag goes 0, 0, 0, 0, 0, ..., 1 then its cumulative sum is the same through the sequence 0, 0, 0, 0, 0, ..., 1, after which you don't care. The offset of 1 observation is needed if you want to keep the line which is flagged, which is implied by wanting to drop after that line.

Carole's technique is essentially equivalent, although in this problem it seems that we don't need to create a new variable as flag contains precisely the information we need already.

For write-ups of Carole's technique see references yielded by

http://www.stata-journal.com/sjsearc...h+observations

Last edited by Nick Cox; 09 May 2016, 08:15.
Comment
lamya kejji

Join Date: May 2016

Posts: 32
#15

09 May 2016, 07:58

Many thanks Nick !
Comment

the header
useful info ....	useful info .....	useful info ....
the header

Announcement

looping over lines

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment