My example dataset (see attached) has only one variable that contains observations in the form shown under the "base" variable in the picture above. All observations are formatted as strings.
The observations follow this pattern:
1) The string "Header" comes first.
2) A varying number of string observations come next
3) The string "End-Header" comes next.
4) A varying number of string observations then follow.
5) The string "Header" comes again to signify the start of the pattern again.
My objective is to write code that removes the observations between the two strings "Header" and "End-Header", including the "Header" and "End-Header" strings themselves. So, I wish to get to a final list of observations as shown under the "target" variable in the picture above. What is the most efficient way to do this?
I tried the following code without success. I get a "is not a valid command name" error. My plan was to generate a "counter" variable equalling 1 initially, and add one each time either "Header" or "End-Header" showed up while looping through the observations. I could then delete the observations coinciding with odd numbers of the counter variable, followed by deleting occurrences of "End-Header".
Code:
* Code adapted from https://www.stata.com/statalist/archive/2007-03/msg00525.html
gen count = 1
local N = _N
forvalue i = 2/`N'{
if base[`i'] == "Header" | base[`i'] == "End-Header"{
qui replace count = count[_n-1]+1 in `i'
}
}
* drop odd occurences, then drop "End-Header"
drop if mod(count,2) == 1
drop if base == "End-Header"
Stata/SE 16.0

Comment