Row totals for after a specific event occurs

Megan Forecki

Join Date: Nov 2021

Posts: 7
#1

Row totals for after a specific event occurs

16 Nov 2021, 17:14

Hello!

I'm certain there is a simple answer to this, so please feel free to direct me to the resource. I am hoping to count the number of events after a specific event occurred. My data are currently in wide format.

As an example of the data, I have created the below table. I am hoping to create a totals column that would count the number of events after "Event 2" occurred. I have looked at egen commands, but haven't been able to identify the right one, so wasn't sure if there was another option or if I should continue searching egen commands.

Person id Visit1 Visit2 Visit3 Visit4 Totals

01 Event 1 Event 2 Event3 Event4 2

02 Event2 Event3 Event4 Event1 3

03 Event3 Event1 Event2 Event5 1

04 Event2 Event1 Event4 Event3 3

Appreciate the help!
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 36059

16 Nov 2021, 17:56

This can be done in wide layout (you say format, which is a commonly used word, but I like Clyde Schechter's term, which is less overloaded in computing and statistics).

Nevertheless I recommend reshaping to long because most problems are easier that way. Here is code for solutions in both.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str2 personid str6(visit1 visit2 visit3 visit4) byte totals
"01" "Event1" "Event2" "Event3" "Event4" 2
"02" "Event2"  "Event3" "Event4" "Event1" 3
"03" "Event3"  "Event1" "Event2" "Event5" 1
"04" "Event2"  "Event1" "Event4" "Event3" 3
end

reshape long visit, i(personid) j(seq)

bysort personid (seq) : egen when2 = total(seq * (visit == "Event2"))
gen wanted = 4 - when2

list, sepby(personid)

clear

input str2 personid str6(visit1 visit2 visit3 visit4) byte totals
"01" "Event1" "Event2" "Event3" "Event4" 2
"02" "Event2"  "Event3" "Event4" "Event1" 3
"03" "Event3"  "Event1" "Event2" "Event5" 1
"04" "Event2"  "Event1" "Event4" "Event3" 3
end

gen when2 = .

forval j = 1/4 {
    replace when2 = `j' if visit`j' == "Event2"
}

gen wanted = 4 - when2

list

I fear that the code may need revisiting because real datasets may not have the same number of visits for each person.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30358
#3

16 Nov 2021, 18:06

This is very difficult to do with the data arranged wide. It is very simple with long layout.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str3 personid str6(visit1 visit2 visit3 visit4) "01 " "Event1" "Event2" "Event3" "Event4" "02 " "Event2" "Event3" "Event4" "Event1" "03 " "Event3" "Event1" "Event2" "Event5" "04 " "Event2" "Event1" "Event4" "Event3" end reshape long visit, i(personid) j(vnum) rename visit event by personid (vnum), sort: gen after_event_2 = min(sum(event == "Event2"), 1) by personid: egen wanted = total(after_event_2) replace wanted = wanted - 1

Now, it will prove impossible to do if you don't clean up the data first. In the example you show, "Event2" is indifferently represented as "Event2 " and "Event 2". For all I know the full data has even more typographical variants. Stata will not recognize those as the same thing. So you have to fix up the data to remove all of these inconsistencies in your data before you do anything else. I cleaned up the example data to correct this problem by stripping all blanks from those variables.

Note also that the tableau you show, though well intended, is not really adequate information for writing this kind of code. For example, it is impossible to know if the values in those cells are actually string variables or are value-labeled numeric variables that look to human eyes like strings, but to Stata are nothing of the kind. Consequently the code I have shown may not be compatible with your actual data. To avoid this kind of complication, in the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

The code leaves your data in long layout. In all probability, it is best to leave it that way. Just as this task stumped you with the data in wide layout, you will find that nearly everything in Stata is simpler (sometimes only possible) with long data. There are a handful of things where Stata actually needs the data to be wide--and for that there is a -reshape wide- command. But unless you know that you need to do one of those particular things next, you are better off staying long.

Added: Crossed with #2, which shows how to do it keeping the data wide. Nevertheless, I recommend you go the long route as it will probably serve you better over the long run. Nick also gives a different long-data solution. Note that our solutions will give different results in the event that there are cases where Event2 appears more than once. My solution will count from the first appearance of Event2. His first solution, the one for long data, can give a negative count if, for example Event2 occurred at both visit2 and visit 4. And his second solution, the one for wide data, will count from the last occurrence of Event2. (Of course, if there is never a personid who has Event2 more than once, both of his solutions and mine will all give the same, correct, counts.)

Last edited by Clyde Schechter; 16 Nov 2021, 18:14.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36059
#4

16 Nov 2021, 18:13

As quite often happens, Clyde and I made very similar remarks, although for once I think the problem is quite easily soluble in wide.

I silently cleaned up the data too, but Clyde is bang on. We ask for data examples using dataex and we mean what we say. Sometimes people give schematic examples because (1) they think schematic examples are easier to understand, which can happen but is unusual in my experience (2) the real data are confidential -- where the motive is laudable but the better strategy is to give fake examples with the same structure, as we explain in the FAQ Advice.
1 like
Comment
Megan Forecki

Join Date: Nov 2021

Posts: 7
#5

16 Nov 2021, 19:56

I appreciate all this help! Yes - was creating my mock "data" on the fly at the end of the day with the table feature in the post after failed attempts with the egen command. Fortunately, my dataset has already been cleaned for variants. I'll use the dataex command in the future - thanks for that (and yes, privacy/confidentiality is the main issue and would be breached with a real example so gave it my best effort).

I had reshaped long to wide thinking that would be best in this scenario. I'll give your recommendations a go tomorrow in both layouts. Long might actually work better for my data anyway so that is helpful!

Thanks again for all this insight! Much appreciated.
1 like
Comment
Megan Forecki

Join Date: Nov 2021

Posts: 7
#6

22 Nov 2021, 09:26

Hello! I was able to get the solution to work with Clyde's suggestion, particularly because not all individuals participated in the same number of events.

I have a follow-up question. I am hoping to determine how to count events that may have occurred in-between a certain event. I have copied the code below per the dataex command. My question is how to count the Events in-between Event1, starting with the first instance of Event1. For example, with the below, personID #01 = 0 (because they only participated in Event1). personID#02 = 3. personID#07=13.

Appreciate any insight!

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 personID float(num_appts EventName)
"01" 1 1
"01" 2 1
"01" 3 1
"01" 4 1
"02" 1 1
"02" 2 9
"02" 3 10
"02" 4 1
"02" 5 2
"07" 1 8
"07" 2 1
"07" 3 6
"07" 4 3
"07" 5 7
"07" 6 7
"07" 7 5
"07" 8 3
"07" 9 3
"07" 10 7
"07" 11 7
"07" 12 3
"07" 13 3
"07" 14 4
"07" 15 7
end
label values EventName event
label def event 1 "Event1", modify
label def event 2 "Event2", modify
label def event 3 "Event3", modify
label def event 4 "Event4", modify
label def event 5 "Event5", modify
label def event 6 "Event6", modify
label def event 7 "Event7", modify
label def event 8 "Event8", modify
label def event 9 "Event9", modify
label def event 10 "Event10", modify
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30358
#7

22 Nov 2021, 10:27

Your example is not quite consistent with your description of what you want You say you want a count of events "between" occurrences of Event 1, but the stated results, with 3 for personID 02 and 13 for personID 07 suggest you mean number of events that occur after an Event 1, even if no other Event1 ever follows. Moreover if an Event 1 does follow another Event 1, you do not want to include it in the total.

So if I'm properly understanding what you want, the following code will do:

Code:

isid personID num_appts, sort by personID (num_appts): gen event1_count = sum(EventName== 1) by personID: egen wanted = total(event1_count > 0 & EventName != 1)
1 like
Comment

Person id	Visit1	Visit2	Visit3	Visit4	Totals
01	Event 1	Event 2	Event3	Event4	2
02	Event2	Event3	Event4	Event1	3
03	Event3	Event1	Event2	Event5	1
04	Event2	Event1	Event4	Event3	3

Announcement

Row totals for after a specific event occurs

Comment

Comment

Comment

Comment

Comment

Comment