Creating composite counts of variables in panel data

Dibyajyoti Basak

Join Date: Feb 2020
Posts: 4

Creating composite counts of variables in panel data

13 Nov 2023, 10:33

Hello,

I have the following dataset of certain incidents occurring in districts across India. The data is in the following format:

district	count_totalincidents	startingyear	titleofincident (string)
AP	8	1995	abc
AP	8	1998	def
AP	8	2000	ghi
AP	8	2002	jkl
AP	8	2004	mno
AP	8	2007	pqr
AP	8	2008	stu
AP	8	2009	vwx

I want to convert this data into a panel dataset, with a new variable year that will have values from 2001-10. I also want three new variables that measures the count of the incidents based on their starting year. If you look below it is clear the count_before2001 and count_after2010 will 0 since those years are not being included in the panel. The variable count_before2001 has 3 because in the table above, there are 3 incidents occurring before 2001.

What I am facing a problem in doing is generating the count_between200110 variable. I want this variable to dynamically change based on the starting year variable. As you can see below, this variable takes on the value of 1 when the year becomes 2002, (there is an incident in 2002 in the table above), increases to 2 in 2004 (another incident in 2004), and so on.

year	district	count_totalincidents	count_before2001	count_between200110	count_after 2010
2001	AP	8	3	0	0
2002	AP	8	3	1	0
2003	AP	8	3	1	0
2004	AP	8	3	2	0
2005	AP	8	3	2	0
2006	AP	8	3	2	0
2007	AP	8	3	3	0
2008	AP	8	3	4	0
2009	AP	8	3	5	0

I need to replicate this for all the districts in my dataset.
Hope this information is sufficient, if not, I will provide further details on asking.

Thanking the community in advance.

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30119

13 Nov 2023, 11:33

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 district byte count_totalincidents int startingyear str3 titleofincidentstring
"AP " 8 1995 "abc"
"AP " 8 1998 "def"
"AP " 8 2000 "ghi"
"AP " 8 2002 "jkl"
"AP " 8 2004 "mno"
"AP " 8 2007 "pqr"
"AP " 8 2008 "stu"
"AP " 8 2009 "vwx"
end

encode district, gen(n_district)
tsset n_district startingyear
tsfill
by n_district (district), sort: replace district = district[_N]
by district (count_totalincidents), sort: replace count_totalincidents = ///
    count_totalincidents[1]
    
isid district startingyear, sort
by district (startingyear): egen count_before2001 = ///
    total(cond(startingyear < 2001, !missing(titleofincidentstring), .))
by district (startingyear): egen count_after_2010 = ///
    total(cond(startingyear > 2010, !missing(titleofincidentstring), .))
by district (startingyear): gen count_between200110 = ///
    sum(cond(inrange(startingyear, 2001, 2010), !missing(titleofincidentstring), .)), ///
    before(count_after_2010)
keep if inrange(startingyear, 2001, 2009)

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Comment

Dibyajyoti Basak

Join Date: Feb 2020

Posts: 4
#3

13 Nov 2023, 12:41

Much thanks, Professor Schechter. I will keep in mind your advice about using -dataex-

On running the code, I'm getting the following error: "repeated time values within panel".

This maybe due to repeated values of the variable 'startingyear'. I'm putting a snippet of the data that has repeated values of the startingyear (1995 and 2004 are repeated twice).

clear
input str3 district byte count_totalincidents int startingyear str3 titleofincidentstring
"MH" 8 1995 "abc"
"MH" 8 1995 "def"
"MH" 8 2000 "ghi"
"MH" 8 2002 "jkl"
"MH" 8 2004 "mno"
"MH" 8 2004 "pqr"
"MH" 8 2008 "stu"
"MH" 8 2009 "vwx"
end

What can be the workaround in this case?

Thanks,
Dibyajyoti
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30119

13 Nov 2023, 13:08

OK. That requires a few changes to the code. They are sufficiently numerous that rather than highlighting them in the original, I'm just posting the entire revision--just discard what you were given in #2.

Code:

clear
input str3 district byte count_totalincidents int startingyear str3 titleofincidentstring
"MH" 8 1995 "abc"
"MH" 8 1995 "def"
"MH" 8 2000 "ghi"
"MH" 8 2002 "jkl"
"MH" 8 2004 "mno"
"MH" 8 2004 "pqr"
"MH" 8 2008 "stu"
"MH" 8 2009 "vwx"
end

encode district, gen(district_num)
gen n_incidents = !missing(titleofincidentstring)
collapse (sum) n_incidents (first) district_num count_totalincidents, ///
    by(district startingyear)

tsset district_num startingyear
tsfill
by district_num (district), sort: replace district = district[_N]
by district (count_totalincidents), sort: ///
    replace count_totalincidents = count_totalincidents[1]
replace n_incidents = 0 if missing(n_incidents)    
isid district startingyear, sort
by district (startingyear): egen count_before2001 = ///
    total(cond(startingyear < 2001, n_incidents, .))
by district (startingyear): egen count_after_2010 = ///
    total(cond(startingyear > 2010, n_incidents, .))
by district (startingyear): gen count_between200110 = ///
    sum(cond(inrange(startingyear, 2001, 2010), n_incidents, .)), ///
    before(count_after_2010)
keep if inrange(startingyear, 2001, 2009)

Comment

Dibyajyoti Basak

Join Date: Feb 2020

Posts: 4
#5

16 Nov 2023, 09:06

Thank you so much for your help, Professor! This code worked like a treat.
Comment

Announcement

Creating composite counts of variables in panel data

Comment

Comment

Comment

Comment