Grouping repetitive dates and graphing

Liz owu

Join Date: May 2017
Posts: 30

Grouping repetitive dates and graphing

07 Jun 2017, 03:50

Dear colleagues,
I am quite new to stata and looking for advise on how to group repeated use of bonds on the same day[Date] with different value dates [ValueDate] in a graph. The [Date] variable should be grouped on the x axis -for instance all days in 02/10/2014 should be in one group but linked to different dates [ValueDate] on the Y axis. suggestions on which types of graphs in stata that could best represent this scenario would be useful.
please see a sample of the dataset below. I tried egen but I received an error message - variable not found.
the data is confidential so I cannot provide it in dataex as previously advised. Thanks again for the help and support.

Date	ISIN	asset val	group	id	ValueDate	BookValue
02/10/2014	xx1	15	Jum	PT1	04/11/2014	145
02/10/2014	xx2	20	Pfand	ex5	11/12/2014	218
02/10/2014	xx3	25	loans	ex10	26/11/2014	16
02/10/2014	xx4	30	shares	ex15	27/10/2014	18
02/10/2014	xx4	35	securities	ex20	16/12/2014	57
02/10/2014	xx5	40	ABS	ex21	22/12/2014	58

07/10/2014	xx1	48	Jum	PT1	06/11/2014	118
07/10/2014	xx21	36	Pfand	ex5	27/10/2014	227
07/10/2014	xx33	10	loans	ex10	07/11/2014	316
07/10/2014	xx40	18	shares	ex15	04/11/2014	211
07/10/2014	xx42	12	securities	ex20	12/11/2014	108
07/10/2014	xx51	10	ABS	ex21	11/12/2014	421
07/10/2014	xx51	5	jum	f12	18/11/2014	14

14/10/2014	xx1	15	Jum	PT1	27/11/2014	145
14/10/2014	xx2	20	Pfand	ex5	19/11/2014	218
14/10/2014	xx3	25	loans	ex10	27/11/2014	16
14/10/2014	xx4	30	shares	ex15	21/11/2014	18
14/10/2014	xx1	35	securities	ex20	19/11/2014	57
14/10/2014	xx21	40	ABS	ex21	23/10/2014	58
14/10/2014	xx33	8	jum	n34	16/12/2014	60

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

07 Jun 2017, 08:31

I am afraid that your sample data is difficult to understand - columns run together - and impossible to easily import into Stata to use in preparing an answer.

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. In particular, please read FAQ #12 and use dataex when posting sample data to Statalist.

To install dataex, type ssc install dataex and when that is complete type help dataex to read the simple instructions for using it. Using dataex will enable those who want to help you to quickly and easily create a 100% faithful replica of your situation to test their ideas and code on.

You can import the sample data you show above into Stata and provide it via dataex. Or you can take your confidential data in Stata and change enough of it so it is like your sample data and has no confidential information.

Last edited by William Lisowski; 07 Jun 2017, 08:37.
1 like
Comment

Liz owu

Join Date: May 2017
Posts: 30

12 Jun 2017, 05:15

Dear William,
thank you for the reply, and sorry if this was not clear. I only want to group the "Date" variable. For example i want to group in such a way that all dates 02/10/2014 = 1; [in this sample there are 6 days).
07/10/2014 = 2; [here there are 7 days].

I tried egen but got an error message "variable not sorted". Then I sorted, but got the same message. Grateful for any help. thanks in advance

Date	ISIN	asset val	group	id	ValueDate	BookValue
02/10/2014	xx1	15	Jum	PT1	04/11/2014	145
02/10/2014	xx2	20	Pfand	ex5	11/12/2014	218
02/10/2014	xx3	25	loans	ex10	26/11/2014	16
02/10/2014	xx4	30	shares	ex15	27/10/2014	18
02/10/2014	xx4	35	securities	ex20	16/12/2014	57
02/10/2014	xx5	40	ABS	ex21	22/12/2014	58

07/10/2014	xx1	48	Jum	PT1	06/11/2014	118
07/10/2014	xx21	36	Pfand	ex5	27/10/2014	227
07/10/2014	xx33	10	loans	ex10	07/11/2014	316
07/10/2014	xx40	18	shares	ex15	04/11/2014	211
07/10/2014	xx42	12	securities	ex20	12/11/2014	108
07/10/2014	xx51	10	ABS	ex21	11/12/2014	421
07/10/2014	xx51	5	jum	f12	18/11/2014	14

14/10/2014	xx1	15	Jum	PT1	27/11/2014	145
14/10/2014	xx2	20	Pfand	ex5	19/11/2014	218
14/10/2014	xx3	25	loans	ex10	27/11/2014	16
14/10/2014	xx4	30	shares	ex15	21/11/2014	18
14/10/2014	xx1	35	securities	ex20	19/11/2014	57
14/10/2014	xx21	40	ABS	ex21	23/10/2014	58
14/10/2014	xx33	8	jum	n34	16/12/2014	60

Comment

Liz owu

Join Date: May 2017

Posts: 30
#4

27 Apr 2018, 06:38

I have a dataset with two identical variables which I want to compare and display the difference as another variable.

Variable 1 replicates patient (id) a number of times because of several visits to the doctor each year.
Variable 2 replicates patient (id) a number of times but to a lesser extent - less observations on each date.
Both variables are string variables.

I have tried a number of combinations of the "foreach" and for "forevalue" loops but this does not seem to work.

I am grateful for any urgent help on this matter.
thanks in advance.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#5

28 Apr 2018, 04:23

Can you provide a data example showing the two variables and what you expect the third variable to look like?
Comment

Liz owu

Join Date: May 2017
Posts: 30

28 Apr 2018, 12:40

thanks for the reply. I want to compare Var_code 1 date 1 with Var_app date_app and list the difference if Var_code 1 date 1 cannot be found in var_app Date_app. thank you

Under two circumstances.
1. all variables are in the same database
2. if variable are in two separate databases

Var_code1	date1	date_app	Var_app
ES0305085005	28-Dec-17	11/25/2014	XS0528006090
ES0305085005	28-Sep-17	11/25/2014	ES0374273003
ES0305085005	29-Jun-17	11/25/2014	IT0004790918
ES0305085005	30-Mar-17	11/25/2014	IT0004790918
ES0305085005	24-Sep-15	11/25/2014	IT0004790918
ES0305085005	29-Sep-16	11/25/2014	IT0004790918
ES0305085005	31-Dec-15	11/26/2014	ES0374273003
ES0305085005	31-Mar-16	11/27/2014	XS1135366240
ES0305085005	30-Jun-16	11/27/2014	XS1135365515
ES0305085005	29-Dec-16	11/27/2014	XS1135365788
XS1314233732	29-Sep-16	11/28/2014	IT0004790918
XS1314233732	29-Jun-17	11/28/2014	IT0004790918
XS1314233732	29-Dec-16	11/28/2014	IT0004790918
XS1314233732	28-Dec-17	12/01/2014	XS0572338936
XS1314233732	30-Mar-17	12/01/2014	XS0572336997
XS1314233732	30-Jun-16	12/01/2014	XS0572338936

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

28 Apr 2018, 14:42

Look at the FAQs on how to use dataex in the future. I will assume that all your variables are strings but had you used dataex, I would not have had to make this assumption. I have changed your example slightly so that I include some matches. Regarding 2, I suggest that you combine the datasets either by merging or appending and thereafter, you can build on the technique below.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 var_code1 str9 date1 str10 date_app str12 var_app
"ES0305085005" "28-Dec-17" "11/25/2014" "XS0528006090"
"ES0305085005" "28-Sep-17" "11/25/2014" "ES0374273003"
"ES0305085005" "29-Jun-17" "11/25/2014" "IT0004790918"
"ES0305085005" "30-Mar-17" "11/25/2014" "IT0004790918"
"ES0305085005" "24-Sep-15" "11/25/2014" "IT0004790918"
"ES0305085005" "29-Sep-16" "11/25/2014" "IT0004790918"
"ES0305085005" "31-Dec-15" "11/26/2014" "ES0374273003"
"ES0305085005" "31-Mar-16" "11/27/2014" "XS1135366240"
"ES0305085005" "30-Jun-16" "11/27/2014" "XS1135365515"
"ES0305085005" "29-Dec-16" "11/27/2014" "XS1135365788"
"XS1314233732" "29-Sep-16" "11/28/2014" "IT0004790918"
"XS1314233732" "29-Jun-17" "11/28/2014" "IT0004790918"
"XS1314233732" "29-Dec-16" "12/31/2015" "ES0305085005"
"XS1314233732" "28-Dec-17" "12/01/2014" "XS0572338936"
"XS1314233732" "30-Mar-17" "12/01/2014" "XS0572336997"
"XS1314233732" "30-Jun-16" "12/28/2017" "ES0305085005"
end


*MAKE DATES CONSISTENT
gen Date1 = date(date1, "DM20Y")
format Date1 %td
gen Date_app = date( date_app , "MDY")
format Date_app %td

*GENERATE COMBINATIONS OF VARIABLES AND DATES
gen V1D1= var_code1+string( Date1 )
gen VADA= var_app +string( Date_app )

*STORE ALL COMBINATIONS OF VAR_APP AND DATE_APP
levelsof VADA, local(V)
*LOOK FOR MATCHES
foreach v in `V'{
gen `v' = regexm(V1D1, "`v'")
}

*HERE YOU HAVE TO REFER TO VARIABLE COMBINATIONS FROM DATASET (SPECIFIC TO YOUR DATASET)
rename (ES030508500520453- XS113536624020054) (match#), addnumber(1)
egen found= rowtotal(match*)
drop match*
list var_code1 Date1 var_app Date_app found, clean noob

The indicator which I name "found" counts the number of matches of Var_code1 date1 in Var_app date_app and equals 0 if there is no match.

Code:

. list var_code1 Date1 var_app Date_app found, clean noob

       var_code1       Date1        var_app    Date_app   found  
    ES0305085005   28dec2017   XS0528006090   25nov2014       1  
    ES0305085005   28sep2017   ES0374273003   25nov2014       0  
    ES0305085005   29jun2017   IT0004790918   25nov2014       0  
    ES0305085005   30mar2017   IT0004790918   25nov2014       0  
    ES0305085005   24sep2015   IT0004790918   25nov2014       0  
    ES0305085005   29sep2016   IT0004790918   25nov2014       0  
    ES0305085005   31dec2015   ES0374273003   26nov2014       1  
    ES0305085005   31mar2016   XS1135366240   27nov2014       0  
    ES0305085005   30jun2016   XS1135365515   27nov2014       0  
    ES0305085005   29dec2016   XS1135365788   27nov2014       0  
    XS1314233732   29sep2016   IT0004790918   28nov2014       0  
    XS1314233732   29jun2017   IT0004790918   28nov2014       0  
    XS1314233732   29dec2016   ES0305085005   31dec2015       0  
    XS1314233732   28dec2017   XS0572338936   01dec2014       0  
    XS1314233732   30mar2017   XS0572336997   01dec2014       0  
    XS1314233732   30jun2016   ES0305085005   28dec2017       0

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

28 Apr 2018, 16:03

There ia a little bit of reinventing of the wheel in #7. Thinking about it some more, a simple merge will do and will handle both your scenarios 1 and 2 in #6

Code:

*MAKE DATES CONSISTENT
gen Date1 = date(date1, "DM20Y")
format Date1 %td
gen Date_app = date( date_app , "MDY")
format Date_app %td


*GENERATE COMBINATIONS OF VARIABLES AND DATES
gen V1D1= var_code1+string( Date1 )
gen VADA= var_app +string( Date_app )


*GENERATE SEPARATE DATASET WITH ALTERNATE VARIABLE, RENAME AND MERGE BACK
preserve
keep VADA
rename VADA V1D1
contract V1D1
sort V1D1
tempfile VADA
save `VADA'
restore
merge 1:m V1D1 using `VADA'
list var_code1 Date1 _freq if _merge==3

Code:

. list var_code1 Date1 _freq if _merge==3

     +----------------------------------+
     |    var_code1       Date1   _freq |
     |----------------------------------|
  2. | ES0305085005   31dec2015       1 |
 10. | ES0305085005   28dec2017       1 |
     +----------------------------------+

Last edited by Andrew Musau; 28 Apr 2018, 16:06.

Comment

Liz owu

Join Date: May 2017

Posts: 30
#9

29 Apr 2018, 09:58

Dear Andrew,
thank you for taking the time to provide some advise. unfortunately I still have some problems. I tried the merge option and the error message was
variable bothapp does not uniquely identify observations in the master data. Variable bothapp is a combination of
Valuedate isin using the following command:
egen bothapp = concat(Valuedate isin), decode p(" ").

This works
*STORE ALL COMBINATIONS OF VAR_APP AND DATE_APP
levelsof VADA, local(V)

*LOOK FOR MATCHES
foreach v in `V'{
gen `v' = regexm(V1D1, "`v'")
}

how do l list the matches? the database is huge therefore looking manually is very tedious.
thanks again
Liz
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

#10

29 Apr 2018, 12:03

Search the forum as well as Stata's online documentation on merge to see how to resolve the error. #8 is the easiest way to address your problem. If you follow #7, you can just browse the dataset, constraining the displayed results to those where the variable -found- is positive to view matches. I will automate the generation of the -found- variable so that you do not need to look at the first and last elements in the local macro.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 var_code1 str9 date1 str10 date_app str12 var_app
"ES0305085005" "28-Dec-17" "11/25/2014" "XS0528006090"
"ES0305085005" "28-Sep-17" "11/25/2014" "ES0374273003"
"ES0305085005" "29-Jun-17" "11/25/2014" "IT0004790918"
"ES0305085005" "30-Mar-17" "11/25/2014" "IT0004790918"
"ES0305085005" "24-Sep-15" "11/25/2014" "IT0004790918"
"ES0305085005" "29-Sep-16" "11/25/2014" "IT0004790918"
"ES0305085005" "31-Dec-15" "11/26/2014" "ES0374273003"
"ES0305085005" "31-Mar-16" "11/27/2014" "XS1135366240"
"ES0305085005" "30-Jun-16" "11/27/2014" "XS1135365515"
"ES0305085005" "29-Dec-16" "11/27/2014" "XS1135365788"
"XS1314233732" "29-Sep-16" "11/28/2014" "IT0004790918"
"XS1314233732" "29-Jun-17" "11/28/2014" "IT0004790918"
"XS1314233732" "29-Dec-16" "12/31/2015" "ES0305085005"
"XS1314233732" "28-Dec-17" "12/01/2014" "XS0572338936"
"XS1314233732" "30-Mar-17" "12/01/2014" "XS0572336997"
"XS1314233732" "30-Jun-16" "12/28/2017" "ES0305085005"
end


*MAKE DATES CONSISTENT
gen Date1 = date(date1, "DM20Y")
format Date1 %td
gen Date_app = date( date_app , "MDY")
format Date_app %td

*GENERATE COMBINATIONS OF VARIABLES AND DATES
gen V1D1= var_code1+string( Date1 )
gen VADA= var_app +string( Date_app )

*STORE ALL COMBINATIONS OF VAR_APP AND DATE_APP
levelsof VADA, local(V)
local n: word count `V'
local first: word 1 of `V'
local last: word `n' of `V'
*LOOK FOR MATCHES
foreach v in `V'{
gen `v' = regexm(V1D1, "`v'")
}

rename (`first' - `last') (match#), addnumber(1)
egen found= rowtotal(match*)
drop match*
*VIEW MATCHES
browse if found>0

Comment

Liz owu

Join Date: May 2017

Posts: 30
#11

29 Apr 2018, 14:04

Dear Andrew,
Thank you very much. I am sorry I still cannot solve the problem with the merge command so I want to use the last instructions that you kindly provided. but when I run the following command:

local last: word `n' of `V'

I receive the following stata error message - invalid syntax r(198).
grateful if you can please help me. thanks a lot.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

#12

30 Apr 2018, 02:16

Can you show the full commands that you enter up to and including the error. You need to maintain the full sequence

Code:

clear
input str17 VADA
"XS052800609020052"
"ES037427300320052"
"IT000479091820052"
"IT000479091820052"
"IT000479091820052"
"IT000479091820052"
"ES037427300320053"
"XS113536624020054"
"XS113536551520054"
"XS113536578820054"
"IT000479091820055"
"IT000479091820055"
"ES030508500520453"
"XS057233893620058"
"XS057233699720058"
"ES030508500521181"
end


levelsof VADA, local(V)
local n: word count `V'
local first: word 1 of `V'
local last: word `n' of `V'
di "`first'" "     " "`last'"

Code:

 local n: word count `V'

. 
. local first: word 1 of `V'

. 
. local last: word `n' of `V'

. 
. di "`first'" "     " "`last'"
ES030508500520453     XS113536624020054

.

Comment

Liz owu

Join Date: May 2017

Posts: 30
#13

30 Apr 2018, 03:04

Dear Andrew,
thanks very much again for your time. this part works:

*****
levelsof VADA, local(V)
local n: word count `V'
local first: word 1 of `V'
local last: word `n' of `V'
di "`first'" " " "`last'"

foreach v in `V'{
gen `v' = regexm(V1D1, "`v'")
}

*******
this part does not work - to list of differences between VADA and V1D1.
rename (`first' - `last') (match#), addnumber(1)
egen found= rowtotal(match*)
drop match*
*VIEW MATCHES
browse if found>0

I am grateful for the help.
Liz
Comment
Liz owu

Join Date: May 2017

Posts: 30
#14

30 Apr 2018, 03:06

sorry the error message is the following:

egen found= rowtotal(match*)
variable match* not found

thanks
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

#15

30 Apr 2018, 04:52

It is even unnecessary to rename the variables now that we have identified the first and last elements. Try

Code:

levelsof VADA, local(V)
local n: word count `V'
local first: word 1 of `V'
local last: word `n' of `V'
di "`first'" " " "`last'"

foreach v in `V'{
gen `v' = regexm(V1D1, "`v'")
}
egen found= rowtotal(`first' - `last')
drop `first' - `last'
browse if found>0

Announcement

Grouping repetitive dates and graphing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment