Grouping repetitive dates and graphing

Liz owu

Join Date: May 2017

Posts: 30
#16

30 Apr 2018, 06:17

Dear Andrew,
thanks again. I tried the following but do not understand why now it is returning a syntax error:
Thanks for your support and guidance. I really need this to work. Thanks again

**************
. local n: word count `V'
. local first: word 1 of `V'
. local last: word `n' of `V'
. di "`first'" " " "`last'"
18087 DE000EH094Y1 21272 IT0005283491 [
. foreach v in `V'{
2. gen `v' = regexm(V1D1, "`v'")
3. }
18087 invalid name
r(198);
end of do-file
r(198);

18087 invalid name - this comes from the combination of dates and isin: 18087 in effect is a transformed date from numeric to string to facilitate the match with isin which is also string

egen VADA = concat(Valuedate isin), decode p(" ")
Now the data looks like this:

V1D1 VADA
AT0000136213 17897 18087 XS0250267647
AT0000136288 17897 18087 DE000EH094Y1
AT0000136312 17897 18087 ES0413770019
AT0000137088 17520 18087 XS0250267647
AT0000137088 17129 18088 ES0414840274
AT0000137088 17402 18088 XS0286031777
AT0000137088 17647 18088 ES0347858005
AT0000137088 17804 18088 ES0413790009
AT0000137088 17897 18091 ES0347859003
AT0000137088 16765 18091 XS0250267647
AT0000137203 17897 18091 FR0010770529
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#17

30 Apr 2018, 06:37

You do not want spaces in the combined variable. Doesn't this work?

Code:

gen VADA= isin+string(Valuedate)

Also make sure you have the same for V1D1 where the order is string id + date
Comment
Liz owu

Join Date: May 2017

Posts: 30
#18

30 Apr 2018, 06:56

Thanks Andrew. it seems to work now. thanks but I receive the following error message: so I think the structure of the output could be revised? thanks again. set maxvar is already set so I have no room. thanks
***
error message
no room to add more variables
Up to 5,000 variables are currently allowed, although you could reset the maximum using set maxvar; see help memory.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#19

30 Apr 2018, 07:04

That is the drawback with this approach, you are creating indicators for each element in the macro. Using the merge command is therefore efficient. How many elements do you have in total? The value of `n' in the following,

Code:

. local n: word count `V' . local first: word 1 of `V' . local last: word `n' of `V' di `n'
Comment
Liz owu

Join Date: May 2017

Posts: 30
#20

30 Apr 2018, 07:22

Thanks again Andrew. I have about 42,000 observations. But the current match is on two variables. the code you provided returns the following error message:
******
local n: word count `V'
local first: word 1 of `V'
local last: word `n' of `V'
di `n'
invalid syntax
r(198);
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#21

30 Apr 2018, 07:29

Run the entire code, only adding the one line

Code:

levelsof VADA, local(V) local n: word count `V' di `n'
Comment
Liz owu

Join Date: May 2017

Posts: 30
#22

30 Apr 2018, 07:58

Thanks very much. please see output:
Successful

. di `n'
25831
thanks
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#23

30 Apr 2018, 08:16

Thanks. Add this as the first line before opening the dataset containing your variables

Code:

set maxvar 27000
Comment
Liz owu

Join Date: May 2017

Posts: 30
#24

30 Apr 2018, 08:29

set maxvar 27000

error message:
no; data in memory would be lost

I saved all data on disc. thanks again Andrew.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10187
#25

30 Apr 2018, 08:46

Before you load any data. Save the commands and have the original datasets

Code:

clear set maxvar 27000

and then the rest of the commands.
Comment
Liz owu

Join Date: May 2017

Posts: 30
#26

30 Apr 2018, 09:24

Thank you very much Andrew. the command works very well - no syntax errors- but found zero observations which is not correct.
there should be some V1D1 found in VADA. Grateful if you could please advise, thanks.

****
levelsof VADA, local(V)
local n: word count `V'
local first: word 1 of `V'
local last: word `n' of `V'
di "`first'" " " "`last'"
foreach v in `V'{
gen `v' = regexm(V1D1, "`v'")
}
egen found= rowtotal(`first' - `last')
drop `first' - `last'
browse if found>0

end
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

#27

30 Apr 2018, 10:04

From #16, it appears that your combinations are not consistent

Code:

V1D1 VADA
AT0000136213 17897 18087 XS0250267647
AT0000136288 17897 18087 DE000EH094Y1
AT0000136312 17897 18087 ES0413770019
AT0000137088 17520 18087 XS0250267647
AT0000137088 17129 18088 ES0414840274
AT0000137088 17402 18088 XS0286031777
AT0000137088 17647 18088 ES0347858005
AT0000137088 17804 18088 ES0413790009
AT0000137088 17897 18091 ES0347859003
AT0000137088 16765 18091 XS0250267647
AT0000137203 17897 18091 FR0010770529

For V1D1, you have ID-DATE and for VADA you have DATE-ID. This could be the issue. Just look at how the example I provide is set up and check whether there are inconsistencies in your set-up.

Comment

Liz owu

Join Date: May 2017

Posts: 30
#28

30 Apr 2018, 11:25

Dear Andrew,
thanks for the help again. I checked the order of the variables are they are consistent. however it is strange that no V1D1 can be found in VADA. V1D1 is a subset of VADA.
Thanks again for all your support and help. I am surprised that it is that complicated to match variables in stata....
Best Liz
******

VADA V1D1
XS025026764718087 AT000013621317897
DE000EH094Y118087 AT000013628817897
ES041377001918087 AT000013631217897
XS025026764718087 AT000013708817520
ES041484027418088 AT000013708817129
XS028603177718088 AT000013708817402
ES034785800518088 AT000013708817647
ES041379000918088 AT000013708817804
ES034785900318091 AT000013708817897
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

#29

30 Apr 2018, 12:36

This routine will pick up matches in terms of both ID and date. Below

Code:

input str12 var_code1 str9 date1 str10 date_app str12 var_app
"ES0305085005" "28-Dec-17" "11/25/2014" "XS0528006090"
"ES0305085005" "28-Sep-17" "11/25/2014" "ES0374273003"
"ES0305085005" "29-Jun-17" "11/25/2014" "IT0004790918"
"ES0305085005" "30-Mar-17" "11/25/2014" "IT0004790918"
"ES0305085005" "24-Sep-15" "11/25/2014" "IT0004790918"
"ES0305085005" "29-Sep-16" "11/25/2014" "IT0004790918"
"ES0305085005" "31-Dec-15" "11/26/2014" "ES0374273003"
"ES0305085005" "31-Mar-16" "11/27/2014" "XS1135366240"
"ES0305085005" "30-Jun-16" "11/27/2014" "XS1135365515"
"ES0305085005" "29-Dec-16" "11/27/2014" "XS1135365788"
"XS1314233732" "29-Sep-16" "11/28/2014" "IT0004790918"
"XS1314233732" "29-Jun-17" "11/28/2014" "IT0004790918"
"XS1314233732" "29-Dec-16" "12/31/2015" "ES0305085005"
"XS1314233732" "28-Dec-17" "12/01/2014" "XS0572338936"
"XS1314233732" "30-Mar-17" "12/01/2014" "XS0572336997"
"XS1314233732" "30-Jun-16" "12/28/2017" "ES0305085005"
end

both ID and date in one combination are in the same observation and the matched ID and date in the other combination is in the same observation. Maybe you have matches in terms of ID but not both ID and date. Can you manually pick out an observation for one combination that matches with an observation for the other combination? If you need matches only in terms of ID, then you don't need to create a variable that combines ID and date.

Last edited by Andrew Musau; 30 Apr 2018, 12:38.

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment