Creating a dummy if value from other dataset is found

Ali Joser

Join Date: Mar 2021

Posts: 14
#1

Creating a dummy if value from other dataset is found

03 Mar 2021, 04:26

Dear Statalist users,

The past few days I have been trying to accomplish something, which I till now haven't succeeded with. I hope that you can guide me in the right direction with your expertise.

I have the following database, which contains information about the composition of boards of firms, over several years. Let us call this dataset1.

Code:

input str12 ISIN str4 Year long DirectorID double GenderRatio "AN9225711045" "2008" 796988 .889 "AN9225711045" "2008" 797030 .889 "AN9225711045" "2008" 797004 .889 "AN9225711045" "2008" 797036 .889 "AN9225711045" "2008" 797010 .889 "AN9225711045" "2008" 797026 .889 "ANN2879J1070" "2010" 1139247 .875 "ANN4327C1220" "2002" 11445 .875 "ANN4327C1220" "2002" 11728 .875 "ANN4327C1220" "2016" 203384 1 "ANN4327C1220" "2017" 90627 1 "ANN4327C1220" "2017" 16795 1 "ANN4327C1220" "2017" 203384 1 end

I also have a database that contains firms that engaged in M&As. Here, we can find firms that have initiated a bid. Sometimes, alongside the bid, the deal value is also stated. Let us call this dataset 2.

Code:

input str12 AcquirorISINnumber str4 Year str17 DealvaluethEUR "NL0011333752" "2008" "61,900,000.00*" "AN9225711045" "2011" "32,155,000.00*" "IT0000072618" "2008" "61,900,000.00*" "IT0000072618" "2009" "" "IT0000072618" "2010" "" "ANN4327C1220" "2002" "22,277,952.64*" end

What I want to accomplish is the following. I want to create a dummy variable in dataset1 (the one with the board information), which takes the value of 1 if an M&A bid is initiated in a given firm (ISIN) and year and 0 if no bid is initiated. I think this can be accomplished if the same ISIN and year is found in dataset2 (the one with M&A information). Alongside that, I want to create a new variable that takes the deal value, if that is reported.

Although I think the latter can be achieved by a merge, I do not know how to accomplish the dummy variable between two datasets. Can anyone guide me in the right direction or knows how to do this?

Thank you for your time and efforts.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

03 Mar 2021, 08:00

What if there are two bids from the same firm in the same year? If this is possible, then as in your previous Statalist topic, you will again need the joinby command, and when there are multiple bids, each director in that firm and year in database1 will now have multiple observations in the resulting database.

If there could be no more than one bid from a firm in a given year, then the following

Code:

use databset1 generate AcquirorISINnumber = ISIN merge m:1 AcquirorISINnumber Year using dataset2 drop if _merge==2 generate dummy = _merge==3 drop AcquirorISINnumber _merge

would result in the merge you need, and the _merge indicator variable would let you know which observations were matched and thus allow you to create the dummy variable you want.

The logic for joinby would be similar.
1 like
Comment
Ali Joser

Join Date: Mar 2021

Posts: 14
#3

03 Mar 2021, 13:06

William Lisowski

Thank you for sharing your expertise, again. Yes, there are multiple instances of the same firm initiating multiple bids (even more than 2) in the same year. Each director is in my dataset for calculating the gender and independence ratio for every firm-year. Now that I have those, I can also drop them and have those ratios for each firm-year (after plugging in the M&A bids). I will therefore search for a way to delete duplicates of ISIN-year in Stata I guess. Do you think that will give problems with my

Code:

joinby

approach?

Thank you again for your time and efforts.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

03 Mar 2021, 13:57

If you no longer need the director information in dataset1, you should immediately reduce the dataset to one observation per firm-year. Then you will not need to use joinby.

Code:

use dataset1
drop DirectorID
// make sure the GenderRatio is really the same for every director in each firm-year
by ISIN Year, sort: assert GenderRatio==GenderRatio[1]
duplicates drop 
list, abbreviate(16)
generate AcquirorISINnumber = ISIN
merge 1:m AcquirorISINnumber Year using dataset2
drop if _merge==2
generate dummy = _merge==3
drop AcquirorISINnumber _merge
sort ISIN Year
list, abbreviate(16) sepby(ISIN Year)

I ran this code on your example data, having first added a pair of bids for one firm in the same year.

Code:

. list, abbreviate(16) sepby(ISIN Year)

     +------------------------------------------------------------+
     |         ISIN   Year   GenderRatio   DealvaluethEUR   dummy |
     |------------------------------------------------------------|
  1. | AN9225711045   2008          .889                        0 |
     |------------------------------------------------------------|
  2. | ANN2879J1070   2010          .875                        0 |
     |------------------------------------------------------------|
  3. | ANN4327C1220   2002          .875   22,277,952.64*       1 |
     |------------------------------------------------------------|
  4. | ANN4327C1220   2016             1          123.45!       1 |
  5. | ANN4327C1220   2016             1          543.21!       1 |
     |------------------------------------------------------------|
  6. | ANN4327C1220   2017             1                        0 |
     +------------------------------------------------------------+

Comment

Ali Joser

Join Date: Mar 2021

Posts: 14
#5

04 Mar 2021, 06:51

William Lisowski

Thank you very much. The merging seems to go right, but I am running into two issues.

First, when running

Code:

by ISIN Year, sort: assert GenderRatio==GenderRatio[1]

I get

Code:

Assertion is false

. I am not sure, but I think it has to do with the second issue I am running into, stated below.

The second issue is that when I drop DirectorID and all other variables that are linked with directors (like name, age etc.), I still keep multiple values for firms and years. How is this possible, and how can I only have one row/entry per firm and year that states the variables I want and that we have calculated (like gender ratio)?

I hope I have made myself clear. If not please let me know. Thank you for your time and efforts.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

04 Mar 2021, 07:03

I do not understand how the merging seems to go right. Your do-file should stop when the assert command fails. The comment

Code:

// make sure the GenderRatio is really the same for every director in each firm-year

tells you that the assert command is finding that for some combination of ISIN and Year, all the values of GenderRatio are not the same. The do-file then stops running. Did you then select the lines after the assert command and run them? When you get a message like

Code:

assertion is false r(9);

that is an indication of an error that you need to fix, not ignore.

Why is it that GenderRatio is not he same within some firm-year combinations, as it was in your example data? To identify some examples

Code:

by ISIN Year, sort: generate wrong = max(GenderRatio!=GenderRatio[1]) list if wrong, sepby(ISIN Year)

which will list the observations for the ISIN/Year combinations where GenderRatio varies.

Last edited by William Lisowski; 04 Mar 2021, 07:11.
1 like
Comment
Ali Joser

Join Date: Mar 2021

Posts: 14
#7

05 Mar 2021, 07:59

William Lisowski

You are absolutely right, when running the commands in a do-file the do-file stops at the

Code:

assertion is false

command. I meant that the merging seems to go correct if I skip the assertion commands and go straight to the merge command. I know this is not correct to do and I should not ignore it. It was only to test if the command does what I want, I did not save it since I know I need to fix the assertion issue first.

So, following your suggestion to see what is going on I tried to run

Code:

by ISIN Year, sort: generate wrong = max(GenderRatio!=GenderRatio[1])

. However, I am getting

Code:

invalid syntax r(198);

.

I tried to fix it myself by searching the forum for a similar command, but I did not manage to find what is wrong with the command. Did you maybe misspell anything or have an idea what the issue might be?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

05 Mar 2021, 10:52

Change generate to egen, my mistake.

Code:

by ISIN Year, sort: egen wrong = max(GenderRatio!=GenderRatio[1])

This time I tested it, having first changed observation 9 in dataset 1 to produce a failure.

Code:

. input str12 ISIN str4 Year long DirectorID double GenderRatio ISIN Year DirectorID GenderRa~o 1. "AN9225711045" "2008" 796988 .889 2. "AN9225711045" "2008" 797030 .889 3. "AN9225711045" "2008" 797004 .889 4. "AN9225711045" "2008" 797036 .889 5. "AN9225711045" "2008" 797010 .889 6. "AN9225711045" "2008" 797026 .889 7. "ANN2879J1070" "2010" 1139247 .875 8. "ANN4327C1220" "2002" 11445 .875 9. "ANN4327C1220" "2002" 11728 .874 10. "ANN4327C1220" "2016" 203384 1 11. "ANN4327C1220" "2017" 90627 1 12. "ANN4327C1220" "2017" 16795 1 13. "ANN4327C1220" "2017" 203384 1 14. end . by ISIN Year, sort: egen wrong = max(GenderRatio!=GenderRatio[1]) . list if wrong +---------------------------------------------------+ | ISIN Year Direct~D Gender~o wrong | |---------------------------------------------------| 8. | ANN4327C1220 2002 11445 .875 1 | 9. | ANN4327C1220 2002 11728 .874 1 | +---------------------------------------------------+

Let me add some advice for you as a new user of Stata.

I'm sympathetic to you as a new user of Stata - there is quite a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

In this case, you apparently have never encountered the egen command, which would be the obvious place to look when the generate command fails.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively. And it's much more effective than losing over 24 hours on your work because someone else (me) gave you incorrect advice.
1 like
Comment
Ali Joser

Join Date: Mar 2021

Posts: 14
#9

05 Mar 2021, 12:04

William Lisowski

That works perfectly now. I have found the contradictions and will investigate what the case is.

I will surely dive into the manuals before continuing my work. Will indeed save time and posting here. Thank you for all your help and for your suggestion!
Comment

Announcement

Creating a dummy if value from other dataset is found

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment