merging datasets

Anastasia Lysikov

Join Date: Jun 2023

Posts: 1
#1

merging datasets

13 Jun 2023, 06:01

hi, I am currently writing my thesis about the effect of firm characteristics on IPO underpricing. My independent variable is thus underpricing and my independent variables are: firm age, firm size, profitability and growth potential. The metrics chosen are total assets for firm size, for age it is pretty straightforward it is just the amount of years since the founding of the firm when the IPO occurred, profitability is measured by ROA and growth potential by the amount spent on R&D expenses. However I am experiencing issues with regards to merging the datasets. I have one excel sheet with the data for the variables firm size, profitability and R&D expenses and another excel sheet with the data for the underpricing % and the age of the firm. All these variables are linked using the ticker codes. Anyone has tips on how to merge these two sheets to have oe big one with all the corresponding information?
Thank you in advance!!
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1060
#2

13 Jun 2023, 07:20

Welcome to Statalist.

Generally, it's either "merge 1:1", "merge m:1", or "merge 1:m". Submit a command "help merge" in Stata to see the online document.

Which one to choose depends on the identification variable, like the "ticker codes" mentioned in the question. The question is if the codes are unique in the data set. And by unique, I mean each code only appears ONCE in the data set.

Assuming the data opened in Stata is called A, and the data to be merged is called B:
If ticker codes in A are unique and ticker codes in B are also unique, then merge 1:1

If ticker codes in A are repeated and ticker codes in B are unique, then merge m:1

If ticker codes in A are unique and ticker codes in B are repeated, then merge 1:m

If ticker codes in both A and B are repeated, then you'll have to find additional variable to go with ticker code so that the case can be identified. In any case, DO NOT USE "merge m:m" as the results are nearly always wrong.
Comment

Announcement

Comment