Merge panel data with same variables but different observations

Victoria Zorzi

Join Date: Mar 2020

Posts: 18
#1

Merge panel data with same variables but different observations

14 Mar 2020, 04:10

Hi

I am trying to merge two different datasets with panel data. One of them contains data about sub-saharan African countries and China's voting behaviour in UNGA from 2000-2014 (with the main variables being countryname and year and vote) and the other one contains data about China's foreign aid to countries in sub-saharan African countries from 2000-2014 (with the main variables being countryname and year and amount).

However, I am not sure how to merge these two dataset as they contain various amount of observations within the given year and countries - there are a lot more observations pr year and country in the UNGA dataset than in the one with foreign aid.

Thank you in advance!
Tags: merge, panel, panel data, Time Series
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

14 Mar 2020, 09:51

Welcome to Statalist.

There is no easy answer to your question.

The answer depends on how you intend to use the resulting data.

It seems to me that for the foreign aid data, you may well want to know aggregate amounts by country and year - the number and size of each separate instance of foreign aid in a year may not matter to you. If that is the case, the collapse command may help you reduce the foreign aid to one observation per country and year, which you can then merge with the voting behavior. That would give you a file of voting information that also has the year's foreign aid information attached to each observation.

Here is a sketch of the sort of code that would use.

Code:

use foreignaid collapse (sum) amount, by(countryname year) merge 1:m countryname year using voting
1 like
Comment
Victoria Zorzi

Join Date: Mar 2020

Posts: 18
#3

15 Mar 2020, 07:12

Originally posted by William Lisowski View Post

Welcome to Statalist.

There is no easy answer to your question.

The answer depends on how you intend to use the resulting data.

It seems to me that for the foreign aid data, you may well want to know aggregate amounts by country and year - the number and size of each separate instance of foreign aid in a year may not matter to you. If that is the case, the collapse command may help you reduce the foreign aid to one observation per country and year, which you can then merge with the voting behavior. That would give you a file of voting information that also has the year's foreign aid information attached to each observation.

Here is a sketch of the sort of code that would use.

Code:

use foreignaid collapse (sum) amount, by(countryname year) merge 1:m countryname year using voting

Hi William

Thank you very much - it worked.

I now merged the two datasets. However, there are a significant amount of 'not matched' observations. Is this something I should be worried about? Would it be better if I just gathered the number of times a country voted the same as China in each year? However, I would like to use the type of UN resolution as a control variable, which I am assuming won't be possible if I pool the numbers together.

What I am trying to investigate is whether the amount of foreign aid from China the countries have received throughout the years has impacted their voting behaviour in UNGA - so that the more they receive, the more they are likely to vote the same as China.

I am, as you can probably tell, very inexperienced with panel data, so I am sorry for all of the questions. Perhaps you may know of a website, book or something alike that explains the use of panel data in STATA well?

Thank you in advance.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

15 Mar 2020, 09:26

I now merged the two datasets. However, there are a significant amount of 'not matched' observations. Is this something I should be worried about?

It is not necessarily something you should be worried about, but you do need to address it.

To effectively discuss how to treat the unmatched observations, we need you to present the merge command you ran and the output it gave you.

Please copy this from Stata's Results window and paste it into a code block in the Forum editor using code delimiters [CODE] and [/CODE], as explained in section 12 of the Statalist FAQ linked to at the top of the page. For example, the following:

[CODE]
. sysuse auto, clear
(1978 Automobile Data)

. describe make price

storage display value
variable name type format label variable label
-----------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
[/CODE]

will be presented in the post as the following:

Code:

. sysuse auto, clear (1978 Automobile Data) . describe make price storage display value variable name type format label variable label ----------------------------------------------------------------- make str18 %-18s Make and Model price int %8.0gc Price

And to increase the effectiveness of your future posts on Statalist, please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

The better you help others understand your problem, the more likely others are to be able to help you solve your problem.
Comment
Victoria Zorzi

Join Date: Mar 2020

Posts: 18
#5

16 Mar 2020, 05:48

Originally posted by William Lisowski View Post

It is not necessarily something you should be worried about, but you do need to address it.

To effectively discuss how to treat the unmatched observations, we need you to present the merge command you ran and the output it gave you.

Please copy this from Stata's Results window and paste it into a code block in the Forum editor using code delimiters [CODE] and [/CODE], as explained in section 12 of the Statalist FAQ linked to at the top of the page. For example, the following:

[CODE]
. sysuse auto, clear
(1978 Automobile Data)

. describe make price

storage display value
variable name type format label variable label
-----------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
[/CODE]

will be presented in the post as the following:

Code:

. sysuse auto, clear (1978 Automobile Data) . describe make price storage display value variable name type format label variable label ----------------------------------------------------------------- make str18 %-18s Make and Model price int %8.0gc Price

And to increase the effectiveness of your future posts on Statalist, please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

The better you help others understand your problem, the more likely others are to be able to help you solve your problem.

Thank you once again.

I am using STATA 16.0 and want to investigate whether the amount of foreign aid the countries have received from China in the period between 2000-2014 has had an impact on the countries voting behaviour in UNGA in the same period. I expect to find that the bigger amount of foreign aid the countries have received the more likely they are to have changed their voting behaviour to voting in line with China.

The results I got was the following:

Code:

. use Aiddataa . merge 1:m country year using UNGAdata.dta Result # of obs. ----------------------------------------- not matched 31,388 from master 33 (_merge==1) from using 31,355 (_merge==2) matched 28,265 (_merge==3) -----------------------------------------
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

16 Mar 2020, 11:52

Thank you.

There were 33 observations on your Aiddataa dataset that were not matched in your UNGAdata. Those are presumably country/year combinations which received aid but for which you have no matching voting data. So those observations can be dropped, since without voting data they don't tell you anything about voting and aid.

There were 31,335 observations in your UNGAdata that were not matched in your Aiddataa dataset. These presumably are votes for which the country did not receive foreign aid from China during the year of the vote.

Some of these will be votes made by China, unless you have already dropped them from the UNGAdata. You will want to drop any observations from China in your merged dataset.

The rest of these unmatched observations are from countries other than China. For them, you will have missing values on your variables of interest coming from the Aiddataa dataset. You will want to replace those missing values with values that are appropriate for a country that received no aid.

You will also want to think about which of these countries you will be including in your models. That is, if the Duchy of Grand Fenwick (a made-up country name) received aid from China in 13 of your 15 years, but not in two of them, they you will likely want to include the Duchy of Grand Fenwick for all 15 years, with values appropriate for having received no data in the two years. If on the other hand, Oceania (another made-up country name) never received aid from China, then perhaps it is not suitable to include it in your modeling.
Comment
Victoria Zorzi

Join Date: Mar 2020

Posts: 18
#7

25 Mar 2020, 07:37

Originally posted by William Lisowski View Post

Thank you.

There were 33 observations on your Aiddataa dataset that were not matched in your UNGAdata. Those are presumably country/year combinations which received aid but for which you have no matching voting data. So those observations can be dropped, since without voting data they don't tell you anything about voting and aid.

There were 31,335 observations in your UNGAdata that were not matched in your Aiddataa dataset. These presumably are votes for which the country did not receive foreign aid from China during the year of the vote.

Some of these will be votes made by China, unless you have already dropped them from the UNGAdata. You will want to drop any observations from China in your merged dataset.

The rest of these unmatched observations are from countries other than China. For them, you will have missing values on your variables of interest coming from the Aiddataa dataset. You will want to replace those missing values with values that are appropriate for a country that received no aid.

You will also want to think about which of these countries you will be including in your models. That is, if the Duchy of Grand Fenwick (a made-up country name) received aid from China in 13 of your 15 years, but not in two of them, they you will likely want to include the Duchy of Grand Fenwick for all 15 years, with values appropriate for having received no data in the two years. If on the other hand, Oceania (another made-up country name) never received aid from China, then perhaps it is not suitable to include it in your modeling.

Thank you.

I managed to merge my data so that all the observations now match. However, as I understand it I will need to use China's voting data as a control variable. How do I merge this into the existing data set that includes all of the countries voting behaviour as well as the amount of foreign aid they have received?

Thank you in advance.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

26 Mar 2020, 14:30

I think that to answer this question correctly, we need to see examples of your original datasets. I at least have trouble understanding how one dataset would contain both (a) data about sub-saharan African countries and (b) data about China's voting record with the main variables being countryname and year and vote.

Even the best descriptions of data are no substitute for an actual example of the data. In order to get a helpful response, you need to show some example data from each of your datase

Be sure to use the dataex command to do this. If you are running version 15.1 or later, or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use dataex.
Comment

Victoria Zorzi

Join Date: Mar 2020
Posts: 18

27 Mar 2020, 02:36

Originally posted by William Lisowski View Post

I think that to answer this question correctly, we need to see examples of your original datasets. I at least have trouble understanding how one dataset would contain both (a) data about sub-saharan African countries and (b) data about China's voting record with the main variables being countryname and year and vote.

Even the best descriptions of data are no substitute for an actual example of the data. In order to get a helpful response, you need to show some example data from each of your datase

Be sure to use the dataex command to do this. If you are running version 15.1 or later, or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use dataex.

The example of my data that includes the voting data from the Sub-saharan African countries and the amount of aid they have received from China can be seen here:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str32 country int year double amount byte vote
"Angola"                   2000          0 1
"Angola"                   2001     600000 1
"Angola"                   2002  211204087 2
"Angola"                   2003   11000000 8
"Angola"                   2004          0 1
"Angola"                   2005    2000600 1
"Angola"                   2006     200000 1
"Angola"                   2007   21835330 1
"Angola"                   2008  115192000 1
"Angola"                   2009          0 1
"Angola"                   2010          0 1
"Angola"                   2011     309528 1
"Angola"                   2012          0 2
"Angola"                   2013   32280149 1
"Angola"                   2014  248956545 1
"Benin"                    2000          0 1
"Benin"                    2001          0 1
"Benin"                    2002          0 1
"Benin"                    2003          0 1
"Benin"                    2004          0 1
"Benin"                    2005          0 1
"Benin"                    2006          0 1
"Benin"                    2007          0 1
"Benin"                    2008   46052078 1
"Benin"                    2009   12706849 1
"Benin"                    2010          0 1
"Benin"                    2011          0 1
"Benin"                    2012          0 1
"Benin"                    2013   43721150 1
"Benin"                    2014     840000 1
"Botswana"                 2000          0 1
"Botswana"                 2001          0 1
"Botswana"                 2002          0 1
"Botswana"                 2003   96652948 1
"Botswana"                 2004  236968830 1
"Botswana"                 2005          0 1
"Botswana"                 2006   28510023 1
"Botswana"                 2007    2628974 1
"Botswana"                 2008     161128 1
"Botswana"                 2009      27952 1
"Botswana"                 2010  3.248e+08 1
"Botswana"                 2011     241361 1
"Botswana"                 2012          0 1
"Botswana"                 2013          0 1
"Botswana"                 2014      19496 1
"Burundi"                  2000          0 1
"Burundi"                  2001   45722577 1
"Burundi"                  2002          0 1
"Burundi"                  2003          0 1
"Burundi"                  2004   15820000 1
"Burundi"                  2005          0 1
"Burundi"                  2006   12544835 2
"Burundi"                  2007   57867899 1
"Burundi"                  2008   33372862 1
"Burundi"                  2009          0 1
"Burundi"                  2010          0 3
"Burundi"                  2011          0 8
"Burundi"                  2012    1546570 1
"Burundi"                  2013          0 1
"Burundi"                  2014   15993877 8
"Cameroon"                 2000          0 1
"Cameroon"                 2001   24525592 2
"Cameroon"                 2002    5738978 8
"Cameroon"                 2003     313146 8
"Cameroon"                 2004          0 1
"Cameroon"                 2005          0 1
"Cameroon"                 2006          0 1
"Cameroon"                 2007  140425956 1
"Cameroon"                 2008          0 1
"Cameroon"                 2009 1069542049 8
"Cameroon"                 2010   46507400 1
"Cameroon"                 2011  6.416e+08 1
"Cameroon"                 2012 2444679367 1
"Cameroon"                 2013     607238 1
"Cameroon"                 2014          0 1
"Cape Verde"               2000          0 1
"Cape Verde"               2001          0 1
"Cape Verde"               2002          0 2
"Cape Verde"               2003          0 1
"Cape Verde"               2004          0 8
"Cape Verde"               2005   31577648 1
"Cape Verde"               2006    6908967 1
"Cape Verde"               2007          0 1
"Cape Verde"               2008          0 3
"Cape Verde"               2009          0 1
"Cape Verde"               2010          0 1
"Cape Verde"               2011          0 1
"Cape Verde"               2012          0 1
"Cape Verde"               2013          0 1
"Cape Verde"               2014          0 1
"Central African Republic" 2000          0 8
"Central African Republic" 2001          0 8
"Central African Republic" 2002          0 8
"Central African Republic" 2003          0 1
"Central African Republic" 2004          0 1
"Central African Republic" 2005          0 8
"Central African Republic" 2006          0 1
"Central African Republic" 2007     418301 1
"Central African Republic" 2008          0 8
"Central African Republic" 2009          0 8
end

My dataset that includes China's voting behaviour in UNGA can be seen by the following example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 country int year byte vote
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 2
"China" 2000 1
"China" 2000 2
"China" 2000 1
"China" 2000 2
"China" 2000 2
"China" 2000 1
"China" 2000 2
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 2
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 2
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 3
"China" 2000 2
"China" 2000 3
"China" 2000 2
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2000 1
"China" 2001 1
"China" 2000 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 2
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 2
"China" 2001 2
"China" 2001 1
"China" 2001 2
"China" 2001 1
"China" 2001 1
"China" 2001 2
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
"China" 2001 1
end

I am considering using the Chinese voting behaviour as a control variable - however, as previously stated, I am unsure about how to merge these dataset.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

28 Mar 2020, 13:10

I cannot imagine what you want to do with this. You have 66 observations of China in the year 2000, with a total of 79 votes. If you were doing this by hand, what would you expect to add onto the following observations from your voting data?

Code:

"Angola" 2000 0 1 "Benin" 2000 0 1 "Botswana" 2000 0 1 "Burundi" 2000 0 1 "Cameroon" 2000 0 1 "Cape Verde" 2000 0 1 "Central African Republic" 2000 0 8
Comment

Announcement