Merging two cross sectional data as pseudo panel

akwa amps

Join Date: Sep 2016

Posts: 9
#1

Merging two cross sectional data as pseudo panel

15 Sep 2016, 21:25

Hi everyone, I have an urgent request and I humbly request you help me out.
I wish to merge two cross-sectional data so as to estimate the pre and post effect of a policy.
I guess this is a pseudo panel data, and I want to know if it will be appropriate to use the merging command using the year of birth and sex as the one to one key variables.
I would be glad to hear of different ways of doing this.
Thank you for your timely response.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

16 Sep 2016, 00:03

More information is needed to answer your question. It would be helpful if you showed brief, representative examples from both data sets. You also need to describe what kind o analysis you plan to do with the data. In general, if you want to make a pseudo-panel out of two cross-sections, the -append- command would more often be appropriate than -merge-. But perhaps you are trying to form matched pairs based on birth year and sex. There are ways to do that, and they may involve -merge-, but it would be surprising to see a data set in which year of birth and sex uniquely identify the observations, unless the observations are actually aggregate data.

To show examples of your data, download and install the -dataex- command by running -ssc install dataex-. Read -help dataex- for the simple instructions, and then use it for each of the two data sets. Then describe the analysis you plan to do, specifically referring to the variables in your data by name. You will be able to get a more specific answer then.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#3

16 Sep 2016, 00:33

Akwa:
as an aside to Clyde's excellent advice, it would also be interesting to know whether the two pseudo-panels refer to the same sample unit (i.e. the dependent variable is the gross domestic product of the same set of nations before and after hypothetical an embargo) or not.

Kind regards,
Carlo
(Stata 19.0)
Comment
akwa amps

Join Date: Sep 2016

Posts: 9
#4

19 Sep 2016, 23:32

Thank you Clyde and Carlo. Yes the data is actually a household survey data of Uganda. Respondents were asked the same questions in both surveys and they contain demographic and economic information of the respondents. What i intend to do is to actually pair these respondents by their month and year of birth together with their sex and ethnic background. I am hoping there could be possible match. I intend to find the effect of a wage policy change for Public sector workers in Uganda which actually came before the first survey but after the second survey. So i want to know how to go about this. Though i can pool and use DID for this, I would like to support this with a matching technique by having the two income variables and running a propensity score matching too.
Unfortunately Clyde, I don't have the data set with me now. I am not by the computer with the data. I will surely do as you suggested with the dataex. Thank you
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#5

19 Sep 2016, 23:48

Akwa:
thanks for providing more details.
You have two surveys and respondents were not (necessarily) the same.
Before any analysis, you should get familiar with Stata -svy- prefix.

Kind regards,
Carlo
(Stata 19.0)
Comment
akwa amps

Join Date: Sep 2016

Posts: 9
#6

22 Sep 2016, 01:13

Thank you very Much Mr. Carlo lazarro.
Yes please I checked the command svy, and also I read a paper by Deaton (1985). What i gathered was that, since it is a repeated cross section, using the ID from each survey to make a panel would be wrong since the person with the id from the first survey might have a different id in the second one hence the use of invariant characteristics like year of birth and sex. So merging the data has to be done using possible invariant variables.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

22 Sep 2016, 02:30

Akwa:
I'm not sure I follow your last statements, whereas I agree that a panel cannot be created from your data, as the ids from the two surveys are, in all likelihood, different.
Just an aside: as per FAQ, please post full reference of everything you quote, as that contribution might be useful for others on the list. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
akwa amps

Join Date: Sep 2016

Posts: 9
#8

22 Sep 2016, 03:41

Thank you Carlo.
Please this is the paper from Deaton "Deaton, Angus (1985), Panel Data from Time Series of Cross Sections, Journal of Econometrics,30, 109–126."
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

22 Sep 2016, 03:47

Akwa:
well done, thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Diana Abdwahab

Join Date: Apr 2016

Posts: 7
#10

01 Jan 2017, 12:12

If I understand it well, you need to create a pseudo panel data set from two sets of cross sectional data sets. It's understood that in both cross sectional data sets the observations are different individuals, and you need to create cohorts of individuals based on birth year and sex. Say you have variables x1 x2 x3, so the suggested codes is:

Code:

use data.dta, clear collapse x1 x2 x3 if sex==0, by(birth_year) save col1.dta, replace use data.dta, clear collapse x1 x2 x3 if sex==1, by(birth_year) save col2.dta, replace append using col1.dta gen ID=_n save pseudo.dta, replace

Now you have a pseudo panel data set where each observation (ID) is a cohort sharing the same sex and birth year. The total number of observation is now N(birth_year)*N(sex).
Comment
akwa amps

Join Date: Sep 2016

Posts: 9
#11

22 Feb 2017, 02:44

Hello Diana Abdwahab thank you for your suggestion. I am sorry for the late reply.
Comment

akwa amps

Join Date: Sep 2016
Posts: 9

#12

22 Feb 2017, 02:47

I have been working on this and came up with this command. I want to please know if this is the right way of forming pseudo panels as proposed by Deaton (1985)

Code:

clear
webuse nlswork
gen Byear= birth_yr
recode Byear (41/43=43) (54=53)
tab Byear
tab race
tab year
bysort Byear race year: egen newincome= mean(ln_wage)
bysort Byear race year: egen newgrade= mean( grade )
bysort Byear race year: egen newwks= mean( wks_work )
bysort Byear race year: egen newexp= mean(ttl_exp)
sum ln_wage grade wks_work ttl_exp newincome newgrade newwks newexp
egen Cohorts=group(Byear race)
xtset Cohorts
xtreg newincome newgrade newwks newexp,fe
estimates store FE1
xtset idcode
xtreg ln_wage grade wks_work ttl_exp,fe
estimates store FE2
esttab FE1 FE2

I look forward to hearing your take on this. Thank you.

Comment

samra khalid

Join Date: Apr 2017

Posts: 3
#13

29 Apr 2017, 00:35

hi every one i am P.hd economic scholar. My study on primary data unit of analysis is household. please guide me can i collect the data about two observation on income (past income when start the job and current income start of interview) at one point of time through questionnaire. because i can not use longitudinal data
Comment
samra khalid

Join Date: Apr 2017

Posts: 3
#14

29 Apr 2017, 00:37

please also guide me about pesuado panal data how can i use it. it is restriction for me to study on primary data. my topic has been approved by advance board no other option
Comment
samra khalid

Join Date: Apr 2017

Posts: 3
#15

29 Apr 2017, 00:40

for mobility analysis, it is required that data should be over the time period so that we can check the effect of current and past income
Comment

Announcement

Merging two cross sectional data as pseudo panel

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment