Help on Thesis : Migrant wage gap Data Analysis.

mama kay

Join Date: Jul 2019

Posts: 16
#1

Help on Thesis : Migrant wage gap Data Analysis.

05 Jul 2019, 03:37

Hi,

My thesis is on assessing the impact of nationality on workers earnings in UK. I have downloaded the household survey and earnings (7yrs) and tried sorting the data however I had a great difficulty in merging my data into a single dataset on Stata.

Seeing, i wasn't making any headway, i tried it on Excel using conditional formatting. The issue now is that, I seem to have an unbalanced panel data in which an individual observation appears for 2yrs at least or more but not in all the 7 years.

My question ;

1. Can i go ahead and work with this dataset i currently have
2. How do i run a regression on it

3. What are the important things i need to do on Stata for analysis my data.

I would appreciate if you can please give me a proper guideline because I'm really confused.

I'm new to stata and to research.

Sorry for the long post. I appreciate a quick response

Thanks
Tags: None
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#2

05 Jul 2019, 03:56

For assembling your dataset, you should probably have a look at append rather than merge.: https://www.stata.com/manuals13/dappend.pdf

Your type of data is panel data and likely is best analyzed with panel data methods. A good intro into how to do so in Stata is: https://www.princeton.edu/~otorres/Panel101.pdf
Many more such lecture slides can be found online.
Some more technical details and further examples can be found in Stata's manual for xtreg: https://www.stata.com/manuals13/xtxtreg.pdf
Comment
mama kay

Join Date: Jul 2019

Posts: 16
#3

05 Jul 2019, 07:37

Originally posted by Jorrit Gosens View Post

For assembling your dataset, you should probably have a look at append rather than merge.: https://www.stata.com/manuals13/dappend.pdf

Your type of data is panel data and likely is best analyzed with panel data methods. A good intro into how to do so in Stata is: https://www.princeton.edu/~otorres/Panel101.pdf
Many more such lecture slides can be found online.
Some more technical details and further examples can be found in Stata's manual for xtreg: https://www.stata.com/manuals13/xtxtreg.pdf

Thank you so much for the feedback.

I need some clarification because someone explained that my dataset must contain individuals who were constantly interviewed over the time period. Thus, suggested i use the merge code.

My question is that, In checking the impact of nationality, gender, age etc on earnings gap. Is it compulsory to have a dataset with the same individual all through the number of years under review?

I have some individuals who were not interviewed in some years.. Can i still use the append code in this regard? Secondly, would it not affect my result.

Thanks
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#4

06 Jul 2019, 01:56

I do share Jorrit's helpful comments.
In addition:
- yes, in order to have a panel data, you should have the same sample of units measured at (almost) equally spaced time intervals;
- it's frequent to have units that skip some measurement and, in general, this does not affect panel data regression feasibility (and you can still use -append-).

Kind regards,
Carlo
(Stata 19.0)
Comment
mama kay

Join Date: Jul 2019

Posts: 16
#5

06 Jul 2019, 02:34

Originally posted by Carlo Lazzaro View Post

I do share Jorrit's helpful comments.
In addition:
- yes, in order to have a panel data, you should have the same sample of units measured at (almost) equally spaced time intervals;
- it's frequent to have units that skip some measurement and, in general, this does not affect panel data regression feasibility (and you can still use -append-).

Thank you so much. I have just used the append code on Stata to merge all the data as advised by @ Jorrit Gosens

However, I noticed the following;

1. Many respondents (IDs) only appeared in a year.

2. Most respondents (IDs) appeared in at most 4 years and not in all the 7years period.

xtset individualid surveyyear
panel variable: individualid (unbalanced)
time variable: surveyyear, 2010 to 2017, but with gaps
delta: 1 unit

.
Brief background of my objective:

I want to analyse the effect of nationality, occupation, gender, region etc on worker's earning/wage gap. For instance, I would like to ascertain if a migrant who works in the Healthcare industry and lives in London earns better than its home born counterparts who live in London too. I also want to examine if regions have an effect on migrant's earnings. E.g, the earnings pay gap of two individuals with the same characteristics but lives in a different city.

Based on the above analysis, I was thinking my data should have the same IDs all through the 7year period so as to see the progression in their earnings or education over the years and make conclusions based on the result.

Please note: My data has the same 13 variables all through the 7yrs period. eg Age, sex, occupation, employment status, income, region, nationality etc

My question is;

Should I remove the IDs who were interviewed only once (appeared only in one year)?

How do I ensure I have a balanced panel data?

Would an unbalanced panel data have a negative effect on my overall result for my Thesis?

I would really appreciate if my questions can be clarified. I'm new to Stata and confused.

Thank you so much.

Last edited by mama kay; 06 Jul 2019, 03:16.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#6

06 Jul 2019, 03:01

Point-to-point replies to your question:
1) You shoiuld not remove IDs with one wave of data only. Conversely, you would end up with a subsample of your original dataset (and relevant biases are likely).
2) As per your description, you do not have a balanced panel, nor you should try to go for that. Stata can handle both balanced and unbalanced panel datasets with no problem.
3) Dealing with an unbalanced panel dataset is pretty frequent. Set aside severe unbalance, no problem for your analysis.

As an aside, please not that -append- and -merge- do different jobs; hence, stating that you -merge- your datasets via -append- sounds a bit weird/incorrect (you probably meant that you combined your datasets via -append-).
Last but not least, as per FAQ you should be aware that attachments are discouraged on this forum. As per FAQ again, this forum endorses other ways to share what you typed and what Stata gave you back (see CODE delimiters) and/or an example/excerpt of your data (see -dataex-). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
mama kay

Join Date: Jul 2019

Posts: 16
#7

06 Jul 2019, 03:15

Originally posted by Carlo Lazzaro View Post

Point-to-point replies to your question:
1) You shoiuld not remove IDs with one wave of data only. Conversely, you would end up with a subsample of your original dataset (and relevant biases are likely).
2) As per your description, you do not have a balanced panel, nor you should try to go for that. Stata can handle both balanced and unbalanced panel datasets with no problem.
3) Dealing with an unbalanced panel dataset is pretty frequent. Set aside severe unbalance, no problem for your analysis.

As an aside, please not that -append- and -merge- do different jobs; hence, stating that you -merge- your datasets via -append- sounds a bit weird/incorrect (you probably meant that you combined your datasets via -append-).
Last but not least, as per FAQ you should be aware that attachments are discouraged on this forum. As per FAQ again, this forum endorses other ways to share what you typed and what Stata gave you back (see CODE delimiters) and/or an example/excerpt of your data (see -dataex-). Thanks.

Hi carlo,

I'm so grateful for your prompt response and for the correction on merge and append. As regards, the attachment of files, I'm deeply sorry to have violated the rules here, I didn't know attachments are prohibited. Please, pardon me. I have now removed it

I will make use of datex going forward.

Thank you so much.
Comment

Announcement

Help on Thesis : Migrant wage gap Data Analysis.

Comment

Comment

Comment

Comment

Comment

Comment