Group categorical variables into a group

Julia Brunet

Join Date: May 2019

Posts: 8
#1

Group categorical variables into a group

13 May 2019, 05:15

Hello,

for my Master thesis I am at the step of preparing the data to run a multinomial logit model to reflect the country choice of migrants. Right now I want to group all possible countries to which people can migrate into 5 or six categories (eg low income, low-middle income, etc). Can anyone advise me on how to proceed?

Thanks
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

13 May 2019, 05:23

Julia:
welcome to this forum.
You may want to see the -group- function, -egen- command.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#3

13 May 2019, 05:23

The classification is your choice. There are various standard classifications, but as the joke says: the problem with standards is that there are so many to choose from... So that is really something you have to sit down and just make a choice. After that implementing it is just a large recode command. I realize that this is not a very specific answer, but it is the best I can do given the question.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#4

13 May 2019, 05:24

https://www.stata.com/support/faqs/d...s-for-subsets/ explains one approach:

1. prepare a list of countries with their categories

2. merge with your main dataset

Alternatively, you aren't telling us much about your data. If you had income data in the main dataset it might be easier.

However, why categorise at all? Why not use country income (GDP pc?) as a predictor?

P.S. Curious why "for my Master thesis" makes any difference to a problem? (Try "for my book" "for a research paper" "for consultancy": it makes no difference....)
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

13 May 2019, 09:19

If I understood right, you may deal with the golden couple, generate + replace. As already pointed out, you gave no data to work on.

Assuming "income" is a variable already included in the data set, you may do something like:

Code:

gen inc_country = . replace inc_country = 1 if income < # replace inc_country = 2 if income > # & income < # */ et cetera - I recommend to use - label define - plus - label values - for fully clarification */ to know which countries are which: tab country inc_country

Best regards,

Marcos
Comment

Announcement

Group categorical variables into a group

Comment

Comment

Comment

Comment