Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Group categorical variables into a group

    Hello,

    for my Master thesis I am at the step of preparing the data to run a multinomial logit model to reflect the country choice of migrants. Right now I want to group all possible countries to which people can migrate into 5 or six categories (eg low income, low-middle income, etc). Can anyone advise me on how to proceed?

    Thanks

  • #2
    Julia:
    welcome to this forum.
    You may want to see the -group- function, -egen- command.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      The classification is your choice. There are various standard classifications, but as the joke says: the problem with standards is that there are so many to choose from... So that is really something you have to sit down and just make a choice. After that implementing it is just a large recode command. I realize that this is not a very specific answer, but it is the best I can do given the question.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        https://www.stata.com/support/faqs/d...s-for-subsets/ explains one approach:

        1. prepare a list of countries with their categories

        2. merge with your main dataset

        Alternatively, you aren't telling us much about your data. If you had income data in the main dataset it might be easier.

        However, why categorise at all? Why not use country income (GDP pc?) as a predictor?

        P.S. Curious why "for my Master thesis" makes any difference to a problem? (Try "for my book" "for a research paper" "for consultancy": it makes no difference....)

        Comment


        • #5
          If I understood right, you may deal with the golden couple, generate + replace. As already pointed out, you gave no data to work on.

          Assuming "income" is a variable already included in the data set, you may do something like:

          Code:
          gen inc_country = .
          replace inc_country = 1 if income < #
          replace inc_country = 2 if income > # & income < #
          */ et cetera - I recommend to use - label define - plus - label values - for fully clarification
          */ to know which countries are which:
          tab country inc_country
          Best regards,

          Marcos

          Comment

          Working...
          X