Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine two variables into one

    Hello all.

    In my wide format dataset I have two variables for the country of the respondent, let's say country_surveyA and country_surveyB. Both variables are coded in the same way: 1 is born in Country, 0 is born outside of Country, -1 is "not applicable". When I tab these two variables I find that 99.9% of the observations that are "not applicable" for country_surveyA have values 0 or 1 for country_surveyB - and viceversa. As far as my understanding goes, this is due to the fact that for a part of the sample the answer to the question "Where were you born?" came from a different survey, and a "consolidated" variable was not created for the main release dataset.

    My intention is to create only one variable, Country, so that all the observations are coded either 1 or 0, having removed all the "not applicable" observations (for what I'm doing I don't care whether the answer came from one survey or the other, being this a time-invariant variable). The ID variable was the same in both surveys.

    How can I achieve this?

    Thank you for your help.

  • #2
    Code:
    generate newcountry = .
    replace newcountry = 1 if country_surveyA == 1  |  country_surveyB == 1
    replace newcountry = 0 if country_surveyA == 0  |  country_surveyB == 0
    This will code a new variable, called newcountry, as 1 or 0 as soon as (at least) one of the others is 1 or 0.
    In case of conflict, i.e. surveyA says 1 but surveyB says 0, you could identify these observations separately:

    Code:
    generate errorvariable = 0
    replace errorvariable = 1 if country_surveyA == 1 & country_surveyB == 0
    replace errorvariable = 1 if country_surveyA == 0 & country_surveyB == 1
    replace errorvariable = 2 if country_surveyA == . & country_surveyB == .
    now, you can e.g. type keep if errorvariable == 1 to list only those observations where you have a conflict, or keep if errorvariable == 2 to list those observations that have twice a missing value.

    Best,
    Max

    Comment


    • #3
      Thank you!

      Comment

      Working...
      X