Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining two parents education variables to one variable for both

    Dear Stataforumists,

    I am currently having some trouble combining two variables into one. The reason for this is my data set has a high non-response on the father's education's variable, and using both mother and father in analyses makes a drop in observations. The aim is to increase N by combining the two variables.

    My dataset has N=208. The mom variable has n=161 recorded obs, while dad n=114.
    The two variables (momeducation, dadeducation) is divided like this:
    Mom (morutd) Dad (farutd)
    Short education (1) 17 26
    Upper Secondary (2) 93 56
    Long education (3) 51 32
    missing 47 64
    To combine them I have tried three combinations in Stata, but none give me the desired result.

    Code:
    g edu=.
    replace edu=farutd_spes if morutd_spes==.
    replaceedu=morutd_spes if farutd_spes==.
    
    replace edu=morutd_spes if morutd_spes>farutd_spes & morutd_spes<.
    replace edu=farutd_spes if farutd_spes>morutd_spes & farutd_spes <.
    Here the end result is 64% missing. 1 becomes n=2. 2 n=39. 3. n=33. So this makes no sense.

    The other code I tried is

    Code:
    g edu=.
    
    
    replace edu=farsutd if morsutd==.
    
    replace edu=morsutd if farsutd==.
    
    replace edu=morsutd if morsutd>farsutd & farsutd <. & morsutd<.
    
    replace edu=morsutd if morsutd>farsutd & farsutd <. & morsutd<.
    Also with the same effect, but higher missing.


    Any ideas or thoughts on how to increase N by combining the two variables?

    Thanks in advance.
    Last edited by Jonas Mathisen; 13 Feb 2020, 03:32.

  • #2
    Does this give you what you want?

    Code:
    egen edu = rowmax(farsutd morsutd)

    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Hey Jonas, welcome to Statalist.

      You don't provide a snippet of your dataset (type help dataex on Stata for more info on how to do that), so I'll create a toy example.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(father_ed mother_ed)
      1 1
      1 2
      1 3
      1 .
      2 1
      2 2
      2 3
      2 .
      3 1
      3 2
      3 3
      3 .
      . 1
      . 2
      . 3
      . .
      end
      The code below creates a new variable edu with the maximum value of the variables father_ed and mother_ed for each observation (row), ignoring missing value.

      Code:
      egen edu = rowmax(father_ed mother_ed)
      In the example dataset above, all but one observations are assigned a maximum education level, that came either from their mother or father or from both (in case the education level is the same). One observation is kept missing, because it is the one observation that does not have information of education level of neither the father or the mother.

      Comment


      • #4
        Thanks you Igor and Bruce for this command! And thanks Igor for welcoming me and the dataex suggestion.

        The egen command seems to do the trick indeed. The missing values has decreased to 21% and the number of N in my regression is increased - excellent. To understand more I think I will look through and compare the old and new variables to make sure I understand it correctly. Thanks again for taking the time to explain the full command and function, Igor.


        Best regards,
        Jonas.

        Comment

        Working...
        X