Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a categorical variable from several variables

    Hi there,

    as I am new to stata and already scrolled the internet to find the answer, I hope you could help me with my (possibly easy) question:

    I have a dataset asking parents how many children they have and what age they are. A total of 9 variables contain the information of the age for each possible child. If the parent has 2 children, only V 1 and V2 are filled in with the age in years.
    I now like to know how many children are in the 4 age categories 0-6, 7-12, 13-17 and 18+ years. Thus like to create a new categorical variable.

    I already tried the replace command but this won‘t work as it only considers 1 child per observation even when there are more than one.

    gen V_cat=1 if (V1==0| V1==1| V1==2| V1==3| V1==4| V1==5| V1==6) | (V2==0| V2==1| V2==2| V2==3| V2==4| V2==5| V2==6) etc.
    replace V_cat=2 if (V1==7| V1==8| V1==9| V1==10| V1==11| V1==12) | (V2==7| V2==8| V2==9| V2==10| V2==11| V2==12) etc.

    What ist he right way to tell stata, if the value x is in V1 to V9, assign value x to category y?

    Could you help me finding the right approach?

    Grateful for tips.
    Thanks, Luisa

  • #2
    There are several ways to do this. Let's assume that only zero or positive integers make sense.

    Here is a start at rewriting your code.

    Code:
    gen V_cat=1 if V1 < 7 |  V2 < 7 
    replace V_cat=2 if inrange(V1, 7, 12) | inrange(V2, 7,12)
    But your code does not answer your question, as it does not count across children; it just codes up an categorical variable that will end as a code for the age interval of the oldest child. Nor will the information you want fit into a single categorical variable.

    So, let's go straight to what I think you want.

    Code:
    forval j = 1/4 {
         gen wanted_`j' = 0
    }
    
    forval k = 1/9 {
          replace wanted_1 = wanted_1 + inrange(V`k', 0, 6)
          replace wanted_2 = wanted_2 + inrange(V`k', 7, 12)
          replace wanted_3 = wanted_3 + inrange(V`k', 13, 17)
          replace wanted_4 = wanted_4 + inrange(V`k', 18, .)
    }

    Note from

    Code:
    . di inrange(., 18, .)
    0
    that missing values are not regarded as being in the range from 18 to missing.

    See also

    https://journals.sagepub.com/doi/pdf...867X0600600413

    https://journals.sagepub.com/doi/pdf...867X1101100308

    Comment


    • #3
      Hi Nick,
      thanks so much for your helpful reply. I'll have a try with this approach.

      Comment

      Working...
      X