Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selectively copy-pasting dummy variables

    Dear Statalisters,

    I would like to get your help with some command coding.

    a_hidp is a household number, pidp is a personal number. There are 6 dummy variables that are related to social-classes (from Professional to Unskill)
    For example, Professional is the highest social class and Unskill is the lowest social class.
    I want to know the social class of those aged 16-18. However, the 6 dummy variables of those aged 16-18 are almost always 0 as they do not have a job yet.
    But, I can do this by looking at their parents' social classes and copy-pasting the higher values of their parents to the dummy variables of their children aged 16-18.
    This sounds very incomprehensible.

    Let me give you an example; row 8~10 (household: 68141443). This is a family with a 47-year-old mum, a 50-year-old dad and a 17-year-old daughter.
    Her mum has a higher social class (SkillNonMa > SkillManual) than her dad.
    So, I want to copy-paste her mum's dummy variables to those of her daughter.

    Hence, row 10 would look the following way;

    row a_hidp pidp a_sex a_dvage professional MngTechnical SkillNonma SkillManual Partlyskill Unskill
    10 68141443 68141455 female 17 0 0 1 0 0 0


    Would there be any way to do this for all the households?

    Click image for larger version

Name:	111.PNG
Views:	1
Size:	22.9 KB
ID:	1455771






    dataex a_hidp pidp a_sex a_dvage Professional MngTechnical SkillNonManual SkillManual PartlySkill Unskill in 8/20

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(a_hidp pidp) byte a_sex int a_dvage float(Professional MngTechnical SkillNonManual SkillManual PartlySkill Unskill)
    68141443 68141447 2 47 0 0 1 0 0 0
    68141443 68141451 1 50 0 0 0 1 0 0
    68141443 68141455 2 17 0 0 0 1 0 0
    68155043 68155047 2 51 0 0 0 1 0 0
    68155043 68155051 1 57 1 0 0 0 0 0
    68155043 68155059 2 19 0 0 0 0 0 0
    68155043 68155063 1 16 0 0 0 0 0 0
    68197883 68197887 2 48 0 0 1 0 0 0
    68197883 68197891 1 18 0 0 0 1 0 0
    68293083 68293087 2 41 0 0 0 0 0 0
    68293083 68293091 1 46 0 1 0 0 0 0
    68293083 68293095 1 21 0 0 1 0 0 0
    68293083 68293099 1 18 0 0 0 0 0 0
    end
    label values a_hidp a_hidp
    label values a_sex a_sex
    label def a_sex 1 "male", modify
    label def a_sex 2 "female", modify
    label values a_dvage a_dvage
    ------------------ copy up to and including the previous line ------------------

    Listed 13 out of 1674 observations



  • #2
    There is probably a better way to do this, but this should work:
    Code:
    **Create a categorical variable 1-6 for class (0 if all are 0)
    gen cat_class=0
    local i=1
    foreach var of varlist Unskill PartlySkill SkillManual SkillNonManual MngTechnical Professional {
        replace cat_class=`i' if `var'==1
        local ++i
        }
    
    **find the highest class within a household
    bysort a_hidp: egen max_class=max(cat_class)
    **replace the class of 16-18 year olds with the highest household class
    replace cat_class=max_class if inrange(a_dvage, 16, 18)
    
    ***RE-CREATE THE DUMMY VARS
    **re-name old dummy variables
    rename  (Professional MngTechnical SkillNonManual SkillManual PartlySkill Unskill) old_=
    
    **create variables with all zeros
    foreach new_var of newlist Professional MngTechnical SkillNonManual SkillManual PartlySkill Unskill {
        gen `new_var'=0
        }
    
    **replace the zeros with ones based on the categorical variable
    local i=1
    foreach var of varlist Unskill PartlySkill SkillManual SkillNonManual MngTechnical Professional {
        replace `var'=1 if cat_class==`i'
        local ++i
        }
    list, sepby(a_hidp)
    
    *drop the old dummies when you are sure that you have what you want
    *drop old_*
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Dear Carole, thank you for your reply.
      I really appreciate it.

      Would it be possible to know why I have to use local ++i instead of local i++?

      Comment


      • #4
        Try it to find out:

        Code:
        . local i++
        
        . di "`i'"
        ++
        In essence, the rules for defining locals don't include insistence on space(s) following the macro name. When StataCorp implemented incrementing of macros they didn't want to break any code that exploited that, whether deliberately or accidentally. So, for local macros only the syntax Carole used does what you're expecting.

        Comment


        • #5
          Dear Nick,

          Thank you for your help!

          Comment


          • #6
            The below code is a little bit simpler.
            Code:
            egen Combine = concat(Professional-Unskill)
            
            foreach v of var Professional-Unskill {
            bys a_hidp (Combine): replace `v' = `v'[_N] if inrange(a_dvage,16,18)
            
            * You could add qualifier: & Combine == "000000"
            * if you just want to replace for teenagers with "zeros"
            }
            
            drop Combine
            It should be noted that, since your example has no information on the relationship within each family, this code, like the solution of Carole in#2, has to rely on the assumption that each family has only parents and kids (no grand parents, for example).

            With such assumption, the code then merely pick up the highest level of a family member and - in considering this member as mom or dad - assign this highest level to the teenager(s) (who are currently at “zeros” level if you might want to focus).
            Last edited by Romalpa Akzo; 02 Aug 2018, 03:23.

            Comment

            Working...
            X