Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two variables with labels

    Hi - I have two variables that I'd like to merge with their labels (I have gender represented in two datasets, so there would be no overlap in the same variable in the two datasets) but when I do concat, I'm getting the spaces in front and behind for the missing var's... advice?

    Code:
    egen GENDER3=concat(v_q1 c_q1), decode p(" ")
    
    
    
    . tab GENDER3
    
        GENDER3 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
       . Female |         76        9.45        9.45
         . Male |        326       40.55       50.00
       Female . |          2        0.25       50.25
         Male . |        400       49.75      100.00
    ------------+-----------------------------------
          Total |        804      100.00

  • #2
    I guess that they're both string variables? (I'm not sure why egen . . . concat() yields a numeric missing value in its concatenated string.)

    You could try something like the following.

    .ÿversionÿ16.1

    .ÿclearÿ*

    .ÿquietlyÿsetÿobsÿ2

    .ÿgenerateÿstrÿv_q1ÿ=ÿcond(_nÿ==ÿ1,ÿ"Female",ÿ"Male")

    .ÿquietlyÿsetÿobsÿ4

    .ÿgenerateÿstrÿc_q1ÿ=ÿcond(_nÿ==ÿ3,ÿ"Female",ÿcond(_nÿ==ÿ4,ÿ"Male",ÿ""))
    (2ÿmissingÿvaluesÿgenerated)

    .ÿquietlyÿsetÿobsÿ5

    .ÿ*

    .ÿ*ÿBeginÿhere

    .ÿ*

    .ÿassertÿinlist(v_q1ÿ+ÿc_q1,ÿ"Male",ÿ"Female",ÿ"")

    .ÿgenerateÿbyteÿsexÿ=ÿstrpos(v_q1ÿ+ÿc_q1,ÿ"F")

    .ÿquietlyÿreplaceÿsexÿ=ÿ.uÿifÿmissing(v_q1)ÿ&ÿmissing(c_q1)

    .ÿlabelÿdefineÿSexesÿ0ÿMÿ1ÿFÿ.uÿUnknown

    .ÿlabelÿvaluesÿsexÿSexes

    .ÿlist,ÿnoobs

    ÿÿ+---------------------------+
    ÿÿ|ÿÿÿv_q1ÿÿÿÿÿc_q1ÿÿÿÿÿÿÿsexÿ|
    ÿÿ|---------------------------|
    ÿÿ|ÿFemaleÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿFÿ|
    ÿÿ|ÿÿÿMaleÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMÿ|
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿFemaleÿÿÿÿÿÿÿÿÿFÿ|
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿMaleÿÿÿÿÿÿÿÿÿMÿ|
    ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿUnknownÿ|
    ÿÿ+---------------------------+

    .

    Comment


    • #3
      They're actually numeric; the label is female or male, respectively

      Comment


      • #4
        Consider the following display of all 9 possibilities. Is c2 what you seek?
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(a b)
        1 1
        1 2
        1 .
        2 1
        2 2
        2 .
        . 1
        . 2
        . .
        end
        label values a gender
        label values b gender
        label def gender 1 "Male", modify
        label def gender 2 "Female", modify
        
        egen c1 = concat(a b), decode p(" ")
        egen c2 = concat(a b), decode
        list, sepby(a)
        Code:
        . list, sepby(a)
        
             +------------------------------------------------+
             |      a        b              c1             c2 |
             |------------------------------------------------|
          1. |   Male     Male       Male Male       MaleMale |
          2. |   Male   Female     Male Female     MaleFemale |
          3. |   Male        .          Male .          Male. |
             |------------------------------------------------|
          4. | Female     Male     Female Male     FemaleMale |
          5. | Female   Female   Female Female   FemaleFemale |
          6. | Female        .        Female .        Female. |
             |------------------------------------------------|
          7. |      .     Male          . Male          .Male |
          8. |      .   Female        . Female        .Female |
          9. |      .        .             . .             .. |
             +------------------------------------------------+

        Comment


        • #5
          No- the variable will only appear once- v_q1 and c_q1 are always missing if not represented for the other ... what I want is one variable to say if either vendor or consumer was a male

          Comment


          • #6
            It probably will be more direct just to clean up the new string variable. Something like the following.
            Code:
            replace GENDER3 = subinstr(subinstr(GENDER3, ".", "", .), " ", "", .)
            .ÿ
            .ÿversionÿ16.1

            .ÿ
            .ÿclearÿ*

            .ÿ
            .ÿinputÿstr20ÿGENDER3ÿintÿcount

            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿGENDER3ÿÿÿÿÿcount
            ÿÿ1.ÿ".ÿFemale"ÿÿ76
            ÿÿ2.ÿ"ÿÿ.ÿMale"ÿ326
            ÿÿ3.ÿ"Femaleÿ."ÿÿÿ2
            ÿÿ4.ÿ"ÿÿMaleÿ."ÿ400
            ÿÿ5.ÿend

            .ÿ
            .ÿreplaceÿGENDER3ÿ=ÿsubinstr(subinstr(GENDER3,ÿ".",ÿ"",ÿ.),ÿ"ÿ",ÿ"",ÿ.)
            (4ÿrealÿchangesÿmade)

            .ÿ
            .ÿlist,ÿnoobs

            ÿÿ+-----------------+
            ÿÿ|ÿGENDER3ÿÿÿcountÿ|
            ÿÿ|-----------------|
            ÿÿ|ÿÿFemaleÿÿÿÿÿÿ76ÿ|
            ÿÿ|ÿÿÿÿMaleÿÿÿÿÿ326ÿ|
            ÿÿ|ÿÿFemaleÿÿÿÿÿÿÿ2ÿ|
            ÿÿ|ÿÿÿÿMaleÿÿÿÿÿ400ÿ|
            ÿÿ+-----------------+

            .ÿ
            .ÿtabulateÿGENDER3ÿ[fweight=count],ÿmissing

            ÿÿÿÿÿÿÿÿÿÿÿÿÿGENDER3ÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
            ---------------------+-----------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿFemaleÿ|ÿÿÿÿÿÿÿÿÿ78ÿÿÿÿÿÿÿÿ9.70ÿÿÿÿÿÿÿÿ9.70
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMaleÿ|ÿÿÿÿÿÿÿÿ726ÿÿÿÿÿÿÿ90.30ÿÿÿÿÿÿ100.00
            ---------------------+-----------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ804ÿÿÿÿÿÿ100.00

            .ÿ
            .ÿexit

            endÿofÿdo-file


            .

            Comment


            • #7
              Then this, perhaps, which has the advantage of being a numeric variable with a value label, just like the two original variables.
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(a b)
              1 .
              2 .
              . 1
              . 2
              . .
              end
              label values a gender
              label values b gender
              label def gender 1 "Male", modify
              label def gender 2 "Female", modify
              
              generate c = a
              replace c = b if missing(c)
              label values c gender
              list
              Code:
              . list
              
                   +--------------------------+
                   |      a        b        c |
                   |--------------------------|
                1. |   Male        .     Male |
                2. | Female        .   Female |
                3. |      .     Male     Male |
                4. |      .   Female   Female |
                5. |      .        .        . |
                   +--------------------------+

              Comment


              • #8
                Originally posted by William Lisowski View Post
                Then this, perhaps, which has the advantage of being a numeric variable with a value label, just like the two original variables.
                But doesn't that presuppose that the two variables from the different data sources have sex encoded identically?

                Maybe we can infer that from the egen . . . concat(), decode() results, I don't know.

                Comment


                • #9
                  Originally posted by Joseph Coveney View Post
                  But doesn't that presuppose that the two variables from the different data sources have sex encoded identically?
                  You are correct.

                  What this entire discussion demonstrates is that, in response to post #3, instead of leaping in with code based on unknowable assumptions about unseen data, I should instead have written the following in post #4:
                  Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data. Tell us what precisely you want the results to be for some cases in your example data. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

                  Be sure to use the dataex command to present your data. If you are running version 15.1 or later, or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                  When asking for help with code, always show example data.
                  Grace Heymsfield - please keep this advice in mind for future topics.

                  Comment

                  Working...
                  X