Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When "append" is more suitable than "merge" for datasets?

    Hello everyone, I have a question.

    I have to append/merge a dataset with another one.
    My question is the following :
    • when have I to use one or the other method to put together two datasets?
    Here are both datasets that I want to append/merge:

    dataset "female":

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long pidp int ca_age byte(ca_sex ca_couple ca_sclonely_cv ca_blwork ca_sempderived) str12 ca_netpay byte(ca_scghql ca_outcome)
       76165 37 2 1  1  1  1 "3200"          2 11
      280165 40 2 1  1  3  3 "1700"          1 11
      469205 30 2 2  2  1  1 "650"           4 11
      599765 33 2 1  1  1  1 "2591"          2 11
     1587125 54 2 2  2  2  2 "600"           2 11
    end
    dataset "male":

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long pidp int ca_age byte(ca_sex ca_couple ca_sclonely_cv ca_blwork ca_sempderived) str12 ca_netpay byte(ca_scghql ca_outcome)
      732365 34 1 2 3 4 4 "inapplicable" 3 11
     4849085 37 1 1 2 1 1 "3200"         3 11
    68035365 68 1 2 2 4 4 "inapplicable" 1 11
    68035367 39 1 1 2 1 1 "3700"         2 11
    68041491 47 1 1 1 1 1 "2400"         1 11
    end

    Thank you in advance for the answer provided.
    Best,

    --
    Michael Duarte Gonçalves

  • #2
    I have another question:

    I want to merge then the above append-dataset with another one dataset, using pidp as the unique individual identifier in both datasets. When merging, I want to keep only the variables "i_jbsoc00_cc" and "i_nchunder16" from the using dataset (and all variables from the master dataset).
    I don't want keep observations for individuals who are only in the using dataset.

    The question is:
    • How should I do this? I tried
    Code:
    merge 1:1 pidp using "data1_female_male_append.dta"
    but it seems not to work exactly as I stated above.

    Thank you again for the help.
    Best,

    Michael Duarte Gonçalves

    Comment


    • #3
      Re #1: these two data sets have all the same variables, and just include different populations. One is for males the other is for females. These should not be -merge-d, they should be -append-ed. In fact there is no sensible way to apply -merge- to these data sets: no pidp ever will occur in both data sets. (You can run the -merge- command, but the results it produces will actually be the same as -append-, except that it will take longer to get there.)

      Code:
      use male_data_set, clear
      gen sex = "Male"
      append using female_data_set
      replace sex = "Female" if missing(sex)
      Re #2. After you have appended the male and female data sets, it sounds like you want to do this next:
      Code:
      merge 1:1 pidp using third_data_set, keepusing(i_jbsoc00_cc i_nchunder16) keep(master match)

      Comment


      • #4
        Hello Clyde Schechter.

        Thanks a lot for your help. All is clear now!
        Best,

        Michael

        Comment

        Working...
        X