Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging when one data set has 2 entries per household

    Hi Statalist!

    I have a question about merging.

    I have household level data of farmers and their agricultural activities.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str14 hhid float(farms area dist_market) int crop byte(quality pest labor) int inputs long value
    "01010140020171" 1    2  4.2 14 2 2 1  70 .
    "01010140020284" 1    1   13 11 2 2 2   . .
    "01010140020297" 2  2.5    1 41 1 2 1   . .
    "01010140020297" 1    1   .5 11 2 2 2 200 .
    "01010140020409" 1    2    3 14 2 2 1   . .
    "01010140020471" 1  1.5 2.02 14 2 2 1   . .
    "01010140020471" 2    2    2 11 2 2 2   . .
    "01010140020551" 2    2   10 41 2 2 2   . .
    "01010140020551" 1    2    9 14 2 2 2 600 .
    "01010140020761" 3    3    7  . 2 . .   . .
    "01010140020761" 1    1  .25 14 2 2 2   . .
    "01010140020761" 2    4    6 41 2 2 2   . .
    "01010140020762" 2  .25   .5 14 2 2 2  60 .
    "01010140020762" 1    3   10 14 2 2 2   . .
    "01020030030004" 1  1.5    7 11 1 2 1   . .
    "01020030030004" 2    2  6.5 11 2 2 1 120 .
    "01020030030022" 1    5   18 13 1 2 2   . .
    "01020030030140" 1  1.5    3  . . . .   . .
    "01020030030161" 3  .25    0 32 1 2 2   . .
    "01020030030161" 2    3    4 11 1 2 1   . .
    "01020030030161" 1    1    1 13 1 2 2   . .
    "01020030030174" 1  3.5    6 11 2 2 1   . .
    "01020030030174" 3    4    5 11 2 2 1   . .
    "01020030030174" 4    1    5 41 2 2 2   . .
    "01020030030174" 5    1    6 60 2 2 2   . .
    "01020030030174" 2    1    6 11 1 2 2   . .
    "01020030030200" 2    4    3 11 2 2 1   . .
    "01020030030200" 1    4    7  . 2 . .   . .
    "01020030030430" 2    4   12 11 1 2 1   . .
    "01020030030430" 6  .25    9 60 1 2 2   . .
    "01020030030430" 3    3   15 11 2 2 2   . .
    "01020030030430" 7    1    9  . . . .   . .
    "01020030030430" 1    6   12 11 2 2 1   . .
    "01020030030430" 5    1    9  . . . .   . .
    "01020030030430" 4  1.5    9  . . . .   . .
    "01020030030479" 1  2.5   .6 11 2 2 1   . .
    "01020170030001" 5    8   12  . 2 . .   . .
    "01020170030001" 4   10   12  . 2 . .   . .
    "01020170030001" 2    4   12 11 2 2 2   . .
    "01020170030001" 1    8   12 13 2 2 2   . .
    "01020170030001" 3    3   10 11 2 2 2   . .
    "01020170030017" 1    3    8 11 1 2 2   . .
    "01020170030022" 1    3    7 13 2 2 2   . .
    "01020170030022" 2    2    8 11 2 2 2   . .
    "01020170030048" 1   10   10 43 1 2 1   . .
    "01020170030048" 3    6 12.5 32 1 2 2   . .
    "01020170030048" 2    7   11 43 2 2 1   . .
    "01020170030048" 4   10    8  . . . .   . .
    "01020170030100" 1    7   .5 13 2 2 1   . .
    "01020170030209" 1    4   .5 43 2 2 1   . .
    "01020170030241" 3    7    2 11 1 2 2   . .
    "01020170030241" 2    3  1.5 11 1 2 2   . .
    "01020170030241" 1   10    3 11 2 2 1   . .
    "01020170030246" 1  1.5   .2 13 2 2 2   . .
    "01020170030246" 2    4  1.5 41 1 2 2   . .
    "01030130040161" 1    3   16 11 1 2 2   . .
    "01030130040161" 2    1   19 22 2 2 2   . .
    "01030130040161" 3    2   12  . . . .   . .
    "01030130040161" 4    2   17  . . . .   . .
    "01030130040219" 2    4    6 13 2 2 2   . .
    "01030130040219" 1  1.5    6 13 2 2 2   . .
    "01030130040219" 3    5    7 14 2 2 2   . .
    "01030130040219" 4    2    6 13 2 2 2   . .
    "01030130040259" 1  .25   14 11 2 2 2   . .
    "01030130040259" 2    3   15 14 2 2 2   . .
    "01030130040346" 1    3    9 13 1 2 2   . .
    "01030130040346" 2    1    9 11 1 2 2   . .
    "01030130040468" 1   .5   12 14 1 2 2   . .
    "01030130040468" 2    2   21 13 2 2 2   . .
    "01030130040468" 3    2   20 11 1 2 2   . .
    "01030130040685" 3  4.5   10 14 2 2 2   . .
    "01030130040685" 2 4.25    9 14 1 2 1   . .
    "01030130040685" 1    3    6 14 2 2 1   . .
    "01030130040739" 5    2   26 14 2 2 1   . .
    "01030130040739" 3  2.5   22 13 2 2 2   . .
    "01030130040739" 2  1.5    6 14 2 2 2   . .
    "01030130040739" 1    1    6 11 2 2 2 100 .
    "01030130040739" 4    1   26 14 2 2 1   . .
    "01030130040745" 2   .5    5 11 2 2 2   . .
    "01030130040745" 1    3    5 11 2 2 2   . .
    "01030133010068" 2    2    4 13 1 2 2   . .
    "01030133010068" 1  .25    3 11 2 2 2   . .
    "01030133010092" 1  .25    4 11 2 2 2   . .
    "01030133010092" 2    1    5 13 1 2 1   . .
    "01030133010175" 2    6  5.5 13 2 2 2   . .
    "01030133010175" 1  1.5    4 11 2 2 2   . .
    "01030133010175" 3  2.5    6 43 2 2 2   . .
    "01030133010188" 2    5    6 43 1 2 2   . .
    "01030133010188" 1    3    4 13 2 2 2   . .
    "01030133010300" 3  1.5  4.5 43 2 2 1   . .
    "01030133010300" 2  1.5  4.5 43 2 2 1   . .
    "01030133010300" 1    2  4.5 11 1 2 1   . .
    "01030133010322" 1 2.25  4.5 11 1 2 1   . .
    "01030133010322" 2  1.5  1.5 13 2 2 2   . .
    "01030133010411" 1    2    5 43 1 2 1   . .
    "01030133010652" 1   .5  2.5 11 2 2 2   . .
    "01030133010652" 3    1  4.5 43 1 2 1   . .
    "01030133010652" 2  1.5  2.5 11 2 2 2   . .
    "01040173040004" 2    1  3.5 42 2 2 2   . .
    "01040173040004" 3    1 12.5 14 2 2 2   . .
    end
    label values s3aq5code S3AQ5CODE
    label def S3AQ5CODE 11 "Maize", modify
    label def S3AQ5CODE 13 "Sorghum", modify
    label def S3AQ5CODE 14 "Bulrush Millet", modify
    label def S3AQ5CODE 22 "Sweet Potatos", modify
    label def S3AQ5CODE 32 "Cowpeas", modify
    label def S3AQ5CODE 41 "Sunflower", modify
    label def S3AQ5CODE 42 "Simsim", modify
    label def S3AQ5CODE 43 "Groundnut", modify
    label def S3AQ5CODE 60 "Sugar Cane", modify
    label values s3aq8 S3AQ8
    label def S3AQ8 1 "GOOD", modify
    label def S3AQ8 2 "AVERAGE", modify
    label values s3aq49 S3AQ49
    label def S3AQ49 2 "NO", modify
    label values s3aq62 S3AQ62
    label def S3AQ62 1 "YES", modify
    label def S3AQ62 2 "NO", modify

    So taking household 01010140020297 as an example, they own 2 farms so there are 2 entries with descriptions of each farm.

    My problem is, I would like to merge the above data with a data set that only has one entry per household.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str14 hhid byte(participateinprogram ownlivestock)
    "01010140020171" 2 1
    "01010140020284" 2 2
    "01010140020297" 2 2
    "01010140020409" 2 1
    "01010140020471" 2 1
    "01010140020551" 2 1
    "01010140020761" 2 2
    "01010140020762" 2 1
    "01020030030004" 2 1
    "01020030030022" 2 1
    "01020030030140" 2 2
    "01020030030161" 2 1
    "01020030030174" 2 1
    "01020030030200" 2 1
    "01020030030430" 2 1
    "01020030030479" 2 1
    "01020170030001" 2 1
    "01020170030017" 2 1
    "01020170030022" 2 2
    "01020170030048" 2 1
    "01020170030100" 2 1
    "01020170030209" 2 1
    "01020170030241" 2 1
    "01020170030246" 2 2
    "01030130040161" 2 1
    "01030130040219" 2 1
    "01030130040259" 2 2
    "01030130040346" 2 1
    "01030130040468" 2 1
    "01030130040685" 2 1
    "01030130040739" 2 1
    "01030130040745" 2 2
    "01030133010068" 2 1
    "01030133010092" 2 2
    "01030133010175" 2 1
    "01030133010188" 2 1
    "01030133010300" 2 2
    "01030133010322" 2 1
    "01030133010411" 2 1
    "01030133010652" 2 1
    "01040173040004" 2 2
    "01040173040017" 2 1
    "01040173040022" 2 2
    "01040173040034" 2 1
    "01040173040041" 2 2
    "01040173040086" 2 2
    "01040173040092" 2 1
    "01040173040094" 2 1
    "01040310010030" 2 2
    "01040310010102" 2 1
    "01040310010174" 2 1
    "01040310010180" 2 1
    "01040310010462" 2 2
    "01040310010482" 2 1
    "01040310010745" 2 2
    "01040310011128" 2 1
    "01040380030347" 2 2
    "01040380030396" 2 1
    "01040380030460" 2 1
    "01040380030693" 2 2
    "01040380030768" 2 1
    "01040380030854" 2 1
    "01040380031057" 2 1
    "01040380031108" 2 2
    "01050090030114" 2 1
    "01050090030347" 2 1
    "01050090030396" 2 2
    "01050090030460" 2 1
    "01050090030599" 2 2
    "01050090030693" 2 1
    "01050090030768" 2 1
    "01050090031057" 2 1
    "01050100040151" 2 2
    "01050100040166" 2 1
    "01050100040272" 2 1
    "01050100040335" 2 2
    "01050100040403" 2 1
    "01050100040480" 2 2
    "01050100040517" 2 2
    "01050100040692" 2 2
    "01050290020161" 2 1
    "01050290020224" 2 1
    "01050290020322" 2 2
    "01050290020557" 2 1
    "01050290020649" 2 1
    "01050290020690" 2 1
    "01050290020691" 2 1
    "01050290021238" 2 1
    "02010100010148" 2 1
    "02010100010263" 2 1
    "02010100010297" 2 1
    "02010100010336" 2 1
    "02010100010511" 2 1
    "02010100010558" 2 2
    "02010100010656" 2 2
    "02010100010785" 2 1
    "02010110010048" 2 1
    "02010110010206" 2 1
    "02010110010209" 2 1
    "02010110010246" 2 1
    end
    label values s13q10 S13Q10
    label def S13Q10 2 "NO", modify
    label values s10aq1 S10AQ1
    label def S10AQ1 1 "YES", modify
    label def S10AQ1 2 "NO", modify
    How do I go about this?

    I tried reshaping the first data set into a wide format:
    Code:
    reshape wide area dist_market crop quality pest labor inputs value ...., i(hhid) j(farms)
    but since the first data set has around 70 variables I got the error code that the majority of the variables are not constant within hhid.The .... represents the other variables in the data set.

    When I tried merging without reshaping the data, I got "variable hhid does not uniquely identify observations in the master data"

    I am therefore stuck and not sure what to do. I would appreciate any help.

    Thank you in advance!
    Kevin

  • #2
    Code:
    merge m:1 hhid using dataset2
    where the second data set with one entry per household (named dataset2 in the code above) is your using data set.

    Comment


    • #3
      Hi Andrew Musau,

      Thank you so much! It worked.

      I have another question that is slightly related.
      I have a string variable that I would like to destring.
      The variable has one letter and one numeric character. Some individuals have missing values for the variable.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str8 id str3 PNT
      "0001-001" "M1"
      "0001-001" "M2"
      "0001-001" "M3"
      "0002-001" "M1"
      "0003-001" "M1"
      "0003-001" "M2"
      "0003-010" "" 
      "0005-001" "M1"
      "0005-001" "M2"
      "0006-001" "M1"
      "0007-001" "M1"
      "0007-001" "M2"
      "0008-001" "M1"
      "0008-001" "M2"
      "0008-001" "M3"
      "0009-001" "M1"
      "0009-001" "M2"
      "0010-001" "M1"
      "0010-001" "M2"
      "0010-001" "M3"
      "0011-001" "M1"
      "0011-001" "M2"
      "0011-001" "M3"
      "0011-004" "M1"
      "0011-004" "M2"
      "0012-001" "M1"
      "0012-001" "M2"
      "0013-001" "M1"
      "0013-001" "M2"
      "0014-001" "M1"
      "0014-001" "M2"
      "0014-001" "M3"
      "0014-001" "M4"
      "0014-001" "M5"
      "0014-001" "M6"
      "0014-007" "M1"
      "0014-007" "M2"
      "0015-001" "M1"
      "0016-001" "M1"
      "0017-001" "M1"
      "0017-001" "M2"
      "0017-001" "M3"
      "0017-001" "M4"
      "0017-001" "M5"
      "0017-001" "M6"
      "0018-001" "M1"
      "0018-001" "M2"
      "0019-001" "M1"
      "0019-001" "M2"
      "0019-001" "M3"
      "0019-001" "M4"
      "0019-001" "M5"
      "0019-003" "M1"
      "0019-003" "M2"
      "0020-001" "M1"
      "0021-001" "M1"
      "0021-001" "M2"
      "0022-001" "M1"
      "0023-001" "M1"
      "0023-001" "M2"
      "0024-001" "M1"
      "0025-001" "" 
      "0026-001" "M1"
      "0026-001" "M2"
      "0027-001" "M1"
      "0028-001" "M1"
      "0028-001" "M2"
      "0028-001" "M3"
      "0028-001" "M4"
      "0029-001" "M1"
      "0029-001" "M2"
      "0029-001" "M3"
      "0029-001" "M4"
      "0030-001" "M1"
      "0030-001" "M2"
      "0031-001" "M1"
      "0031-001" "M2"
      "0031-001" "M3"
      "0032-001" "M1"
      "0032-001" "M2"
      "0032-001" "M3"
      "0033-001" "M1"
      "0033-001" "M2"
      "0033-001" "M3"
      "0033-001" "M4"
      "0034-001" "M1"
      "0034-001" "M2"
      "0034-003" "" 
      "0035-001" "M1"
      "0035-001" "M2"
      "0036-001" "M1"
      "0036-001" "M2"
      "0036-001" "M3"
      "0037-001" "M1"
      "0038-001" "M1"
      "0038-001" "M2"
      "0038-001" "M3"
      "0038-001" "M4"
      "0038-001" "M5"
      "0038-002" "M1"
      end
      Essentially, what I would like to do is create a new variable that would remove the M and leave the number.
      So if an individual has M3 under PNT, the new variable should just have 3.

      I ran the following
      Code:
       destring PNT, replace
      and got "PNT contains nonnumeric characters; no replace"

      Then I ran the following
      Code:
       gen newPNT = 0
      replace newPNT = 1 if PNT == M1
      and I got "M1 not found". Same thing happened with other values e.g. M2, M3, M4.

      I also ran the following
      Code:
       encode PNT, gen (newPNT)
      which converted the string variable to a long %8.0g variable.

      Then:
      Code:
       gen NoM = 0
      replace NoM = 1 if newPNT == M1
      Again, same problem: "M1 not found"

      Any tips on how I can solve this? I don't understand what I am doing wrong

      Thank you in advance
      Kevin

      Comment


      • #4
        Code:
        gen wanted= real(substr(PNT, 2,.))
        is a direct solution.

        gen NoM = 0
        replace NoM = 1 if newPNT == M1
        Here, you are looking at the value label and not the value itself. You can run

        Code:
        lab list
        to find out the values behind the labels. Directly, you need double quotes around strings to achieve the same purpose.

        Code:
        replace NoM = 1 if PNT == "M1"
        Last edited by Andrew Musau; 06 Feb 2020, 06:15.

        Comment


        • #5
          Thank you so much Andrew! I appreciate your help.

          Comment

          Working...
          X