Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need to keep only one HHID per row with specific values of variables

    In my dataset, there are multiple rows of HHID with the same HHID. Each row of HHID is associated with multiple observations of l_type and serial_number_l.

    I want only one row of HHID while retaining specific values of l_type and serial_number_l. For instance, i want to keep per HHID, l_type ==1 and l_type==2 and l_type==10.

    The reason i want to keep only one HHID per row is because i need to merge this data with other HH level datasets.

    I tried reshaping to wide-format but the dataset became very messy with lot of missing values. I also tried egen but then it created variables at the different rows of the HHID. So, I could not make one HHID per row with the egen variables.

    Kindly suggest a way to make the process easy for merging.


    HHID Serial_number_plot l_type L_ar L_va W_l W_o
    533352011 1 2 0.80 700000 2
    533352011 98 10 0.04 85000 2
    533352011 99 0.84 785000
    533353011 1 2 0.60 40000 2
    533353011 2 2 0.22 150000 2
    533353011 98 10 0.06 85000 2
    533353011 99 0.88 275000
    533353021 1 2 0.80 650000 2
    533353021 98 10 0.07 65000 2
    533353021 99 0.87 715000
    533353031 1 2 0.80 770000 2
    533353031 98 10 0.06 120000 2
    533353031 99 0.86 890000

  • #2
    The data example doesn't make sense to me. The header implies 7 variables, but we see variously 4 or 6 numbers in what you show.

    This is why we ask for data examples to be presented using dataex. Please read and act on https://www.statalist.org/forums/help#stata

    I am also puzzled by terminology here. Observation in Stata refers to (in other terms) a row, record or case in the dataset, so on that basis row is spreadsheet-speak for observation and one row == one observation.

    Meanwhile, I don't know how you imagine values 1, 2, 10 could be returned in the same observation for the same variable for each household. Perhaps it would help to create variables like this and then use collapse or duplicates.

    Code:
    foreach x in 1 2 10 { 
        egen l_type`x' = total(l_type == `x'), by(HHID)
    }

    Comment


    • #3
      Code:
      input str9 HHID byte(Serial_number L_type) double L_area long L_value
      "533352011"  1  2  .8 700000
      "533352011" 98 10 .04  85000
      "533352011" 99  . .84 785000
      "533353011"  1  2  .6  40000
      "533353011"  2  2 .22 150000
      "533353011" 98 10 .06  85000
      "533353011" 99  . .88 275000
      "533353021"  1  2  .8 650000
      "533353021" 98 10 .07  65000
      "533353021" 99  . .87 715000
      "533353031"  1  2  .8 770000
      "533353031" 98 10 .06 120000
      "533353031" 99  . .86 890000
      "533353041"  1  2  .4 485000
      "533353041" 98 10 .05  72000
      "533353041" 99  . .45 557000
      "533353051"  1  2  .7 620000
      "533353051" 98 10 .04 110000
      "533353051" 99  . .74 730000
      "533355011"  1  2  .7 500000
      end

      Apologise for the bad formatting. Let me rephrase my question. I want one HHID per row after creating new variables using egen.
      As you see, there are three 533352011 HHID. The other variables, Serial_number and L_type are dummies, and L_area and L_value are the values associated with those dummies.
      I want to keep L_are and L_value for specific dummies of serial_number and L_type only. After that, I want to only have one HHID per row and remove the others.
      Last edited by Sujit Chauhan; 14 Dec 2023, 04:25.

      Comment


      • #4
        Thanks for the clearer data example.

        Serial_number and L_type are not dummy variables at all, even given some elasticity in defining what is a dummy, which I don't think is anything but codes for two distinct states, 0 and 1 ideally, 1 and 2 or 1 and missing perversely; plus possibly a code for missing values.

        Terminology aside, my suggestion in #2 still looks good to me for what you originally asked, but you what you seek now sounds more like reshape wide. You can't have both distinct values of e.g. L_value and just one observation per HHID.

        Code:
        reshape wide L_type L_area L_value, i(HHID) j(Serial_number)
        works for your data example, but I fear I don't understand what you say you want, as that one-liner doesn't require any new variables.

        Comment


        • #5
          Apologies for my poor understanding of the data. Yes Serial_number and L_type are distinct states. And thank you for providing the codes.

          However, reshaping causes too many missing values, any suggestions to deal with those.

          Also, I don't want all the possible states of Serial_number and L_type, just those that equal to 1 or 2 or 10. In that case, egen seems to work better. But applying egen means creating new variables while ending up with the same number of rows for the HHID (e.g., three rows for the HHID 533352011). Is there any way to limit all my new variables from egen to only one row per HHID? If not, then I have to go for reshaping, right?


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte L_type1 double L_type1 long L_type1 byte L_type2 double L_type2 long L_type2 byte L_type3 double L_type3 long L_type3
          1   .6  150000 1  .2  50000 .   .      .
          1   .2   50000 1  .2  45000 1  .2  50000
          1   .4   90000 .   .      . .   .      .
          1   .6  135000 .   .      . .   .      .
          1  1.4  280000 .   .      . .   .      .
          2  1.2  200000 .   .      . .   .      .
          1   .4   90000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          2   .4   80000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          2   .6  135000 .   .      . .   .      .
          2    2   90000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          2    1  225000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          2   .8  180000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          2    1  225000 .   .      . .   .      .
          2   .8  180000 .   .      . .   .      .
          2   .6  135000 .   .      . .   .      .
          2   .6  135000 .   .      . .   .      .
          1   .9 1500000 .   .      . .   .      .
          1   .8 1000000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          2   .5  800000 .   .      . .   .      .
          1  .55  700000 .   .      . .   .      .
          1   .5  400000 .   .      . .   .      .
          1    1  600000 .   .      . .   .      .
          1 1.25  700000 .   .      . .   .      .
          1  .75  600000 .   .      . .   .      .
          2   .6  800000 .   .      . .   .      .
          1    1 2500000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          1    4 3500000 .   .      . .   .      .
          1 3.43 2950000 .   .      . .   .      .
          1  .57  550000 .   .      . .   .      .
          1    1  900000 .   .      . .   .      .
          1 2.29 2150000 .   .      . .   .      .
          1  .57  500000 .   .      . .   .      .
          1 6.86 6000000 .   .      . .   .      .
          1  .57  400000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          1 5.71 5000000 .   .      . .   .      .
          1 4.57 4000000 .   .      . .   .      .
          1   .5  600000 .   .      . .   .      .
          2   .3  695000 .   .      . .   .      .
          1   .6  850000 .   .      . .   .      .
          1  .35  500000 .   .      . .   .      .
          1    1 1200000 .   .      . .   .      .
          1    1 1250000 .   .      . .   .      .
          1  .45  550000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          .    .       . .   .      . .   .      .
          1   .4  450000 .   .      . .   .      .
          1  .25  250000 .   .      . .   .      .
          1 1.25  750000 .   .      . .   .      .
          1  2.5 2750000 .   .      . .   .      .
          1  1.5 1650000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          1    2 2250000 .   .      . .   .      .
          1    2 1850000 .   .      . .   .      .
          .    .       . .   .      . .   .      .
          1    3 2950000 .   .      . .   .      .
          1    2 2250000 .   .      . .   .      .
          1  1.5 1800000 .   .      . .   .      .
          1    2 2100000 .   .      . .   .      .
          1    2 2150000 .   .      . .   .      .
          1   .5  310000 .   .      . .   .      .
          1   .5  400000 .   .      . .   .      .
          1    4 2000000 .   .      . .   .      .
          1    3 1300000 .   .      . .   .      .
          1  4.5 1500000 .   .      . .   .      .
          1    1  400000 .   .      . .   .      .
          1    1  450000 .   .      . .   .      .
          1  .55  450000 .   .      . .   .      .
          1   .5  350000 .   .      . .   .      .
          1   .7  410000 .   .      . .   .      .
          1    1  550000 .   .      . .   .      .
          1   .5  300000 .   .      . .   .      .
          1  .29  210000 .   .      . .   .      .
          1  .86  550000 .   .      . .   .      .
          1  .57  415000 1 .29 205000 .   .      .
          1  .57  400000 1 .57 400000 .   .      .
          1 1.71 1200000 .   .      . .   .      .
          1  .57  350000 .   .      . .   .      .
          1  .57  350000 1 .57 400000 1 .57 350000
          end

          Comment


          • #6
            I am not really following exactly what you want or what you perceive as a problem here, so I will stop after this post.

            You get what you ask for here. If you insist on a wide layout that keeps the information, missing values may be an inevitable side-effect.

            Conversely, drop observations first if you know you don't need the values they contain.

            Comment


            • #7
              Understood.
              Thank you.

              Comment

              Working...
              X