Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapsing by ID, maintaining available values for some observations, while collapsing identical values.

    Dear statalist team,

    I am trying to manage a large dataset, and I am probably too tired to figure out some simple things, but I hope you can help me.

    Below is a sample of the data I am working on:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id long(ntNO_SDG_LINK ntTarget_12_4 ntTarget_12_5 ntTarget_12_6 ntTarget_2_4 ntTarget_3_1 ntTarget_3_2 ntTarget_3_3 ntTarget_3_4 ntTarget_3_7 ntTarget_3_8 ntTarget_3_9) str3 r_iso3 str6 hs6 str4 ntmcode str3 p_iso3 byte nbr float(count id2)
    16842 . 24  . . . . . .  . . . . "EUN" "151110" "B31" "WLD" 4 2 16036
    16843 .  . 25 . . . . .  . . . . "EUN" "151110" "B31" "WLD" 4 2 16036
    16844 .  .  . . . . . . 31 . . . "EUN" "151110" "B31" "WLD" 4 3 16036
    16845 . 24  . . . . . .  . . . . "EUN" "151110" "B33" "WLD" 1 2 16037
    16846 .  . 25 . . . . .  . . . . "EUN" "151110" "B33" "WLD" 1 2 16037
    16847 . 24  . . . . . .  . . . . "EUN" "151110" "B84" "WLD" 1 2 16038
    16848 .  . 25 . . . . .  . . . . "EUN" "151110" "B84" "WLD" 1 2 16038
    16849 1  .  . . . . . .  . . . . "EUN" "151110" "E1"  "WLD" 2 1 16039
    16850 1  .  . . . . . .  . . . . "EUN" "151110" "E32" "WLD" 1 1 16040
    end
    label values ntNO_SDG_LINK nt
    label values ntTarget_12_4 nt
    label values ntTarget_12_5 nt
    label values ntTarget_12_6 nt
    label values ntTarget_2_4 nt
    label values ntTarget_3_1 nt
    label values ntTarget_3_2 nt
    label values ntTarget_3_3 nt
    label values ntTarget_3_4 nt
    label values ntTarget_3_7 nt
    label values ntTarget_3_8 nt
    label values ntTarget_3_9 nt
    label def nt 1 "NO_SDG_LINK", modify
    label def nt 24 "Target_12_4", modify
    label def nt 25 "Target_12_5", modify
    label def nt 31 "Target_3_4", modify

    id2 identifies the individual for which I would like to collapse in a single observation, and merge available values for each ntTarget* variables in a single observation following the ID2 identifier. For example, in the dataex above, id2==16036, should be a single observation, including under ntTarget_12_4, ntTarget_12_5 and ntTarget_r_4 respectively, while leaving the rest of the ntTarget* variables as missing values. The remaining variables should be collapsed accordingly.

    So the data should finally look like below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id long(ntNO_SDG_LINK ntTarget_12_4 ntTarget_12_5 ntTarget_12_6 ntTarget_2_4 ntTarget_3_1 ntTarget_3_2 ntTarget_3_3 ntTarget_3_4 ntTarget_3_7 ntTarget_3_8 ntTarget_3_9) str3 r_iso3 str6 hs6 str4 ntmcode str3 p_iso3 byte nbr float(count id2)
    16842 . 24 25 . . . . . 31 . . . "EUN" "151110" "B31" "WLD" 4 2 16036
    16845 . 24 25 . . . . .  . . . . "EUN" "151110" "B33" "WLD" 1 2 16037
    16847 . 24 25 . . . . .  . . . . "EUN" "151110" "B84" "WLD" 1 2 16038
    16849 1  .  . . . . . .  . . . . "EUN" "151110" "E1"  "WLD" 2 1 16039
    16850 1  .  . . . . . .  . . . . "EUN" "151110" "E32" "WLD" 1 1 16040
    end
    label values ntNO_SDG_LINK nt
    label values ntTarget_12_4 nt
    label values ntTarget_12_5 nt
    label values ntTarget_12_6 nt
    label values ntTarget_2_4 nt
    label values ntTarget_3_1 nt
    label values ntTarget_3_2 nt
    label values ntTarget_3_3 nt
    label values ntTarget_3_4 nt
    label values ntTarget_3_7 nt
    label values ntTarget_3_8 nt
    label values ntTarget_3_9 nt
    label def nt 1 "NO_SDG_LINK", modify
    label def nt 24 "Target_12_4", modify
    label def nt 25 "Target_12_5", modify
    label def nt 31 "Target_3_4", modify
    Indeed this is just a sample of a dataset including 31,842 obs.

    I hope you can help me with this.

    Thank you in advance for your precious help,

    Kind regards,

    George


  • #2
    There's a convenient way by using community-contributed command -fillmissing-. However, if you want to elaborate a piece of code, please wait Nick Cox.
    Code:
    net install fillmissing.pkg
    fillmissing ntNO_SDG_LINK ntTarget_12_4 ntTarget_12_5 ntTarget_3_4, by(id2)
    duplicates drop id2, force

    Comment


    • #3
      I can’t comment on fillmissing which I have never used.

      Comment


      • #4
        Sorry Nick, I misused the word 'elaborate' here. I just want to tell George Mane that maybe you can provide him another solution without using third party / user-written commands.

        Comment


        • #5
          Thank you Chen Samulsion and Nick Cox , and apologies for the late response. I have just tested the fillmissing command and it works perfectly.

          Comment


          • #6
            I didn't do much in this thread. But perhaps the recent paper

            Cox, N. J. (2023). Speaking Stata: Replacing missing values: The easiest problems. The Stata Journal, 23(3), 884-896.

            https://journals.sagepub.com/doi/pdf...6867X231196519

            may be of interest.

            Comment

            Working...
            X