Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a new variable counting the unique times of specific portion of the name of other variables.

    Dear Stata forum members,

    I have the following data structure, where I need to create new variables that count the number of different targets being addressed by specific NTMs. The variable (Linked_Target_*) holds the information for both, SDG Target and NTM code. The portion in the middle represents the SDG target: 3_4, 12_4, 12_5; whereas the end of the name of the variables represents the NTM code: A210, A220, B310.

    I need to generate a new variable "SDG_Targets" that counts the number of specific SDG targets being addressed, so, in this case, it would be equal to 3, since there are three unique SDG Targets being addressed: 3_4 (2 times), 12_4 (1 time); 12_5 (1 time)

    Note: The data is just a simple subsample of a large dataset, with +500 variables, for 10 years, and 7 sectors.

    Thank you for your help

    George

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4(reporter partner) str6 hs6 int year float(Linked_Target_3_2_A210 Linked_Target_3_4_A210 Linked_Target_3_4_A220 Linked_Target_3_4_B310 Linked_Target_3_7_B310 Linked_Target_12_4_B310 Linked_Target_12_5_B310 Row_Data)
    "DEU" "IDN" "151110" 2018 0 0 1 1 0 1 1 11
    "DEU" "IDN" "151110" 2019 0 0 1 1 0 1 1 11
    end

  • #2
    Not exactly elegant, but should do the job. I'm interested to see what methods others may suggest. Generally, my suggested approach is to reshape to long, extract that two-part code and perform two rounds of collapse. Then merge the total counts of unique codes back to the main data set.

    Code:
    * Give it a unique id
    generate id = _n
    
    preserve
    * Reshape to long
    reshape long Linked_Target_, i(id) j(combo, string)
    * Extract code
    generate thecode = substr(combo, 1, strrpos(combo, "_") - 1)
    * Collapse
    collapse (max) Linked_Target_, by(id thecode)
    collapse (sum) Linked_Target_, by(id)
    * Tempfile
    tempfile uniqsum
    save `uniqsum'
    restore
    
    merge m:1 id using `uniqsum', nogen

    Comment


    • #3
      Dear Ken, and STATA forum memember,

      Thank you for your assistance thus far. While I do have a version of my dataset in long format where I can accomplish my objective, I am particularly interested in achieving this in the wide format without reverting to the long one.

      I think that this could be done by generating a new variable for each unique SDG Target code present in the dataset. The code is identifiable as a part of the existing variable names, for example, the code "3_4" from "Linked_Target_3_4_B310", or the code "12_4" from "Linked_Target_12_4_B310", or the code "3_2" from "Linked_Target_3_2_A210".

      I think that for each unique SDG Target code, a corresponding new variable should be created. This new variable would have a binary value: "1" if the target is addressed at least once in an observation across multiple variables (e.g., `Linked_Target_3_4_B310==1` ; `Linked_Target_3_4_A210==1`) and "0" otherwise (e.g., `Linked_Target_3_2_A210==0). The new variables representing each target could be named Target and the code (i.e. Target_2_4; Target_3_4), for instance. In the end, I can simply perform a rowtotal for this new variables created to accomplish the objective.

      The main challenge here is that the same SDG Target can be present in multiple variables, but it should only be counted once.

      Given the considerable size of my original dataset, this approach could be much less memory-intensive.

      I appreciate the support from all STATA forum members.

      Kind regards,
      G

      Comment

      Working...
      X