Generate a new variable counting the unique times of specific portion of the name of other variables.

George Mane

Join Date: Apr 2019

Posts: 24
#1

Generate a new variable counting the unique times of specific portion of the name of other variables.

13 Oct 2023, 14:21

Dear Stata forum members,

I have the following data structure, where I need to create new variables that count the number of different targets being addressed by specific NTMs. The variable (Linked_Target_*) holds the information for both, SDG Target and NTM code. The portion in the middle represents the SDG target: 3_4, 12_4, 12_5; whereas the end of the name of the variables represents the NTM code: A210, A220, B310.

I need to generate a new variable "SDG_Targets" that counts the number of specific SDG targets being addressed, so, in this case, it would be equal to 3, since there are three unique SDG Targets being addressed: 3_4 (2 times), 12_4 (1 time); 12_5 (1 time)

Note: The data is just a simple subsample of a large dataset, with +500 variables, for 10 years, and 7 sectors.

Thank you for your help

George

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str4(reporter partner) str6 hs6 int year float(Linked_Target_3_2_A210 Linked_Target_3_4_A210 Linked_Target_3_4_A220 Linked_Target_3_4_B310 Linked_Target_3_7_B310 Linked_Target_12_4_B310 Linked_Target_12_5_B310 Row_Data) "DEU" "IDN" "151110" 2018 0 0 1 1 0 1 1 11 "DEU" "IDN" "151110" 2019 0 0 1 1 0 1 1 11 end
Tags: None

Ken Chui

Join Date: Aug 2014
Posts: 1060

13 Oct 2023, 14:48

Not exactly elegant, but should do the job. I'm interested to see what methods others may suggest. Generally, my suggested approach is to reshape to long, extract that two-part code and perform two rounds of collapse. Then merge the total counts of unique codes back to the main data set.

Code:

* Give it a unique id
generate id = _n

preserve
* Reshape to long
reshape long Linked_Target_, i(id) j(combo, string)
* Extract code
generate thecode = substr(combo, 1, strrpos(combo, "_") - 1)
* Collapse
collapse (max) Linked_Target_, by(id thecode)
collapse (sum) Linked_Target_, by(id)
* Tempfile
tempfile uniqsum
save `uniqsum'
restore

merge m:1 id using `uniqsum', nogen

Comment

George Mane

Join Date: Apr 2019

Posts: 24
#3

14 Oct 2023, 04:06

Dear Ken, and STATA forum memember,

Thank you for your assistance thus far. While I do have a version of my dataset in long format where I can accomplish my objective, I am particularly interested in achieving this in the wide format without reverting to the long one.

I think that this could be done by generating a new variable for each unique SDG Target code present in the dataset. The code is identifiable as a part of the existing variable names, for example, the code "3_4" from "Linked_Target_3_4_B310", or the code "12_4" from "Linked_Target_12_4_B310", or the code "3_2" from "Linked_Target_3_2_A210".

I think that for each unique SDG Target code, a corresponding new variable should be created. This new variable would have a binary value: "1" if the target is addressed at least once in an observation across multiple variables (e.g., `Linked_Target_3_4_B310==1` ; `Linked_Target_3_4_A210==1`) and "0" otherwise (e.g., `Linked_Target_3_2_A210==0). The new variables representing each target could be named Target and the code (i.e. Target_2_4; Target_3_4), for instance. In the end, I can simply perform a rowtotal for this new variables created to accomplish the objective.

The main challenge here is that the same SDG Target can be present in multiple variables, but it should only be counted once.

Given the considerable size of my original dataset, this approach could be much less memory-intensive.

I appreciate the support from all STATA forum members.

Kind regards,
G
Comment

Announcement

Generate a new variable counting the unique times of specific portion of the name of other variables.

Comment

Comment