Extracting from a string variable without spaces

spencer keene

Join Date: Jan 2018
Posts: 33

Extracting from a string variable without spaces

07 Jun 2023, 07:18

Hello,

I would like to extract from a string variable that contains the filenames of files I used to append into Stata (using a loop).

The files contain the results I need to make a forest plot and each file contains results for a different outcome that can be seen after zcrp_ or crp_ and before _maxadj

An example is below.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str244 filevar 
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_prim_outcome_maxadj6_nocohfnf_results.dta" 
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_all_stroke_maxadj6_nocohfnf_results.dta"   
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stri_maxadj6_nocohfnf_results.dta"       
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_ep1_strh_nf_maxadj7_nocohfnf_results.dta"     
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stru_nf_maxadj7_nocohfnf_results.dta"  
end

From this I would like to have a variable containing the outcome so:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str244 filevar str12 outcome
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_prim_outcome_maxadj6_nocohfnf_results.dta" "prim_outcome"
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_all_stroke_maxadj6_nocohfnf_results.dta" "all_stroke"  
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stri_maxadj6_nocohfnf_results.dta" "ep1_stri"    
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_ep1_strh_nf_maxadj7_nocohfnf_results.dta" "ep1_strh_nf" 
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stru_nf_maxadj7_nocohfnf_results.dta" "ep1_stru_nf" 
end

In real life there are many more files than in the example

Tags: None

Bruce Weaver

Join Date: May 2014
Posts: 1133

07 Jun 2023, 08:35

Hello spencer keene. There is probably a slicker method, but for the sample data you posted, this seems to work.

Code:

. * Keep everything prior to the first occurrence of "_max"
. generate outcome = substr(filevar, 1, strpos(filevar,"_max")-1)

. *list outcome, clean
. * Keep everything after the first occurrence of "_"
. replace outcome = substr(outcome, strpos(outcome, "_")+1, 244)
(5 real changes made)

. *list outcome, clean
. * What was originally the 2nd occurence of "_" is now the first
. * Keep everything after the NOW first occurrence of "_"
. replace outcome = substr(outcome, strpos(outcome, "_")+1, 244)
(5 real changes made)

. compress outcome
  variable outcome was str70 now str12
  (290 bytes saved)

. list outcome, clean

            outcome  
  1.   prim_outcome  
  2.     all_stroke  
  3.       ep1_stri  
  4.    ep1_strh_nf  
  5.    ep1_stru_nf

Here's the code:

Code:

* Keep everything prior to the first occurrence of "_max"
generate outcome = substr(filevar, 1, strpos(filevar,"_max")-1)
*list outcome, clean
* Keep everything after the first occurrence of "_"
replace outcome = substr(outcome, strpos(outcome, "_")+1, 244)
*list outcome, clean
* What was originally the 2nd occurence of "_" is now the first
* Keep everything after the NOW first occurrence of "_"
replace outcome = substr(outcome, strpos(outcome, "_")+1, 244)
compress outcome
list outcome, clean

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

spencer keene

Join Date: Jan 2018

Posts: 33
#3

08 Jun 2023, 06:12

Thanks a lot, Bruce. This helped a lot!
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10214

08 Jun 2023, 14:00

Using regular expressions:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str244 filevar 
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_prim_outcome_maxadj6_nocohfnf_results.dta" 
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_all_stroke_maxadj6_nocohfnf_results.dta"   
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stri_maxadj6_nocohfnf_results.dta"       
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_ep1_strh_nf_maxadj7_nocohfnf_results.dta"     
"V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stru_nf_maxadj7_nocohfnf_results.dta"  
end

gen wanted=ustrregexra(filevar, ".*crp\_(.*)\_max.*$", "$1")

Res.:

Code:

. l wanted

     +--------------+
     |       wanted |
     |--------------|
  1. | prim_outcome |
  2. |   all_stroke |
  3. |     ep1_stri |
  4. |  ep1_strh_nf |
  5. |  ep1_stru_nf |
     +--------------+

Announcement

Extracting from a string variable without spaces

Comment

Comment

Comment