Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting from a string variable without spaces

    Hello,

    I would like to extract from a string variable that contains the filenames of files I used to append into Stata (using a loop).

    The files contain the results I need to make a forest plot and each file contains results for a different outcome that can be seen after zcrp_ or crp_ and before _maxadj

    An example is below.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str244 filevar 
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_prim_outcome_maxadj6_nocohfnf_results.dta" 
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_all_stroke_maxadj6_nocohfnf_results.dta"   
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stri_maxadj6_nocohfnf_results.dta"       
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_ep1_strh_nf_maxadj7_nocohfnf_results.dta"     
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stru_nf_maxadj7_nocohfnf_results.dta"  
    end
    From this I would like to have a variable containing the outcome so:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str244 filevar str12 outcome
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_prim_outcome_maxadj6_nocohfnf_results.dta" "prim_outcome"
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_all_stroke_maxadj6_nocohfnf_results.dta" "all_stroke"  
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stri_maxadj6_nocohfnf_results.dta" "ep1_stri"    
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_ep1_strh_nf_maxadj7_nocohfnf_results.dta" "ep1_strh_nf" 
    "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stru_nf_maxadj7_nocohfnf_results.dta" "ep1_stru_nf" 
    end

    In real life there are many more files than in the example

  • #2
    Hello spencer keene. There is probably a slicker method, but for the sample data you posted, this seems to work.

    Code:
    . * Keep everything prior to the first occurrence of "_max"
    . generate outcome = substr(filevar, 1, strpos(filevar,"_max")-1)
    
    . *list outcome, clean
    . * Keep everything after the first occurrence of "_"
    . replace outcome = substr(outcome, strpos(outcome, "_")+1, 244)
    (5 real changes made)
    
    . *list outcome, clean
    . * What was originally the 2nd occurence of "_" is now the first
    . * Keep everything after the NOW first occurrence of "_"
    . replace outcome = substr(outcome, strpos(outcome, "_")+1, 244)
    (5 real changes made)
    
    . compress outcome
      variable outcome was str70 now str12
      (290 bytes saved)
    
    . list outcome, clean
    
                outcome  
      1.   prim_outcome  
      2.     all_stroke  
      3.       ep1_stri  
      4.    ep1_strh_nf  
      5.    ep1_stru_nf

    Here's the code:

    Code:
    * Keep everything prior to the first occurrence of "_max"
    generate outcome = substr(filevar, 1, strpos(filevar,"_max")-1)
    *list outcome, clean
    * Keep everything after the first occurrence of "_"
    replace outcome = substr(outcome, strpos(outcome, "_")+1, 244)
    *list outcome, clean
    * What was originally the 2nd occurence of "_" is now the first
    * Keep everything after the NOW first occurrence of "_"
    replace outcome = substr(outcome, strpos(outcome, "_")+1, 244)
    compress outcome
    list outcome, clean
    HTH.
    --
    Bruce Weaver
    Email: [email protected]
    Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
    Version: Stata/MP 18.0 (Windows)

    Comment


    • #3
      Thanks a lot, Bruce. This helped a lot!

      Comment


      • #4
        Using regular expressions:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str244 filevar 
        "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_prim_outcome_maxadj6_nocohfnf_results.dta" 
        "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_all_stroke_maxadj6_nocohfnf_results.dta"   
        "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stri_maxadj6_nocohfnf_results.dta"       
        "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_zcrp_ep1_strh_nf_maxadj7_nocohfnf_results.dta"     
        "V:\Analysis\Spencer\STATA\output\mvmetaipd\mvmetaipd_crp_ep1_stru_nf_maxadj7_nocohfnf_results.dta"  
        end
        
        gen wanted=ustrregexra(filevar, ".*crp\_(.*)\_max.*$", "$1")
        Res.:

        Code:
        . l wanted
        
             +--------------+
             |       wanted |
             |--------------|
          1. | prim_outcome |
          2. |   all_stroke |
          3. |     ep1_stri |
          4. |  ep1_strh_nf |
          5. |  ep1_stru_nf |
             +--------------+

        Comment

        Working...
        X