Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting a specific part of a string if the string contains an exact syntax

    OK, so I'm not explaining this very well but here is what I want to do. I have data as shown below.

    The incentive has a maximum of $6000.00, The incentive has a maximum of 5.00 Years
    The incentive is 0.48 $/kWh
    The incentive is 0.60 $/W, The system size has a minimum of 150.00 kW, The system size has a maximum of 200.00 kW
    The incentive is 100.00 %
    The incentive is 25.00 %, The incentive has a maximum of $1000.00
    The system size has a maximum of 50.00 kW, The incentive is 1.05 $/W


    What I want to do is extract, for example, if there is a section of the string that contains "The incentive is (NUMBER I WANT TO EXTRACT) $/kWh", I want to extract that number and generate a new variable with it.

    So for example, for the sample data above, I want the following output

    The incentive has a maximum of $6000.00, The incentive has a maximum of 5.00 Years .
    The incentive is 0.48 $/kWh 0.48
    The incentive is 0.60 $/W, The system size has a minimum of 150.00 kW, The system size has a maximum of 200.00 kW 0.60
    The incentive is 100.00 % .
    The incentive is 25.00 %, The incentive has a maximum of $1000.00 .
    The system size has a maximum of 50.00 kW, The incentive is 1.05 $/W .


    I've read string help, but I'm still struggling with this code.

    Thank you so much for all the help in advance.
    Last edited by Max Ortel; 28 Feb 2019, 11:43.

  • #2
    Hi Max

    I could imagine a few ways to solve this. Note that the 3rd line says "The incentive is 0.60 $/W" and so doesn't contain "$/kWh". Is that common? And if so, would looking for "$/" work? (NOTE: that would then include the "The incentive is 1.05 $/W" from the last line above.)

    Code:
    dataex text_var  // Data shared via -dataex-. To install: ssc install dataex
    clear
    input str113 text_var
    "The incentive has a maximum of $6000.00, The incentive has a maximum of 5.00 Years"                              
    "The incentive is 0.48 $/kWh"                                                                                      
    "The incentive is 0.60 $/W, The system size has a minimum of 150.00 kW, The system size has a maximum of 200.00 kW"
    "The incentive is 100.00 %"                                                                                        
    "The incentive is 25.00 %, The incentive has a maximum of $1000.00"                                                
    "The system size has a maximum of 50.00 kW, The incentive is 1.05 $/W"                                            
    end
    
    replace text_var = itrim(trim( text_var))
    gen start_pos = strpos(text_var, "The incentive is")  // finds first char position of "The incentive is"
    gen end_pos   = strpos(text_var, "$/")  
    gen incentive_string = trim(substr(text_var, end_pos - 10, 10)) if start_pos >0 & end_pos >0  // Extracts the 10 char before "$/"
    * Did 10 because didn't know if they would always have the space before & after the number, & I didn't know how large (or how many decimal places)
    * The "if start_pos >0 & end_pos >0" means will only run on obs where the text variable includes both "The incentive is" and "$/"
    
    * Converting incentive_string to a number (and keeping commas and decimal places)
    strkeep incentive_string, gen(incent_num) numeric keep(, .)
    destring incent_num, replace
    format incent_num %9.2fc
    
    . list text_var start_pos end_pos incentive_string incent_num , abbrev(16) string(40)
    
         +--------------------------------------------------------------------------------------------------+
         | text_var                                     start_pos   end_pos   incentive_string   incent_num |
         |--------------------------------------------------------------------------------------------------|
      1. | The incentive has a maximum of $6000.00,..           0         0                               . |
      2. | The incentive is 0.48 $/kWh                          1        23          e is 0.48         0.48 |
      3. | The incentive is 0.60 $/W, The system si..           1        23          e is 0.60         0.60 |
      4. | The incentive is 100.00 %                            1         0                               . |
      5. | The incentive is 25.00 %, The incentive ..           1         0                               . |
         |--------------------------------------------------------------------------------------------------|
      6. | The system size has a maximum of 50.00 k..          44        66          e is 1.05         1.05 |
         +--------------------------------------------------------------------------------------------------+
    
    compress
    desc
    
    ---------------------------------------------------------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    ---------------------------------------------------------------------------------------------------------------------------------------------
    text_var        str113  %113s                 Text_var
    start_pos       byte    %9.0g                
    end_pos         byte    %9.0g                
    extract_length  byte    %9.0g                
    incentive_str~g str9    %9s                  
    incent_num      double  %9.2fc                
    ---------------------------------------------------------------------------------------------------------------------------------------------
    ------------------ copy up to and including the previous line ------------------
    Last edited by David Benson; 28 Feb 2019, 14:33.

    Comment


    • #3
      Originally posted by David Benson View Post
      Hi Max

      I could imagine a few ways to solve this. Note that the 3rd line says "The incentive is 0.60 $/W" and so doesn't contain "$/kWh". Is that common? And if so, would looking for "$/" work? (NOTE: that would then include the "The incentive is 1.05 $/W" from the last line above.)

      Code:
      dataex text_var // Data shared via -dataex-. To install: ssc install dataex
      clear
      input str113 text_var
      "The incentive has a maximum of $6000.00, The incentive has a maximum of 5.00 Years"
      "The incentive is 0.48 $/kWh"
      "The incentive is 0.60 $/W, The system size has a minimum of 150.00 kW, The system size has a maximum of 200.00 kW"
      "The incentive is 100.00 %"
      "The incentive is 25.00 %, The incentive has a maximum of $1000.00"
      "The system size has a maximum of 50.00 kW, The incentive is 1.05 $/W"
      end
      
      replace text_var = itrim(trim( text_var))
      gen start_pos = strpos(text_var, "The incentive is") // finds first char position of "The incentive is"
      gen end_pos = strpos(text_var, "$/")
      gen incentive_string = trim(substr(text_var, end_pos - 10, 10)) if start_pos >0 & end_pos >0 // Extracts the 10 char before "$/"
      * Did 10 because didn't know if they would always have the space before & after the number, & I didn't know how large (or how many decimal places)
      * The "if start_pos >0 & end_pos >0" means will only run on obs where the text variable includes both "The incentive is" and "$/"
      
      * Converting incentive_string to a number (and keeping commas and decimal places)
      strkeep incentive_string, gen(incent_num) numeric keep(, .)
      destring incent_num, replace
      format incent_num %9.2fc
      
      . list text_var start_pos end_pos incentive_string incent_num , abbrev(16) string(40)
      
      +--------------------------------------------------------------------------------------------------+
      | text_var start_pos end_pos incentive_string incent_num |
      |--------------------------------------------------------------------------------------------------|
      1. | The incentive has a maximum of $6000.00,.. 0 0 . |
      2. | The incentive is 0.48 $/kWh 1 23 e is 0.48 0.48 |
      3. | The incentive is 0.60 $/W, The system si.. 1 23 e is 0.60 0.60 |
      4. | The incentive is 100.00 % 1 0 . |
      5. | The incentive is 25.00 %, The incentive .. 1 0 . |
      |--------------------------------------------------------------------------------------------------|
      6. | The system size has a maximum of 50.00 k.. 44 66 e is 1.05 1.05 |
      +--------------------------------------------------------------------------------------------------+
      
      compress
      desc
      
      ---------------------------------------------------------------------------------------------------------------------------------------------
      storage display value
      variable name type format label variable label
      ---------------------------------------------------------------------------------------------------------------------------------------------
      text_var str113 %113s Text_var
      start_pos byte %9.0g
      end_pos byte %9.0g
      extract_length byte %9.0g
      incentive_str~g str9 %9s
      incent_num double %9.2fc
      ---------------------------------------------------------------------------------------------------------------------------------------------
      ------------------ copy up to and including the previous line ------------------
      This worked perfectly, thank you so much, I really appreciate all your help!

      Comment


      • #4
        strkeep (SSC) needs to be explained here as community-contributed.

        Comment

        Working...
        X