Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting a number between recurrent words

    Dear All,
    I tried to find a salution to my problem but no luck so far. I have Synopsis of MnA deals from SDC Platinum, I need to extract from the string a number that is always between the words "issuance of" and "mil", I report examples:

    US - Dynatronics Corp (DC) acquired Rajala Therapy Sales Associates, a wholesaler of physiotherapy equipment. Concurrently, DC acquired Responsive Providers Inc, Therapy & Health Care Products Inc, Cyman Therapy Inc, Al Rice & Associates Inc and Theratech Inc. The six transactions had a combined value of $8.268 mil. The consideration consisted of $3. 3 mil in cash and the issuance of 4.6 mil DC common shares valued at $4.968 mil. The shares were valued based on DC's closing stock price of $1.08 on June 29, the last full trading day prior to the announcement.

    US - Ebix Inc (Ebix) acquired E-Z Data Inc, a Pasadena- based developer of financial and insurance software, for USD 52.827 mil. The consideration consisted of USD 25.35 mil in cash and the issuance of 0.496 mil common shares valued at USD 27.477 mil. The shares were valued based on Ebix's closing stock price of USD 55.36 on 30 September 2009, the last full trading day prior to the announcement.

    I want a variables that takes value 4.6 in the first obs and 0.496 in the second.
    I think I have to do it with a regular expression but I can't manage to make it work. Also, I can I solve the problem given that sometimes I have a blank space between the number and "mil" and sometimes not? It would be fine also if I manage to extract the first number that occurs after "issuance of", that would be probably the first best.
    Thank you so much!

  • #2
    This is using string functions (link), which find the first instance of "issuance of" or "mil", which is why i do it in two steps:
    Code:
    gen issuanceposition = strpos(text, "issuance of")
    gen stringstartatissuance = substr(text, issuanceposition+12, . )
    gen milposition = strpos( stringstartatissuance , " mil")
    gen issuevalue = substr( stringstartatissuance , 1, milposition-1)
    destring issuevalue, gen(issuevaluenum)
    Where the variable 'text' contains the long sentences you posted
    Last edited by Jorrit Gosens; 25 Nov 2016, 06:45.

    Comment


    • #3
      Dear Jorrit,
      thanks for your prompt reply! Unfortunately it does not work. I have the word "mil" occurring several time in my string variable, and not always the first instance of "mil" is the one of interest. I guess I can do it with the first instance of "issuance of" instead. Why the issuanceposition+12 in the second command?

      Comment


      • #4
        I'm saying the code takes into consideration it wont be the first 'mil' in your strings.
        Give it a try on a copy of your data and do the code line by line to see understand what it does.
        The +12 is because "issuance of " is 12 characters long, and is now also cut of in the code in line 2.

        Comment

        Working...
        X