Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting String variables/ extracting numeric characters

    Hi,

    I am working with two string variables from which I am trying to pull the subscription numbers of newspapers.

    Code:
    "171 ejemplares de lunes a s·bado. 217 ejemplares domingo. * CirculaciÛn certificada por Romay Hermida y CÌa., S. C. Periodo certificado: enero-dic. 2014. Fecha certificado: 17/03/2015."                                                                                                                                                                                       
    "1,500 ejemplares. * CirculaciÛn certificada por Romay Hermida y CÌa., S. C. Periodo certificado: ene.-dic.2012. Fecha certificado: 13/08/2013."
    Examples of the values of the string variables are listed above. First, I am only interested in the text prior to the "* Circulaci√õn" portion, so. I use the command:


    Code:
    replace `var' = substr(`var', 1, strpos(`var', "Circulaci√õn") - 1)  if strpos(`var', "Circulaci√õn")
    to eliminate the latter text.

    I then want to save only the total subscription numbers, which is either the sum of the 1st two numbers (171+217) or one number (1500) depending on whether the subscription numbers are disaggregated by 6days+sunday or are just listed for the entire week (example 2).

    I am trying to repeat this procedure for 32 different data sets, for which I use the code:

    Code:
    foreach f of local files {
        use `"`f'"'    
        
        
    foreach var of varlist freecirc payingcirc {
     replace `var' = substr(`var', 1, strpos(`var', "Circulaci√õn") - 1)  if strpos(`var', "Circulaci√õn") 
        split `var', p("bado.")
      egen `var'_num1 = sieve(`var'1), keep(numeric)
    egen `var'_num2 = sieve(`var'2), keep(numeric)
    
    destring `var'_num1, replace
    destring `var'_num2, replace
    egen total`var'= rowtotal(`var'_num1 `var'_num2)}
         save `"`f'"', replace
         }
    The problem is that in some datasets, there are no observations for a variable that include disagrgated subscription rates, thus the "split" command only generates a `var'1 variable but not a `var'2- and the loop stops at the line:egen `var'_num2 = sieve(`var'2), keep(numeric).

    Is there a way to specify this command only if the variable split resulted in a `var'2 value?

    Thank you.

  • #2
    I think this will do what you want:
    Code:
    foreach f of local files {
        use `"`f'"'    
    
        foreach var of varlist freecirc payingcirc {
            replace `var' = substr(`var', 1, strpos(`var', "Circulaci√õn") - 1)  if strpos(`var', "Circulaci√õn") 
            split `var', p("bado.") destring
            egen `var'_num1 = sieve(`var'1), keep(numeric)
            destring `var'_num1, replace
            capture confirm var `var'2
            if c(rc) == 0 {
                egen `var'_num2 = sieve(`var'2), keep(numeric)
                destring `var'_num2, replace
            }
            else {
                gen `var'_num2 = .
            }
            egen total`var'= rowtotal(`var'_num1 `var'_num2)
        }
     
     save `"`f'"', replace
    }
    Note: Not tested. Beware of typos or other errors.

    Comment


    • #3
      That worked perfectly, thank you.

      Comment

      Working...
      X