Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Macro string manipulation

    Hi

    I am trying to produce a one line summary of a categorical variable using Ben Jann's fre command.

    Code:
    ssc install fre
    
    sysuse nlsw88, clear
    
    keep if industry>8
    
    fre industry
    
    return list
    
    mat M = r(valid)
    forval i = 1/`=rowsof(M)' {
          local counts `counts'  `=M[`i',1]'
    }
    display "`counts'"
    
    local n : word count `counts'
    
    forvalues i = 1/`n' {
        local part1 : word `i' of `r(lab_valid)'
        local part2  : word `i' of `counts'
        if `i'!= `n' {
          local summary `summary' `"`part1' (`part2'); "'
        }
        else {
          local summary `summary' `"`part1' (`part2')"'
        }    
    }
    
    macro list _summary
    Output:
    Code:
    . sysuse nlsw88, clear
    (NLSW, 1988 extract)
    
    .
    . keep if industry>8
    (1,118 observations deleted)
    
    .
    . fre industry
    
    industry -- industry
    ------------------------------------------------------------------------------
                                     |      Freq.    Percent      Valid       Cum.
    ---------------------------------+--------------------------------------------
    Valid   9  Personal Services     |         97       8.60       8.71       8.71
            10 Entertainment/Rec Svc |         17       1.51       1.53      10.23
            11 Professional Services |        824      73.05      73.97      84.20
            12 Public Administration |        176      15.60      15.80     100.00
            Total                    |       1114      98.76     100.00           
    Missing .                        |         14       1.24                      
    Total                            |       1128     100.00                      
    ------------------------------------------------------------------------------
    
    .
    . return list
    
    scalars:
                      r(N) =  1128
                r(N_valid) =  1114
              r(N_missing) =  14
                      r(r) =  5
                r(r_valid) =  4
              r(r_missing) =  1
    
    macros:
                 r(depvar) : "industry"
                  r(label) : "industry"
              r(lab_valid) : "`"9 Personal Services"' `"10 Entertainment/Rec Svc"' `"11 Professional Services"' `"12 Public Administration"'"
            r(lab_missing) : "`"."'"
    
    matrices:
                  r(valid) :  4 x 1
                r(missing) :  1 x 1
    
    .
    . mat M = r(valid)
    
    . forval i = 1/`=rowsof(M)' {
      2.       local counts `counts'  `=M[`i',1]'
      3. }
    
    . display "`counts'"
    97 17 824 176
    
    .
    . local n : word count `counts'
    
    .
    . forvalues i = 1/`n' {
      2.     local part1 : word `i' of `r(lab_valid)'
      3.     local part2  : word `i' of `counts'
      4.     if `i'!= `n' {
      5.       local summary `summary' `"`part1' (`part2'); "'
      6.     }
      7.     else {
      8.       local summary `summary' `"`part1' (`part2')"'
      9.     }    
     10. }
    
    .
    . macro list _summary
    _summary:       9 Personal Services (97); `"10 Entertainment/Rec Svc (17); "' `"11 Professional Services (824); "' `"12 Public Administration (176)"'
    If you have scrolled down this far, many thanks!

    I would like to strip summary of all types of quotes so that it reads:
    Code:
    _summary:       9 Personal Services (97); 10 Entertainment/Rec Svc (17); 11 Professional Services (824); 12 Public Administration (176)
    Can anybody help?

    With best wishes and thanks,

    Jane

  • #2
    Thanks for the nice example data and proposed code.

    A potentially dumb question on my part: Are you using compound double quotes because you expect some of your value labels to contain embedded quotes? If so, then stripping your result of all quotes wouldn't make sense to me. If, on the other hand, you don't care about preserving embedded quotes in your value labels, what about not using compound double quotes? The fact that they are present in r(lab_valid) doesn't require you to do so, e.g.
    Code:
    local summary "`summary' `part1' (`part2');"
    Or, if you have some reason to not want to touch any of your existing code, what about using subinstr() to clean up your summary local at the very end?
    Code:
    local summary = subinstr(`"`summary'"', `"""', "", .) // remove "
    // ... do the same for ` and '

    Comment


    • #3
      Hi Mike. Thanks so much for this. The first suggestion is perfect.

      Not at all a dumb question. I am generally OK at programming, but quotation marks, etc, in Stata macros confuse me no end. I didn't think it through logically.

      I am adding to the contents of "describe, replace" to document a large number of variables in several datasets obtained externally. This is for potential harmonisation for individual-level meta analysis. Some of the value labels might well have quotes embedded in them, but I will have to live without them. This is very much a scoping exercise, so I don't need perfection.

      With best wishes and thanks again,

      Jane

      Comment

      Working...
      X