Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting the first number from a variable name

    Hello,

    I have data that looks like the following

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(it_budget100_657 it_budget101_1267 it_budget102_730 it_budget103_1000 it_budget104_64)
     56810   .      . 28136             3475
    131975   .      . 14813            12850
    367000   .      . 29000   22914.84765625
    331000   .      . 24000     39438.359375
    276000   .      . 21000  30441.775390625
    105000   .      . 13000 11991.0087890625
    131300 129 151500     .            20285
    241794   . 275696     .            34261
    278940   . 276936     .            36480
    277691   . 275696     .           275696
    276178   . 274194     .           274194
    253459   . 274194     .                .
    end
    What I am trying to do is loop through each variable name and extract the first number which I would then save as a local. Then, within this loop, I would loop again through the variable names and extract the first number where I check to see if this extracted number is the same as the one saved in the outer loop.

    For example, in the first loop I would extract the number 100 somehow and then save this as a local. Within this loop, I would extract the next number 101 and then check to see if it is the same as 100. If so, I would use pwcorr to calculate correlations between the two vectors.

    For context, the first number corresponds to a bank (like Wells Fargo), and each variable is a unique branch-bank pair (so some Wells Fargo in the United States). What I want to do is calculate correlations for branches that are not in the same bank. Hence, this is why I would like to skip the branches that have the same first number. I am really only struggling with how to extract the first number from each variable name. Thanks for any help!
    Last edited by Jack Hays; 16 Feb 2023, 22:06.

  • #2
    I am not sure I that I understand why you want this complicated procedure requiring two loops (if the list of variables you show is the list you want to work with, a simple -pwcorr it_budget*- would do what you want) but here is a way to extract the "first number" from the variable name:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(it_budget100_657 it_budget101_1267 it_budget102_730 it_budget103_1000 it_budget104_64)
     56810   .      . 28136             3475
    131975   .      . 14813            12850
    367000   .      . 29000   22914.84765625
    331000   .      . 24000     39438.359375
    276000   .      . 21000  30441.775390625
    105000   .      . 13000 11991.0087890625
    131300 129 151500     .            20285
    241794   . 275696     .            34261
    278940   . 276936     .            36480
    277691   . 275696     .           275696
    276178   . 274194     .           274194
    253459   . 274194     .                .
    end
    
    * Extract "first number" from variable names:
    foreach v of varlist it_budget* {
       local n : subinstr local v "it_budget" ""
       local n : subinstr local n "_" " "
       local n : word 1 of `n'
       di "first number: `n', variable: `v'"
    }
    the result would be:
    Code:
    first number: 100, variable: it_budget100_657
    first number: 101, variable: it_budget101_1267
    first number: 102, variable: it_budget102_730
    first number: 103, variable: it_budget103_1000
    first number: 104, variable: it_budget104_64
    If you really want to use this general idea to calculate the correlation of pairs of variables with different "first numbers" one way may be:
    Code:
    * Correlate variables if "first numbers" differ:
    foreach v of varlist it_budget* {
       local vlist "`vlist' `v'"
    }
    tokenize `vlist'
    tempvar markvar
    foreach v1 of varlist `vlist' {
       local n1 : subinstr local v1 "it_budget" ""
       local n1 : subinstr local n1 "_" " "
       local n1 : word 1 of `n1'
       macro shift
       local vlist `*'
       if "`vlist'" != "" {
          foreach v2 of varlist `vlist' {
             local n2 : subinstr local v2 "it_budget" ""
             local n2 : subinstr local n2 "_" " "
             local n2 : word 1 of `n2'
             if "`n1'" != "`n2'" {
                mark `markvar'
                markout `markvar' `v1' `v2'
                qui sum `v1' if `markvar', meanonly
                local c1 = r(N)
                qui sum `v2' if `markvar', meanonly
                local c2 = r(N)
                if `c1' > 1 & `c2' > 1 {
                   di "`v1' `v2':"
                   corr `v1' `v2'
                }
                drop `markvar'
             }
          }
       }
    }
    the result would be:
    Code:
    it_budget100_657 it_budget102_730:
    (obs=6)
    
                 | it_b~657 it_b~730
    -------------+------------------
    it_budge~657 |   1.0000
    it_budge~730 |   0.9649   1.0000
    
    it_budget100_657 it_budget103_1000:
    (obs=6)
    
                 | it_b~657 it_~1000
    -------------+------------------
    it_budge~657 |   1.0000
    it_budg~1000 |   0.4248   1.0000
    
    it_budget100_657 it_budget104_64:
    (obs=11)
    
                 | it_b~657 it_bu~64
    -------------+------------------
    it_budge~657 |   1.0000
    it_budget~64 |   0.3382   1.0000
    
    it_budget102_730 it_budget104_64:
    (obs=5)
    
                 | it_b~730 it_bu~64
    -------------+------------------
    it_budge~730 |   1.0000
    it_budget~64 |   0.4387   1.0000
    
    it_budget103_1000 it_budget104_64:
    (obs=6)
    
                 | it_~1000 it_bu~64
    -------------+------------------
    it_budg~1000 |   1.0000
    it_budget~64 |   0.1604   1.0000


    Comment

    Working...
    X