Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • indentifying specific strings across different string variables

    I have a dataset with different strings in different variables. The variables are v1-v3. In one of v1-v3, will be a lowercase state. But in each obs, the state will/could be in a different cell. (In reality there are more than 3 variables.) The variables (v1-v3) that do not have state, have other string values inside them. An example dataset is here.

    Code:
    clear
    input str20 v1 str20 v2 str20 v3
    "zippy" "pancake" "ohio"
    "indiana" "sidewinder" "billy"
    "waffles are good" "new york" "susie"
    end

    I'd like to create a variable called state and fill it with the state from v1-v3. The solution to the dataset would be:

    Code:
    clear
    input str20 v1 str20 v2 str20 v3 str20 state
    "zippy" "pancake" "ohio" "ohio"
    "indiana" "sidewinder" "billy" "indiana"
    "waffles are good" "new york" "susie" "new york"
    end

    This is my attempt to solve the problem, but it doesn't work. (I am not great at loops.)

    Code:
    local states ///
    "alabama" "alaska" "arizona" "arkansas" "california" "colorado" ///
    "connecticut" "delaware" "florida" "georgia" "hawaii" "idaho" ///
    "illinois" "indiana" "iowa" "kansas" "kentucky" "louisiana" ///
    "maine" "maryland" "massachusetts" "michigan" "minnesota" ///
    "mississippi" "missouri" "montana" "nebraska" "nevada" ///
    "new hampshire" "new jersey" "new mexico" "new york" ///
    "north carolina" "north dakota" "ohio" "oklahoma" "oregon" ///
    "pennsylvania" "rhode island" "south carolina" "south dakota" ///
    "tennessee" "texas" "utah" "vermont" "virginia" "washington" ///
    "west virginia" "wisconsin" "wyoming"
    
    gen state = ""
    
    forval i = 1/3 {
        foreach s of local states {
            replace state = v`i' if lower(trim(v`i')) == "`s'" & state == ""
        }
    }
    Thanks in advance.



  • #2
    The use of quotes in local macros is a bit complicated, especially since Stata will not allow you to break lines within a quoted section. I don't want to get into a lengthy post about all the complications of quotes in macros. Suffice it to show a working solution to your problem:

    Code:
    clear
    input str20 v1 str20 v2 str20 v3
    "zippy" "pancake" "ohio"
    "indiana" "sidewinder" "billy"
    "waffles are good" "new york" "susie"
    end
    
    local states alabama alaska arizona arkansas california colorado ///
    connecticut delaware florida georgia hawaii idaho ///
    illinois indiana iowa kansas kentucky louisiana ///
    maine maryland massachusetts michigan minnesota ///
    mississippi missouri montana nebraska nevada ///
    "new hampshire" "new jersey" "new mexico" "new york" ///
    "north carolina" "north dakota" ohio oklahoma oregon ///
    pennsylvania "rhode island" "south carolina" "south dakota" ///
    tennessee texas utah vermont virginia washington ///
    "west virginia" wisconsin wyoming
    
    gen state = ""
    
    forval i = 1/3 {
        foreach s of local states {
            replace state = v`i' if lower(trim(v`i')) == "`s'" & state == ""
        }
    }
    The only changes to your original code are shown in bold face.

    Comment


    • #3
      Clyde Schechter : Thanks so much. I would have never got that. Always appreciate your help. The community is much better off with you in it.

      Comment


      • #4
        Two other options:

        1. A simple change of double quotes to compound quotes in the replace command of your original code would work:

        Code:
        clear
        input str20 v1 str20 v2 str20 v3
        "zippy" "pancake" "ohio"
        "indiana" "sidewinder" "billy"
        "waffles are good" "new york" "susie"
        end
        
        local states ///
        "alabama" "alaska" "arizona" "arkansas" "california" "colorado" ///
        "connecticut" "delaware" "florida" "georgia" "hawaii" "idaho" ///
        "illinois" "indiana" "iowa" "kansas" "kentucky" "louisiana" ///
        "maine" "maryland" "massachusetts" "michigan" "minnesota" ///
        "mississippi" "missouri" "montana" "nebraska" "nevada" ///
        "new hampshire" "new jersey" "new mexico" "new york" ///
        "north carolina" "north dakota" "ohio" "oklahoma" "oregon" ///
        "pennsylvania" "rhode island" "south carolina" "south dakota" ///
        "tennessee" "texas" "utah" "vermont" "virginia" "washington" ///
        "west virginia" "wisconsin" "wyoming"
        
        gen state = ""
        
        forval i = 1/3 {
            foreach s of local states {
                replace state = v`i' if lower(trim(v`i')) == `"`s'"' & state == ""
            }
        }
        
        . list, noobs
        
          +--------------------------------------------------+
          |               v1           v2      v3      state |
          |--------------------------------------------------|
          |            zippy      pancake    ohio       ohio |
          |          indiana   sidewinder   billy    indiana |
          | waffles are good     new york   susie   new york |
          +--------------------------------------------------+

        2. If you have no more than 9 string variables, then the following obviates the need for one level of loop (if you have more than 9, you would need check the OR of multiple inlist() functions, which might start looking messy). If you lowercase and trim the variables beforehand (especially if that is useful for other parts of your code), that would make this method look even neater.

        Code:
        clear
        input str20 v1 str20 v2 str20 v3
        "zippy" "pancake" "ohio"
        "indiana" "sidewinder" "billy"
        "waffles are good" "new york" "susie"
        end
        
        local states ///
        "alabama" "alaska" "arizona" "arkansas" "california" "colorado" ///
        "connecticut" "delaware" "florida" "georgia" "hawaii" "idaho" ///
        "illinois" "indiana" "iowa" "kansas" "kentucky" "louisiana" ///
        "maine" "maryland" "massachusetts" "michigan" "minnesota" ///
        "mississippi" "missouri" "montana" "nebraska" "nevada" ///
        "new hampshire" "new jersey" "new mexico" "new york" ///
        "north carolina" "north dakota" "ohio" "oklahoma" "oregon" ///
        "pennsylvania" "rhode island" "south carolina" "south dakota" ///
        "tennessee" "texas" "utah" "vermont" "virginia" "washington" ///
        "west virginia" "wisconsin" "wyoming"
        
        gen state = ""
        
        foreach s of local states {
            replace state = `"`s'"' if inlist(`"`s'"', lower(trim(v1)), lower(trim(v2)), lower(trim(v3)))
        }
        
        . list, noobs
        
          +--------------------------------------------------+
          |               v1           v2      v3      state |
          |--------------------------------------------------|
          |            zippy      pancake    ohio       ohio |
          |          indiana   sidewinder   billy    indiana |
          | waffles are good     new york   susie   new york |
          +--------------------------------------------------+
        Last edited by Hemanshu Kumar; 30 May 2025, 00:57.

        Comment

        Working...
        X