Assigning observations based on variable names

Klaus Klausen

Join Date: Mar 2021
Posts: 72

Assigning observations based on variable names

15 Nov 2023, 00:34

Hi,

consider the following example data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 Company str20 ceo1 str23 ceo2 str25 ceo3 strL(ceo1speech ceo2speech ceo3speech) str14 ceocorrect
"A" "WATSON"    ""          ""          "bla bla bla bla"     ""            ""            "WATSON"   
"B" "WIRTH"     ""          ""          "bla bla bla"         ""            ""            "WIRTH"    
"B" "WIRTH"     ""          ""          "bla bla bla bla bla" ""            ""            "WIRTH"    
"C" "DOWD"      "GREENBERG" ""          ""                    "bla bla bla" ""            "GREENBERG"
"C" "GREENBERG" "KEOGH"     "DOWD"      "bla bla bla"         ""            ""            "GREENBERG"
"C" "KEOGH"     "DOWD"      "GREENBERG" ""                    ""            "bla bla bla" "GREENBERG"
"C" "GREENBERG" ""          ""          "bla bla bla"         ""            ""            "GREENBERG"
"C" "KEOGH"     "DOWD"      "GREENBERG" ""                    ""            "bla bla bla" "GREENBERG"
"C" "DOWD"      "GREENBERG" ""          ""                    "bla bla bla" ""            "GREENBERG"
end

ceo1-ceo3: These are the names of managers who attended the conference call, and "CEO" is assigned to their job description at the beginning of the transcript. Not all of them are actually CEOs of the respective firm mentioned in the Company variable.
ceo1speech, ceo2speech, ceo3speech: These represent the text spoken by the corresponding CEOs during the conference call. For example, ceo1speech contains the text spoken by ceo1, and so on.
ceocorrect: This variable includes the CEO's name of the firm, as indicated by the variable Company (the firm hosting the conference call).

Now, my goal is to create a variable that holds the text spoken by ceocorrect. I'm thinking about a solution that involves putting the variable names at the beginning of each string. For instance, the first observation would look like:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 Company str20 ceo1 str23 ceo2 str25 ceo3 strL(ceo1speech ceo2speech ceo3speech) str14 ceocorrect
"A" "ceo1: WATSON" "" "" "ceo1: bla bla bla bla" "" "" "WATSON"
end

And than create the desired variable using a regex match approach but I haven't really figured out how to pull that off yet and would appreciate any help.

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35754

15 Nov 2023, 04:24

Overwriting your own data is often somewhere between a little risky and downright hazardous.

I won't do that here because I am not confident I understand what you want.

For example, I can't follow what this problem has to do with regular expressions.

In your data example, it's just a matter of looping to find a non-empty string.

See e.g. https://journals.sagepub.com/doi/pdf...36867X20931007 for a discussion of similar problems, including the question of what to do with two or more candidates.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 Company str20 ceo1 str23 ceo2 str25 ceo3 strL(ceo1speech ceo2speech ceo3speech) str14 ceocorrect
"A" "WATSON"    ""          ""          "bla bla bla bla"     ""            ""            "WATSON"   
"B" "WIRTH"     ""          ""          "bla bla bla"         ""            ""            "WIRTH"    
"B" "WIRTH"     ""          ""          "bla bla bla bla bla" ""            ""            "WIRTH"    
"C" "DOWD"      "GREENBERG" ""          ""                    "bla bla bla" ""            "GREENBERG"
"C" "GREENBERG" "KEOGH"     "DOWD"      "bla bla bla"         ""            ""            "GREENBERG"
"C" "KEOGH"     "DOWD"      "GREENBERG" ""                    ""            "bla bla bla" "GREENBERG"
"C" "GREENBERG" ""          ""          "bla bla bla"         ""            ""            "GREENBERG"
"C" "KEOGH"     "DOWD"      "GREENBERG" ""                    ""            "bla bla bla" "GREENBERG"
"C" "DOWD"      "GREENBERG" ""          ""                    "bla bla bla" ""            "GREENBERG"
end

gen wanted1 = ""
gen wanted2 = "" 

forval j = 1/3 { 
    replace wanted1 = "ceo`j':" + ceo`j' if ceo`j'speech != "" 
    replace wanted2 = "ceo`j':" + ceo`j'speech if ceo`j'speech != "" 
}

l wanted? , sep(0)

     +-------------------------------------------+
     |        wanted1                    wanted2 |
     |-------------------------------------------|
  1. |    ceo1:WATSON       ceo1:bla bla bla bla |
  2. |     ceo1:WIRTH           ceo1:bla bla bla |
  3. |     ceo1:WIRTH   ceo1:bla bla bla bla bla |
  4. | ceo2:GREENBERG           ceo2:bla bla bla |
  5. | ceo1:GREENBERG           ceo1:bla bla bla |
  6. | ceo3:GREENBERG           ceo3:bla bla bla |
  7. | ceo1:GREENBERG           ceo1:bla bla bla |
  8. | ceo3:GREENBERG           ceo3:bla bla bla |
  9. | ceo2:GREENBERG           ceo2:bla bla bla |
     +-------------------------------------------+

Last edited by Nick Cox; 15 Nov 2023, 04:28.

Announcement

Assigning observations based on variable names

Comment