Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Frequencies of an array/ dummy variables from multencoce

    Hi,
    I have a dataset for firms with different characteristics. One variables was a string variable with many tags divided by a commas.
    I created a tag* variables using the following command:

    . split tags, p(",")
    variables created as string:
    tags1 tags2 tags3

    Then I used the multencode to create an identifier for each tag:
    . multencode tags1-tags3, gen(ntags1-ntags3)

    See below the dataex.

    Now the tags are not uniforms in number that is some have 49 tags some have 3 tags but all the tags* variables are the same:

    1) I would like to have a frequency table of all the different tags* variables I created (not only tag1, tag2, tag 3 seperately but rather how many times robots appear in all the matrix of tags)
    2) I would like to create a dummy variable for each tag so that instead of now having a column tag1 which can include different values having a column variable for each company named: tec which equals 1 if tags1="tec" or tags2="tec" or tags3="tec" and so forth for all the tags and all the values.
    I created the below example but in reality I have more than 5000 firms and 50 possible tags so doing it manually is impossible

    Thanks a lot

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 company str50 tags str6 tags1 str8 tags2 str7 tags3 byte(ntags1 ntags2 ntags3)
    "a" "tec, medical, COV19"  "tec"    " medical" " COV19"  7 3 2
    "b" "robots, COV19, yeast" "robots" " COV19"   " yeast"  6 2 5
    "c" "tec, AI, mobile"      "tec"    " AI"      " mobile" 7 1 4
    "d" "robots, AI"           "robots" " AI"      ""        6 1 .
    end
    label values ntags1 tags1
    label values ntags2 tags1
    label values ntags3 tags1
    label def tags1 6 "robots", modify
    label def tags1 7 "tec", modify
    label def tags1 1 " AI", modify
    label def tags1 2 " COV19", modify
    label def tags1 3 " medical", modify
    label def tags1 4 " mobile", modify
    label def tags1 5 " yeast", modify

  • #2
    I do not get #2. For #1, just reshape long

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 company str50 tags str6 tags1 str8 tags2 str7 tags3 byte(ntags1 ntags2 ntags3)
    "a" "tec, medical, COV19"  "tec"    " medical" " COV19"  7 3 2
    "b" "robots, COV19, yeast" "robots" " COV19"   " yeast"  6 2 5
    "c" "tec, AI, mobile"      "tec"    " AI"      " mobile" 7 1 4
    "d" "robots, AI"           "robots" " AI"      ""        6 1 .
    end
    label values ntags1 tags1
    label values ntags2 tags1
    label values ntags3 tags1
    label def tags1 6 "robots", modify
    label def tags1 7 "tec", modify
    label def tags1 1 " AI", modify
    label def tags1 2 " COV19", modify
    label def tags1 3 " medical", modify
    label def tags1 4 " mobile", modify
    label def tags1 5 " yeast", modify
    
    preserve
    drop tags ntag*
    reshape long tags, i(company)
    contract tags if !missing(tags)
    l
    restore
    Res.:

    Code:
    . l, sep(10)
    
         +------------------+
         |     tags   _freq |
         |------------------|
      1. |       AI       2 |
      2. |    COV19       2 |
      3. |  medical       1 |
      4. |   mobile       1 |
      5. |    yeast       1 |
      6. |   robots       2 |
      7. |      tec       2 |
         +------------------+

    Comment

    Working...
    X