Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting events in observations for 1-..20 groups, monthly bases

    Hi all,

    here is my data:
    ---------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 date str18 source str19 target str4 cameocode int numevents long numarts byte quadclass float goldstein str6 YearMonth str4 year
    "19930101" "USA" "AFR"          "042"  1  4 1  1.9 "199301" "1993"
    "19930101" "USA" "ALB"          "130"  1  2 3 -4.4 "199301" "1993"
    "19930101" "USA" "BUS"          "020"  1  8 1    3 "199301" "1993"
    "19930101" "USA" "BUS"          "154"  1  8 4 -7.2 "199301" "1993"
    "19930101" "USA" "CHE"          "030"  2  9 1    4 "199301" "1993"
    "19930101" "USA" "CHE"          "036"  1  4 1    4 "199301" "1993"
    "19930101" "USA" "CHE"          "042"  1  2 1  1.9 "199301" "1993"
    "19930101" "USA" "CHE"          "046"  1  4 1    7 "199301" "1993"
    "19930101" "USA" "CHN"          "012"  1  4 1  -.4 "199301" "1993"
    "19930101" "USA" "CHN"          "043"  1  8 1  2.8 "199301" "1993"
    "19930101" "USA" "CHN"          "112"  1  6 3   -2 "199301" "1993"
    "19930101" "USA" "CUB"          "013"  1  8 1   .4 "199301" "1993"
    "19930101" "USA" "CUB"          "043"  1  8 1  2.8 "199301" "1993"
    "19930101" "USA" "CUBREF"       "075"  1 16 2    7 "199301" "1993"
    "19930101" "USA" "EDU"          "042"  2  4 1  1.9 "199301" "1993"
    "19930101" "USA" "EDU"          "043"  4  9 1  2.8 "199301" "1993"
    "19930101" "USA" "EDU"          "051"  1  1 1  3.4 "199301" "1993"
    "19930101" "USA" "FRA"          "036"  1  4 1    4 "199301" "1993"
    "19930101" "USA" "FRA"          "043"  3 10 1  2.8 "199301" "1993"
    "19930101" "USA" "GOV"          "010"  3  8 1    0 "199301" "1993"
    "19930101" "USA" "GOV"          "020"  2  4 1    3 "199301" "1993"
    "19930101" "USA" "GOV"          "040"  5 11 1    1 "199301" "1993"
    "19930101" "USA" "GOV"          "042"  2  9 1  1.9 "199301" "1993"
    "19930101" "USA" "GOV"          "052"  1  9 1  3.5 "199301" "1993"
    "19930101" "USA" "GOV"          "057"  2  8 1    8 "199301" "1993"
    "19930101" "USA" "GRD"          "010"  1  2 1    0 "199301" "1993"
    "19930101" "USA" "HLH"          "042"  2  8 1  1.9 "199301" "1993"
    "19930101" "USA" "IGOUNO"       "036"  1 12 1    4 "199301" "1993"
    "19930101" "USA" "IGOUNO"       "040"  2  2 1    1 "199301" "1993"
    "19930101" "USA" "IRQ"          "042"  2 10 1  1.9 "199301" "1993"
    "19930101" "USA" "IRQ"          "080"  2 10 2    5 "199301" "1993"
    "19930101" "USA" "IRQ"          "190"  2  9 4  -10 "199301" "1993"
    "19930101" "USA" "IRQ"          "192"  2  2 4 -9.5 "199301" "1993"
    "19930101" "USA" "ITAGOV"       "043"  1  2 1  2.8 "199301" "1993"
    "19930101" "USA" "JUD"          "043"  1  8 1  2.8 "199301" "1993"
    "19930101" "USA" "JUD"          "071"  1  1 2  7.4 "199301" "1993"
    "19930101" "USA" "LAB"          "114"  1  3 3   -2 "199301" "1993"
    "19930101" "USA" "LBY"          "043"  1 18 1  2.8 "199301" "1993"
    "19930101" "USA" "LKA"          "070"  1  4 2    7 "199301" "1993"
    "19930101" "USA" "LKAGOV"       "070"  2 15 2    7 "199301" "1993"
    "19930101" "USA" "MDV"          "043"  1  2 1  2.8 "199301" "1993"
    "19930101" "USA" "MED"          "014"  2  5 1    0 "199301" "1993"
    "19930101" "USA" "MED"          "042"  1  2 1  1.9 "199301" "1993"
    "19930101" "USA" "MED"          "100"  2  5 3   -5 "199301" "1993"
    "19930101" "USA" "MEX"          "043"  1  6 1  2.8 "199301" "1993"
    "19930101" "USA" "MEX"          "084"  2  4 2    7 "199301" "1993"
    "19930101" "USA" "MIL"          "010"  1  1 1    0 "199301" "1993"
    "19930101" "USA" "MIL"          "027"  1  2 1    4 "199301" "1993"
    "19930101" "USA" "MIL"          "042"  1  1 1  1.9 "199301" "1993"
    "19930101" "USA" "MIL"          "045"  1  2 1    5 "199301" "1993"
    "19930101" "USA" "MIL"          "050"  1  3 1  3.5 "199301" "1993"
    "19930101" "USA" "PRK"          "046"  1  4 1    7 "199301" "1993"
    "19930101" "USA" "PRKGOV"       "046"  1  6 1    7 "199301" "1993"
    "19930101" "USA" "PSE"          "051"  1  5 1  3.4 "199301" "1993"
    "19930101" "USA" "PSEREBOPP"    "013"  1  4 1   .4 "199301" "1993"
    "19930101" "USA" "REB"          "051"  2  4 1  3.4 "199301" "1993"
    "19930101" "USA" "RUS"          "030"  3  9 1    4 "199301" "1993"
    "19930101" "USA" "RUS"          "036"  4  9 1    4 "199301" "1993"
    "19930101" "USA" "RUS"          "046"  1  8 1    7 "199301" "1993"
    "19930101" "USA" "RUS"          "051"  4 22 1  3.4 "199301" "1993"
    "19930101" "USA" "RUS"          "090"  1  3 2   -2 "199301" "1993"
    "19930101" "USA" "RUS"          "130"  1  5 3 -4.4 "199301" "1993"
    "19930101" "USA" "SAU"          "042"  3 18 1  1.9 "199301" "1993"
    "19930101" "USA" "SCGSRB"       "046"  2  4 1    7 "199301" "1993"
    "19930101" "USA" "SOM"          "020"  1  6 1    3 "199301" "1993"
    "19930101" "USA" "SOM"          "036"  1  4 1    4 "199301" "1993"
    "19930101" "USA" "SOM"          "042"  1  2 1  1.9 "199301" "1993"
    "19930101" "USA" "SOM"          "0874" 1 10 2   10 "199301" "1993"
    "19930101" "USA" "SOM"          "180"  3  9 4   -9 "199301" "1993"
    "19930101" "USA" "UKR"          "030"  1  2 1    4 "199301" "1993"
    "19930101" "USA" "USAELI"       "020"  2  5 1    3 "199301" "1993"
    "19930101" "USA" "USAGOV"       "020"  1  2 1    3 "199301" "1993"
    "19930101" "USA" "USAGOV"       "040"  1  1 1    1 "199301" "1993"
    "19930101" "USA" "USAJUD"       "020"  1  1 1    3 "199301" "1993"
    "19930101" "USA" "USAMED"       "043"  3  8 1  2.8 "199301" "1993"
    "19930101" "USA" "USAMIL"       "130"  1  5 3 -4.4 "199301" "1993"
    "19930101" "USA" "USASPY"       "010"  1  7 1    0 "199301" "1993"
    "19930101" "USA" "YEM"          "194"  2 10 4  -10 "199301" "1993"
    "19930102" "USA" "CRM"          "0874" 1  2 2   10 "199301" "1993"
    "19930102" "USA" "CVL"          "100"  1  1 3   -5 "199301" "1993"
    "19930102" "USA" "EDU"          "0311" 2  3 1  5.2 "199301" "1993"
    "19930102" "USA" "EDU"          "043"  2  5 1  2.8 "199301" "1993"
    "19930102" "USA" "EDU"          "053"  1  3 1  3.8 "199301" "1993"
    "19930102" "USA" "EDU"          "071"  1  3 2  7.4 "199301" "1993"
    "19930102" "USA" "EDU"          "100"  1  2 3   -5 "199301" "1993"
    "19930102" "USA" "ETHMED"       "036"  1  2 1    4 "199301" "1993"
    "19930102" "USA" "FRA"          "043"  2  4 1  2.8 "199301" "1993"
    "19930102" "USA" "FRA"          "046"  1  2 1    7 "199301" "1993"
    "19930102" "USA" "FRA"          "090"  1  2 2   -2 "199301" "1993"
    "19930102" "USA" "GOV"          "040"  2  2 1    1 "199301" "1993"
    "19930102" "USA" "GOV"          "043"  1  3 1  2.8 "199301" "1993"
    "19930102" "USA" "GOV"          "046"  1 10 1    7 "199301" "1993"
    "19930102" "USA" "GOV"          "060"  1  9 2    6 "199301" "1993"
    "19930102" "USA" "GRC"          "042"  2  4 1  1.9 "199301" "1993"
    "19930102" "USA" "HTIELIGOV"    "050"  1  3 1  3.5 "199301" "1993"
    "19930102" "USA" "IGOUNO"       "036"  1  3 1    4 "199301" "1993"
    "19930102" "USA" "IGOUNOKIDLAB" "190"  2  6 4  -10 "199301" "1993"
    "19930102" "USA" "IRN"          "042"  1  7 1  1.9 "199301" "1993"
    "19930102" "USA" "ISR"          "061"  1  7 2  6.4 "199301" "1993"
    "19930102" "USA" "ISR"          "070"  2  4 2    7 "199301" "1993"
    end
    ------------------ copy up to and including the previous line ------------------

    After every cameocode: The number of events that take place recorded in a observations is mentioned.

    What I would like is: all events on a monthly basis for example for the 4 existing quadclasses.

    every observations holds 1 or more events.


    I tried:
    egen Quadclass_1=sum(quadclass=="1"),by(YearMonth)
    egen Quadclass_2=sum(quadclass=="2"),by(YearMonth)
    egen Quadclass_3=sum(quadclass=="3"),by(YearMonth)
    egen Quadclass_4=sum(quadclass=="4"),by(YearMonth)
    this I think gives me the monthly count but by observation.

    How can I add in the amount of events per observations?

    thank you guys,










  • #2
    Actually the -egen- commands you show will not give you any results: they will give you the "type mismatch" error because quadclass is a numeric variable, not a string.

    Also, while it doesn't affect this particular task, the use of the kind of date variables you have in your data is going to get you into trouble sooner or later. You need to convert these to Stata internal format date variables sooner or later, so may as well do it now.

    Code:
    gen real_date = daily(date, "YMD")
    format real_date %td
    drop date
    gen real_month = mofd(real_date)
    format real_month %tm
    drop YearMonth
    
    forvalues i = 1/4 {
        egen Quadclass_`i' = total(quadclass == `i'), by(real_month)
        egen Quadclass_`i'_events = total(cond(quadclass==`i', numevents, .))
    }
    Note: the -egen- function -sum()- is now undocumented. Although it still works, it is too easily confused with the -gen- function -sum()- which does something different. So what was previously called -egen, sum()- is now called -egen, total()-, and it is best to avoid the older usage.

    Comment


    • #3
      Clyde Schechter solid and clear advice. thank you.


      egen Quadclass_`i' = total(quadclass == `i'), by(real_month)
      => generates the quadclass count per month


      egen Quadclass_`i'_events = total(cond(quadclass==`i', numevents, .))
      ==> generates the total quadclass event count, for the whole total dataset?

      Q: what if I wanted ==> generate the total quadclass event count per month.

      I tried
      forvalues i = 1/4 {
      egen Quadclass_`i' = total(quadclass == `i'), by(real_month)
      egen Quadclass_events_`i' = total(cond(quadclass==`i', numevents,by(real_month)
      egen Quadclass_`i'_events = total(cond(quadclass==`i', numevents, .))
      }
      error:
      parentheses unbalanced
      r(132);

      tried some omitting did not seem to work just now








      p.s. for other readers:
      use tostring date to change format date en generate real_date here.





      Comment


      • #4
        So, if you want the total event count per quadclass per month, it's:

        Code:
        egen Quadclass_events_`i' = total(cond(quadclass==`i', numevents)), by(real_month)
        Note: The data set you are creating here with separate variables for each quadclass is turning into a hybrid between long and wide that , sooner or later, is likely to become unmanageable. I don't really see any reason for having separate variables like this for each quadclass. Consider instead this approach:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str8 date str18 source str19 target str4 cameocode int numevents long numarts byte quadclass float goldstein str6 YearMonth str4 year
        "19930101" "USA" "AFR"          "042"  1  4 1  1.9 "199301" "1993"
        "19930101" "USA" "ALB"          "130"  1  2 3 -4.4 "199301" "1993"
        "19930101" "USA" "BUS"          "020"  1  8 1    3 "199301" "1993"
        "19930101" "USA" "BUS"          "154"  1  8 4 -7.2 "199301" "1993"
        "19930101" "USA" "CHE"          "030"  2  9 1    4 "199301" "1993"
        "19930101" "USA" "CHE"          "036"  1  4 1    4 "199301" "1993"
        "19930101" "USA" "CHE"          "042"  1  2 1  1.9 "199301" "1993"
        "19930101" "USA" "CHE"          "046"  1  4 1    7 "199301" "1993"
        "19930101" "USA" "CHN"          "012"  1  4 1  -.4 "199301" "1993"
        "19930101" "USA" "CHN"          "043"  1  8 1  2.8 "199301" "1993"
        "19930101" "USA" "CHN"          "112"  1  6 3   -2 "199301" "1993"
        "19930101" "USA" "CUB"          "013"  1  8 1   .4 "199301" "1993"
        "19930101" "USA" "CUB"          "043"  1  8 1  2.8 "199301" "1993"
        "19930101" "USA" "CUBREF"       "075"  1 16 2    7 "199301" "1993"
        "19930101" "USA" "EDU"          "042"  2  4 1  1.9 "199301" "1993"
        "19930101" "USA" "EDU"          "043"  4  9 1  2.8 "199301" "1993"
        "19930101" "USA" "EDU"          "051"  1  1 1  3.4 "199301" "1993"
        "19930101" "USA" "FRA"          "036"  1  4 1    4 "199301" "1993"
        "19930101" "USA" "FRA"          "043"  3 10 1  2.8 "199301" "1993"
        "19930101" "USA" "GOV"          "010"  3  8 1    0 "199301" "1993"
        "19930101" "USA" "GOV"          "020"  2  4 1    3 "199301" "1993"
        "19930101" "USA" "GOV"          "040"  5 11 1    1 "199301" "1993"
        "19930101" "USA" "GOV"          "042"  2  9 1  1.9 "199301" "1993"
        "19930101" "USA" "GOV"          "052"  1  9 1  3.5 "199301" "1993"
        "19930101" "USA" "GOV"          "057"  2  8 1    8 "199301" "1993"
        "19930101" "USA" "GRD"          "010"  1  2 1    0 "199301" "1993"
        "19930101" "USA" "HLH"          "042"  2  8 1  1.9 "199301" "1993"
        "19930101" "USA" "IGOUNO"       "036"  1 12 1    4 "199301" "1993"
        "19930101" "USA" "IGOUNO"       "040"  2  2 1    1 "199301" "1993"
        "19930101" "USA" "IRQ"          "042"  2 10 1  1.9 "199301" "1993"
        "19930101" "USA" "IRQ"          "080"  2 10 2    5 "199301" "1993"
        "19930101" "USA" "IRQ"          "190"  2  9 4  -10 "199301" "1993"
        "19930101" "USA" "IRQ"          "192"  2  2 4 -9.5 "199301" "1993"
        "19930101" "USA" "ITAGOV"       "043"  1  2 1  2.8 "199301" "1993"
        "19930101" "USA" "JUD"          "043"  1  8 1  2.8 "199301" "1993"
        "19930101" "USA" "JUD"          "071"  1  1 2  7.4 "199301" "1993"
        "19930101" "USA" "LAB"          "114"  1  3 3   -2 "199301" "1993"
        "19930101" "USA" "LBY"          "043"  1 18 1  2.8 "199301" "1993"
        "19930101" "USA" "LKA"          "070"  1  4 2    7 "199301" "1993"
        "19930101" "USA" "LKAGOV"       "070"  2 15 2    7 "199301" "1993"
        "19930101" "USA" "MDV"          "043"  1  2 1  2.8 "199301" "1993"
        "19930101" "USA" "MED"          "014"  2  5 1    0 "199301" "1993"
        "19930101" "USA" "MED"          "042"  1  2 1  1.9 "199301" "1993"
        "19930101" "USA" "MED"          "100"  2  5 3   -5 "199301" "1993"
        "19930101" "USA" "MEX"          "043"  1  6 1  2.8 "199301" "1993"
        "19930101" "USA" "MEX"          "084"  2  4 2    7 "199301" "1993"
        "19930101" "USA" "MIL"          "010"  1  1 1    0 "199301" "1993"
        "19930101" "USA" "MIL"          "027"  1  2 1    4 "199301" "1993"
        "19930101" "USA" "MIL"          "042"  1  1 1  1.9 "199301" "1993"
        "19930101" "USA" "MIL"          "045"  1  2 1    5 "199301" "1993"
        "19930101" "USA" "MIL"          "050"  1  3 1  3.5 "199301" "1993"
        "19930101" "USA" "PRK"          "046"  1  4 1    7 "199301" "1993"
        "19930101" "USA" "PRKGOV"       "046"  1  6 1    7 "199301" "1993"
        "19930101" "USA" "PSE"          "051"  1  5 1  3.4 "199301" "1993"
        "19930101" "USA" "PSEREBOPP"    "013"  1  4 1   .4 "199301" "1993"
        "19930101" "USA" "REB"          "051"  2  4 1  3.4 "199301" "1993"
        "19930101" "USA" "RUS"          "030"  3  9 1    4 "199301" "1993"
        "19930101" "USA" "RUS"          "036"  4  9 1    4 "199301" "1993"
        "19930101" "USA" "RUS"          "046"  1  8 1    7 "199301" "1993"
        "19930101" "USA" "RUS"          "051"  4 22 1  3.4 "199301" "1993"
        "19930101" "USA" "RUS"          "090"  1  3 2   -2 "199301" "1993"
        "19930101" "USA" "RUS"          "130"  1  5 3 -4.4 "199301" "1993"
        "19930101" "USA" "SAU"          "042"  3 18 1  1.9 "199301" "1993"
        "19930101" "USA" "SCGSRB"       "046"  2  4 1    7 "199301" "1993"
        "19930101" "USA" "SOM"          "020"  1  6 1    3 "199301" "1993"
        "19930101" "USA" "SOM"          "036"  1  4 1    4 "199301" "1993"
        "19930101" "USA" "SOM"          "042"  1  2 1  1.9 "199301" "1993"
        "19930101" "USA" "SOM"          "0874" 1 10 2   10 "199301" "1993"
        "19930101" "USA" "SOM"          "180"  3  9 4   -9 "199301" "1993"
        "19930101" "USA" "UKR"          "030"  1  2 1    4 "199301" "1993"
        "19930101" "USA" "USAELI"       "020"  2  5 1    3 "199301" "1993"
        "19930101" "USA" "USAGOV"       "020"  1  2 1    3 "199301" "1993"
        "19930101" "USA" "USAGOV"       "040"  1  1 1    1 "199301" "1993"
        "19930101" "USA" "USAJUD"       "020"  1  1 1    3 "199301" "1993"
        "19930101" "USA" "USAMED"       "043"  3  8 1  2.8 "199301" "1993"
        "19930101" "USA" "USAMIL"       "130"  1  5 3 -4.4 "199301" "1993"
        "19930101" "USA" "USASPY"       "010"  1  7 1    0 "199301" "1993"
        "19930101" "USA" "YEM"          "194"  2 10 4  -10 "199301" "1993"
        "19930102" "USA" "CRM"          "0874" 1  2 2   10 "199301" "1993"
        "19930102" "USA" "CVL"          "100"  1  1 3   -5 "199301" "1993"
        "19930102" "USA" "EDU"          "0311" 2  3 1  5.2 "199301" "1993"
        "19930102" "USA" "EDU"          "043"  2  5 1  2.8 "199301" "1993"
        "19930102" "USA" "EDU"          "053"  1  3 1  3.8 "199301" "1993"
        "19930102" "USA" "EDU"          "071"  1  3 2  7.4 "199301" "1993"
        "19930102" "USA" "EDU"          "100"  1  2 3   -5 "199301" "1993"
        "19930102" "USA" "ETHMED"       "036"  1  2 1    4 "199301" "1993"
        "19930102" "USA" "FRA"          "043"  2  4 1  2.8 "199301" "1993"
        "19930102" "USA" "FRA"          "046"  1  2 1    7 "199301" "1993"
        "19930102" "USA" "FRA"          "090"  1  2 2   -2 "199301" "1993"
        "19930102" "USA" "GOV"          "040"  2  2 1    1 "199301" "1993"
        "19930102" "USA" "GOV"          "043"  1  3 1  2.8 "199301" "1993"
        "19930102" "USA" "GOV"          "046"  1 10 1    7 "199301" "1993"
        "19930102" "USA" "GOV"          "060"  1  9 2    6 "199301" "1993"
        "19930102" "USA" "GRC"          "042"  2  4 1  1.9 "199301" "1993"
        "19930102" "USA" "HTIELIGOV"    "050"  1  3 1  3.5 "199301" "1993"
        "19930102" "USA" "IGOUNO"       "036"  1  3 1    4 "199301" "1993"
        "19930102" "USA" "IGOUNOKIDLAB" "190"  2  6 4  -10 "199301" "1993"
        "19930102" "USA" "IRN"          "042"  1  7 1  1.9 "199301" "1993"
        "19930102" "USA" "ISR"          "061"  1  7 2  6.4 "199301" "1993"
        "19930102" "USA" "ISR"          "070"  2  4 2    7 "199301" "1993"
        end
        
        gen real_date = daily(date, "YMD")
        format real_date %td
        drop date
        gen real_month = mofd(real_date)
        format real_month %tm
        drop YearMonth
        
        by real_month quadclass, sort: gen monthly_quadclass_count = _N
        by real_month quadclass: egen monthly_quadclass_event_total = total(numevents)
        This preserves the long layout of the data and will be, for nearly all purposes, easier to work with. Also note how much simpler the code is: including the quadclass variable in the -by- prefix removes the complexity of conditioning on the value of quadclass in the earlier code.

        Comment


        • #5
          Clyde Schechter The reason I asked was because I thought I could expand upon the code in order to also find seperate variables for each Eventrootcode.

          After sorting a variable and generating real_date and realMonth including formating, this is what my data looks like now for example:

          . dataex

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str3 source_country str14 target_country str4 cameocode int numevents long numarts byte quadclass float goldstein str2 Eventrootcode str4 year float(real_date real_month)
          "USA" "ABW" "042"   1   9 1  1.9 "04" "2009" 18031 592
          "USA" "AFG" "0874"  2  15 2   10 "08" "2004" 16274 534
          "USA" "AGO" "040"   1   9 1    1 "04" "2002" 15671 514
          "USA" "AIA" "014"   2   2 1    0 "01" "2010" 18464 606
          "USA" "ALB" "040"   6  12 1    1 "04" "1999" 14370 472
          "USA" "AND" "051"   1   9 1  3.4 "05" "2010" 18512 608
          "USA" "ARE" "042"   4  16 1  1.9 "04" "2009" 18012 591
          "USA" "ARG" "173"   1   5 4   -5 "17" "2010" 18372 603
          "USA" "ARM" "080"   2   6 2    5 "08" "2009" 17993 591
          "USA" "ATG" "040"   1   1 1    1 "04" "2007" 17287 568
          "USA" "AUS" "010"   2   7 1    0 "01" "2012" 19021 624
          "USA" "AUT" "043"   1   3 1  2.8 "04" "1996" 13404 440
          "USA" "AZE" "060"   2   9 2    6 "06" "2011" 18984 623
          "USA" "BDI" "041"   1   2 1    1 "04" "2011" 18698 614
          "USA" "BEL" "051"   2   3 1  3.4 "05" "2009" 18156 596
          "USA" "BEN" "043"   1   2 1  2.8 "04" "2011" 18842 619
          "USA" "BFA" "040"   1   6 1    1 "04" "2012" 19300 634
          "USA" "BGD" "1053"  2   2 3   -5 "10" "2008" 17642 579
          "USA" "BGR" "010"   1  18 1    0 "01" "1999" 14463 475
          "USA" "BHR" "042"   2   2 1  1.9 "04" "2009" 17930 589
          "USA" "BHS" "042"   2   7 1  1.9 "04" "2009" 18194 597
          "USA" "BLR" "046"   1   2 1    7 "04" "1999" 14375 472
          "USA" "BLZ" "042"   2   4 1  1.9 "04" "2008" 17612 578
          "USA" "BMU" "020"   1   7 1    3 "02" "2011" 18747 615
          "USA" "BOL" "042"   1   1 1  1.9 "04" "2008" 17735 582
          "USA" "BRA" "046"   3  37 1    7 "04" "2009" 17971 590
          "USA" "BRB" "042"   6  54 1  1.9 "04" "2009" 18148 596
          "USA" "BRN" "150"   1   3 4 -7.2 "15" "2012" 19099 627
          "USA" "BTN" "020"   1   2 1    3 "02" "2012" 19325 634
          "USA" "BWA" "040"   2   8 1    1 "04" "2003" 15898 522
          "USA" "CAF" "042"   1   8 1  1.9 "04" "2011" 18916 621
          "USA" "CAN" "042"   4  11 1  1.9 "04" "2010" 18415 605
          "USA" "CHE" "043"   4  14 1  2.8 "04" "2012" 19208 631
          "USA" "CHL" "043"   1   2 1  2.8 "04" "2009" 18072 593
          "USA" "CHN" "1122"  1   2 3   -2 "11" "1995" 13013 427
          "USA" "CIV" "130"   1   5 3 -4.4 "13" "2010" 18615 611
          "USA" "CMR" "046"   1   9 1    7 "04" "2012" 19051 625
          "USA" "COD" "190"   2  19 4  -10 "19" "2010" 18550 609
          "USA" "COG" "010"   5  20 1    0 "01" "2012" 19338 635
          "USA" "COK" "046"   4   8 1    7 "04" "2009" 18133 595
          "USA" "COL" "072"   2   6 2  8.3 "07" "2001" 15228 500
          "USA" "COM" "023"   1   9 1  3.4 "02" "2009" 18079 594
          "USA" "CPV" "051"   5  35 1  3.4 "05" "2013" 19369 636
          "USA" "CRI" "040"   1   3 1    1 "04" "2008" 17863 586
          "USA" "CUB" "140"   1   5 3 -6.5 "14" "2010" 18321 601
          "USA" "CYM" "042"   3   6 1  1.9 "04" "2010" 18554 609
          "USA" "CYP" "046"   1   2 1    7 "04" "2013" 19404 637
          "USA" "CZE" "111"   1   2 3   -2 "11" "2009" 17993 591
          "USA" "DEU" "042"   5  14 1  1.9 "04" "2009" 18209 598
          "USA" "DJI" "051"   2  18 1  3.4 "05" "2000" 14690 482
          "USA" "DMA" "042"   1   5 1  1.9 "04" "2003" 15987 525
          "USA" "DNK" "173"   2  18 4   -5 "17" "2010" 18625 611
          "USA" "DOM" "112"   2  12 3   -2 "11" "2010" 18294 601
          "USA" "DZA" "181"   3   9 4   -9 "18" "2012" 19100 627
          "USA" "ECU" "090"   2   4 2   -2 "09" "2012" 19266 632
          "USA" "EGY" "010"   1  10 1    0 "01" "2001" 15067 495
          "USA" "ERI" "010"   1   4 1    0 "01" "2003" 15713 516
          "USA" "ESP" "193"   3   7 4  -10 "19" "2011" 18705 614
          "USA" "EST" "036"   2   6 1    4 "03" "2002" 15499 509
          "USA" "ETH" "0333"  1   9 1  5.2 "03" "1999" 14401 473
          "USA" "FIN" "051"   1   2 1  3.4 "05" "2012" 19300 634
          "USA" "FJI" "043"   2   4 1  2.8 "04" "2001" 15118 496
          "USA" "FRA" "042"   2  32 1  1.9 "04" "2006" 16909 555
          "USA" "FSM" "010"   1   5 1    0 "01" "2011" 18858 619
          "USA" "GAB" "060"   2   2 2    6 "06" "1997" 13698 450
          "USA" "GBR" "040"   2   9 1    1 "04" "2006" 17144 563
          "USA" "GEO" "100"  13 182 3   -5 "10" "2010" 18448 606
          "USA" "GHA" "010"   4   8 1    0 "01" "2013" 19366 636
          "USA" "GIN" "036"   1  10 1    4 "03" "1998" 14234 467
          "USA" "GMB" "042"   2   7 1  1.9 "04" "2004" 16341 536
          "USA" "GNB" "043"   1   1 1  2.8 "04" "2012" 19126 628
          "USA" "GNQ" "010"   2   4 1    0 "01" "2011" 18810 618
          "USA" "GRC" "040"   1   6 1    1 "04" "2010" 18511 608
          "USA" "GRD" "036"   2   3 1    4 "03" "2010" 18494 607
          "USA" "GTM" "010"   2   4 1    0 "01" "2009" 17985 590
          "USA" "GUY" "190"   1   6 4  -10 "19" "2007" 17246 566
          "USA" "HKG" "010"   1   3 1    0 "01" "2011" 18731 615
          "USA" "HND" "073"   1   9 2  7.4 "07" "2009" 18231 598
          "USA" "HTI" "042"   1   9 1  1.9 "04" "2010" 18622 611
          "USA" "HUN" "043"   1  18 1  2.8 "04" "1996" 13430 441
          "USA" "IDN" "036"   1   9 1    4 "03" "2008" 17812 585
          "USA" "IGO" "040"   1   1 1    1 "04" "2001" 15067 495
          "USA" "IMG" "051"   1   9 1  3.4 "05" "2012" 19165 629
          "USA" "IND" "042"   1   6 1  1.9 "04" "2009" 17932 589
          "USA" "INT" "100"   1   9 3   -5 "10" "2009" 18199 597
          "USA" "IRL" "042"   2   9 1  1.9 "04" "2003" 16037 526
          "USA" "IRN" "010"   1   3 1    0 "01" "2009" 18014 591
          "USA" "IRQ" "0874"  1  30 2   10 "08" "2012" 19202 630
          "USA" "ISL" "040"   1   2 1    1 "04" "2010" 18514 608
          "USA" "ISR" "0332"  1   4 1  5.2 "03" "1999" 14267 468
          "USA" "ITA" "042"   7  12 1  1.9 "04" "2009" 18233 599
          "USA" "JAM" "193"   1   3 4  -10 "19" "2010" 18616 611
          "USA" "JOR" "042"   1  10 1  1.9 "04" "2013" 19669 646
          "USA" "JPN" "013"   3   9 1   .4 "01" "2009" 17986 590
          "USA" "KAZ" "070"   2  75 2    7 "07" "2010" 18374 603
          "USA" "KEN" "190"   2   4 4  -10 "19" "2008" 17559 576
          "USA" "KHM" "090"   1  14 2   -2 "09" "1999" 14386 472
          "USA" "KIR" "090"   1   6 2   -2 "09" "2010" 18614 611
          "USA" "KNA" "070"   1   2 2    7 "07" "2006" 17018 559
          "USA" "KOR" "042"   7  24 1  1.9 "04" "2011" 18927 621
          end
          format %td real_date
          format %tm real_month
          ------------------ copy up to and including the previous line ------------------

          Listed 100 out of 3139347 observations
          Use the count() option to list more


          I understand what you mean with my dataset becoming unmanageable when having a long and wide dataset.

          I am interested in keeping monthly event count per quadclass and per Eventrootcode.

          I thought applying :
          egen tag = tag(makestatement_1 YearMonth)
          egen ndistinct = total(tag), by(YearMonth)
          //summarize all the variables now that I have the unique amount of events per cameo code, per Month/Year
          keep if tag==1
          would do that trick pretty much, leaving me with a one shape of data.

          why I am interested in keeping monthly event count per quadclass and per Eventrootcode? I want to use a simple lineair regression and 2 stage regression to test inference of quadclass variables/ Eventrootcode variables on stock market returns and volatility.

          comment:

          by real_month quadclass, sort: gen monthly_quadclass_count = _N by real_month quadclass: egen monthly_quadclass_event_total = total(numevents)
          . generates total monthly count of all quadclass observations, and generates total monthly count of events of all quadclass.


          Question:
          I run


          forvalues i = 1/4 {
          egen Quadclass_`i' = total(quadclass == `i'), by(real_month)
          egen Quadclass_events_`i' = total(cond(quadclass==`i', numevents)), by(real_month)
          }

          -->generates total count observations per quadclass
          -->
          generates total count events per quadclass

          returns: invalid syntax r(198).

          additional question:
          I wanted to expand this similar code to Eventrootcodes.

          I thought I would probably obtain similar syntax error but:

          forvalues i = 1/20 {
          egen Eventrootcode`i' = total(Eventrootcode == `i'), by(real_month)
          egen Eventrootcode`i' = total(cond(Eventrootcode==`i', numevents)), by(real_month)
          }

          error: type mismatch, r(109)

          I tried destring : destring Eventrootcode,replace --> tells me " Eventrootcode contains nonnumeric characters; no replace"

          puzzled since Eventrootcode, contains 01 through 20 .

          tried encode aswell.

          would love some help on this topic. Thank you







          ​​​​​​​

          Comment


          • #6
            Have you looked at contract ?

            Comment


            • #7
              Nick Cox did now, thank you

              contract quadclass numevents
              Very handy commands. consulted help , can not find a way to obtain this monthly since this gives fequency for whole dataset. very handy nontheless.

              still interested in obtaining monthly count of events for every " variable group"( i.e. 1-4 for quadclass, 1-20 for Eventrootcode.

              Comment


              • #8
                Just add month or whatever else as another argument. contract can be for any cross-combination of categories. That is why and how it was written.

                Comment


                • #9
                  Okay yes clear.

                  for other readers also, as Nick states adding it an argument helps.

                  thus:
                  contract quadclass numevents real_month
                  and apply
                  sort quadclass real_month numevents
                  give you column for numevents quadclass real_month and their frequencies. first sort here is quadclass thus from 1 to 4 respectively.

                  Okay clear, and yes this does give me the monthly count of events for every quadclass in every month. clear.

                  I am still very keen on getting: the total event count per quadclass/ eventrootcode per month.

                  I am sure there is a way to obtain this after the contract command however I am not sure how to proceed.

                  applying:

                  forvalues i = 1/4 { egen Quadclass_`i' = total(quadclass == `i'), by(real_month)
                  egen Quadclass_`i'_events = total(cond(quadclass==`i', numevents, .)) }

                  and dropping so I obtain monthly event count seems handy, still slightly confused as to why"


                  forvalues i = 1/4 { egen Quadclass_`i' = total(quadclass == `i'), by(real_month)
                  egen Quadclass_`i'_events = totaltotal(cond(quadclass==`i', numevents)), by(real_month) }


                  does not work for me.


                  Thank you Nick and Clyde for a helpfull hand and sugggestions



                  Comment


                  • #10
                    Given that you want your output reduced to a single observation for each observation of month and quadclass, I think the simplest way to do this is:

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str8 date str18 source str19 target str4 cameocode int numevents long numarts byte quadclass float goldstein str6 YearMonth str4 year
                    "19930101" "USA" "AFR"          "042"  1  4 1  1.9 "199301" "1993"
                    "19930101" "USA" "ALB"          "130"  1  2 3 -4.4 "199301" "1993"
                    "19930101" "USA" "BUS"          "020"  1  8 1    3 "199301" "1993"
                    "19930101" "USA" "BUS"          "154"  1  8 4 -7.2 "199301" "1993"
                    "19930101" "USA" "CHE"          "030"  2  9 1    4 "199301" "1993"
                    "19930101" "USA" "CHE"          "036"  1  4 1    4 "199301" "1993"
                    "19930101" "USA" "CHE"          "042"  1  2 1  1.9 "199301" "1993"
                    "19930101" "USA" "CHE"          "046"  1  4 1    7 "199301" "1993"
                    "19930101" "USA" "CHN"          "012"  1  4 1  -.4 "199301" "1993"
                    "19930101" "USA" "CHN"          "043"  1  8 1  2.8 "199301" "1993"
                    "19930101" "USA" "CHN"          "112"  1  6 3   -2 "199301" "1993"
                    "19930101" "USA" "CUB"          "013"  1  8 1   .4 "199301" "1993"
                    "19930101" "USA" "CUB"          "043"  1  8 1  2.8 "199301" "1993"
                    "19930101" "USA" "CUBREF"       "075"  1 16 2    7 "199301" "1993"
                    "19930101" "USA" "EDU"          "042"  2  4 1  1.9 "199301" "1993"
                    "19930101" "USA" "EDU"          "043"  4  9 1  2.8 "199301" "1993"
                    "19930101" "USA" "EDU"          "051"  1  1 1  3.4 "199301" "1993"
                    "19930101" "USA" "FRA"          "036"  1  4 1    4 "199301" "1993"
                    "19930101" "USA" "FRA"          "043"  3 10 1  2.8 "199301" "1993"
                    "19930101" "USA" "GOV"          "010"  3  8 1    0 "199301" "1993"
                    "19930101" "USA" "GOV"          "020"  2  4 1    3 "199301" "1993"
                    "19930101" "USA" "GOV"          "040"  5 11 1    1 "199301" "1993"
                    "19930101" "USA" "GOV"          "042"  2  9 1  1.9 "199301" "1993"
                    "19930101" "USA" "GOV"          "052"  1  9 1  3.5 "199301" "1993"
                    "19930101" "USA" "GOV"          "057"  2  8 1    8 "199301" "1993"
                    "19930101" "USA" "GRD"          "010"  1  2 1    0 "199301" "1993"
                    "19930101" "USA" "HLH"          "042"  2  8 1  1.9 "199301" "1993"
                    "19930101" "USA" "IGOUNO"       "036"  1 12 1    4 "199301" "1993"
                    "19930101" "USA" "IGOUNO"       "040"  2  2 1    1 "199301" "1993"
                    "19930101" "USA" "IRQ"          "042"  2 10 1  1.9 "199301" "1993"
                    "19930101" "USA" "IRQ"          "080"  2 10 2    5 "199301" "1993"
                    "19930101" "USA" "IRQ"          "190"  2  9 4  -10 "199301" "1993"
                    "19930101" "USA" "IRQ"          "192"  2  2 4 -9.5 "199301" "1993"
                    "19930101" "USA" "ITAGOV"       "043"  1  2 1  2.8 "199301" "1993"
                    "19930101" "USA" "JUD"          "043"  1  8 1  2.8 "199301" "1993"
                    "19930101" "USA" "JUD"          "071"  1  1 2  7.4 "199301" "1993"
                    "19930101" "USA" "LAB"          "114"  1  3 3   -2 "199301" "1993"
                    "19930101" "USA" "LBY"          "043"  1 18 1  2.8 "199301" "1993"
                    "19930101" "USA" "LKA"          "070"  1  4 2    7 "199301" "1993"
                    "19930101" "USA" "LKAGOV"       "070"  2 15 2    7 "199301" "1993"
                    "19930101" "USA" "MDV"          "043"  1  2 1  2.8 "199301" "1993"
                    "19930101" "USA" "MED"          "014"  2  5 1    0 "199301" "1993"
                    "19930101" "USA" "MED"          "042"  1  2 1  1.9 "199301" "1993"
                    "19930101" "USA" "MED"          "100"  2  5 3   -5 "199301" "1993"
                    "19930101" "USA" "MEX"          "043"  1  6 1  2.8 "199301" "1993"
                    "19930101" "USA" "MEX"          "084"  2  4 2    7 "199301" "1993"
                    "19930101" "USA" "MIL"          "010"  1  1 1    0 "199301" "1993"
                    "19930101" "USA" "MIL"          "027"  1  2 1    4 "199301" "1993"
                    "19930101" "USA" "MIL"          "042"  1  1 1  1.9 "199301" "1993"
                    "19930101" "USA" "MIL"          "045"  1  2 1    5 "199301" "1993"
                    "19930101" "USA" "MIL"          "050"  1  3 1  3.5 "199301" "1993"
                    "19930101" "USA" "PRK"          "046"  1  4 1    7 "199301" "1993"
                    "19930101" "USA" "PRKGOV"       "046"  1  6 1    7 "199301" "1993"
                    "19930101" "USA" "PSE"          "051"  1  5 1  3.4 "199301" "1993"
                    "19930101" "USA" "PSEREBOPP"    "013"  1  4 1   .4 "199301" "1993"
                    "19930101" "USA" "REB"          "051"  2  4 1  3.4 "199301" "1993"
                    "19930101" "USA" "RUS"          "030"  3  9 1    4 "199301" "1993"
                    "19930101" "USA" "RUS"          "036"  4  9 1    4 "199301" "1993"
                    "19930101" "USA" "RUS"          "046"  1  8 1    7 "199301" "1993"
                    "19930101" "USA" "RUS"          "051"  4 22 1  3.4 "199301" "1993"
                    "19930101" "USA" "RUS"          "090"  1  3 2   -2 "199301" "1993"
                    "19930101" "USA" "RUS"          "130"  1  5 3 -4.4 "199301" "1993"
                    "19930101" "USA" "SAU"          "042"  3 18 1  1.9 "199301" "1993"
                    "19930101" "USA" "SCGSRB"       "046"  2  4 1    7 "199301" "1993"
                    "19930101" "USA" "SOM"          "020"  1  6 1    3 "199301" "1993"
                    "19930101" "USA" "SOM"          "036"  1  4 1    4 "199301" "1993"
                    "19930101" "USA" "SOM"          "042"  1  2 1  1.9 "199301" "1993"
                    "19930101" "USA" "SOM"          "0874" 1 10 2   10 "199301" "1993"
                    "19930101" "USA" "SOM"          "180"  3  9 4   -9 "199301" "1993"
                    "19930101" "USA" "UKR"          "030"  1  2 1    4 "199301" "1993"
                    "19930101" "USA" "USAELI"       "020"  2  5 1    3 "199301" "1993"
                    "19930101" "USA" "USAGOV"       "020"  1  2 1    3 "199301" "1993"
                    "19930101" "USA" "USAGOV"       "040"  1  1 1    1 "199301" "1993"
                    "19930101" "USA" "USAJUD"       "020"  1  1 1    3 "199301" "1993"
                    "19930101" "USA" "USAMED"       "043"  3  8 1  2.8 "199301" "1993"
                    "19930101" "USA" "USAMIL"       "130"  1  5 3 -4.4 "199301" "1993"
                    "19930101" "USA" "USASPY"       "010"  1  7 1    0 "199301" "1993"
                    "19930101" "USA" "YEM"          "194"  2 10 4  -10 "199301" "1993"
                    "19930102" "USA" "CRM"          "0874" 1  2 2   10 "199301" "1993"
                    "19930102" "USA" "CVL"          "100"  1  1 3   -5 "199301" "1993"
                    "19930102" "USA" "EDU"          "0311" 2  3 1  5.2 "199301" "1993"
                    "19930102" "USA" "EDU"          "043"  2  5 1  2.8 "199301" "1993"
                    "19930102" "USA" "EDU"          "053"  1  3 1  3.8 "199301" "1993"
                    "19930102" "USA" "EDU"          "071"  1  3 2  7.4 "199301" "1993"
                    "19930102" "USA" "EDU"          "100"  1  2 3   -5 "199301" "1993"
                    "19930102" "USA" "ETHMED"       "036"  1  2 1    4 "199301" "1993"
                    "19930102" "USA" "FRA"          "043"  2  4 1  2.8 "199301" "1993"
                    "19930102" "USA" "FRA"          "046"  1  2 1    7 "199301" "1993"
                    "19930102" "USA" "FRA"          "090"  1  2 2   -2 "199301" "1993"
                    "19930102" "USA" "GOV"          "040"  2  2 1    1 "199301" "1993"
                    "19930102" "USA" "GOV"          "043"  1  3 1  2.8 "199301" "1993"
                    "19930102" "USA" "GOV"          "046"  1 10 1    7 "199301" "1993"
                    "19930102" "USA" "GOV"          "060"  1  9 2    6 "199301" "1993"
                    "19930102" "USA" "GRC"          "042"  2  4 1  1.9 "199301" "1993"
                    "19930102" "USA" "HTIELIGOV"    "050"  1  3 1  3.5 "199301" "1993"
                    "19930102" "USA" "IGOUNO"       "036"  1  3 1    4 "199301" "1993"
                    "19930102" "USA" "IGOUNOKIDLAB" "190"  2  6 4  -10 "199301" "1993"
                    "19930102" "USA" "IRN"          "042"  1  7 1  1.9 "199301" "1993"
                    "19930102" "USA" "ISR"          "061"  1  7 2  6.4 "199301" "1993"
                    "19930102" "USA" "ISR"          "070"  2  4 2    7 "199301" "1993"
                    end
                    
                    gen real_date = daily(date, "YMD")
                    format real_date %td
                    drop date
                    gen real_month = mofd(real_date)
                    format real_month %tm
                    drop YearMonth
                    
                    assert !missing(numevents)
                    
                    collapse (count) monthly_quadclass_count = numevents ///
                            (sum) monthly_quadclass_event_total = numevents, ///
                            by(real_month quadclass)
                    Note: The (count) operator in -collapse- returns the number of non-missing observations of the variable in the group. What we want for monthly_quadclass_count is a count of the number of observations in that group. So we just need to supply a variable that has no missing values--numevents fits the purpose here. (But that is why we first verify that it is never missing.) If your real data set has no variable that is never missing, you can just create one: -gen never_missing = 1-. Then use never_missing instead of numevents in the (count) part of the command.

                    Comment


                    • #11
                      Although "does not work" is the canonical hard-to-answer question -- do please read and act on FAQ Advice #12 --
                      the first problem is whether you typed in Stata what you typed here.

                      I note as a start that totaltotal() isn't a recognised egen function unless exceptionally you wrote it yourself. That's a typo here on Statalist or a typo there in your Stata, so let's fix it either way.

                      Under the same heading, you aren't making use of CODE delimiters to clarify where your command lines begin and end.

                      Code:
                      forvalues i = 1/4 {
                            egen Quadclass_`i' = total(quadclass == `i'), by(real_month)
                            egen Quadclass_`i'_events = total(cond(quadclass==`i', numevents)), by(real_month)
                      }
                      is now more readable but the cond() call is illegal. It's not clear that it is needed any way.

                      My last suggestion is that you really shouldn't need a different variable for each class, so perhaps the code should be more like.

                      Code:
                      bysort quadclass real_month : gen Quadclass = _N
                      by quadclass real_month:  egen Quadclass_events = total(numevents)



                      Comment


                      • #12
                        @ #10, okay, thank you Clyde. works for its intended purposes.

                        @#11 Thank you Nick for the reminder, yes the totaltotal my typo. Yes no excuse for not using Code delimiters, will do consistently. and thank you for the suggestion also.

                        Comment

                        Working...
                        X