Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summary statistics by country

    Dear all,

    I have the following dataset

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 loc str12 isin str6 datacqtr
    "DEU" ""             "2005Q1"
    "ITA" ""             "2005Q2"
    "DNK" ""             "2005Q3"
    "ITA" ""             "2005Q4"
    "GBR" ""             "2006Q1"
    "SWE" ""             "2006Q2"
    "GBR" ""             "2006Q3"
    "DEU" ""             "2006Q4"
    "NLD" ""             "2007Q1"
    "ITA" ""             "2007Q2"
    "DEU" ""             "2007Q3"
    "GRC" ""             "2007Q4"
    "SWE" ""             "2008Q1"
    "SWE" ""             "2008Q2"
    "DEU" ""             "2008Q3"
    "ITA" ""             "2008Q4"
    "SWE" ""             "2009Q1"
    "GBR" ""             "2009Q2"
    "DEU" ""             "2009Q3"
    "POL" ""             "2009Q4"
    "SWE" ""             "2010Q1"
    "ITA" ""             "2010Q2"
    "NLD" ""             "2010Q3"
    "DEU" ""             "2010Q4"
    "DEU" ""             "2011Q1"
    "GBR" ""             "2011Q2"
    "DEU" ""             "2011Q3"
    "GRC" ""             "2011Q4"
    "POL" ""             "2012Q1"
    "NLD" ""             "2012Q2"
    "DEU" ""             "2012Q3"
    "SWE" ""             "2012Q4"
    "DEU" ""             "2013Q1"
    "NLD" ""             "2013Q2"
    "DNK" ""             "2013Q3"
    "SWE" ""             "2013Q4"
    "SWE" ""             "2014Q1"
    "NLD" ""             "2014Q2"
    "NLD" ""             "2014Q3"
    "NLD" ""             "2014Q4"
    "SWE" ""             "2015Q1"
    "GRC" ""             "2015Q2"
    "GBR" ""             "2015Q3"
    "DNK" ""             "2015Q4"
    "NLD" "ANN4327C1220" "2005Q1"
    "NLD" "ANN4327C1220" "2005Q2"
    "NLD" "ANN4327C1220" "2005Q3"
    "NLD" "ANN4327C1220" "2005Q4"
    "NLD" "ANN4327C1220" "2006Q1"
    "NLD" "ANN4327C1220" "2006Q2"
    "NLD" "ANN4327C1220" "2006Q3"
    "NLD" "ANN4327C1220" "2006Q4"
    "NLD" "ANN4327C1220" "2007Q1"
    "NLD" "ANN4327C1220" "2007Q2"
    "NLD" "ANN4327C1220" "2007Q3"
    "NLD" "ANN4327C1220" "2007Q4"
    "NLD" "ANN4327C1220" "2008Q1"
    "NLD" "ANN4327C1220" "2008Q2"
    "NLD" "ANN4327C1220" "2008Q3"
    "NLD" "ANN4327C1220" "2008Q4"
    "NLD" "ANN4327C1220" "2009Q1"
    "NLD" "ANN4327C1220" "2009Q2"
    "NLD" "ANN4327C1220" "2009Q3"
    "NLD" "ANN4327C1220" "2009Q4"
    "NLD" "ANN4327C1220" "2010Q1"
    "NLD" "ANN4327C1220" "2010Q2"
    "NLD" "ANN4327C1220" "2010Q3"
    "NLD" "ANN4327C1220" "2010Q4"
    "NLD" "ANN4327C1220" "2011Q1"
    "NLD" "ANN4327C1220" "2011Q2"
    "NLD" "ANN4327C1220" "2011Q3"
    "NLD" "ANN4327C1220" "2011Q4"
    "NLD" "ANN4327C1220" "2012Q1"
    "NLD" "ANN4327C1220" "2012Q2"
    "NLD" "ANN4327C1220" "2012Q3"
    "NLD" "ANN4327C1220" "2012Q4"
    "NLD" "ANN4327C1220" "2013Q1"
    "NLD" "ANN4327C1220" "2013Q2"
    "NLD" "ANN4327C1220" "2013Q3"
    "NLD" "ANN4327C1220" "2013Q4"
    "NLD" "ANN4327C1220" "2014Q1"
    "NLD" "ANN4327C1220" "2014Q2"
    "NLD" "ANN4327C1220" "2014Q3"
    "NLD" "ANN4327C1220" "2014Q4"
    "NLD" "ANN4327C1220" "2015Q1"
    "NLD" "ANN4327C1220" "2015Q2"
    "NLD" "ANN4327C1220" "2015Q3"
    "NLD" "ANN4327C1220" "2015Q4"
    "AUT" "AT000000STR1" "2005Q4"
    "AUT" "AT000000STR1" "2006Q1"
    "AUT" "AT000000STR1" "2006Q2"
    "AUT" "AT000000STR1" "2006Q3"
    "AUT" "AT000000STR1" "2006Q4"
    "AUT" "AT000000STR1" "2007Q1"
    "AUT" "AT000000STR1" "2007Q2"
    "AUT" "AT000000STR1" "2007Q3"
    "AUT" "AT000000STR1" "2007Q4"
    "AUT" "AT000000STR1" "2008Q1"
    "AUT" "AT000000STR1" "2008Q2"
    "AUT" "AT000000STR1" "2008Q3"
    end
    and I'm trying to explore the data, with the objective to have a table like the following:

    Click image for larger version

Name:	example.png
Views:	1
Size:	27.9 KB
ID:	1629361


    Where country is the variable "loc", firms are identifies by "ISIN" over quarters ("datacqtr")


    I tried with collapse function

    Code:
    collapse (unique) firms= isin (count) observations = isin , by(loc)
    Nevertheless I realize tha collapse doesn't have stats related to unique/distinct id identifiers.

    More than happy to receive your help!

  • #2
    I think this example will start you in a useful direction.
    Code:
    generate one = 1
    collapse (count) observations=one, by(loc isin)
    collapse (count) firms=observations (sum) observations, by(loc)
    list, clean noobs abbreviate(16) sum(observations)
    Code:
    . list, clean noobs abbreviate(16) sum(observations)
    
            loc   firms   observations  
            AUT       1             12  
            DEU       1             10  
            DNK       1              3  
            GBR       1              5  
            GRC       1              3  
            ITA       1              5  
            NLD       2             51  
            POL       1              2  
            SWE       1              9  
      Sum                          100

    Comment


    • #3
      Thanks William Lisowski
      Your codes have been really helpfuls

      I really appreciate

      Comment


      • #4
        Another way how to achieve the same with something that I call List Table is as follows:

        Code:
        . egen countrytag = tag( loc)
        
        . egen countryfirmtag = tag( loc isin)
        
        . egen Observations = count(isin), by(loc)
        
        . egen UniqueFirms = total(countryfirmtag), by(loc)
        
        . list loc UniqueFirms Observations if countrytag, sep(0) noobs abb(15)
        
          +----------------------------------+
          | loc   UniqueFirms   Observations |
          |----------------------------------|
          | AUT             1             12 |
          | DEU             0              0 |
          | DNK             0              0 |
          | GBR             0              0 |
          | GRC             0              0 |
          | ITA             0              0 |
          | NLD             1             44 |
          | POL             0              0 |
          | SWE             0              0 |
          +----------------------------------+

        Comment


        • #5
          And the results are different because I do not think William's code is quite right... It gives 1 distinct firm and counts observations in countries where there is no firm (the observations on firm are missing).

          Comment


          • #6
            My code was presented to demonstrate how to use collapse to overcome the problem of counting unique firms, to "start in a useful direction" as I wrote. Handling empty firm codes was left as an exercise for the poster. Perhaps they are in fact meaningful to the poster; perhaps those observations contain other variables with non~missing values.

            Comment


            • #7
              You are right, William.

              Additionally the degenerate situation where we have a whole country without any singe firm would never happen in practice, in the real data. This just happens in the few observations example sample.

              What I wrote was just referring to why the output of the two approaches which should be in theory equivalent is different.


              Originally posted by William Lisowski View Post
              My code was presented to demonstrate how to use collapse to overcome the problem of counting unique firms, to "start in a useful direction" as I wrote. Handling empty firm codes was left as an exercise for the poster. Perhaps they are in fact meaningful to the poster; perhaps those observations contain other variables with non~missing values.

              Comment


              • #8
                And I will add that I started with a solution similar to yours, but when I realized it could be taken care of with a second collapse, I opted for the solution that extended the code in post #1. Both approaches are solutions of the problem presented, which left the desired result from the sample data unstated.

                Comment


                • #9
                  Dear Joro Kolev and William Lisowski

                  Sorry for the late reply. I looked to both solution and thanks for clarifying the differences between the two approaches.

                  Really many thanks for your time, I truly appreciate.

                  Comment

                  Working...
                  X