Summary statistics by country

Marco Errico

Join Date: Apr 2020
Posts: 187

Summary statistics by country

28 Sep 2021, 04:48

Dear all,

I have the following dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 loc str12 isin str6 datacqtr
"DEU" ""             "2005Q1"
"ITA" ""             "2005Q2"
"DNK" ""             "2005Q3"
"ITA" ""             "2005Q4"
"GBR" ""             "2006Q1"
"SWE" ""             "2006Q2"
"GBR" ""             "2006Q3"
"DEU" ""             "2006Q4"
"NLD" ""             "2007Q1"
"ITA" ""             "2007Q2"
"DEU" ""             "2007Q3"
"GRC" ""             "2007Q4"
"SWE" ""             "2008Q1"
"SWE" ""             "2008Q2"
"DEU" ""             "2008Q3"
"ITA" ""             "2008Q4"
"SWE" ""             "2009Q1"
"GBR" ""             "2009Q2"
"DEU" ""             "2009Q3"
"POL" ""             "2009Q4"
"SWE" ""             "2010Q1"
"ITA" ""             "2010Q2"
"NLD" ""             "2010Q3"
"DEU" ""             "2010Q4"
"DEU" ""             "2011Q1"
"GBR" ""             "2011Q2"
"DEU" ""             "2011Q3"
"GRC" ""             "2011Q4"
"POL" ""             "2012Q1"
"NLD" ""             "2012Q2"
"DEU" ""             "2012Q3"
"SWE" ""             "2012Q4"
"DEU" ""             "2013Q1"
"NLD" ""             "2013Q2"
"DNK" ""             "2013Q3"
"SWE" ""             "2013Q4"
"SWE" ""             "2014Q1"
"NLD" ""             "2014Q2"
"NLD" ""             "2014Q3"
"NLD" ""             "2014Q4"
"SWE" ""             "2015Q1"
"GRC" ""             "2015Q2"
"GBR" ""             "2015Q3"
"DNK" ""             "2015Q4"
"NLD" "ANN4327C1220" "2005Q1"
"NLD" "ANN4327C1220" "2005Q2"
"NLD" "ANN4327C1220" "2005Q3"
"NLD" "ANN4327C1220" "2005Q4"
"NLD" "ANN4327C1220" "2006Q1"
"NLD" "ANN4327C1220" "2006Q2"
"NLD" "ANN4327C1220" "2006Q3"
"NLD" "ANN4327C1220" "2006Q4"
"NLD" "ANN4327C1220" "2007Q1"
"NLD" "ANN4327C1220" "2007Q2"
"NLD" "ANN4327C1220" "2007Q3"
"NLD" "ANN4327C1220" "2007Q4"
"NLD" "ANN4327C1220" "2008Q1"
"NLD" "ANN4327C1220" "2008Q2"
"NLD" "ANN4327C1220" "2008Q3"
"NLD" "ANN4327C1220" "2008Q4"
"NLD" "ANN4327C1220" "2009Q1"
"NLD" "ANN4327C1220" "2009Q2"
"NLD" "ANN4327C1220" "2009Q3"
"NLD" "ANN4327C1220" "2009Q4"
"NLD" "ANN4327C1220" "2010Q1"
"NLD" "ANN4327C1220" "2010Q2"
"NLD" "ANN4327C1220" "2010Q3"
"NLD" "ANN4327C1220" "2010Q4"
"NLD" "ANN4327C1220" "2011Q1"
"NLD" "ANN4327C1220" "2011Q2"
"NLD" "ANN4327C1220" "2011Q3"
"NLD" "ANN4327C1220" "2011Q4"
"NLD" "ANN4327C1220" "2012Q1"
"NLD" "ANN4327C1220" "2012Q2"
"NLD" "ANN4327C1220" "2012Q3"
"NLD" "ANN4327C1220" "2012Q4"
"NLD" "ANN4327C1220" "2013Q1"
"NLD" "ANN4327C1220" "2013Q2"
"NLD" "ANN4327C1220" "2013Q3"
"NLD" "ANN4327C1220" "2013Q4"
"NLD" "ANN4327C1220" "2014Q1"
"NLD" "ANN4327C1220" "2014Q2"
"NLD" "ANN4327C1220" "2014Q3"
"NLD" "ANN4327C1220" "2014Q4"
"NLD" "ANN4327C1220" "2015Q1"
"NLD" "ANN4327C1220" "2015Q2"
"NLD" "ANN4327C1220" "2015Q3"
"NLD" "ANN4327C1220" "2015Q4"
"AUT" "AT000000STR1" "2005Q4"
"AUT" "AT000000STR1" "2006Q1"
"AUT" "AT000000STR1" "2006Q2"
"AUT" "AT000000STR1" "2006Q3"
"AUT" "AT000000STR1" "2006Q4"
"AUT" "AT000000STR1" "2007Q1"
"AUT" "AT000000STR1" "2007Q2"
"AUT" "AT000000STR1" "2007Q3"
"AUT" "AT000000STR1" "2007Q4"
"AUT" "AT000000STR1" "2008Q1"
"AUT" "AT000000STR1" "2008Q2"
"AUT" "AT000000STR1" "2008Q3"
end

and I'm trying to explore the data, with the objective to have a table like the following:

Click image for larger version

Name: example.png
Views: 1
Size: 27.9 KB
ID: 1629361

Where country is the variable "loc", firms are identifies by "ISIN" over quarters ("datacqtr")

I tried with collapse function

Code:

collapse (unique) firms= isin (count) observations = isin , by(loc)

Nevertheless I realize tha collapse doesn't have stats related to unique/distinct id identifiers.

More than happy to receive your help!

Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

28 Sep 2021, 07:35

I think this example will start you in a useful direction.

Code:

generate one = 1
collapse (count) observations=one, by(loc isin)
collapse (count) firms=observations (sum) observations, by(loc)
list, clean noobs abbreviate(16) sum(observations)

Code:

. list, clean noobs abbreviate(16) sum(observations)

        loc   firms   observations  
        AUT       1             12  
        DEU       1             10  
        DNK       1              3  
        GBR       1              5  
        GRC       1              3  
        ITA       1              5  
        NLD       2             51  
        POL       1              2  
        SWE       1              9  
  Sum                          100

Comment

Marco Errico

Join Date: Apr 2020

Posts: 187
#3

28 Sep 2021, 08:59

Thanks William Lisowski
Your codes have been really helpfuls

I really appreciate
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

29 Sep 2021, 04:15

Another way how to achieve the same with something that I call List Table is as follows:

Code:

. egen countrytag = tag( loc)

. egen countryfirmtag = tag( loc isin)

. egen Observations = count(isin), by(loc)

. egen UniqueFirms = total(countryfirmtag), by(loc)

. list loc UniqueFirms Observations if countrytag, sep(0) noobs abb(15)

  +----------------------------------+
  | loc   UniqueFirms   Observations |
  |----------------------------------|
  | AUT             1             12 |
  | DEU             0              0 |
  | DNK             0              0 |
  | GBR             0              0 |
  | GRC             0              0 |
  | ITA             0              0 |
  | NLD             1             44 |
  | POL             0              0 |
  | SWE             0              0 |
  +----------------------------------+

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#5

29 Sep 2021, 04:26

And the results are different because I do not think William's code is quite right... It gives 1 distinct firm and counts observations in countries where there is no firm (the observations on firm are missing).
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

29 Sep 2021, 06:02

My code was presented to demonstrate how to use collapse to overcome the problem of counting unique firms, to "start in a useful direction" as I wrote. Handling empty firm codes was left as an exercise for the poster. Perhaps they are in fact meaningful to the poster; perhaps those observations contain other variables with non~missing values.
2 likes
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

29 Sep 2021, 09:21

You are right, William.

Additionally the degenerate situation where we have a whole country without any singe firm would never happen in practice, in the real data. This just happens in the few observations example sample.

What I wrote was just referring to why the output of the two approaches which should be in theory equivalent is different.

Originally posted by William Lisowski View Post

My code was presented to demonstrate how to use collapse to overcome the problem of counting unique firms, to "start in a useful direction" as I wrote. Handling empty firm codes was left as an exercise for the poster. Perhaps they are in fact meaningful to the poster; perhaps those observations contain other variables with non~missing values.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

29 Sep 2021, 09:33

And I will add that I started with a solution similar to yours, but when I realized it could be taken care of with a second collapse, I opted for the solution that extended the code in post #1. Both approaches are solutions of the problem presented, which left the desired result from the sample data unstated.
Comment
Marco Errico

Join Date: Apr 2020

Posts: 187
#9

04 Oct 2021, 02:13

Dear Joro Kolev and William Lisowski

Sorry for the late reply. I looked to both solution and thanks for clarifying the differences between the two approaches.

Really many thanks for your time, I truly appreciate.
Comment

Announcement

Summary statistics by country

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment