Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating geographical sales dispersion by using entropy index of company sales per country

    Hello everyone,
    I am completely new to Stata and getting familiar with the programme.
    However, as I am currently writing my master's thesis in international business management, I was hoping for some support by Stata veterans like you.

    I want to calculate the geographic dispersion of companies' sales per year, which is done by using the entropy index of geographic dispersion.
    The formula is as follows in the screenshot.

    Now, I am wondering on how to best operationalise this formula in Stata15, can anyone help me out?

    Thanks in advance!

    Jennifer
    Click image for larger version

Name:	Screenshot 2019-10-16 17.49.41.png
Views:	2
Size:	694.4 KB
ID:	1520717
    Attached Files

  • #2
    There are many ways to calculate the entropy index in Stata. See -search entropy- (The -search- command is always worth trying in Stata, a simple trick commonly unknown to new users.) Among the possibilities you can learn about there, I'd recommend you use the -entropyetc- command, but telling you how to apply it to your data would require knowing the structure of your data set, which is best communicated to us with an example. Please look at the StataList FAQ (tab at the top of your screen) and search for "dataex." This Stata command prepares an example excerpt of your data which is the best (I'd say *only*) way to post an example of your data on StataList. In any event, you will first need to install the -entropyetc- command, which you can do with
    Code:
    ssc install entropyetc
    After doing this, you should look at -help entropyetc-.

    Comment


    • #3
      Hello Mike,

      thank you very much for your fast feedback and the tipps on how to best find help in Stata!
      I will try to quickly familiarize myself with the different possibilities.
      As suggested, here is a data example of my dataset.
      In general, my dataset consists of panel data from >6000 global companies, each company with a different geographic dispersion of its sales per region/country.
      The data were obtained from Thomson Reuter's Datastream database, which provides up to 10 segments of corporate geographic and product diversification.
      My first variable within the dataset is the company ID, followed by the observation year (time frame 2000-2018).
      The third variable contains the company's total sales, which I would need to calculate the percentage of sales within each geographic region (sales per segment / total sales).
      Starting variable 4 (geosalesseg1,...), my dataset contains the sales per geographic segment per company per year.
      Geosalesseg1 always contains domestic sales, therefore if total sales and geosalesseg1 are the same, that means that the company only sells in its domestic market and did not internationalize.
      The number of geographic segments may change within each year, which is why some years include more geographic regions than others (depending on its path of internationalization).

      I have started to calculate the P(i,j) as well as the log(P(i,j)) in Stata, but I struggled when trying to proceed from here on, further thinking that there must be other options to calculate an entropy index in Stata, other than simply following the mathematical formula.

      I will look into the -entropyetc- command, as I have already read the post during my initial search on Statalist.
      However, I would highly appreciate some support, as I still feel a bit overwhelmed with the more complex commands in Stata.

      Kind regards,

      Jennifer

      Code:
       
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str6 CompanyID int year float id double(totalsales geosalesseg1 geosalesseg2 geosalesseg3 geosalesseg4 geosalesseg5 geosalesseg6 geosalesseg7 geosalesseg8 geosalesseg9 geosalesseg10) byte X
      "130042" 2000   1    90176    90176      .      .    . . . . . . . .
      "130042" 2001   2   114723   114723      .      .    . . . . . . . .
      "130042" 2002   3   133096   133096      .      .    . . . . . . . .
      "130042" 2003   4   146569   146569      .      .    . . . . . . . .
      "130042" 2004   5   168341   168341      .      .    . . . . . . . .
      "130042" 2005   6   207170   207170      .      .    . . . . . . . .
      "130042" 2006   7   273363   257485  15878      .    . . . . . . . .
      "130042" 2007   8   318302   290161  28141      .    . . . . . . . .
      "130042" 2008   9   415404   356303  53126   5975    0 . . . . . . .
      "130042" 2009  10   374461   316177  55024   3260    . . . . . . . .
      "130042" 2010  11   330757   273900  51511   3168 2178 0 . . . . . .
      "130042" 2011  12   364428        0 302724  61704    0 0 0 . . . . .
      "130042" 2012  13   381259        0 314076  67183    0 . . . . . . .
      "130042" 2013  14   406486        . 331252  75234    . . . . . . . .
      "130042" 2014  15   445474   352344  86698   6432    . . . . . . . .
      "130042" 2015  16   530777   335394  88121 107262    . . . . . . . .
      "130042" 2016  17   508622        .      .      .    . . . . . . . .
      "130042" 2017  18   533549   533549      .      .    . . . . . . . .
      "130042" 2018  19   593229   593229      .      .    . . . . . . . .
      "130062" 2000  20  1333000  1333000      .      .    . . . . . . . .
      "130062" 2001  21  1505691  1505691      .      .    . . . . . . . .
      "130062" 2002  22  1497101  1497101      .      .    . . . . . . . .
      "130062" 2003  23  1711453  1711453      .      .    . . . . . . . .
      "130062" 2004  24  1759613  1759613      .      .    . . . . . . . .
      "130062" 2005  25  2004243  2004243      .      .    . . . . . . . .
      "130062" 2006  26  2206401  2206401      .      .    . . . . . . . .
      "130062" 2007  27  2207141  2207141      .      .    . . . . . . . .
      "130062" 2008  28  2120081  2120081      .      .    . . . . . . . .
      "130062" 2009  29  1702603  1702603      .      .    . . . . . . . .
      "130062" 2010  30  1782857  1782857      .      .    . . . . . . . .
      "130062" 2011  31  1713823  1672077  41746      .    . . . . . . . .
      "130062" 2012  32  2037667  1996142  41525      .    . . . . . . . .
      "130062" 2013  33  2155551  2113068  42483      .    . . . . . . . .
      "130062" 2014  34  2957951  2912115  45836      .    . . . . . . . .
      "130062" 2015  35  3539570  3493462  46108      .    . . . . . . . .
      "130062" 2016  36  3818749  3761651  57098      .    . . . . . . . .
      "130062" 2017  37  3965594  3901323  64271      .    . . . . . . . .
      "130062" 2018  38  4244265  4166339  77926      .    . . . . . . . .
      "130086" 2000  39  1226878  1063000 115000  49000    . . . . . . . .
      "130086" 2001  40  1164913  1012000 113000  40000    . . . . . . . .
      "130086" 2002  41  1117431   989000  87000  41000    . . . . . . . .
      "130086" 2003  42  1100852  1002852  86000  12000    . . . . . . . .
      "130086" 2004  43  1206996  1087000  94000  26000    . . . . . . . .
      "130086" 2005  44  1180700  1122000  49000  10000    . . . . . . . .
      "130086" 2006  45  1229807  1182000  32000  16000    . . . . . . . .
      "130086" 2007  46  1224654  1172000  43000  10000    . . . . . . . .
      "130086" 2008  47  1232100  1220000  12000      0    . . . . . . . .
      "130086" 2009  48  1168567  1136000  33000      .    . . . . . . . .
      "130086" 2010  49  1315233  1228000  87000      .    . . . . . . . .
      "130086" 2011  50  1488642  1378000 111000      0    . . . . . . . .
      "130086" 2012  51  1571000  1467000 104000      0    . . . . . . . .
      "130086" 2013  52  1707822  1428000 280000      .    . . . . . . . .
      "130086" 2014  53   603521   318000 286000      .    . . . . . . . .
      "130086" 2015  54   544874   302074 242800      .    . . . . . . . .
      "130086" 2016  55   788278   507391 280887      .    . . . . . . . .
      "130086" 2017  56   819596   419403 400193      .    . . . . . . . .
      "130086" 2018  57   816138   390396 425742      .    . . . . . . . .
      "130088" 2000  58   759037   759037      .      .    . . . . . . . .
      "130088" 2001  59   849799   849799      .      .    . . . . . . . .
      "130088" 2002  60  1209990  1209990      .      .    . . . . . . . .
      "130088" 2003  61  1472885  1472885      .      .    . . . . . . . .
      "130088" 2004  62  1738843  1738843      .      .    . . . . . . . .
      "130088" 2005  63  2067979  2067979      .      .    . . . . . . . .
      "130088" 2006  64  2369612  2369612      .      .    . . . . . . . .
      "130088" 2007  65  2703212  2703212      .      .    . . . . . . . .
      "130088" 2008  66  3007949  3007949      .      .    . . . . . . . .
      "130088" 2009  67  3206937  3206937      .      .    . . . . . . . .
      "130088" 2010  68  3638336  3638336      .      .    . . . . . . . .
      "130088" 2011  69  4232743  4232743      .      .    . . . . . . . .
      "130088" 2012  70  4664120  4664120      .      .    . . . . . . . .
      "130088" 2013  71  5164784  5164784      .      .    . . . . . . . .
      "130088" 2014  72  5711715  5711715      .      .    . . . . . . . .
      "130088" 2015  73  6226507  6226507      .      .    . . . . . . . .
      "130088" 2016  74  6779579  6779579      .      .    . . . . . . . .
      "130088" 2017  75  7256382  7256382      .      .    . . . . . . . .
      "130088" 2018  76  7911046  7911046      .      .    . . . . . . . .
      "130104" 2000  77        .        .      .      .    . . . . . . . .
      "130104" 2001  78        .        .      .      .    . . . . . . . .
      "130104" 2002  79        .        .      .      .    . . . . . . . .
      "130104" 2003  80    16838        .      .      .    . . . . . . . .
      "130104" 2004  81    20438        .      .      .    . . . . . . . .
      "130104" 2005  82    48918        .      .      .    . . . . . . . .
      "130104" 2006  83    71464        .      .      .    . . . . . . . .
      "130104" 2007  84   109385        .      .      .    . . . . . . . .
      "130104" 2008  85    91059        .      .      .    . . . . . . . .
      "130104" 2009  86    80459        .      .      .    . . . . . . . .
      "130104" 2010  87   113357        .      .      .    . . . . . . . .
      "130104" 2011  88   141577        .      .      .    . . . . . . . .
      "130104" 2012  89   187369        .      .      .    . . . . . . . .
      "130104" 2013  90   263937   200407  33333  30197    . . . . . . . .
      "130104" 2014  91   271183   192613  40956  37614    . . . . . . . .
      "130104" 2015  92   321525   211934  63463  46128    . . . . . . . .
      "130104" 2016  93   383736   250822  92574  40340    . . . . . . . .
      "130104" 2017  94   448845   358655  71978  18212    . . . . . . . .
      "130104" 2018  95   482166   389416  75840  16911    . . . . . . . .
      "130169" 2000  96 12222302 12222302      .      .    . . . . . . . .
      "130169" 2001  97 17779180 17779180      .      .    . . . . . . . .
      "130169" 2002  98 19338160 19338160      .      .    . . . . . . . .
      "130169" 2003  99 19373409 19373409      .      .    . . . . . . . .
      "130169" 2004 100 19912647 19912647      .      .    . . . . . . . .
      end

      Comment


      • #4
        Thanks for your data example. Mike Lacy is right on all scores. There are community-contributed commands in this territory, but all advice is speculation unless and until we can see your data structure.

        The structure is perfectly reasonable but it's nothing like what is assumed by entropyetc (SSC), which happens to be the command I know best here. I can't rule out there being some other command that will serve directly.

        That's not a great problem as you just need your own loop What requires care here is that

        1. Your missings are really zeros. We don't have to fix that but we need to watch out.

        2. Left to its devices Stata will tell you that if the proportion is 0 the contribution to entropy is indeterminate

        Code:
        . display 0 * ln(1/0)
        .


        but that is because Stata did not do a calculus course to see the expression as a whole and know that 0 ln(1/0) is to be returned as 0, as a variety of mathematical arguments will show.

        That caught carefully it's a loop:

        Code:
        gen double entropy = 0
        gen double term = .
        
        quietly forval j = 1/10 {
            replace term = geosalesseg`j'/totalsales
            replace entropy = entropy + term * ln(1/term) if !inlist(term, 0, .)
        }
        
        tabdisp year CompanyID, c(entropy)
        
        ----------------------------------------------------------------------------
                  |                            CompanyID                            
             year |    130042     130062     130086     130088     130104     130169
        ----------+-----------------------------------------------------------------
             2000 |         0          0  .47474111          0          0          0
             2001 |         0          0  .46432472          0          0          0
             2002 |         0          0  .42809316          0          0          0
             2003 |         0          0  .33336537          0          0          0
             2004 |         0          0   .3757673          0          0          0
             2005 |         0          0  .22092732          0          0          
             2006 | .22166254          0  .18954351          0          0          
             2007 | .29884304          0  .19891329          0          0          
             2008 |  .4556623          0  .05488127          0          0          
             2009 | .46594369          0  .12820897          0          0          
             2010 | .52340037          0  .24372402          0          0          
             2011 |  .4548022  .11454782  .26506736          0          0          
             2012 | .46560307   .0995093  .24369638          0          0          
             2013 | .47901547  .09690306  .44607911          0  .71843516          
             2014 | .56522434  .07994926  .69149811          0  .80247398          
             2015 | .91132418  .06948606  .68721839          0  .87357147          
             2016 |         0  .07768164  .65127726          0  .85777191          
             2017 |         0   .0828859  .69287248          0   .6027849          
             2018 |         0  .09158721  .69220906          0   .5809839          
        ----------------------------------------------------------------------------



        See https://www.stata-journal.com/articl...article=pr0046 for strategic advice for row-wise calculations.

        Make sure in your own writing that you write ln() not Ln(), unlike your own unnamed author(s). There is no reason to do that.

        Even worse, p here is not a percentage. It's a proportion. As a Public Service Announcement you might well give the full reference to expose incompetent explanation.

        Note. See http://www.pamitc.org/documents/mermin.pdf on avoiding awkward wording such as "the following formula".
        Last edited by Nick Cox; 17 Oct 2019, 00:21.

        Comment


        • #5
          Dear Nick,
          thank you for your fast response, support and, most of all, your constructive feedback.
          I will make sure to watch out for correct explanation and wording and I highly appreciate your pointing out the weak spots to help me improving the quality of my work.

          Kind regards!

          Comment

          Working...
          X