Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating different industries for similar regions

    Dear all,

    Since I have to write my thesis I am intensively using Stata. For the major part I am able to find all the questions I have. With respect to this issue, however, I seem to be unable to find what I am looking for. Perhaps my knowledge of the relevant words is not sufficient, therefore I'd like to ask your help with the following issue.

    My dataset is in the form of panel data. There are different regions (40 to be precise) that all have a periode ranging from 1970 - 1993. Usually I have (and want to have it) in this right format that is workable. Now, however, I have data for different industries in a certain region. Thus there are now 2 variables that determine in what category I work, namely regions and industry (type). As with the rest of my data, I want everything to be in the category of regions. Might this be unclear, then maybe this table clarifies it a bit. Currently it is in this format
    region industry year employment production
    Country A industry 1 1970 xxx xxx
    Country A industry 2 1970 xxx xxx
    Country B industry 1 1970 xxx xxx
    Country B industry 2 1970 xxx xxx
    Since I need to calculate Shannon's Entropy (if this doesn't ring a bell, then one may as well ignore it), I'd rather have my data in this format:
    region year employment_1 production_1 employment_2 production_2 employment_3 production_3
    Country A 1970 xxx xxx xxx xxx xxx xxx
    Country B 1970 xxx xxx xxx xxx xxx xxx
    Now employment_1 represents the employment for industry 1, employment_2 the employment for industry 2 and so on. I hope you get the idea and perhaps will be able to help me.
    Thank you in advance.

    Kind regards,

    Dennis

  • #2
    This can be accomplished using the reshape command (help reshape). You want to reshape from long to wide (although, instead of year used in many examples, you want to reshape on industry). Note that in most cases, it is much easier to analyze data in long form than wide form.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Welcome to Statalist!

      Allow me to add another voice to Carole's suggesting that you can almost certainly accomplish the calculation of whatever you need with your data in long format rather than wide. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. So my initial reaction is to not reshape this dataset wide, but instead work out how to calculate Shannon's Entropy on your data in its current format.

      I would suggest some code to show the way, but I don't know how you plan to calculate Shannon's Entropy in your current analysis.

      Perhaps if you post a small hand-made example, with just a few variables and observations, and an explanation of how you would calculate Shannon's Entropy by hand on that data, someone will be able to advise you further. You should first review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question and present sample data using the dataex command.

      Comment


      • #4
        First of all, thank you for the warm welcome and incredibly quick responses.! I have read (and hopefully understood) the Statalist FAQ now. So let me start off with posting my dataex output:

        Code:
        clear
        input str35 Regios float ind int yr float emp long Y int(use dY wages soc indtax resinc) float se
        "Oost-Groningen"                       0 1970  39000   979   543   481   211   49   38   181 0
        "Oost-Groningen"                       0 1973  36300  1288   705   638   289   80   36   232 0
        "Oost-Groningen"                       0 1974  37400  1453   787   725   341  102   40   240 0
        "Oost-Groningen"                       0 1975  34600  1533   814   786   364  107   57   256 0
        "Oost-Groningen"                       0 1976  35700  1658   849   888   409  123   77   278 0
        "Oost-Groningen"                       0 1977  34200  1756   929   918   434  129   78   275 0
        "Oost-Groningen"                       0 1978  32900  1792   981   911   453  135   78   243 0
        "Oost-Groningen"                       0 1979  32900  1872  1008   967   475  143   68   279 0
        "Oost-Groningen"                       0 1980  32500  1944  1044  1007   500  155   78   271 0
        "Oost-Groningen"                       0 1981  29600  1957  1060  1005   471  145   92   296 0
        "Oost-Groningen"                       0 1982  29400  2009  1084  1021   497  152   59   311 0
        "Oost-Groningen"                       0 1983  27000  1969  1028  1040   476  155   74   333 0
        "Oost-Groningen"                       0 1984  27800  2144  1142  1089   488  157   71   371 0
        "Oost-Groningen"                       0 1985  27900  2314  1264  1145   515  160   66   403 0
        "Oost-Groningen"                       0 1986  29400  2414  1282  1230   554  164   74   437 0
        "Oost-Groningen"                       0 1987  28700  2313  1212  1206   549  167   41   448 0
        "Oost-Groningen"                       0 1988  28600  2413  1277  1243   554  164   48   476 0
        "Oost-Groningen"                       0 1989  28900  2606  1362  1354   580  157   82   534 0
        "Oost-Groningen"                       0 1990  29600  2724  1420  1420   667  114   73   564 0
        "Oost-Groningen"                       0 1991  29700  2796  1449  1466   687  115   65   597 0
        "Oost-Groningen"                       0 1992  30300  2938  1523  1540   732  118   88   600 0
        "Oost-Groningen"                       0 1993  30300  3047  1577  1592   751  121   63   656 0
        "Oost-Groningen"                       1 1970    700   122    68    54     3    0    0    50 0
        "Oost-Groningen"                       1 1973    600   156    86    69     5    1    0    62 0
        "Oost-Groningen"                       1 1974    900   178   106    72     9    2   -1    62 0
        "Oost-Groningen"                       1 1975    600   191   116    75     6    1    1    65 0
        "Oost-Groningen"                       1 1976    600   230   135    94     7    2    1    83 0
        "Oost-Groningen"                       1 1977    400   222   134    88     4    1    2    79 0
        "Oost-Groningen"                       1 1978    500   201   136    64     5    1    2    55 0
        "Oost-Groningen"                       1 1979    500   177   122    54     4    1    0    48 0
        "Oost-Groningen"                       1 1980    500   190   134    56     4    1    1    48 0
        "Oost-Groningen"                       1 1981    400   210   145    65     4    1    1    57 0
        "Oost-Groningen"                       1 1982    400   224   158    65     4    1    3    55 0
        "Oost-Groningen"                       1 1983    400   225   161    64     4    1    0    57 0
        "Oost-Groningen"                       1 1984    400   229   162    66     4    1    0    58 0
        "Oost-Groningen"                       1 1985    400   239   169    69     5    1    0    61 0
        "Oost-Groningen"                       1 1986    400   249   158    90     5    1    1    81 0
        "Oost-Groningen"                       1 1987    400   217   142    74     5    1    4    63 0
        "Oost-Groningen"                       1 1988    400   217   139    78     5    1    3    67 0
        "Oost-Groningen"                       1 1989    400   231   139    91     5    1    4    80 0
        "Oost-Groningen"                       1 1990    400   194   111    83     6    0    0    75 0
        "Oost-Groningen"                       1 1991    400   202   124    78     6    0    0    69 0
        "Oost-Groningen"                       1 1992    400   226   150    75     6    1    0    67 0
        "Oost-Groningen"                       1 1993    400   228   148    79     6    1    0    70 0
        "Oost-Groningen"                       2 1970      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1973      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1974      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1975      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1976      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1977      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1978      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1979      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1980      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1981      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1982      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1983      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1984      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1985      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1986      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1987      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1988      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1989      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1990      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1991      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1992      0     0     0     0     0    0    0     0 0
        "Oost-Groningen"                       2 1993      0     0     0     0     0    0    0     0 0
        I hope to clarify it with including the following explanation: "Oost-Groningen" is the region in this case (in my own dataset I have 39 more regions). Then I have the ind (industry) variables. These have been identified with a unique number that ranges from 0 to 17, those were generated through the generate and replace commands. Then there is the yr (year) variable, which ranges from 1970 to 1973 as one may see. Furthermore, there is a variable emp that stands for employment, which is what I am interested in. Lastly, there is a range of different variables, which are not of interest (yet). One may especially ignore the se variable, which is how I wanted initially to calculate Shannon's Entropy. This was simply generated by using gen se = 0.

        Shannon's entropy should be calculated (in my case) by the following formula: Diversityr= -SIGMAind( Eind,r/Er * ln(Eind,r/Er)), where r represents the region (Oost Groningen for the example above), SIGMA represents the sigma sum sign (I hope this is clear), E represents unemployment and ind again the industry. Now I was thinking about a for-loop, but I am not sure how to do that in this case, especially since I have never used them. But perhaps this is not even the way how to do it, so all in all I'm quite puzzled.

        Thanks again!

        Dennis

        Edit: Sorry, I forgot to tell that I am using Stata 14.1 on Windows. Furthermore, I want to obtain Shannon's Entropy for each year per region. Then I will include the entropy in my other dataset that does not discriminate per industry, since it is not the industry that I am interested in but only the region. In that other set I will use either Pooled OLS (reg variables, cluster(region)) or xtreg, fe/re which I shall test with a Hausman test. Hopefully this makes it somewhat more clear. Sorry for the edit, as you know, I'm new here and tend to forget to include certain important things. Cheers!
        Last edited by Dennis Jong; 05 May 2016, 04:17.

        Comment


        • #5
          There are several user-written programs to do this (e.g. ineq (SSC)), but it looks like a few interactive lines to me: First calculate the proportions or probabilities, then just add up the terms over each group of observations. Absolutely no loops needed. It's important to trap the zeros as ln 0 or ln (1/0) will be returned as missing.

          Code:
          egen prop = pc(emp), by(Regios yr) prop
          egen entropy = total(cond(prop == 0, 0, prop * ln(1/prop))), by(Regios yr)
          egen tag = tag(Regios yr)
          l Regios yr entropy if tag
          Sidenote. I don't like the formula -SIGMA( p ln p ) which is how everyone with reasonable high school mathematics tends to write it because the minus sign is more economical than showing a division.

          I prefer SIGMA[ p ln (1/p) ] which to me conveys the meaning better. Entropy is a weighted average of ln (1/p).
          Last edited by Nick Cox; 05 May 2016, 06:07.

          Comment


          • #6
            Although I thought there were some mistake I was proven wrong. Your code is absolutely right, thank you so much. The entropy formula was provided by some article I read by the way, but I agree upon your preference in terms of the formula. Once again, thank you very very much everyone here!

            Comment

            Working...
            X