Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • summation of multiple variables with different subscripts

    Hello everybody

    This is my very first post and I gave my best in order to stick to the advice on posting to Statalist. Please excuse in advance if there should nevertheless be some inaccuracies.



    I am currently trying to construct the following measure of import exposure between two periods for my BA thesis (based on Autor, Dorn & Hanson (2013): "The China Syndrome: Local Labor Market Effects of Import Competition in the United States"):
    Click image for larger version

Name:	Screenshot 2021-07-01 155137.png
Views:	1
Size:	7.0 KB
ID:	1617153
    • "L" denotes the number of employees (given in full-time equivalents)
    • "Delta(Imp)" denotes the difference in imports from China to Switzerland between two periods
    • Subscript "i" denotes a region; "j" an industry, "t" a period
    --> for example Lijt is the number of employees in region i, industry j at the beginning of period t. Following this logic, "Delta(Imp)" is on the industry level.

    My data set looks as following:
    Code:
    input float year int(comzone nace2) float(l_1 l_2 l_3) long tradeflow float d_trade05
    2005 1 5812  502.5827  768.1605 275862.34 1  26727504
    2005 1 2059 135.52043  4980.311 275862.34 0         0
    2005 1 2593 73.693985 2244.5874 275862.34 0   3555463
    2005 1 5811  507.7866 2068.2964 275862.34 0 -12758331
    2005 1 1101  30.48767  397.5343 275862.34 1     53194
    2005 1 1712  171.6538 1999.2538 275862.34 1         0
    2005 1 2573  21.25865  8300.515 275862.34 0  10659572
    2005 1 9004  764.4638  2234.022 275862.34 1         0
    2005 1 1071  295.9666  8137.673 275862.34 1         0
    2005 1 2894  9.184786  6182.562 275862.34 1  28994800
    2005 1 3220  9.937362  485.2864 275862.34 0     98052
    2005 1 5310  2974.012  34733.26 275862.34 0    151012
    2005 1 1623  541.0589  27948.59 275862.34 0  21660954
    2005 1 2822 292.46902  8554.805 275862.34 1  14217534
    2005 1 2016  3.310062   3043.33 275862.34 1  12876245
    end
    label values tradeflow tradeflow
    label def tradeflow 0 "Export", modify
    label def tradeflow 1 "Import", modify
    where:
    • comzone denotes regions (# = 106)
    • nace2 denotes a 4digit industry code
    • l_1 is constructed in a way that it equals Lij
    • l_2 equals Ljt
    • l_3 equals Lit
    • d_trade05 the Delta(Impjt)
    So far I have tried this:

    Code:
    gen numerator = 0
    levelsof nace2, local(nace2)
    foreach l of local nace2{
        replace numerator = (l_1 * d_trade05) / (l_2 * l_3) if nace2 == `l' & tradeflow == 1
    }
    To my understanding, this creates all the single terms. My main problem now is, how to sum up all the single terms so I get the final measure of trade exposure which is at the regional level.

    I have thought about continuing like that:

    Code:
    gen imp_exp05 = 0
    forvalues i = 1/106{
    replace imp_exp05 = total(numerator) if comzone == `i'
    }
    But
    Code:
    total
    does not work with
    Code:
    replace
    . Does anybody have an idea how to solve this problem? Maybe, is there an easier solution or have I made a mistake in general?

    Thank you very much for your help in advance.
    Christoph

  • #2
    See -help egen- and look at the -total()- function. -total- works with -egen-, not with -gen- or -replace-. Moreover, you do not use it in a loop over values of region like that. It's a single command:

    Code:
    by comzone year, sort: egen imp_exp = total(numerator)
    I've deliberately named the variable imp_exp, rather than imp_exp_05 because this will compute the result for every year.

    (And, yes, in case your wondering, the calculation of numerator itself could also have been simplified by the use of a -by- instead of a loop. But that's water over the dam now.)

    I recommend you read -help by-, and then click on the blue link near the top of that page and read the entire chapter on -by- in the PDF manuals that are installed with your Stata. -by- is a "bread and butter" command in Stata and a real workhorse. While there are limits to what it can apply to, when it can be used, it is simpler and much more efficient than the kind of looping you are trying to use. In fact, in Stata, whenever you feel the urge to write a loop, you should always stop and ask yourself "could I do this using -by- instead?"

    Comment


    • #3
      Dear Clyde

      Thank you very much for your answer! It all worked now. I was aware of the -by- command but I did not know that it can be used for a summation like this. Also, I was not sure whether I was on the right track with my approach to construct the measure for trade exposure in general. Therefore, another big thank you for taking the time and thinking through my question.

      Best regards
      Christoph

      Comment

      Working...
      X