Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating variable with the number of distinct observations of a particular variable by region and time

    Hello,

    I am trying to make a line graph of the number of distinct products exported in different provinces in South Africa over a 5 year period. I have transaction level data of all exports within a particular period in South Africa with variables including the province (region) the product exited the country, the time (month and year), the value and the product classification (hs6 code). My raw data looks as follows:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str19 province long hs6_code float t
    "Western Cape"  493 684
    "Western Cape"  516 684
    "Western Cape"  560 684
    "Western Cape"  572 684
    "Western Cape"  824 684
    "Western Cape"  945 684
    "Western Cape"  957 684
    "Western Cape" 1172 684
    "Western Cape" 1224 684
    "Western Cape" 1279 684
    "Western Cape" 1362 684
    "Western Cape" 1459 684
    "Western Cape" 1460 684
    "Western Cape" 1464 684
    "Western Cape" 1465 684
    "Western Cape" 1468 684
    "Western Cape" 1469 684
    "Western Cape" 1470 684
    "Western Cape" 1542 684
    "Western Cape" 1663 684
    "Western Cape" 1682 684
    "Western Cape" 1686 684
    "Western Cape" 1688 684
    "Western Cape" 1701 684
    "Western Cape" 1718 684
    "Western Cape" 2174 684
    "Western Cape" 2433 684
    "Western Cape" 2496 684
    "Western Cape" 2517 684
    "Western Cape" 2845 684
    "Western Cape" 2854 684
    "Western Cape" 2900 684
    "Western Cape" 2932 684
    "Western Cape" 3015 684
    "Western Cape" 3028 684
    "Western Cape" 3032 684
    "Western Cape" 3126 684
    "Western Cape" 3132 684
    "Western Cape" 3156 684
    "Western Cape" 3195 684
    "Western Cape" 3208 684
    "Western Cape" 3220 684
    "Western Cape" 3248 684
    "Western Cape" 3255 684
    "Western Cape" 3259 684
    "Western Cape" 3299 684
    "Western Cape" 3311 684
    "Western Cape" 3374 684
    "Western Cape" 3429 684
    "Western Cape" 3430 684
    "Western Cape" 3557 684
    "Western Cape" 3558 684
    "Western Cape" 3598 684
    "Western Cape" 3602 684
    "Western Cape" 3603 684
    "Western Cape" 3729 684
    "Western Cape" 3750 684
    "Western Cape" 3777 684
    "Western Cape" 3779 684
    "Western Cape" 3780 684
    "Western Cape" 3782 684
    "Western Cape" 3822 684
    "Western Cape" 3824 684
    "Western Cape" 3955 684
    "Western Cape" 3969 684
    "Western Cape" 3971 684
    "Western Cape" 4020 684
    "Western Cape" 4048 684
    "Western Cape" 4049 684
    "Western Cape" 4075 684
    "Western Cape" 4184 684
    "Western Cape" 2430 684
    "Western Cape"  479 684
    "Western Cape"  564 684
    "Western Cape" 1172 684
    "Western Cape" 1235 684
    "Western Cape" 1236 684
    "Western Cape" 1247 684
    "Western Cape" 1253 684
    "Western Cape" 1350 684
    "Western Cape" 1362 684
    "Western Cape" 1446 684
    "Western Cape" 1563 684
    "Western Cape" 1564 684
    "Western Cape" 1565 684
    "Western Cape" 1577 684
    "Western Cape" 1705 684
    "Western Cape" 1714 684
    "Western Cape" 2171 684
    "Western Cape" 2273 684
    "Western Cape" 2287 684
    "Western Cape" 2302 684
    "Western Cape" 2371 684
    "Western Cape" 2428 684
    "Western Cape" 2855 684
    "Western Cape" 3024 684
    "Western Cape" 3104 684
    "Western Cape" 3173 684
    "Western Cape" 3259 684
    "Western Cape" 3405 684
    end
    format %tm t
    label values hs6_code hs6_code
    label def hs6_code 479 "170490", modify
    label def hs6_code 493 "190190", modify
    label def hs6_code 516 "200520", modify
    label def hs6_code 560 "210390", modify
    label def hs6_code 564 "210610", modify
    label def hs6_code 572 "220421", modify
    label def hs6_code 824 "282590", modify
    label def hs6_code 945 "290511", modify
    label def hs6_code 957 "290541", modify
    label def hs6_code 1172 "300691", modify
    label def hs6_code 1224 "320890", modify
    label def hs6_code 1235 "321511", modify
    label def hs6_code 1236 "321519", modify
    label def hs6_code 1247 "330290", modify
    label def hs6_code 1253 "330499", modify
    label def hs6_code 1279 "340399", modify
    label def hs6_code 1350 "380993", modify
    label def hs6_code 1362 "381400", modify
    label def hs6_code 1446 "391710", modify
    label def hs6_code 1459 "391990", modify
    label def hs6_code 1460 "392190", modify
    label def hs6_code 1464 "392321", modify
    label def hs6_code 1465 "392329", modify
    label def hs6_code 1468 "392350", modify
    label def hs6_code 1469 "392390", modify
    label def hs6_code 1470 "392410", modify
    label def hs6_code 1542 "401691", modify
    label def hs6_code 1563 "420219", modify
    label def hs6_code 1564 "420221", modify
    label def hs6_code 1565 "420222", modify
    label def hs6_code 1577 "420340", modify
    label def hs6_code 1663 "480100", modify
    label def hs6_code 1682 "481830", modify
    label def hs6_code 1686 "481920", modify
    label def hs6_code 1688 "481940", modify
    label def hs6_code 1701 "482360", modify
    label def hs6_code 1705 "490199", modify
    label def hs6_code 1714 "490810", modify
    label def hs6_code 1718 "491110", modify
    label def hs6_code 2171 "590700", modify
    label def hs6_code 2174 "591000", modify
    label def hs6_code 2273 "611420", modify
    label def hs6_code 2287 "620119", modify
    label def hs6_code 2302 "620319", modify
    label def hs6_code 2371 "621143", modify
    label def hs6_code 2428 "630621", modify
    label def hs6_code 2430 "630630", modify
    label def hs6_code 2433 "630710", modify
    label def hs6_code 2496 "680422", modify
    label def hs6_code 2517 "681310", modify
    label def hs6_code 2845 "730640", modify
    label def hs6_code 2854 "730729", modify
    label def hs6_code 2855 "730791", modify
    label def hs6_code 2900 "731822", modify
    label def hs6_code 2932 "732620", modify
    label def hs6_code 3015 "760719", modify
    label def hs6_code 3024 "761290", modify
    label def hs6_code 3028 "761510", modify
    label def hs6_code 3032 "761699", modify
    label def hs6_code 3104 "820411", modify
    label def hs6_code 3126 "820790", modify
    label def hs6_code 3132 "820900", modify
    label def hs6_code 3156 "830160", modify
    label def hs6_code 3173 "830629", modify
    label def hs6_code 3195 "840290", modify
    label def hs6_code 3208 "840721", modify
    label def hs6_code 3220 "840999", modify
    label def hs6_code 3248 "841381", modify
    label def hs6_code 3255 "841440", modify
    label def hs6_code 3259 "841480", modify
    label def hs6_code 3299 "842123", modify
    label def hs6_code 3311 "842382", modify
    label def hs6_code 3374 "843319", modify
    label def hs6_code 3405 "843890", modify
    label def hs6_code 3429 "844331", modify
    label def hs6_code 3430 "844360", modify
    label def hs6_code 3557 "847110", modify
    label def hs6_code 3558 "847170", modify
    label def hs6_code 3598 "848110", modify
    label def hs6_code 3602 "848180", modify
    label def hs6_code 3603 "848190", modify
    label def hs6_code 3729 "852530", modify
    label def hs6_code 3750 "853180", modify
    label def hs6_code 3777 "853630", modify
    label def hs6_code 3779 "853649", modify
    label def hs6_code 3780 "853650", modify
    label def hs6_code 3782 "853669", modify
    label def hs6_code 3822 "854420", modify
    label def hs6_code 3824 "854441", modify
    label def hs6_code 3955 "890710", modify
    label def hs6_code 3969 "900319", modify
    label def hs6_code 3971 "900410", modify
    label def hs6_code 4020 "901890", modify
    label def hs6_code 4048 "902610", modify
    label def hs6_code 4049 "902620", modify
    label def hs6_code 4075 "903180", modify
    label def hs6_code 4184 "940380", modify

    I have tried to do it by running the following code:

    collapse (sum) ExportValue = v (mean) AvgExportValue = v, by(province hs6_code t)
    duplicates drop hs6_code if province == "KwaZulu Natal", force
    duplicates drop hs6_code if province == "Gauteng", force
    duplicates drop hs6_code if province == "Western Cape", force
    collapse (count) NumProducts = hs6_code, by(province t)

    However, when I try make a graph of NumProducts over time across provinces, the values for some months seem to be implausibly high. As such, I think I must have made a mistake. Does anyone know how I could do this correctly. I am using Stata 15.


    Thank you!

    Kind regards,
    Chris


  • #2
    Code:
    egen tag = tag(hs6_code province t) 
    egen ndistinct = total(tag), by(province t)
    See https://www.stata-journal.com/articl...article=dm0042 esp. p.563. dm0042 is a good search term to find examples on this forum.

    collapse (count) counts non-missing values, regardless of whether they are distinct, so for example 666 instances of 42 would contribute 666, not 1, to the result.

    Comment


    • #3
      Thanks so much Nick!

      Comment


      • #4
        Nick's solution in #2 works fine, and is, in fact, what I would use. But, if you happen to have the -egenmore- package installed (written by somebody Nick knows well , and available from SSC), you can also do it in one line with
        Code:
        egen ndistinct = nvals(hs6_code), by(province t)
        Last edited by Clyde Schechter; 05 Aug 2022, 09:59.

        Comment


        • #5
          The first published version of the tag() code seems to have been in package egenodd in 1999 (first hosted by StataCorp and later published in the Stata Technical Bulletin). It was later folded into official Stata.

          The first published version of the nvals() code seems to have in package egenmore in 2000. I can't recall whether when writing the latter I was thinking that tag() PLUS total() can give you the same result, but nvals() doesn't call up another egen function itself and that would have been inefficient.

          I've been publicising the two-step mentioned in #2 over most of the intervening period because I don't want people to get the impression that something so basic requires community-contributed code.

          Comment

          Working...
          X