Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Total sum is not equal to the group sum

    Hi all,

    ​This is Quan. I am having some problems using the commands "bysort, egen, sum and if" together. I hope I can get a tip from the forum.
    below are some examples of my data:
    t for time,
    i for importer,
    j for exporter,
    k for 6 digit HS code,
    v for trade value,
    and tv for t/1000.


    t i j k v q tv
    2001 410 Thailand 10111 2.313 .152 .002313
    1999 410 USA, Puerto Rico and US Virgin Islands 10111 11.939 12.3 .011939
    2016 410 Czechia 10111 1.724 .004 .001724
    2009 410 Russian Federation 10111 .434 .068 .000434
    2009 410 United Kingdom 10111 2.494 .018 .002494
    2002 410 China 10111 9.899 2.4 .009899
    2005 410 Slovenia 10111 .015 .001 .000015
    2008 410 Thailand 10111 57.4 13 .0574
    2002 410 Philippines 10111 10.423 .74 .010423
    1995 410 China 10111 53.496 5.687 .053496
    1996 410 Belgium-Luxembourg 10111 47.32 19.007 .04732
    2006 410 Other Asia, not elsewhere specified 10111 91.792 2 .091792
    2006 410 Malaysia 10111 14.256 .692 .014256
    2006 410 Kazakhstan 10111 17.6 1.45 .0176
    1997 410 Turkey 10111 57.658 7.125 .057658
    1999 410 Poland 10111 9.458 .222 .009458
    2016 410 Malaysia 10111 1.758 .954 .001758
    2003 410 Canada 10111 1.029 .064 .001029
    2015 410 Germany 10111 13.632 .5 .013632
    2009 410 United Kingdom 10111 2.494 .018 .002494
    2004 410 Canada 10111 .025 .001 .000025
    2009 410 South Africa 10111 2.416 .018 .002416
    2008 410 China 10111 5.98 .038 .00598
    2008 410 Russian Federation 10111 10 1.2 .01
    2018 410 United Arab Emirates 10111 10.056 1.12 .010056
    2009 410 Singapore 10111 26.664 .18 .026664
    2009 410 Other Asia, not elsewhere specified 10111 16.857 .02 .016857
    1995 410 Australia 10111 .944 .105 .000944
    2010 410 Thailand 10111 2.468 .085 .002468
    2000 410 Italy 10111 .917 .06 .000917
    1996 410 France, Monaco 10111 2.979 .002 .002979
    1998 410 Viet Nam 10111 .952 .059 .000952
    2007 410 USA, Puerto Rico and US Virgin Islands 10111 .594 .11 .000594


    I want to know the total export value of agricultural food (defined as HS01-HS24) and the individual HS-2 digit export values by year. So my codes are as follows:

    for total value:
    bysort t: egen tvagsum=sum(tv) if k<250000

    for each HS-2 digit value:
    bysort t: egen tv01sum=sum(tv) if k<20000
    bysort t: egen tv02sum=sum(tv) if k>=20000 & k<30000
    bysort t: egen tv03sum=sum(tv) if k>=30000 & k<40000
    bysort t: egen tv04sum=sum(tv) if k>=40000 & k<50000
    bysort t: egen tv05sum=sum(tv) if k>=50000 & k<60000
    bysort t: egen tv06sum=sum(tv) if k>=60000 & k<70000
    bysort t: egen tv07sum=sum(tv) if k>=70000 & k<80000
    bysort t: egen tv08sum=sum(tv) if k>=80000 & k<90000
    bysort t: egen tv09sum=sum(tv) if k>=90000 & k<100000
    bysort t: egen tv10sum=sum(tv) if k>=100000 & k<110000
    bysort t: egen tv11sum=sum(tv) if k>=110000 & k<120000
    bysort t: egen tv12sum=sum(tv) if k>=120000 & k<130000
    bysort t: egen tv13sum=sum(tv) if k>=130000 & k<140000
    bysort t: egen tv14sum=sum(tv) if k>=140000 & k<150000
    bysort t: egen tv15sum=sum(tv) if k>=150000 & k<160000
    bysort t: egen tv16sum=sum(tv) if k>=160000 & k<170000
    bysort t: egen tv17sum=sum(tv) if k>=170000 & k<180000
    bysort t: egen tv18sum=sum(tv) if k>=180000 & k<190000
    bysort t: egen tv19sum=sum(tv) if k>=190000 & k<200000
    bysort t: egen tv20sum=sum(tv) if k>=200000 & k<210000
    bysort t: egen tv21sum=sum(tv) if k>=210000 & k<220000
    bysort t: egen tv22sum=sum(tv) if k>=220000 & k<230000
    bysort t: egen tv23sum=sum(tv) if k>=230000 & k<240000
    bysort t: egen tv24sum=sum(tv) if k>=240000 & k<250000

    ***the problem is that the sum of all HS -2 digit values does not equal the total value.
    by the way, I have excluded all the missing values... I don't see any problems in the codes, do you have any idea? thanks




  • #2
    It's likely to be a precision problem of some kind. See e.g. https://www.statalist.org/forums/for...-of-unique-ids

    The general advice is to use a double when you have a problem with holding very large, or even very small, results.

    Comment


    • #3
      Mr. Nick Cox,

      Thank you so much for your fast reply, I will check your advice

      Comment

      Working...
      X