Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of one variable according to two other variables

    Hi Dears,

    My datset includes Firms identification: NPC_FIC, Classification by broad economic categories:cgce, and fluxo=1 for import and fluxo==2 for export. I am going to compute the number of export varieties. I used the following codes though it produces the number of categories with min 1 and max 19 not the number EXPORT of goods. What I need is the number of EXPORT products of firms. As you can see the min and max in Tab cgce which are for both Export & Import. Do you know where I made mistake?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(NPC_FIC cgce fluxo)
    500000119  2 1
    502013478 17 1
    502013485  6 1
    502013485 16 1
    501695073 17 2
    500001518  6 2
    500002221  6 1
    501455073 10 1
    502013585 16 1
    502013585 16 1
    500002553 10 1
    502013616 10 1
    502013659 10 1
    500004381 10 1
    500004776 13 2
    502013781 13 2
    502013787 12 2
    501842978  4 2
    500006208 10 1
    500006298 17 2
    500006298 16 2
    500006298 16 2
    502013880 17 1
    500009612 13 2
    500011428 13 2
    500011570  6 1
    500011570  6 2
    500011570  6 2
    500011570  6 2
    500011616 17 2
    500011616 16 2
    500011616 16 2
    500011616 16 2
    500011616 16 2
    500011677  6 2
    502014167 10 1
    501695781  6 2
    500014761  6 1
    502014320 16 1
    500014860 10 1
    500015239  6 1
    500016309 16 1
    500016309 10 1
    500016309 16 1
    500016309 16 1
    500016309 15 1
    500016309 17 1
    500016309 10 1
    500016309 10 1
    500016309 10 1
    500016309 10 1
    500016309  6 1
    500016309 10 1
    500016309 11 1
    502014432  5 2
    502014517 11 1
    502014517 11 1
    502014579 10 1
    500018647  6 2
    500018647  6 2
    500018647  6 2
    500018647  6 2
    500018647  6 2
    500018647  6 2
    500019062  6 2
    500019062  6 2
    500019062  6 2
    500019062 17 2
    500019062 17 2
    500019062 10 2
    500019062  6 2
    500019062 15 2
    500019062  6 2
    500019062  9 2
    502014728  6 1
    502014768 17 1
    502014807 18 1
    500022305  5 1
    500022713 17 1
    500023876 18 1
    501846415 12 2
    501846415 17 2
    501846415 17 2
    501846415 17 2
    501846415  5 2
    500028289  6 2
    500028289  6 2
    501456662 10 1
    501456662 17 1
    500029056 17 1
    500031935  6 2
    500031935 10 2
    500031935 16 2
    500031935 16 2
    500031935 10 2
    500031935 16 2
    500031935  6 2
    500031935  6 2
    500031935 16 2
    500031935 17 2
    end
    label values cgce cgce
    label def cgce 2 "112", modify
    label def cgce 4 "122", modify
    label def cgce 5 "21", modify
    label def cgce 6 "22", modify
    label def cgce 9 "322", modify
    label def cgce 10 "41", modify
    label def cgce 11 "42", modify
    label def cgce 12 "51", modify
    label def cgce 13 "521", modify
    label def cgce 15 "53", modify
    label def cgce 16 "61", modify
    label def cgce 17 "62", modify
    label def cgce 18 "63", modify
    label values fluxo fluxo
    label def fluxo 1 "1", modify
    label def fluxo 2 "2", modify
    
    
    ***Number of export varieties**
    
    egen tag = tag(cgce NPC_FIC)
    egen distinct = total(tag), by(cgce)
    egen n_e = count(fluxo == 2), by(NPC_FIC) 
    ** delet duplicated **keep one firms among repeated firms**
    bysort NPC_FIC fluxo: keep if _n == _N & fluxo == 2
    sum cgce
    
    . sum cgce
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
            cgce |     22,728    10.74485    5.015334          1         19
    
    
    
    
    
    . tab cgce
    
       COD_CGCE |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            111 |     15,398        0.29        0.29
            112 |    147,187        2.81        3.11
            121 |     23,175        0.44        3.55
            122 |    368,501        7.04       10.59
             21 |     85,258        1.63       12.22
             22 |  1,523,180       29.11       41.33
             31 |      1,682        0.03       41.36
            321 |        288        0.01       41.36
            322 |     25,493        0.49       41.85
             41 |    595,740       11.38       53.24
             42 |    561,164       10.72       63.96
             51 |      5,846        0.11       64.07
            521 |     12,508        0.24       64.31
            522 |     10,628        0.20       64.51
             53 |    216,881        4.14       68.66
             61 |    196,818        3.76       72.42
             62 |  1,018,465       19.46       91.88
             63 |    421,203        8.05       99.93
              7 |      3,763        0.07      100.00
    ------------+-----------------------------------
          Total |  5,233,178      100.00
    Listed 100 out of 5233309 observations



  • #2
    Code:
    by NPC_FIC fluxo (cgce), sort: gen wanted = sum(cgce != cgce[_n-1])
    by NPC_FIC fluxo (cgce): replace wanted = wanted[_N]
    This will also give you (separately) the number of distinct cgce's for each NPC_FIC when fluxo == 1. You can just ignore them or replace them with missing values if you want.

    Added: By the way, -egen n_e = count(fluxo == 2), by(NPC_FIC)- does not mean what you think it means. It does not give you a count of the number of observations having fluxo == 2 (in each NPC_FIC). That's not how -egen, count()- works. With -egen-, -count(expression)- gives a count of the number of observations for which the expression is not a missing value. In the case of the logical expression fluxo == 2, in every observation it is either true (1) or false(0), so it is always a non-missing value. Consequently -egen n_e = count(fluxo == 2), by(NPC_FIC)- just gives you a count of the number of observations each NPC_FIC has in the data set. To get a count of the number of observations in which fluxo == 2, you need -egen n_e = total(fluxo == 2), by(NPC_FIC)-. Remember that in each observation fluxo == 2 is either true (which, as a number means 1) or false (which, as a number means 0). So when you total up those 1s and 0s you get a count of the number of observations for which fluxo == 2, within each NPC_FIC. That's still not what you wanted, but it's more in the spirit of what you wanted, and that approach will often be useful.
    Last edited by Clyde Schechter; 13 Jan 2023, 17:11.

    Comment


    • #3
      Dear Clyde,

      Thank you for getting back to me.
      I still could not reach the number of export goods. The code produces the category (19 category exists) while the quantity of products reaches some thousands. I seek the number of goods firms export.
      Moreover, the data is based on workers, one firm can have 100 workers, therefore one firm's identification (NPC_FIC) might be reaped 100 times. So I have to keep one for each firm.
      Code:
      . by NPC_FIC fluxo (cgce), sort: gen n_ex = sum(cgce != cgce[_n-1])
      
      . by NPC_FIC fluxo (cgce): replace n_ex = n_ex[_N]
      (4,638,217 real changes made)
      
      . sum n_ex
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
              n_ex |  5,402,947    8.229319    4.464022          0         20

      Comment


      • #4
        I think you did not understand the results produced by my code in #2, as your subsequent use of -sum- is not appropriate for that.

        From the beginning:
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long(NPC_FIC cgce fluxo)
        500000119  2 1
        502013478 17 1
        502013485  6 1
        502013485 16 1
        501695073 17 2
        500001518  6 2
        500002221  6 1
        501455073 10 1
        502013585 16 1
        502013585 16 1
        500002553 10 1
        502013616 10 1
        502013659 10 1
        500004381 10 1
        500004776 13 2
        502013781 13 2
        502013787 12 2
        501842978  4 2
        500006208 10 1
        500006298 17 2
        500006298 16 2
        500006298 16 2
        502013880 17 1
        500009612 13 2
        500011428 13 2
        500011570  6 1
        500011570  6 2
        500011570  6 2
        500011570  6 2
        500011616 17 2
        500011616 16 2
        500011616 16 2
        500011616 16 2
        500011616 16 2
        500011677  6 2
        502014167 10 1
        501695781  6 2
        500014761  6 1
        502014320 16 1
        500014860 10 1
        500015239  6 1
        500016309 16 1
        500016309 10 1
        500016309 16 1
        500016309 16 1
        500016309 15 1
        500016309 17 1
        500016309 10 1
        500016309 10 1
        500016309 10 1
        500016309 10 1
        500016309  6 1
        500016309 10 1
        500016309 11 1
        502014432  5 2
        502014517 11 1
        502014517 11 1
        502014579 10 1
        500018647  6 2
        500018647  6 2
        500018647  6 2
        500018647  6 2
        500018647  6 2
        500018647  6 2
        500019062  6 2
        500019062  6 2
        500019062  6 2
        500019062 17 2
        500019062 17 2
        500019062 10 2
        500019062  6 2
        500019062 15 2
        500019062  6 2
        500019062  9 2
        502014728  6 1
        502014768 17 1
        502014807 18 1
        500022305  5 1
        500022713 17 1
        500023876 18 1
        501846415 12 2
        501846415 17 2
        501846415 17 2
        501846415 17 2
        501846415  5 2
        500028289  6 2
        500028289  6 2
        501456662 10 1
        501456662 17 1
        500029056 17 1
        500031935  6 2
        500031935 10 2
        500031935 16 2
        500031935 16 2
        500031935 10 2
        500031935 16 2
        500031935  6 2
        500031935  6 2
        500031935 16 2
        500031935 17 2
        end
        label values cgce cgce
        label def cgce 2 "112", modify
        label def cgce 4 "122", modify
        label def cgce 5 "21", modify
        label def cgce 6 "22", modify
        label def cgce 9 "322", modify
        label def cgce 10 "41", modify
        label def cgce 11 "42", modify
        label def cgce 12 "51", modify
        label def cgce 13 "521", modify
        label def cgce 15 "53", modify
        label def cgce 16 "61", modify
        label def cgce 17 "62", modify
        label def cgce 18 "63", modify
        label values fluxo fluxo
        label def fluxo 1 "1", modify
        label def fluxo 2 "2", modify
        
        by NPC_FIC fluxo (cgce), sort: gen wanted = sum(cgce != cgce[_n-1])
        by NPC_FIC fluxo (cgce): replace wanted = wanted[_N]
        
        egen tag = tag(NPC_FIC fluxo)
        list NPC_FIC wanted if tag & fluxo == 2, noobs clean
        The output of that last -list- command is just what you want:
        Code:
              NPC_FIC   wanted  
            500001518        1  
            500004776        1  
            500006298        2  
            500009612        1  
            500011428        1  
            500011570        1  
            500011616        2  
            500011677        1  
            500018647        1  
            500019062        5  
            500028289        1  
            500031935        4  
            501695073        1  
            501695781        1  
            501842978        1  
            501846415        3  
            502013781        1  
            502013787        1  
            502014432        1
        Or, if you want to -summarize- it:

        Code:
        . summ wanted if tag & fluxo == 2
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
              wanted |         19    1.578947    1.169795          1          5

        Comment


        • #5
          Dear Clyde,

          That is not really what I need.
          Please have a look at these frequency numbers:
          Code:
           tab cgce
          
             COD_CGCE |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                  111|        150        0.35        0.35
                  112 |        481        1.12        1.47
                  121 |         31        0.07        1.54
                  122 |      1,657        3.86        5.40
                   21 |        692        1.61        7.01
                   22 |      5,651       13.15       20.16
                   31 |          9        0.02       20.18
                  321 |          2        0.00       20.19
                  322 |        111        0.26       20.45
                   41 |      4,916       11.44       31.89
                   42 |      3,319        7.73       39.62
                   51 |        853        1.99       41.60
                  521 |      1,187        2.76       44.37
                  522 |        199        0.46       44.83
                   53 |      2,095        4.88       49.71
                   61 |      2,457        5.72       55.42
                   62 |      8,293       19.30       74.73
                   63 |     10,603       24.68       99.41
                    7 |        253        0.59      100.00
          ------------+-----------------------------------
                Total |     42,959      100.00
          The red numbers are the quantity of goods exported/imported altogether. I am going to count just the number of exported goods which should be 1 to 6000ish goods. Your code gives the blue number which indicates the economic categories to which each product belongs, not the number of goods (its sum is 19).

          Comment


          • #6
            Ok, then I really don't understand at all what you are asking for and how it relates to your data. The fact that you report back with discrepancies based on your whole data set only adds to the confusion, because I have no way to know how those numbers were arrived at. You are not explaining the data in any way that I am able to understand. Here's what I think you should do.

            Here is a new data set involving just two NPC_FIC codes, extracted from your original example data:
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long(NPC_FIC cgce fluxo)
            501842978  4 2
            500019062  6 2
            500019062  6 2
            500019062  6 2
            500019062 17 2
            500019062 17 2
            500019062 10 2
            500019062  6 2
            500019062 15 2
            500019062  6 2
            500019062  9 2
            end
            label values cgce cgce
            label def cgce 4 "122", modify
            label def cgce 6 "22", modify
            label def cgce 9 "322", modify
            label def cgce 10 "41", modify
            label def cgce 15 "53", modify
            label def cgce 17 "62", modify
            label values fluxo fluxo
            label def fluxo 2 "2", modify
            In this data set, work out by hand the results you want for those two NPC_FICs and explain in a series of sentences referring specifically to the variable names and numerical values that appear in this example data set, how you arrived at those results. Please do not show results from your full data set, nor mention any variables that do not occur in the example. Stick to this limited example.

            Comment


            • #7
              Sorry, I could not express the problem correctly and confound you in a way.
              Code:
              . tab cgce
              
                     cgce |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        4 |          1        9.09        9.09
                        6 |          5       45.45       54.55
                        9 |          1        9.09       63.64
                       10 |          1        9.09       72.73
                       15 |          1        9.09       81.82
                       17 |          2       18.18      100.00
              ------------+-----------------------------------
                    Total |         11      100.00
              I need to obtain 11.

              Comment


              • #8
                Well, if that's what you need, the only thing special about the data example is that we have fluxo == 2 throughout. So it sounds like all you need is
                Code:
                tab cgce if fluxo == 2
                If that is not what you want, post back with an example where that gives the wrong answer, and explain why the answer is wrong, and what the right answer would be.

                Comment


                • #9
                  Dear Clyde,

                  First of all, I am thankful you prevented me from making a goof on my work, an unforgivable one. I always repeat to myself 'don't work when you are exhausted, though I do not care about my thought more often!
                  Actually, I made a mistake in reading variables in the dataset (Portuguese abbreviations are really confusing). The true data are as follows: (instead of product codes,artigo, I used the category cgce in my previous post, feel so sorry about that)
                  clear
                  input double NPC_FIC long artigo byte fluxo
                  Code:
                  5.002e+08 61102099 1
                  5.020e+08 84229090 2
                  5.020e+08 94036090 2
                  5.020e+08 63090000 2
                  5.020e+08 69111000 2
                  5.020e+08 71189000 1
                  5.002e+08 84138100 2
                  5.002e+08 39233010 2
                  500200024 90148000 2
                  500200005 84314980 2
                  5.002e+08 62046231 2
                  5.002e+08 62046231 1
                  5.020e+08 87033190 1
                  5.002e+08 94042910 2
                  5.002e+08 94036010 2
                  500208000 94042990 2
                  5.132e+08 94035000 2
                  3.002e+08 94036010 2
                  4.002e+08 94035000 2
                  5.002e+08 94014000 2
                  5.002e+08 94016100 2
                  5.002e+08 7011000 2
                  5.002e+08 40122000 1
                  5.002e+08 25161290 2
                  5.020e+08 91021100 1
                  5.020e+08 91021100 1
                  5.020e+08 91021100 1
                  5.019e+08 6049990 2
                  5.020e+08 49019900 1
                  5.002e+08 39269097 1
                  5.020e+08 85332100 1
                  5020034500 85238099 1
                  5.002e+08 33029090 1
                  5.002e+08 38249098 2
                  5.002e+08 39251000 2
                  500256000 33079000 2
                  5.002e+08 33021040 2
                  5.002e+08 39235090 2
                  5.002e+08 34021900 1
                  500200700 39235090 1
                  5.002e+08 33072000 2
                  5002000090 33072000 1
                  5.002e+08 38091090 1
                  5.002e+08 34011900 1
                  5.002e+08 48173000 1
                  5.002e+08 39233090 1
                  5.002e+08 34011100 1
                  500203000 38089190 1
                  5.002e+08 29241900 1
                  5.002e+08 33059090 1
                  5.002e+08 28151100 2
                  5.002e+08 42029211 2
                  7.002e+08 73269098 2
                  5.002e+08 33049900 2
                  5.002e+08 84818061 2
                  5.002e+08 90178010 2
                  5.002e+08 73079990 2
                  5.002e+08 39239090 2
                  5.002e+08 70109031 2
                  5.002e+08 39269097 2
                  4.002e+08 39233090 2
                  6.002e+08 32049000 2
                  5.002e+08 39119099 1
                  5.002e+08 34022090 1
                  5.002e+08 33071000 1
                  5.002e+08 39233010 1
                  5.002e+08 3026968 1
                  9.020e+08 84158200 2
                  5.020e+08 84182191 2
                  5.020e+08 73089099 2
                  5.020e+08 84388099 2
                  5.002e+08 49019900 1
                  5.002e+08 94036010 2
                  5.002e+08 94035000 2
                  5.002e+08 91021100 1
                  5.020e+08 97019000 1
                  5.020e+08 87163980 2
                  5.020e+08 87033390 2
                  5.020e+08 83062990 1
                  5.002e+08 8093010 1
                  5.002e+08 7031019 1
                  2.002e+08 8045000 1
                  2.402e+08 8082050 1
                  2.302e+08 8081080 1
                  4.562e+08 7095990 1
                  5.002e+08 8094005 1
                  5.002e+08 8104030 1
                  5.002e+08 7089000 1
                  5.002e+08 7020000 1
                  5.002e+08 8071900 1
                  5.002e+08 8043000 1
                  5.002e+08 8101000 1
                  5.002e+08 8044000 1
                  5.002e+08 8092095 1
                  5.002e+08 8093090 1
                  5.002e+08 8094090 1
                  5.002e+08 8105000 1
                  5.002e+08 8061010 1
                  5.002e+08 8072000 1
                  5.002e+08 7096099 1
                  end
                  I am going to extract the Number of exported goods from Portuguese firms. Firms' ID: NPC_FIC- Products code:artigo- fluxo=1 for imports- fluxo=2 for exports.
                  I used code #4
                  Code:
                  by NPC_FIC fluxo (artigo), sort: gen n_ex = sum(artigo != artigo[_n-1])
                  by NPC_FIC fluxo (artigo): replace n_ex = n_ex[_N]
                  
                  
                  
                  *Usually firms export one product to several countries, so one item may appear several times. To eliminate the repeated export products I used this:
                  
                  bysort NPC_FIC artigo: keep if _n == _N & fluxo == 2
                  sum n_ex
                  
                      Variable |        Obs        Mean    Std. dev.       Min        Max
                  -------------+---------------------------------------------------------
                          n_ex |         52    17.51923     13.3556          1         30
                  I am not satisfied with the results though. I am eager to receive your ideas.
                  Thanks a million.
                  Last edited by Paris Rira; 15 Jan 2023, 08:50.

                  Comment


                  • #10
                    I am not satisfied with the results though.
                    I cannot read your mind. What about the results dissatisfies you? How do they differ from the results you want?

                    Comment


                    • #11
                      Do you think code #9 is correct?
                      what about this one
                      Code:
                      by NPC_FIC fluxo (artigo), sort: gen n_ex = sum(artigo != artigo[_n-1])
                      by NPC_FIC fluxo (artigo): replace n_ex = n_ex[_N]
                      
                      egen tag = tag(NPC_FIC artigo)
                      sum n_ex if tag & fluxo == 2
                      Enable GingerCannot connect to Ginger Check your internet connection
                      or reload the browserDisable in this text fieldRephraseRephrase current sentence15Edit in GingerĂ—
                      Last edited by Paris Rira; 15 Jan 2023, 09:33.

                      Comment


                      • #12
                        I think they're both wrong.

                        Both of them have the same first two commands:
                        Code:
                        by NPC_FIC fluxo (artigo), sort: gen n_ex = sum(artigo != artigo[_n-1])
                        by NPC_FIC fluxo (artigo): replace n_ex = n_ex[_N]
                        This code identifies, for each NPC_FIC two (potentially) different values of n_ex, one for observations where fluxo == 1, and the other for when fluxo == 2. The value of n_ex in each case is the number of distinct values of artigo for that NPC_FIC and fluxo. (So if the same artigo appears more than once with that NPC_FIC and fluxo, it is only counted once.)

                        So far, so good. It's important to bear in mind that for any given NPC_FIC, there will be only (at most) two different values of n_ex, one when fluxo == 1 and the other when fluxo == 2. The value of n_ex is the same for all values of artigo in the NPC_FIC fluxo combination, because it is a count of how many different values of artigo there are in the NPC_FIC fluxo combination.

                        Now, you are looking for some summary statistics on n_ex when fluxo == 2. Let's look at
                        Code:
                        bysort NPC_FIC artigo: keep if _n == _N & fluxo == 2
                        sum n_ex
                        The problem with this code is that with -bysort NPC_FIC artigo-, if the same value of artigo occurs in that NPC_FIC with both fluxo == 1 and fluxo == 2, you have no guarantee which of those will be in the _n == _N (final) position. It could be either, at random. So if the one with fluxo == 1 happens to be the last, it fails the -& fluxo == 2- condition, and it is dropped, and there is no observation kept at all for that combination of NPC_FIC and artigo. But even more to the point, grouping by combination of NPC_FIC and artigo makes no sense in this context, because the value of artigo is irrelevant to the value of n_ex.

                        Now let's look at
                        Code:
                        egen tag = tag(NPC_FIC artigo)
                        sum n_ex if tag & fluxo == 2
                        This has pretty much the same problem as the last one. You are identifying a single observation in each NPC_FIC artigo combination. If that NPC_FIC artigo combination appears with both fluxo = 1 and fluxo = 2, then which of those you keep is just random.

                        What I believe you want to do is identify a single observation with fluxo == 2 in each NPC_FIC group and summarize the values of n_ex among these. That would be:
                        Code:
                        egen tag = tag(NPC_FIC fluxo)
                        sum n_ex if tag & fluxo == 2





                        Comment


                        • #13
                          Prof Schechter,

                          It worked perfectly. You saved my day. Please accept my endless gratitude.

                          Comment

                          Working...
                          X