Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • selecting among repeated values of a variable

    Hello everybody,

    I came to share my trouble which is about selecting some values of a variable with repeated values. Specifically, I have two variables, namely Companies and Nationalities. Companies are based on Company Identifiers and some companies have multinational workers (so duplicated identifiers appear). I need to select companies that have ONLY Pt (specific nationality). I would appreciate any help. Thanks.

    Company ID Nationalities

    248702 PT
    248702 BR
    199171 PT
    199171 PT
    199171 PT
    199171 IR
    199171 US
    227354 PT
    227354 PT
    264723 PT
    264723 BR
    1160590 UA
    259217 PT
    259217 PT
    678597 BR
    678597 SP
    678597 PT
    678597 PT

    Last edited by Paris Rira; 22 Dec 2022, 08:54.

  • #2
    Welcome to Statalist, Paris!

    It is hard to provide a solution without a data example. You can do this using the dataex command in Stata. Please see the Statalist FAQ for this and other suggestions on how best to pose questions on this forum.

    Comment


    • #3
      See https://www.stata-journal.com/articl...article=dm0042 and specifically p.563 for the suggestion

      Code:
      egen tag = tag(Companies Nationalities) 
      
      egen wanted = total(tag), by(Companies) 
      
      browse Companies Nationalities if tag & wanted == 1

      Comment


      • #4
        Thank you for the codes.
        I ll have a look at the Distinct Observation paper. BTW, the code does not work. It shows all nationalities. While I need to choose companies that have just PT. In the dataset only the companies with ID 227354 and 259217 have PT, the rest have PT plus others such as US. I need to select/ distinguish those companies that have Only PT.

        Comment


        • #5
          Your data example (in an edit to #1) was visible to me only after I made a guess at what you want -- which was wrong, as I understood you to mean only one nationality.

          I am still guessing because you didn't use dataex, so your storage types are still ambiguous and one variable name has changed from first mention, but perhaps this is closer.


          Code:
          bysort Company_id (Nationalities) : gen PT_only = Nationalities[1] == "PT" & Nationalities[_N] == "PT"
          For the logic see https://www.stata.com/support/faqs/d...ions-in-group/ for an identifiably similar question.

          Comment


          • #6
            Yup, I did dataex. You did not get me totally, I am afraid. I do not need one Nationality. What I need are companies that hire only one specific nationality (PT). I need to separate those companies which have one specific nationality (PT). More often companies have workers from different nationalities as my dataset, PT, US, and SP. I have to group firms that have only and only PT.

            Comment


            • #7
              No; #1 did not include dataex results as they are produced. dataex results make clear the storage types of the variables concerned.

              But my understanding is that #5 contains code to generate an indicator that is 1 for the observations you want and 0 otherwise.

              Comment


              • #8
                My expectation was like yours for running #5 codes, however, the result tells another story! Dear Nick, the problem is not the storage type.
                dataex CompanyID Nationality
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input double CompanyID long Nationality
                500000001 153
                500000001 153
                500000001 153
                500000002 153
                500000002 153
                500000002 153
                500000002 153
                500000002 153
                500000033 153
                500000033 153
                500000050 153
                500000050 153
                500000083 153
                500000104 153
                500000104 189
                500000119 153
                500000119 153
                500000119 153
                500000119 153
                500000119 153
                500000119 153
                500000119 153
                500000119 153
                500000121 153
                500000121 153
                500000156 153
                500000156 153
                500000157 153
                500000157 153
                500000157 153
                500000157 153
                500000165 153
                500000165 153
                500000165 153
                500000180 153
                500000198 153
                500000198 153
                500000198 153
                500000201 153
                500000204 153
                500000204 153
                500000212 153
                500000215 153
                500000215 153
                500000215 153
                500000215 153
                500000240 153
                500000240 153
                500000276 153
                500000279 153
                500000284 153
                500000284 153
                500000284 153
                500000284 153
                500000305 153
                500000334 153
                500000341 153
                500000341 153
                500000346 153
                500000346  28
                500000346  63
                500000346 153
                500000346 153
                500000376 153
                500000376 153
                500000395 153
                500000395 153
                500000395 153
                500000395 153
                500000395 153
                500000395 153
                500000465 153
                500000465 153
                500000465 153
                500000470 153
                500000470 153
                500000478 153
                500000525 153
                500000543 153
                500000543 153
                500000565 153
                500000583 153
                500000633 153
                500000633 153
                500000675 153
                500000709 153
                500000748 153
                500000757 153
                500000765 153
                500000765 153
                500000765 153
                500000766 153
                500000766 153
                500000774 153
                500000791 153
                500000791 153
                500000795 153
                500000795 153
                500000795 153
                500000795 153
                end
                label values nationalty
                label def nacio1 28 "BR", modify
                label def nacio1 63 "FR", modify
                label def nacio1 153 "PT", modify
                label def nacio1 189 "UA", modify
                bysort CompanyID (Nationalty) : gen PT_only = Nationalty[1] == 153 & Nationality[_N] == 153
                Example generated by -dataex-. For more info, type help dataex
                clear
                input double NPC_FIC long nacio1 float PT_only
                500000001 153 1
                500000001 153 1
                500000001 153 1
                500000002 153 1
                500000002 153 1
                500000002 153 1
                500000002 153 1
                500000002 153 1
                500000003 . 0
                500000005 . 0
                500000007 . 0
                500000008 . 0
                500000009 . 0
                500000010 . 0
                500000013 . 0
                500000019 . 0
                500000021 . 0
                500000022 . 0
                500000023 . 0
                500000026 . 0
                500000027 . 0
                500000028 . 0
                500000031 . 0
                500000033 153 1
                500000033 153 1
                500000034 . 0
                500000035 . 0
                500000037 . 0
                500000038 . 0
                500000040 . 0
                500000043 . 0
                500000045 . 0
                500000047 . 0
                500000048 . 0
                500000049 . 0
                500000050 153 1
                500000050 153 1
                500000051 . 0
                500000052 . 0
                500000053 . 0
                500000055 . 0
                500000056 . 0
                500000058 . 0
                500000061 . 0
                500000063 . 0
                500000066 . 0
                500000067 . 0
                500000069 . 0
                500000070 . 0
                500000071 . 0
                500000073 . 0
                500000075 . 0
                500000079 . 0
                500000080 . 0
                500000081 . 0
                500000083 153 1
                500000087 . 0
                500000088 . 0
                500000089 . 0
                500000090 . 0
                500000091 . 0
                500000092 . 0
                500000094 . 0
                500000096 . 0
                500000098 . 0
                500000101 . 0
                500000102 . 0
                500000104 153 0
                500000104 189 0
                500000105 . 0
                500000106 . 0
                500000109 . 0
                500000111 . 0
                500000113 . 0
                500000114 . 0
                500000115 . 0
                500000116 . 0
                500000117 . 0
                500000118 . 0
                500000119 153 1
                500000119 153 1
                500000119 153 1
                500000119 153 1
                500000119 153 1
                500000119 153 1
                500000119 153 1
                500000119 153 1
                500000121 153 1
                500000121 153 1
                500000122 . 0
                500000123 . 0
                500000127 . 0
                500000128 . 0
                500000133 . 0
                500000139 . 0
                500000140 . 0
                500000144 . 0
                500000148 . 0
                500000149 . 0
                500000150 . 0
                end
                label values Nationality Nationality
                label def nacio1 153 "PT", modify
                label def nacio1 189 "UA", modify
                [/CODE]


                what do you think?
                Last edited by Paris Rira; 22 Dec 2022, 14:12.

                Comment


                • #9
                  You changed the dataex output before posting it here. So, it won't run without error. There is no variable Nationality to label. If you want to change dataex output, you need either a very good understanding of dataex or to check that your changes do not break the code.


                  Code:
                  clear
                  input double NPC_FIC long nacio1 float PT_only
                  500000001 153 1
                  500000001 153 1
                  500000001 153 1
                  500000002 153 1
                  500000002 153 1
                  500000002 153 1
                  500000002 153 1
                  500000002 153 1
                  500000003 . 0
                  500000005 . 0
                  500000007 . 0
                  500000008 . 0
                  500000009 . 0
                  500000010 . 0
                  500000013 . 0
                  500000019 . 0
                  500000021 . 0
                  500000022 . 0
                  500000023 . 0
                  500000026 . 0
                  500000027 . 0
                  500000028 . 0
                  500000031 . 0
                  500000033 153 1
                  500000033 153 1
                  500000034 . 0
                  500000035 . 0
                  500000037 . 0
                  500000038 . 0
                  500000040 . 0
                  500000043 . 0
                  500000045 . 0
                  500000047 . 0
                  500000048 . 0
                  500000049 . 0
                  500000050 153 1
                  500000050 153 1
                  500000051 . 0
                  500000052 . 0
                  500000053 . 0
                  500000055 . 0
                  500000056 . 0
                  500000058 . 0
                  500000061 . 0
                  500000063 . 0
                  500000066 . 0
                  500000067 . 0
                  500000069 . 0
                  500000070 . 0
                  500000071 . 0
                  500000073 . 0
                  500000075 . 0
                  500000079 . 0
                  500000080 . 0
                  500000081 . 0
                  500000083 153 1
                  500000087 . 0
                  500000088 . 0
                  500000089 . 0
                  500000090 . 0
                  500000091 . 0
                  500000092 . 0
                  500000094 . 0
                  500000096 . 0
                  500000098 . 0
                  500000101 . 0
                  500000102 . 0
                  500000104 153 0
                  500000104 189 0
                  500000105 . 0
                  500000106 . 0
                  500000109 . 0
                  500000111 . 0
                  500000113 . 0
                  500000114 . 0
                  500000115 . 0
                  500000116 . 0
                  500000117 . 0
                  500000118 . 0
                  500000119 153 1
                  500000119 153 1
                  500000119 153 1
                  500000119 153 1
                  500000119 153 1
                  500000119 153 1
                  500000119 153 1
                  500000119 153 1
                  500000121 153 1
                  500000121 153 1
                  500000122 . 0
                  500000123 . 0
                  500000127 . 0
                  500000128 . 0
                  500000133 . 0
                  500000139 . 0
                  500000140 . 0
                  500000144 . 0
                  500000148 . 0
                  500000149 . 0
                  500000150 . 0
                  end
                  label values nacio1 nacio1 
                  label def nacio1 153 "PT", modify
                  label def nacio1 189 "UA", modify
                  
                  format NPC_FIC %10.0f
                  
                  * groups is from the Stata Journal 
                  groups PT_only nacio1 NPC_FIC , missing sepby(PT_only)
                  
                    +------------------------------------------------+
                    | PT_only   nacio1     NPC_FIC   Freq.   Percent |
                    |------------------------------------------------|
                    |       0       PT   500000104       1      1.00 |
                    |       0       UA   500000104       1      1.00 |
                    |       0        .   500000003       1      1.00 |
                    |       0        .   500000005       1      1.00 |
                    |       0        .   500000007       1      1.00 |
                    |       0        .   500000008       1      1.00 |
                    |       0        .   500000009       1      1.00 |
                    |       0        .   500000010       1      1.00 |
                    |       0        .   500000013       1      1.00 |
                    |       0        .   500000019       1      1.00 |
                    |       0        .   500000021       1      1.00 |
                    |       0        .   500000022       1      1.00 |
                    |       0        .   500000023       1      1.00 |
                    |       0        .   500000026       1      1.00 |
                    |       0        .   500000027       1      1.00 |
                    |       0        .   500000028       1      1.00 |
                    |       0        .   500000031       1      1.00 |
                    |       0        .   500000034       1      1.00 |
                    |       0        .   500000035       1      1.00 |
                    |       0        .   500000037       1      1.00 |
                    |       0        .   500000038       1      1.00 |
                    |       0        .   500000040       1      1.00 |
                    |       0        .   500000043       1      1.00 |
                    |       0        .   500000045       1      1.00 |
                    |       0        .   500000047       1      1.00 |
                    |       0        .   500000048       1      1.00 |
                    |       0        .   500000049       1      1.00 |
                    |       0        .   500000051       1      1.00 |
                    |       0        .   500000052       1      1.00 |
                    |       0        .   500000053       1      1.00 |
                    |       0        .   500000055       1      1.00 |
                    |       0        .   500000056       1      1.00 |
                    |       0        .   500000058       1      1.00 |
                    |       0        .   500000061       1      1.00 |
                    |       0        .   500000063       1      1.00 |
                    |       0        .   500000066       1      1.00 |
                    |       0        .   500000067       1      1.00 |
                    |       0        .   500000069       1      1.00 |
                    |       0        .   500000070       1      1.00 |
                    |       0        .   500000071       1      1.00 |
                    |       0        .   500000073       1      1.00 |
                    |       0        .   500000075       1      1.00 |
                    |       0        .   500000079       1      1.00 |
                    |       0        .   500000080       1      1.00 |
                    |       0        .   500000081       1      1.00 |
                    |       0        .   500000087       1      1.00 |
                    |       0        .   500000088       1      1.00 |
                    |       0        .   500000089       1      1.00 |
                    |       0        .   500000090       1      1.00 |
                    |       0        .   500000091       1      1.00 |
                    |       0        .   500000092       1      1.00 |
                    |       0        .   500000094       1      1.00 |
                    |       0        .   500000096       1      1.00 |
                    |       0        .   500000098       1      1.00 |
                    |       0        .   500000101       1      1.00 |
                    |       0        .   500000102       1      1.00 |
                    |       0        .   500000105       1      1.00 |
                    |       0        .   500000106       1      1.00 |
                    |       0        .   500000109       1      1.00 |
                    |       0        .   500000111       1      1.00 |
                    |       0        .   500000113       1      1.00 |
                    |       0        .   500000114       1      1.00 |
                    |       0        .   500000115       1      1.00 |
                    |       0        .   500000116       1      1.00 |
                    |       0        .   500000117       1      1.00 |
                    |       0        .   500000118       1      1.00 |
                    |       0        .   500000122       1      1.00 |
                    |       0        .   500000123       1      1.00 |
                    |       0        .   500000127       1      1.00 |
                    |       0        .   500000128       1      1.00 |
                    |       0        .   500000133       1      1.00 |
                    |       0        .   500000139       1      1.00 |
                    |       0        .   500000140       1      1.00 |
                    |       0        .   500000144       1      1.00 |
                    |       0        .   500000148       1      1.00 |
                    |       0        .   500000149       1      1.00 |
                    |       0        .   500000150       1      1.00 |
                    |------------------------------------------------|
                    |       1       PT   500000001       3      3.00 |
                    |       1       PT   500000002       5      5.00 |
                    |       1       PT   500000033       2      2.00 |
                    |       1       PT   500000050       2      2.00 |
                    |       1       PT   500000083       1      1.00 |
                    |       1       PT   500000119       8      8.00 |
                    |       1       PT   500000121       2      2.00 |
                    +------------------------------------------------+
                  Otherwise I think that the code, modified to use values, not value labels. works exactly as intended. The firms identified as PT_only have only employees who are PT, You can keep those observations and drop the others.

                  Code:
                  keep if PT_only == 1
                  Isn't this what you wanted?

                  Comment


                  • #10
                    it is absolutely right! I changed the labels before posting to be clear enough to statalist here. Please accept my apology for asking simple questions.
                    Your output is correct and different from mine. In my data 500000150 or 500000149 do not exist.
                    How did you generate the table?

                    Comment


                    • #11
                      Please look in your own #8 for those firms: I haven't changed anything from your (second) data example.

                      Code:
                      500000149 . 0
                      500000150 . 0

                      Comment


                      • #12
                        oh, you are right.
                        Last but not least, whats the code for the below lets call it table?
                        +------------------------------------------------+ | PT_only nacio1 NPC_FIC Freq. Percent | |------------------------------------------------| | 0 PT 500000104 1 1.00 | | 0 UA 500000104 1 1.00 | | 0 . 500000003 1 1.00 | | 0 . 500000005 1 1.00 | | 0 . 500000007 1 1.00 | | 0 . 500000008 1 1.00 | | 0 . 500000009 1 1.00 | | 0 . 500000010 1 1.00 | | 0 . 500000013 1 1.00 | | 0 . 500000019 1 1.00 | | 0 . 500000021 1 1.00 | | 0 . 500000022 1 1.00 | | 0 . 500000023 1 1.00 | | 0 . 500000026 1 1.00 | | 0 . 500000027 1 1.00 | | 0 . 500000028 1 1.00 | | 0 . 500000031 1 1.00 | | 0 . 500000034 1 1.00 | | 0 . 500000035 1 1.00 | | 0 . 500000037 1 1.00 | | 0 . 500000038 1 1.00 | | 0 . 500000040 1 1.00 | | 0 . 500000043 1 1.00 | | 0 . 500000045 1 1.00 | | 0 . 500000047 1 1.00 | | 0 . 500000048 1 1.00 | | 0 . 500000049 1 1.00 | | 0 . 500000051 1 1.00 | | 0 . 500000052 1 1.00 | | 0 . 500000053 1 1.00 | | 0 . 500000055 1 1.00 | | 0 . 500000056 1 1.00 | | 0 . 500000058 1 1.00 | | 0 . 500000061 1 1.00 | | 0 . 500000063 1 1.00 | | 0 . 500000066 1 1.00 | | 0 . 500000067 1 1.00 | | 0 . 500000069 1 1.00 | | 0 . 500000070 1 1.00 | | 0 . 500000071 1 1.00 | | 0 . 500000073 1 1.00 | | 0 . 500000075 1 1.00 | | 0 . 500000079 1 1.00 | | 0 . 500000080 1 1.00 | | 0 . 500000081 1 1.00 | | 0 . 500000087 1 1.00 | | 0 . 500000088 1 1.00 | | 0 . 500000089 1 1.00 | | 0 . 500000090 1 1.00 | | 0 . 500000091 1 1.00 | | 0 . 500000092 1 1.00 | | 0 . 500000094 1 1.00 | | 0 . 500000096 1 1.00 | | 0 . 500000098 1 1.00 | | 0 . 500000101 1 1.00 | | 0 . 500000102 1 1.00 | | 0 . 500000105 1 1.00 | | 0 . 500000106 1 1.00 | | 0 . 500000109 1 1.00 | | 0 . 500000111 1 1.00 | | 0 . 500000113 1 1.00 | | 0 . 500000114 1 1.00 | | 0 . 500000115 1 1.00 | | 0 . 500000116 1 1.00 | | 0 . 500000117 1 1.00 | | 0 . 500000118 1 1.00 | | 0 . 500000122 1 1.00 | | 0 . 500000123 1 1.00 | | 0 . 500000127 1 1.00 | | 0 . 500000128 1 1.00 | | 0 . 500000133 1 1.00 | | 0 . 500000139 1 1.00 | | 0 . 500000140 1 1.00 | | 0 . 500000144 1 1.00 | | 0 . 500000148 1 1.00 | | 0 . 500000149 1 1.00 | | 0 . 500000150 1 1.00 | |------------------------------------------------| | 1 PT 500000001 3 3.00 | | 1 PT 500000002 5 5.00 | | 1 PT 500000033 2 2.00 | | 1 PT 500000050 2 2.00 | | 1 PT 500000083 1 1.00 | | 1 PT 500000119 8 8.00 | | 1 PT 500000121 2 2.00 | +------------------------------------------------+

                        Comment


                        • #13
                          Explained. within the code in #9

                          Code:
                           * groups is from the Stata Journal  
                          
                          groups PT_only nacio1 NPC_FIC , missing sepby(PT_only)
                          The meaning of "groups is from the Stata Journal" is to download groups from the Stata Journal website In practice groups as a keyword brings up some false positives, so here is a short-cut

                          Code:
                          .   search st0496, entry
                          
                          Search of official help files, FAQs, Examples, and Stata Journals
                          
                          SJ-18-1 st0496_1  . . . . . . . . . . . . . . . . . Software update for groups
                                  (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                                  Q1/18   SJ 18(1):291
                                  groups exited with an error message if weights were specified;
                                  this has been corrected
                          
                          SJ-17-3 st0496  . . . . .  Speaking Stata: Tables as lists: The groups command
                                  (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                                  Q3/17   SJ 17(3):760--773
                                  presents command for listing group frequencies and percents and
                                  cumulations thereof; for various subsetting and ordering by
                                  frequencies, percents, and so on; for reordering of columns;
                                  and for saving tabulated data to new datasets
                          If you repeat the command

                          Code:
                          search st0496, entry
                          then clicking st0496_1 gives you a link to download files and

                          clicking SJ 17(3):760--773 gives you access to the write-up.

                          Comment


                          • #14
                            its awesome.
                            Many thanks to you, dear Nick.

                            Comment

                            Working...
                            X