Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • computing the number of one variable based on others

    Hi Dear Profs and Colleagues,

    I am going to obtain the number of immigrants in firm i, year t, sector s, and region r. What I seek is to know how many immigrant (nacio~=PT) exist in firm i sector s and region r.

    firms ID: NPC_FIC
    Nationalities: nacio for immigrants I consider nacio~="PT"


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(NPC_FIC year) float(sector region) str2 nacio
    500979142 2010  3 1 "PT"
    500979162 2010  7 1 "PT"
    500979168 2010  3 0 "PT"
    500979172 2010  3 0 "PT"
    500979182 2010  3 0 "PT"
    500979184 2010  7 1 "PT"
    500979206 2010  7 4 "PT"
    500979220 2010  7 1 "PT"
    500979239 2010  3 1 "PT"
    500979255 2010  3 3 "PT"
    500979261 2010  3 0 "PT"
    500979266 2010  3 1 "PT"
    500979269 2010  3 3 "PT"
    500979271 2010  7 1 "PT"
    500979383 2010  3 1 "PT"
    500979456 2010  3 1 "PT"
    500979474 2010  3 3 "PT"
    500979489 2010  7 1 "PT"
    500979491 2010  3 1 "PT"
    500979522 2010  3 1 "PT"
    500979526 2010  3 3 "PT"
    500979530 2010  3 5 "PT"
    500979532 2010  3 5 "PT"
    500979537 2010  3 3 "PT"
    500979543 2010  3 5 "PT"
    500979551 2010  3 3 "PT"
    500979561 2010  3 1 "PT"
    500979563 2010  3 3 "PT"
    500979564 2010  3 1 "PT"
    500979566 2010  3 4 "PT"
    500979567 2010  3 1 "PT"
    500979569 2010  3 3 "PT"
    500979596 2010  3 3 "PT"
    500979634 2010  7 4 "PT"
    500979705 2010  7 0 "PT"
    500979725 2010  3 0 "PT"
    500979737 2010  7 4 "PT"
    500979740 2010  3 3 "PT"
    500979750 2010  3 3 "PT"
    500979758 2010  3 3 "PT"
    500979797 2010  3 0 "PT"
    500979809 2010  7 1 "PT"
    500979820 2010  7 3 "PT"
    500979825 2010  3 3 "PT"
    500979874 2010  7 4 "PT"
    500979890 2010  3 4 "PT"
    500979959 2010  7 4 "PT"
    500979964 2010  9 4 "PT"
    500979983 2010  7 0 "PT"
    500979987 2010  3 1 "PT"
    500980019 2010  6 0 "PT"
    500980048 2010  3 1 "PT"
    500980053 2010  3 1 "PT"
    500980104 2010  7 1 "PT"
    500980110 2010  7 4 "PT"
    500980138 2010  7 1 "PT"
    500980167 2010  3 1 "PT"
    500980175 2010  3 3 "PT"
    500980231 2010  6 1 "PT"
    500980233 2010  3 0 "PT"
    500980243 2010  3 3 "PT"
    500980247 2010  7 1 "PT"
    500980265 2010  3 4 "PT"
    500980319 2010  3 3 "PT"
    500980385 2010  7 1 "PT"
    500980401 2010  3 1 "PT"
    500980402 2010  3 4 "PT"
    500980415 2010  7 4 "PT"
    500980421 2010  7 1 "PT"
    500980485 2010  7 3 "PT"
    500980507 2010  7 0 "PT"
    500980532 2010  3 1 "PT"
    500980598 2010  3 5 "PT"
    500980626 2010  3 1 "PT"
    500980638 2010  6 0 "PT"
    500980722 2010  7 0 "PT"
    500980854 2010 11 0 "PT"
    500980855 2010  7 0 "PT"
    500980873 2010  7 3 "PT"
    500980898 2010  8 1 "PT"
    500980903 2010  7 4 "PT"
    500980907 2010  3 3 "PT"
    500980935 2010  3 4 "UA"
    500980936 2010  3 3 "PT"
    500980938 2010  3 3 "PT"
    500980947 2010  7 4 "PT"
    500980992 2010  3 3 "PT"
    500981007 2010  7 0 "PT"
    500981020 2010  3 3 "PT"
    500981066 2010  3 1 "PT"
    500981084 2010  7 0 "PT"
    500981094 2010  3 1 "PT"
    500981140 2010  2 3 "PT"
    500981156 2010  3 0 "PT"
    500981164 2010  7 1 "PT"
    500981265 2010  8 4 "PT"
    500981281 2010  7 1 "PT"
    500981292 2010  7 0 "PT"
    500981317 2010  7 0 "PT"
    500981318 2010  7 1 "PT"
    end
    
    by NPC_FIC year (sector region) ,sort: gen n_imm = sum(nacio ~= "PT")
    this gives values 1 and 0.
    Any assistance is appreciated.

    Cheers,
    Paris

  • #2
    You need egen and its total() function. Outside egen, sum() is a cumulative or running sum and not the same thing. Yes; that is confusing until you are used to it.

    Comment


    • #3
      Code:
       by NPC_FIC (sector region),sort: egen n_imm = total(nacio ~= "PT")
      
      . tab n_imm
      
            n_imm |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |      6,682       96.17       96.17
                1 |        266        3.83      100.00
      ------------+-----------------------------------
            Total |      6,948      100.00
      Still it produces 0 and 1

      Comment


      • #4
        That's down, I guess, to feeding individual observations to the function. As they are 1 or 0, so also their totals are. You presumably need different by: arguments.

        Comment


        • #5
          Trying to state Nick's point in another way, your example data has at most one observation per firm, and that apparently is true of your full data, which is why the sum for each firm is one or zero.

          Are you expecting one observation per employee instead of one observation per firm?

          Is there a variable you forgot to include in your example data that gives the number of employees in firm i, year t, sector s, region r. and nacio n?

          Note also that you say you want "how many immigrant (nacio~=PT) exist in firm i sector s and region r" but that will count the same employees repeatedly if they are employed on more than one year.

          Comment


          • #6
            Originally posted by William Lisowski View Post
            Trying to state Nick's point in another way, your example data has at most one observation per firm, and that apparently is true of your full data, which is why the sum for each firm is one or zero.
            Oh My Goodness, I was thinking almost the whole day about what is wrong with the data, Yes, I have to consider them at the Worker level, but I put at the Firm level. Once I corrected it, It worked. Really appreciated for flagging that.
            [CODE]
            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input double NPC_FIC str2 nacio
            500008310 "PT"
            500027353 "PT"
            500027353 "PT"
            500027353 "PT"
            500049799 "PT"
            500049799 "UA"
            500050215 "PT"
            500050215 "PT"
            500135017 "PT"
            500135017 "PT"
            500135017 "PT"
            500135017 "BR"
            500135017 "PT"
            500135017 "PT"
            500135017 "PT"
            500139105 "PT"
            500139105 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500154496 "PT"
            500205042 "PT"
            500205042 "PT"
            500205042 "BR"
            500205042 "AO"
            500206942 "PT"
            500206942 "PT"
            500206942 "PT"
            500251226 "PT"
            500251226 "PT"
            500265754 "PT"
            500321035 "PT"
            500321035 "PT"
            500321035 "PT"
            500369375 "PT"
            500369375 "PT"
            500392001 "PT"
            500392920 "PT"
            500422489 "PT"
            500458395 "PT"
            500508011 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "RU"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "VE"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            500979142 "PT"
            end
            
             by NPC_FIC,sort: egen n_imm = total(nacio ~= "PT")
            
            
            . sum  n_imm
            
                Variable |        Obs        Mean    Std. dev.       Min        Max
            -------------+---------------------------------------------------------
                   n_imm |    868,310    85.46004     216.944          0       1106
            One more thing, I am going to make Log of this variable, n_imm

            Code:
             g l_n_imm= log(n_imm)
            (258,542 missing values generated)
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float(n_imm l_n_imm)
            0         .
            0         .
            0         .
            0         .
            1         0
            1         0
            0         .
            0         .
            1         0
            1         0
            1         0
            1         0
            1         0
            1         0
            1         0
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            2  .6931472
            2  .6931472
            2  .6931472
            2  .6931472
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            0         .
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            6 1.7917595
            end

            There are some 0 variables whose Logs are ". "
            I wonder shall I leave them as they are or better to drop them off before estimation.

            Originally posted by William Lisowski View Post
            Note also that you say you want "how many immigrant (nacio~=PT) exist in firm i sector s and region r"
            Actually I will comput for each year I should have written that I need to determine the number of immigrants in each firm each sector each region and each year. Because I am working only for one year now (I have not accessed the rest yet).

            Comment


            • #7
              Logarithm of zero is undefined. The answer to the new question in #6 depends crucially on why you think you need a logarithm here and could be anything from

              zeros are irrelevant for that purpose, so don't worry

              to

              really, don't do that: logarithms make no sense here.

              Comment


              • #8
                This variable, the number of immigrants in each firm, n_imm, is one of the dependent variables in my model.
                The answer to your question about the necessity of using the Log, can be my own intuition plus the related literature.

                Comment


                • #9
                  If you mean that mean number of migrants is to be thought of on log scale, the answer lies in Poisson regression or some other generalized linear model and not at all in a log transformation, which would make much of your data useless, as you have already seen.

                  I can't follow what scale such modelling is to be carried out. You have nearly 1 million firms???

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    I can't follow what scale such modelling is to be carried out. You have nearly 1 million firms???
                    Each year there are almost 29,000 firms and 800000 plants.Enable GingerCannot connect to Ginger Check your internet connection
                    or reload the browserDisable in this text fieldRephraseRephrase current sentence4Edit in GingerĂ—

                    Comment

                    Working...
                    X