Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating one variable without dropping

    Hi,

    The dataset is based on workers. There are 4 variables.
    year:2010-2020
    nacio: nationalities
    NPC_FIC: firms ID. (Repeating each year). We do not have a unique ID each year. There are around 500,000 first which repeats each year.
    re_wage: Wages

    There are different nationalities. I am going to create a variable (without dropping) that only shows the wages of specific nationality (PT). I need to know min/max/ mean and std.dev of this variable as well.

    Code:
    input int year str2 nacio long NPC_FIC float re_wage
    2010 "PT" 501195373 237.79385
    2010 "PT" 500996349  745.1175
    2010 "PT" 501112968   234.783
    2010 "UK" 501087261  784.1953
    2011 "PT" 501101640 578.58044
    2011 "UK" 501052779  456.3653
    2011 "PT" 503188955  268.7161
    2012 "GW" 501165899 479.83725
    2012 "BR" 503249542  720.6148
    2012 "PT" 501384409  80.37975
    2013 "BR" 503357509  293.8517
    2013 "PT" 504103455  628.8788
    2014 "PT" 501101765 198.55334
    2014 "US" 501052779  440.7233
    2014 "PT" 502622516 574.46655
    2015 "PT" 501204126  331.4828
    2015 "US" 501297955  356.9078
    2016 "IR" 502910365  686.1664
    2019"PT" 501081112  636.4105
    2020 "CN" 503184507  629.1139
    2020 "PT" 501139334 105.11755
    2020 "PT" 501129929  344.4485
    2020 "SP" 501192139  722.0615
    2020 "PT" 501130463  726.3924

    Any ideas apprecited.

    Cheers,
    Paris

  • #2
    I am going to create a variable (without dropping) that only shows the wages of specific nationality (PT).
    Code:
    generate wage_pt = re_wage if nacio == "PT"
    I need to know min/max/ mean and std.dev of this variable as well.
    Code:
    summarize wage_pt
    It is also possible to get the summary statistics without the middle step:

    Code:
    summarize re_wage if nacio == "PT"
    Last edited by Ken Chui; 13 Apr 2023, 15:14.

    Comment


    • #3
      Thank you so much. It worked perfectly.

      Comment


      • #4
        Now, I need to compute the total number of PT ( workers whose nationalities is PT) in each year (2010-2020) for each firm. I used

        Code:
        egen nnacio = total( nacio == "PT"), by(NPC_FIC year) 
        input float nnacio int year long NPC_FIC
        1 2020 500000033
        2 2020 500000073
        2 2020 500000073
        2 2020 500000083
        2 2020 500000083
        1 2020 500000101
        1 2020 500000156
        1 2020 500000204
        2 2020 500000232
        2 2020 500000232
        2 2020 500000240
        2 2020 500000240
        5 2020 500000283
        5 2020 500000283
        5 2020 500000283
        5 2020 500000283
        5 2020 500000283
        3 2020 500000284
        3 2020 500000284
        3 2020 500000284
        2 2020 500000286
        2 2020 500000286
        5 2019 500000294
        5 2019 500000294
        5 2019 500000294
        5 2019 500000294
        5 2019 500000294
        2 2020 500000346
        2 2020 500000346
        4 2020 500000395
        4 2020 500000395
        4 2020 500000395
        4 2020 500000395
        1 2020 500000431
        2 2020 500000470
        2 2020 500000470
        1 2020 500000478
        As you can see firms ID and nnacio are repeating. How can I keep one nnacio for each firm? I mean, for example, firm 500000284 repeats 3 times. Instead of 3 times how can it appear once?

        Comment


        • #5
          If the "without dropping" rule still applies, you may try:

          Code:
          bysort NPC_FIC year nacio: replace nnacio = . if _n > 1
          If it is fine to drop cases, then the collapse command will also work.

          Comment


          • #6

            Dear Ken,
            Code:
            collapse (sum) nnacio, by (year NPC_FIC CCPCodes)
            Originally posted by Ken Chui View Post

            If it is fine to drop cases, then the collapse command will also work.
            Actually I need to build a variable with this definition, namely: Share_native= Share of natives (workers with the nationality of PT) in White-collar jobs ( work_col=1 ) relative to the total employment of a firm
            and the question to drop or keep obs depends on the creation of this variable whether dropping may hurt it or not.

            Code:
            clear
            input int year long NPC_FIC float(national nnacio work_col)
            2020 500000033 1 1 2
            2020 500000073 1 . 1
            2020 500000073 1 2 2
            2020 500000083 1 . 2
            2020 500000083 1 2 2
            2020 500000101 1 1 2
            2020 500000156 1 1 2
            2020 500000204 1 1 2
            2020 500000232 1 2 2
            2020 500000232 1 . 2
            2020 500000240 1 . 2
            2020 500000240 1 2 2
            2020 500000283 1 . 2
            2020 500000283 1 . 2
            2020 500000283 1 . 2
            2020 500000283 1 . 2
            2020 500000283 1 5 2
            2020 500000284 1 . 2
            2020 500000284 1 3 2
            2020 500000284 1 . 2
            2020 500000286 1 . 1
            2020 500000286 1 2 2
            2020 500000294 1 5 1
            2020 500000294 1 . 2
            2020 500000294 1 . 2
            2020 500000294 1 . 2
            2020 500000294 1 . 2
            2020 500000346 1 2 2
            2020 500000346 1 . 2
            2020 500000395 1 . 1
            2020 500000395 1 4 1
            2020 500000395 1 . 1
            2020 500000395 1 . 1
            2020 500000431 1 1 2
            2020 500000470 1 2 2
            2020 500000470 1 . 2
            2020 500000478 1 1 2
            2020 500000554 1 1 2
            2020 500000565 1 2 2
            2020 500000565 1 . 2
            2020 500000600 1 2 2
            2020 500000600 1 . 2
            2020 500000601 1 1 2
            2020 500000633 1 . 2
            2020 500000633 1 . 2
            2020 500000633 1 . 2
            2020 500000633 1 . 2
            2020 500000633 1 . 2
            2020 500000633 1 . 2
            2020 500000633 1 . 2
            2020 500000633 1 8 2
            2020 500000748 1 . 2
            2020 500000748 1 2 2
            2020 500000761 1 . 2
            2020 500000761 1 . 2
            2020 500000761 1 3 2
            2020 500000766 1 1 2
            2020 500000774 1 1 2
            2020 500000843 1 1 2
            2020 500000863 1 . 1
            2020 500000863 0 6 2
            2020 500000863 1 6 2
            2020 500000863 1 . 2
            2020 500000863 1 . 2
            2020 500000863 1 . 2
            2020 500000863 1 . 2
            2020 500000901 1 1 2
            2020 500000918 1 2 2
            2020 500000918 1 . 2
            2020 500001002 1 1 2
            2020 500001003 1 1 2
            2020 500001011 1 . 1
            2020 500001011 1 . 1
            2020 500001011 1 7 2
            2020 500001011 0 7 2
            2020 500001028 1 2 2
            2020 500001040 1 . 2
            2020 500001040 1 2 2
            2020 500001057 1 1 2
            2020 500001066 1 1 2
            2020 500001096 1 1 2
            2020 500001229 1 1 2
            2020 500001272 1 1 2
            2020 500001275 1 1 2
            2020 500001281 1 2 2
            2020 500001281 1 . 2
            2020 500001453 1 . 2
            2020 500001453 1 . 2
            2020 500001453 1 . 2
            2020 500001453 1 4 2
            Do you have any idea of making this variable?

            Comment


            • #7
              What does the variable "national" stand for?

              Comment


              • #8
                ah sorry. I made a little change.
                Code:
                g national=nacio=="PT"
                I divided workers to two groups. One who is PT another group who is not PT.

                Comment


                • #9
                  Collapsing is one way to get that data:

                  Code:
                  collapse (mean) fraction_pt = national (sum) total_pt = national, by(year NPC_FIC work_col)
                  gen percent_pt = fraction_pt * 100
                  list if work_col == 1, sep(0)

                  Comment


                  • #10
                    This gives the share of PT relative to all employees. What I need is *the share of PT workers only in White-collar jobs (work_col=1) relative to the whole employment of the firms*

                    Comment


                    • #11
                      Manipulate the items inside the collapse command to get the desired numbers. I couldn't clearly understand the question.

                      The statement "the share of PT workers only in White-collar jobs (work_col=1) relative to the whole employment of the firms" does not make sense to me because there are two denominators (... of white collar) & (... of the whole employment). It may be more efficient if you can supply some actual calculated results to demonstrate what is meant by that.

                      Comment


                      • #12
                        You are right.
                        I believe that I can break this question into:

                        First: the share of PT workers in white-collar jobs in each firm.
                        Second: computing the total employment of each firm.
                        Afterwards, First / Second

                        Comment


                        • #13
                          Code:
                          **First**the share of PT workers in white-collar jobs in each firm.**
                          egen white_PT=total(work_col==1) if (national ==1), by(year NPC_FIC)
                          egen white_nonPT=total(work_col==1) if (national !=1), by(year NPC_FIC)
                          g tot_white=white_PT+ white_nonPT
                          g share_white_PT= white_PT/tot_white
                          
                          **Second**Total employment of each firm****
                          egen PT = total (national ==1), by(year NPC_FIC)
                          egen non_PT = total (national != 1 ), by(year NPC_FIC)
                          g tot_employment=non_PT+PT
                          The second part is correct, though the first code does not work. It creates missing obs. So I cannot make --tot_white.
                          Code:
                          input float(white_PT white_nonPT)
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          . 0
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          0 .
                          . 0
                          0 .
                          Do you have any ideas? Thanks.
                          Last edited by Paris Rira; 14 Apr 2023, 08:00.

                          Comment

                          Working...
                          X