Creating one variable without dropping

Paris Rira

Join Date: Dec 2022
Posts: 384

Creating one variable without dropping

13 Apr 2023, 13:24

Hi,

The dataset is based on workers. There are 4 variables.
year:2010-2020
nacio: nationalities
NPC_FIC: firms ID. (Repeating each year). We do not have a unique ID each year. There are around 500,000 first which repeats each year.
re_wage: Wages

There are different nationalities. I am going to create a variable (without dropping) that only shows the wages of specific nationality (PT). I need to know min/max/ mean and std.dev of this variable as well.

Code:

input int year str2 nacio long NPC_FIC float re_wage
2010 "PT" 501195373 237.79385
2010 "PT" 500996349  745.1175
2010 "PT" 501112968   234.783
2010 "UK" 501087261  784.1953
2011 "PT" 501101640 578.58044
2011 "UK" 501052779  456.3653
2011 "PT" 503188955  268.7161
2012 "GW" 501165899 479.83725
2012 "BR" 503249542  720.6148
2012 "PT" 501384409  80.37975
2013 "BR" 503357509  293.8517
2013 "PT" 504103455  628.8788
2014 "PT" 501101765 198.55334
2014 "US" 501052779  440.7233
2014 "PT" 502622516 574.46655
2015 "PT" 501204126  331.4828
2015 "US" 501297955  356.9078
2016 "IR" 502910365  686.1664
2019"PT" 501081112  636.4105
2020 "CN" 503184507  629.1139
2020 "PT" 501139334 105.11755
2020 "PT" 501129929  344.4485
2020 "SP" 501192139  722.0615
2020 "PT" 501130463  726.3924

Any ideas apprecited.

Cheers,
Paris

Tags: None

Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

13 Apr 2023, 15:09

I am going to create a variable (without dropping) that only shows the wages of specific nationality (PT).

Code:

generate wage_pt = re_wage if nacio == "PT"

I need to know min/max/ mean and std.dev of this variable as well.

Code:

summarize wage_pt

It is also possible to get the summary statistics without the middle step:

Code:

summarize re_wage if nacio == "PT"

Last edited by Ken Chui; 13 Apr 2023, 15:14.
1 like
Comment
Paris Rira

Join Date: Dec 2022

Posts: 384
#3

13 Apr 2023, 15:31

Thank you so much. It worked perfectly.
1 like
Comment

Paris Rira

Join Date: Dec 2022
Posts: 384

13 Apr 2023, 15:58

Now, I need to compute the total number of PT ( workers whose nationalities is PT) in each year (2010-2020) for each firm. I used

Code:

egen nnacio = total( nacio == "PT"), by(NPC_FIC year) 
input float nnacio int year long NPC_FIC
1 2020 500000033
2 2020 500000073
2 2020 500000073
2 2020 500000083
2 2020 500000083
1 2020 500000101
1 2020 500000156
1 2020 500000204
2 2020 500000232
2 2020 500000232
2 2020 500000240
2 2020 500000240
5 2020 500000283
5 2020 500000283
5 2020 500000283
5 2020 500000283
5 2020 500000283
3 2020 500000284
3 2020 500000284
3 2020 500000284
2 2020 500000286
2 2020 500000286
5 2019 500000294
5 2019 500000294
5 2019 500000294
5 2019 500000294
5 2019 500000294
2 2020 500000346
2 2020 500000346
4 2020 500000395
4 2020 500000395
4 2020 500000395
4 2020 500000395
1 2020 500000431
2 2020 500000470
2 2020 500000470
1 2020 500000478

As you can see firms ID and nnacio are repeating. How can I keep one nnacio for each firm? I mean, for example, firm 500000284 repeats 3 times. Instead of 3 times how can it appear once?

Comment

Ken Chui

Join Date: Aug 2014

Posts: 1058
#5

13 Apr 2023, 16:06

If the "without dropping" rule still applies, you may try:

Code:

bysort NPC_FIC year nacio: replace nnacio = . if _n > 1

If it is fine to drop cases, then the collapse command will also work.
1 like
Comment

Paris Rira

Join Date: Dec 2022
Posts: 384

13 Apr 2023, 17:10

Dear Ken,

Code:

collapse (sum) nnacio, by (year NPC_FIC CCPCodes)

Originally posted by Ken Chui View Post

If it is fine to drop cases, then the collapse command will also work.

Actually I need to build a variable with this definition, namely: Share_native= Share of natives (workers with the nationality of PT) in White-collar jobs ( work_col=1 ) relative to the total employment of a firm
and the question to drop or keep obs depends on the creation of this variable whether dropping may hurt it or not.

Code:

clear
input int year long NPC_FIC float(national nnacio work_col)
2020 500000033 1 1 2
2020 500000073 1 . 1
2020 500000073 1 2 2
2020 500000083 1 . 2
2020 500000083 1 2 2
2020 500000101 1 1 2
2020 500000156 1 1 2
2020 500000204 1 1 2
2020 500000232 1 2 2
2020 500000232 1 . 2
2020 500000240 1 . 2
2020 500000240 1 2 2
2020 500000283 1 . 2
2020 500000283 1 . 2
2020 500000283 1 . 2
2020 500000283 1 . 2
2020 500000283 1 5 2
2020 500000284 1 . 2
2020 500000284 1 3 2
2020 500000284 1 . 2
2020 500000286 1 . 1
2020 500000286 1 2 2
2020 500000294 1 5 1
2020 500000294 1 . 2
2020 500000294 1 . 2
2020 500000294 1 . 2
2020 500000294 1 . 2
2020 500000346 1 2 2
2020 500000346 1 . 2
2020 500000395 1 . 1
2020 500000395 1 4 1
2020 500000395 1 . 1
2020 500000395 1 . 1
2020 500000431 1 1 2
2020 500000470 1 2 2
2020 500000470 1 . 2
2020 500000478 1 1 2
2020 500000554 1 1 2
2020 500000565 1 2 2
2020 500000565 1 . 2
2020 500000600 1 2 2
2020 500000600 1 . 2
2020 500000601 1 1 2
2020 500000633 1 . 2
2020 500000633 1 . 2
2020 500000633 1 . 2
2020 500000633 1 . 2
2020 500000633 1 . 2
2020 500000633 1 . 2
2020 500000633 1 . 2
2020 500000633 1 8 2
2020 500000748 1 . 2
2020 500000748 1 2 2
2020 500000761 1 . 2
2020 500000761 1 . 2
2020 500000761 1 3 2
2020 500000766 1 1 2
2020 500000774 1 1 2
2020 500000843 1 1 2
2020 500000863 1 . 1
2020 500000863 0 6 2
2020 500000863 1 6 2
2020 500000863 1 . 2
2020 500000863 1 . 2
2020 500000863 1 . 2
2020 500000863 1 . 2
2020 500000901 1 1 2
2020 500000918 1 2 2
2020 500000918 1 . 2
2020 500001002 1 1 2
2020 500001003 1 1 2
2020 500001011 1 . 1
2020 500001011 1 . 1
2020 500001011 1 7 2
2020 500001011 0 7 2
2020 500001028 1 2 2
2020 500001040 1 . 2
2020 500001040 1 2 2
2020 500001057 1 1 2
2020 500001066 1 1 2
2020 500001096 1 1 2
2020 500001229 1 1 2
2020 500001272 1 1 2
2020 500001275 1 1 2
2020 500001281 1 2 2
2020 500001281 1 . 2
2020 500001453 1 . 2
2020 500001453 1 . 2
2020 500001453 1 . 2
2020 500001453 1 4 2

Do you have any idea of making this variable?

Comment

Ken Chui

Join Date: Aug 2014

Posts: 1058
#7

13 Apr 2023, 18:30

What does the variable "national" stand for?
Comment
Paris Rira

Join Date: Dec 2022

Posts: 384
#8

14 Apr 2023, 05:51

ah sorry. I made a little change.

Code:

g national=nacio=="PT"

I divided workers to two groups. One who is PT another group who is not PT.
Comment

Ken Chui

Join Date: Aug 2014
Posts: 1058

14 Apr 2023, 06:24

Collapsing is one way to get that data:

Code:

collapse (mean) fraction_pt = national (sum) total_pt = national, by(year NPC_FIC work_col)
gen percent_pt = fraction_pt * 100
list if work_col == 1, sep(0)

Comment

Paris Rira

Join Date: Dec 2022

Posts: 384
#10

14 Apr 2023, 06:34

This gives the share of PT relative to all employees. What I need is *the share of PT workers only in White-collar jobs (work_col=1) relative to the whole employment of the firms*
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#11

14 Apr 2023, 06:45

Manipulate the items inside the collapse command to get the desired numbers. I couldn't clearly understand the question.

The statement "the share of PT workers only in White-collar jobs (work_col=1) relative to the whole employment of the firms" does not make sense to me because there are two denominators (... of white collar) & (... of the whole employment). It may be more efficient if you can supply some actual calculated results to demonstrate what is meant by that.
1 like
Comment
Paris Rira

Join Date: Dec 2022

Posts: 384
#12

14 Apr 2023, 06:50

You are right.
I believe that I can break this question into:

First: the share of PT workers in white-collar jobs in each firm.
Second: computing the total employment of each firm.
Afterwards, First / Second
Comment

Paris Rira

Join Date: Dec 2022
Posts: 384

#13

14 Apr 2023, 07:15

Code:

**First**the share of PT workers in white-collar jobs in each firm.**
egen white_PT=total(work_col==1) if (national ==1), by(year NPC_FIC)
egen white_nonPT=total(work_col==1) if (national !=1), by(year NPC_FIC)
g tot_white=white_PT+ white_nonPT
g share_white_PT= white_PT/tot_white

**Second**Total employment of each firm****
egen PT = total (national ==1), by(year NPC_FIC)
egen non_PT = total (national != 1 ), by(year NPC_FIC)
g tot_employment=non_PT+PT

The second part is correct, though the first code does not work. It creates missing obs. So I cannot make --tot_white.

Code:

input float(white_PT white_nonPT)
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
. 0
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
0 .
. 0
0 .

Do you have any ideas? Thanks.

Last edited by Paris Rira; 14 Apr 2023, 08:00.

Announcement

Creating one variable without dropping

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment