I have this data:
----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input long COM_NAME int ROUND_DATE long STAGE
1 22592 2
1 22592 2
2 22420 3
2 22420 3
2 22818 3
2 22818 3
2 22818 3
2 22818 3
2 22818 3
3 19205 6
4 21192 2
4 21192 2
4 22195 3
4 22195 3
4 22195 3
4 22195 3
4 22712 3
4 22712 3
4 22712 3
4 22712 3
5 19926 4
6 22834 2
6 22834 2
6 22834 2
6 22834 2
6 22834 2
6 22834 2
7 15809 1
8 18995 2
8 20515 3
8 20515 3
8 21444 3
8 22362 4
8 22362 4
8 22362 4
end
format %tdnn/dd/CCYY ROUND_DATE
label values COM_NAME COM_N
label def COM_N 1 "&Charge GmbH", modify
label def COM_N 2 "&Open Gifts Ltd", modify
label def COM_N 3 "+Plugg Srl", modify
label def COM_N 4 "+Simple.Fr SAS", modify
label def COM_N 5 "004 GmbH", modify
label def COM_N 6 "011h Sustainable Construction SL", modify
label def COM_N 7 "02 Netherlands BV", modify
label def COM_N 8 "07446749 Ltd", modify
label def COM_N 10 "1-2-3TV GmbH", modify
label def COM_N 11 "10-Vins SAS", modify
label def COM_N 12 "100 Percent IT Ltd", modify
label def COM_N 15 "1000mercis SA", modify
label def COM_N 16 "1001 Listes", modify
label def COM_N 19 "100world AG", modify
label def COM_N 20 "10Start GmbH", modify
label def COM_N 21 "10Tacle Studios AG", modify
I want to create a dummy variable ‘Early’ that equals 1 only if the ‘STAGE’ variable in the first (oldest) entry (raw) only is categorized as ‘Startup/Seed’ or ‘Early stage’ and zero otherwise (means if STAGE = 2 or 6). In other words, this is how the file should look like :
----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input long COM_NAME int ROUND_DATE long STAGE float Early
1 22592 2 1
1 22592 2 1
2 22420 3 0
2 22420 3 0
2 22818 3 0
2 22818 3 0
2 22818 3 0
2 22818 3 0
2 22818 3 0
3 19205 6 1
4 21192 2 1
4 21192 2 1
4 22195 3 1
4 22195 3 1
4 22195 3 1
4 22195 3 1
4 22712 3 1
4 22712 3 1
4 22712 3 1
4 22712 3 1
5 19926 4 0
6 22834 2 1
6 22834 2 1
6 22834 2 1
6 22834 2 1
6 22834 2 1
6 22834 2 1
7 15809 1 0
8 18995 2 1
8 20515 3 1
8 20515 3 1
8 21444 3 1
8 22362 4 1
8 22362 4 1
8 22362 4 1
end
format %tdnn/dd/CCYY ROUND_DATE
label values COM_NAME COM_N
label def COM_N 1 "&Charge GmbH", modify
label def COM_N 2 "&Open Gifts Ltd", modify
label def COM_N 3 "+Plugg Srl", modify
label def COM_N 4 "+Simple.Fr SAS", modify
label def COM_N 5 "004 GmbH", modify
label def COM_N 6 "011h Sustainable Construction SL", modify
label def COM_N 7 "02 Netherlands BV", modify
label def COM_N 8 "07446749 Ltd", modify
label def COM_N 10 "1-2-3TV GmbH", modify
label def COM_N 11 "10-Vins SAS", modify
label def COM_N 12 "100 Percent IT Ltd", modify
label def COM_N 15 "1000mercis SA", modify
label def COM_N 16 "1001 Listes", modify
label def COM_N 19 "100world AG", modify
label def COM_N 20 "10Start GmbH", modify
label def COM_N 21 "10Tacle Studios AG", modify
I tried to create other data file that has these variables only: COM_NAME and ROUND_DATE and STAGE. Then I used these commands to keep the first observation only:
then merge this data file with the original file using "m:1" command
but then the code keeps only 18354 observations in COM_NAME while I know from my data that there should be 25890 observations (unique company names).
is there any other easier accurate way to do what I need?
Many thanks[
----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input long COM_NAME int ROUND_DATE long STAGE
1 22592 2
1 22592 2
2 22420 3
2 22420 3
2 22818 3
2 22818 3
2 22818 3
2 22818 3
2 22818 3
3 19205 6
4 21192 2
4 21192 2
4 22195 3
4 22195 3
4 22195 3
4 22195 3
4 22712 3
4 22712 3
4 22712 3
4 22712 3
5 19926 4
6 22834 2
6 22834 2
6 22834 2
6 22834 2
6 22834 2
6 22834 2
7 15809 1
8 18995 2
8 20515 3
8 20515 3
8 21444 3
8 22362 4
8 22362 4
8 22362 4
end
format %tdnn/dd/CCYY ROUND_DATE
label values COM_NAME COM_N
label def COM_N 1 "&Charge GmbH", modify
label def COM_N 2 "&Open Gifts Ltd", modify
label def COM_N 3 "+Plugg Srl", modify
label def COM_N 4 "+Simple.Fr SAS", modify
label def COM_N 5 "004 GmbH", modify
label def COM_N 6 "011h Sustainable Construction SL", modify
label def COM_N 7 "02 Netherlands BV", modify
label def COM_N 8 "07446749 Ltd", modify
label def COM_N 10 "1-2-3TV GmbH", modify
label def COM_N 11 "10-Vins SAS", modify
label def COM_N 12 "100 Percent IT Ltd", modify
label def COM_N 15 "1000mercis SA", modify
label def COM_N 16 "1001 Listes", modify
label def COM_N 19 "100world AG", modify
label def COM_N 20 "10Start GmbH", modify
label def COM_N 21 "10Tacle Studios AG", modify
I want to create a dummy variable ‘Early’ that equals 1 only if the ‘STAGE’ variable in the first (oldest) entry (raw) only is categorized as ‘Startup/Seed’ or ‘Early stage’ and zero otherwise (means if STAGE = 2 or 6). In other words, this is how the file should look like :
----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input long COM_NAME int ROUND_DATE long STAGE float Early
1 22592 2 1
1 22592 2 1
2 22420 3 0
2 22420 3 0
2 22818 3 0
2 22818 3 0
2 22818 3 0
2 22818 3 0
2 22818 3 0
3 19205 6 1
4 21192 2 1
4 21192 2 1
4 22195 3 1
4 22195 3 1
4 22195 3 1
4 22195 3 1
4 22712 3 1
4 22712 3 1
4 22712 3 1
4 22712 3 1
5 19926 4 0
6 22834 2 1
6 22834 2 1
6 22834 2 1
6 22834 2 1
6 22834 2 1
6 22834 2 1
7 15809 1 0
8 18995 2 1
8 20515 3 1
8 20515 3 1
8 21444 3 1
8 22362 4 1
8 22362 4 1
8 22362 4 1
end
format %tdnn/dd/CCYY ROUND_DATE
label values COM_NAME COM_N
label def COM_N 1 "&Charge GmbH", modify
label def COM_N 2 "&Open Gifts Ltd", modify
label def COM_N 3 "+Plugg Srl", modify
label def COM_N 4 "+Simple.Fr SAS", modify
label def COM_N 5 "004 GmbH", modify
label def COM_N 6 "011h Sustainable Construction SL", modify
label def COM_N 7 "02 Netherlands BV", modify
label def COM_N 8 "07446749 Ltd", modify
label def COM_N 10 "1-2-3TV GmbH", modify
label def COM_N 11 "10-Vins SAS", modify
label def COM_N 12 "100 Percent IT Ltd", modify
label def COM_N 15 "1000mercis SA", modify
label def COM_N 16 "1001 Listes", modify
label def COM_N 19 "100world AG", modify
label def COM_N 20 "10Start GmbH", modify
label def COM_N 21 "10Tacle Studios AG", modify
I tried to create other data file that has these variables only: COM_NAME and ROUND_DATE and STAGE. Then I used these commands to keep the first observation only:
Code:
sort COM_NAME ROUND_DATE by COM_NAME: keep if _n==1
then merge this data file with the original file using "m:1" command
but then the code keeps only 18354 observations in COM_NAME while I know from my data that there should be 25890 observations (unique company names).
is there any other easier accurate way to do what I need?
Many thanks[

Comment