Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a dummy variable by group

    This is an example of my data:

    ----------------------- copy starting from the next line -----------------------
    [CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long COM_NAME int ROUND_DATE long(VC_NAME VC_TYPE)
    2 22592 3113 3
    2 22592 1983 13
    3 22420 2529 13
    3 22420 5053 13
    3 22818 5053 13
    3 22818 4349 13
    3 22818 6532 6
    3 22818 2529 13
    3 22818 4277 13
    4 19205 6847 13
    5 21192 2304 8
    5 21192 505 9
    5 22195 4795 13
    5 22195 2304 8
    5 22195 3385 13
    5 22195 370 13
    5 22712 2304 8
    5 22712 6465 9
    5 22712 3654 13
    5 22712 3385 13
    6 19926 1694 9
    7 22834 91 13
    7 22834 2615 13
    7 22834 1125 13
    7 22834 5355 13
    7 22834 5709 13
    7 22834 328 12
    8 15809 2943 13
    9 18995 5254 3
    9 20515 3720 5
    9 20515 5254 3
    9 21444 5254 3
    9 22362 1506 3
    9 22362 2865 3
    9 22362 6985 13
    11 16526 41 13
    11 16679 3525 9
    11 16679 6978 13
    11 16679 41 13
    11 16679 6329 13
    11 16801 41 13
    11 16891 41 13
    11 16891 41 13
    11 16982 41 13
    11 17135 41 13
    11 17317 41 13
    11 17757 6329 13
    11 18133 1518 13
    12 19822 839 1
    12 20607 839 1
    13 22396 20 13
    13 22396 2233 13
    16 17378 5848 13
    16 18627 2443 2
    16 18627 2443 2
    17 14711 1211 13
    17 15132 1211 13
    17 15737 1211 13
    17 15737 1253 2
    20 15085 41 13
    22 19592 3222 2
    23 16915 4088 13
    24 21083 5067 8
    24 21615 5067 8
    24 21615 4488 2
    24 22445 1322 4
    24 22445 1005 9
    24 22445 3568 2
    24 22445 5067 8
    24 22445 4488 2
    24 22445 6940 11
    25 17623 3511 13
    26 22141 2790 13
    27 18917 679 13
    27 19680 679 13
    31 14643 7167 13
    34 14873 514 13
    end
    format %tdnn/dd/CCYY ROUND_DATE
    label values COM_NAME COM_N
    label def COM_N 2 "&Charge GmbH", modify
    label def COM_N 3 "&Open Gifts Ltd", modify
    label def COM_N 4 "+Plugg Srl", modify
    label def COM_N 5 "+Simple.Fr SAS", modify
    label def COM_N 6 "004 GmbH", modify
    label def COM_N 7 "011h Sustainable Construction SL", modify
    label def COM_N 8 "02 Netherlands BV", modify
    label def COM_N 9 "07446749 Ltd", modify
    label def COM_N 11 "1-2-3TV GmbH", modify
    label def COM_N 12 "10-Vins SAS", modify
    label def COM_N 13 "100 Percent IT Ltd", modify
    label def COM_N 16 "1000mercis SA", modify
    label def COM_N 17 "1001 Listes", modify
    label def COM_N 20 "100world AG", modify
    label def COM_N 22 "10Start GmbH", modify
    label def COM_N 23 "10Tacle Studios AG", modify

    In my data, each observation represents an investment in a company (COM_NAME) by a particular investor (VC_NAME). There are different types of investors (VC_TYPE) >> this is a categorical variable. Note that a particular company (COM_NAME) could have multiple observations with the same ROUND_DATE , which represents the investment date, because investors usually tend to syndicate when investing in a company in my data.

    I need to create a dummy variable (CVC) that =1 if a particular company received an investment at any time from ‘type 3 investor’ . For example, VC_TYPE ==3,
    However, I need the dummy variable (CVC) to be equal to 1 in all the observations of the same company (COM_NAME). not just in the one with the VC_TYPE=3.

    I tried to use:
    Code:
    bysort COM_NAME: gen CVC =1 if VC_TY ==3
    But the code didn’t work, as it will only let CVC =1 in one observation that has the VC_TYPE ==3, but not in all other observations with the same COM_NAME

    Could you please help me with a better code?

    Many thanks
    Last edited by Yusra Noorwali; 07 Mar 2023, 05:02.

  • #2
    This kind of problem is discussed in various places, including https://www.stata.com/support/faqs/d...ble-recording/
    and https://www.stata-journal.com/articl...article=dm0055

    You can solve it by

    Code:
    bysort COM_NAME: gen CVC = VC_TY ==3
    
    bysort COM_NAME (CVC) : replace CVC = CVC[_N]
    or by
    Code:
    bysort COM_NAME: egen CVC = max(VC_TY == 3)



    There are two steps here. The first is to see that asking for

    Code:
     1 if VC_TY ==3
    just creates 1s and missings, which is usually not as helpful as asking for
    Code:
    VC_TY ==3
    which creates 1s and 0s directly. More on that at e.g. https://www.stata.com/support/faqs/d...rue-and-false/ or https://www.stata-journal.com/articl...article=dm0099 Naturally you could follow up by something like
    Code:
    replace CVC = 0 if VC_TY != 3
    but it's simpler to use the direct route.
    The second step is to see that

    Code:
     bysort COM_NAME: gen CVC =1 if VC_TY ==3
    does not do anything different from
    Code:
    gen CVC =1 if VC_TY ==3
    any more than at your desk: add 2 + 2 yields a different result from at your friend's desk: add 2 + 2 or in something closer to Stata
    Code:
    by desk : gen four = 2 + 2
    is not going to be different in result from
    Code:
    gen four = 2 + 2
    This is subtle and perhaps surprising, but the best way I can explain it is that all hinges on whether the result of the code after the colon depends in any way on the stated grouping.
    Last edited by Nick Cox; 07 Mar 2023, 05:29.

    Comment

    Working...
    X