Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping over variables to see who left the firm

    Dear all,

    I'm struggling with creating a code that loops over variables values (row) to create a dummy variable to indicate whether a person left or not.
    I have data for companies from 1994-2010 and they list up to 249 employees.
    My data looks as follows:

    year organization exec_name1 exec_name2 .... exec_name249
    1994 B Albert Jim
    1995 B Albert Jim
    1996 B Jim
    ...
    2010 B Jim
    1994 C etc.


    I want to create a dummy for exec_name1, goneclientexec_name1 that is 0 if the employee is working in year +1 at the same firm and 1 if he/she is not (so his/her name does not appear in row [n+1] in variables exec_name1-exec_name249.

    I wrote the following code:

    foreach v of varlist exec_name1-exec_name249 {
    gen clientgone`v' = 0
    replace clientgone`v' = 1 if `v'!="" & year!="2010" & `v'!= exec_name1[_n+1] & `v'!= exec_name2[_n+1] & `v'!= exec_name3[_n+1] & `v'!= exec_name4[_n+1] & `v'!= exec_name5[_n+1] & `v'!= exec_name6[_n+1] & `v'!= exec_name7[_n+1] & `v'!= exec_name8[_n+1] & `v'!= exec_name9[_n+1] & `v'!= exec_name10[_n+1] & `v'!= exec_name11[_n+1] & `v'!= exec_name12[_n+1] & `v'!= exec_name13[_n+1] & `v'!= exec_name14[_n+1] & `v'!= exec_name15[_n+1] & `v'!= exec_name16[_n+1] & `v'!= exec_name17[_n+1] & `v'!= exec_name18[_n+1] & `v'!= exec_name19[_n+1] & `v'!= exec_name20[_n+1] & `v'!= exec_name21[_n+1]& `v'!= exec_name22[_n+1] & `v'!= exec_name23[_n+1] & `v'!= exec_name24[_n+1] & `v'!= exec_name25[_n+1] & `v'!= exec_name26[_n+1] & `v'!= exec_name27[_n+1] & `v'!= exec_name28[_n+1] & `v'!= exec_name29[_n+1] & `v'!= exec_name30[_n+1] & `v'!= exec_name31[_n+1] & `v'!= exec_name32[_n+1] & `v'!= exec_name33[_n+1] & `v'!= exec_name34[_n+1] & `v'!= exec_name35[_n+1] & `v'!= exec_name36[_n+1] & `v'!= exec_name37[_n+1] & `v'!= exec_name38[_n+1] & `v'!= exec_name39[_n+1] & `v'!= exec_name40[_n+1] & `v'!= exec_name41[_n+1] & `v'!= exec_name42[_n+1] & `v'!= exec_name43[_n+1] & `v'!= exec_name44[_n+1] & `v'!= exec_name45[_n+1] & `v'!= exec_name46[_n+1] & `v'!= exec_name47[_n+1] & `v'!= exec_name48[_n+1] & `v'!= exec_name49[_n+1] & `v'!= exec_name50[_n+1] & `v'!= exec_name51[_n+1] & `v'!= exec_name52[_n+1] & `v'!= exec_name53[_n+1] & `v'!= exec_name54[_n+1] & `v'!= exec_name55[_n+1] & `v'!= exec_name56[_n+1] & `v'!= exec_name57[_n+1] & `v'!= exec_name58[_n+1] & `v'!= exec_name59[_n+1] & `v'!= exec_name60[_n+1] & `v'!= exec_name61[_n+1] & `v'!= exec_name62[_n+1] & `v'!= exec_name63[_n+1] & `v'!= exec_name64[_n+1] & `v'!= exec_name65[_n+1] & `v'!= exec_name66[_n+1] & `v'!= exec_name67[_n+1] & `v'!= exec_name68[_n+1] & `v'!= exec_name69[_n+1] & `v'!= exec_name70[_n+1] & `v'!= exec_name71[_n+1] & `v'!= exec_name72[_n+1] & `v'!= exec_name73[_n+1] & `v'!= exec_name74[_n+1] & `v'!= exec_name75[_n+1] & `v'!= exec_name76[_n+1] & `v'!= exec_name77[_n+1] & `v'!= exec_name78[_n+1] & `v'!= exec_name79[_n+1] & `v'!= exec_name80[_n+1] & `v'!= exec_name81[_n+1] & `v'!= exec_name82[_n+1] & `v'!= exec_name83[_n+1] & `v'!= exec_name84[_n+1] & `v'!= exec_name85[_n+1] & `v'!= exec_name86[_n+1] & `v'!= exec_name87[_n+1] & `v'!= exec_name88[_n+1] & `v'!= exec_name89[_n+1] & `v'!= exec_name90[_n+1] & `v'!= exec_name91[_n+1] & `v'!= exec_name92[_n+1] & `v'!= exec_name93[_n+1] & `v'!= exec_name94[_n+1] & `v'!= exec_name95[_n+1] & `v'!= exec_name96[_n+1] & `v'!= exec_name97[_n+1] & `v'!= exec_name98[_n+1] & `v'!= exec_name99[_n+1] & `v'!= exec_name100[_n+1] & `v'!= exec_name101[_n+1] & `v'!= exec_name102[_n+1] & `v'!= exec_name103[_n+1] & `v'!= exec_name104[_n+1] & `v'!= exec_name105[_n+1] & `v'!= exec_name106[_n+1] & `v'!= exec_name107[_n+1] & `v'!= exec_name108[_n+1] & `v'!= exec_name109[_n+1] & `v'!= exec_name110[_n+1] & `v'!= exec_name111[_n+1] & `v'!= exec_name112[_n+1] & `v'!= exec_name113[_n+1] & `v'!= exec_name114[_n+1] & `v'!= exec_name115[_n+1] & `v'!= exec_name116[_n+1] & `v'!= exec_name117[_n+1] & `v'!= exec_name118[_n+1] & `v'!= exec_name119[_n+1] & `v'!= exec_name120[_n+1] & `v'!= exec_name121[_n+1] & `v'!= exec_name122[_n+1] & `v'!= exec_name123[_n+1] & `v'!= exec_name124[_n+1] & `v'!= exec_name125[_n+1] & `v'!= exec_name126[_n+1] & `v'!= exec_name127[_n+1] & `v'!= exec_name128[_n+1] & `v'!= exec_name129[_n+1] & `v'!= exec_name130[_n+1] & `v'!= exec_name131[_n+1] & `v'!= exec_name132[_n+1] & `v'!= exec_name133[_n+1] & `v'!= exec_name134[_n+1] & `v'!= exec_name135[_n+1] & `v'!= exec_name136[_n+1] & `v'!= exec_name137[_n+1] & `v'!= exec_name138[_n+1] & `v'!= exec_name139[_n+1] & `v'!= exec_name140[_n+1] & `v'!= exec_name141[_n+1] & `v'!= exec_name142[_n+1] & `v'!= exec_name143[_n+1] & `v'!= exec_name144[_n+1] & `v'!= exec_name145[_n+1] & `v'!= exec_name146[_n+1] & `v'!= exec_name147[_n+1] & `v'!= exec_name148[_n+1] & `v'!= exec_name149[_n+1] & `v'!= exec_name150[_n+1] & `v'!= exec_name151[_n+1] & `v'!= exec_name152[_n+1] & `v'!= exec_name153[_n+1] & `v'!= exec_name154[_n+1] & `v'!= exec_name155[_n+1] & `v'!= exec_name156[_n+1] & `v'!= exec_name157[_n+1] & `v'!= exec_name158[_n+1] & `v'!= exec_name159[_n+1] & `v'!= exec_name160[_n+1] & `v'!= exec_name161[_n+1] & `v'!= exec_name162[_n+1] & `v'!= exec_name163[_n+1] & `v'!= exec_name164[_n+1] & `v'!= exec_name165[_n+1] & `v'!= exec_name166[_n+1] & `v'!= exec_name167[_n+1] & `v'!= exec_name168[_n+1] & `v'!= exec_name169[_n+1] & `v'!= exec_name170[_n+1] & `v'!= exec_name171[_n+1] & `v'!= exec_name172[_n+1] & `v'!= exec_name173[_n+1] & `v'!= exec_name174[_n+1] & `v'!= exec_name175[_n+1] & `v'!= exec_name176[_n+1] & `v'!= exec_name177[_n+1] & `v'!= exec_name178[_n+1] & `v'!= exec_name179[_n+1] & `v'!= exec_name180[_n+1] & `v'!= exec_name181[_n+1] & `v'!= exec_name182[_n+1] & `v'!= exec_name183[_n+1] & `v'!= exec_name184[_n+1] & `v'!= exec_name185[_n+1] & `v'!= exec_name186[_n+1] & `v'!= exec_name187[_n+1] & `v'!= exec_name188[_n+1] & `v'!= exec_name189[_n+1] & `v'!= exec_name190[_n+1] & `v'!= exec_name191[_n+1] & `v'!= exec_name192[_n+1] & `v'!= exec_name193[_n+1] & `v'!= exec_name194[_n+1] & `v'!= exec_name195[_n+1] & `v'!= exec_name196[_n+1] & `v'!= exec_name197[_n+1] & `v'!= exec_name198[_n+1] & `v'!= exec_name199[_n+1] & `v'!= exec_name200[_n+1] & `v'!= exec_name201[_n+1] & `v'!= exec_name202[_n+1] & `v'!= exec_name203[_n+1] & `v'!= exec_name204[_n+1] & `v'!= exec_name205[_n+1] & `v'!= exec_name206[_n+1] & `v'!= exec_name207[_n+1] & `v'!= exec_name208[_n+1] & `v'!= exec_name209[_n+1] & `v'!= exec_name210[_n+1] & `v'!= exec_name211[_n+1] & `v'!= exec_name212[_n+1] & `v'!= exec_name213[_n+1] & `v'!= exec_name214[_n+1] & `v'!= exec_name215[_n+1] & `v'!= exec_name216[_n+1] & `v'!= exec_name217[_n+1] & `v'!= exec_name218[_n+1] & `v'!= exec_name219[_n+1] & `v'!= exec_name218[_n+1] & `v'!= exec_name219[_n+1] & `v'!= exec_name220[_n+1] & `v'!= exec_name221[_n+1] & `v'!= exec_name222[_n+1] & `v'!= exec_name223[_n+1] & `v'!= exec_name224[_n+1] & `v'!= exec_name225[_n+1] & `v'!= exec_name226[_n+1] & `v'!= exec_name227[_n+1] & `v'!= exec_name228[_n+1] & `v'!= exec_name229[_n+1] & `v'!= exec_name230[_n+1] & `v'!= exec_name231[_n+1] & `v'!= exec_name232[_n+1] & `v'!= exec_name233[_n+1] & `v'!= exec_name234[_n+1] & `v'!= exec_name235[_n+1] & `v'!= exec_name236[_n+1] & `v'!= exec_name237[_n+1] & `v'!= exec_name238[_n+1] & `v'!= exec_name239[_n+1] & `v'!= exec_name240[_n+1] & `v'!= exec_name241[_n+1] & `v'!= exec_name242[_n+1] & `v'!= exec_name243[_n+1] & `v'!= exec_name244[_n+1] & `v'!= exec_name245[_n+1] & `v'!= exec_name246[_n+1] & `v'!= exec_name247[_n+1] & `v'!= exec_name248[_n+1] & `v'!= exec_name249[_n+1]
    }

    Which comes back with the error message that the expression is too long (obviously).

    Now my question is how I can get the same output without having to split the function. I tried a loop within the loop but failed miserably.

    Thanks in advance!

  • #2
    Welcome to Statalist.

    The first thing we notice is that your data is in what Stata would call a "wide" layout with the values of the executive names in different variables for a single organization and year. It will be easier to work with, both for this problem, and for your further analysis, if you transform it to a "long" layout, where each observation has just one value of exec_name for one given organization and year.

    The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. It certainly does make things much easier here.

    The following example of the reshape command may start you on your way.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year str1 organization str20(exec_name1 exec_name2)
    1994 "B" "Albert" "Jim"
    1995 "B" "Albert" "Jim"
    1996 "B" "Jim"    ""   
    end
    reshape long exec_name, i(organization year) j(ex_num)
    drop if missing(exec_name)
    drop ex_num
    sort organization exec_name year
    list, noobs sepby(organization exec_name)
    Code:
    . reshape long exec_name, i(organization year) j(ex_num)
    (note: j = 1 2)
    
    Data                               wide   ->   long
    -----------------------------------------------------------------------------
    Number of obs.                        3   ->       6
    Number of variables                   4   ->       4
    j variable (2 values)                     ->   ex_num
    xij variables:
                      exec_name1 exec_name2   ->   exec_name
    -----------------------------------------------------------------------------
    
    . drop if missing(exec_name)
    (1 observation deleted)
    
    . drop ex_num
    
    . sort organization exec_name year
    
    . list, noobs sepby(organization exec_name)
    
      +----------------------------+
      | organi~n   year   exec_n~e |
      |----------------------------|
      |        B   1994     Albert |
      |        B   1995     Albert |
      |----------------------------|
      |        B   1994        Jim |
      |        B   1995        Jim |
      |        B   1996        Jim |
      +----------------------------+

    Comment


    • #3
      Building on William's helpful advice, a simple idea would be to extract members of each firm in the first year and construct a complete dataset (the case where no one leaves across all years). You can then merge back with the original and this will allow you to identify who is there, who left and even who is new. Here is a simple example which builds on yours


      Code:
      input float year str12(organization exec_name1 exec_name2 exec_name3 exec_name4 exec_name5 exec_name6)
      1994 "B" "Albert" "Jim" "Andrew" "Mark" "Ben" "Jordan"
      1995 "B" "Albert" "Jim" "Ben" "Jordan"
      1996 "B" "Jim" "Ben" "James"
      2010 "B" "Jim" "James"
      1994 "C" "Philip" "Morris" "Matt" "Kevin" "Junior"
      1995 "C" "Philip" "Morris" "Matt" "Kevin" "Junior"
      1996 "C" "Philip" "Matt" "Junior" "Marco"
      2010 "C" "Junior" "Marco"
      end
      
      reshape long exec_name, i( organization year ) j(todrop)
      drop todrop
      drop if exec_name==""
      preserve
      bys organization (year): egen fyear= min(year)
      keep if year== fyear
      drop year
      
      *EXPAND N where N is total number of years
      expand 4
      
      *YEAR STARTS IN 1994, ENDS IN 2010 WITH GAPS
      bys organization exec_name: gen year= 1993+_n
      replace year = 2010 if year== year[_N]
      drop fyear
      *SAVE
      tempfile fyear
      save `fyear'
      
      *RESTORE AND MERGE
      restore
      merge 1:1 organization exec_name year using `fyear'
      gen present= 0
      replace present= 1 if _merge==3
      replace present =2 if _merge==1
      drop _merge
      sort organization year
      list, sepby(organization year)

      Code:
       
      . list, sepby(organization year)
      
           +--------------------------------------+
           | organi~n   year   exec_n~e   present |
           |--------------------------------------|
        1. |        B   1994        Jim         1 |
        2. |        B   1994     Albert         1 |
        3. |        B   1994       Mark         1 |
        4. |        B   1994     Jordan         1 |
        5. |        B   1994     Andrew         1 |
        6. |        B   1994        Ben         1 |
           |--------------------------------------|
        7. |        B   1995     Jordan         1 |
        8. |        B   1995        Jim         1 |
        9. |        B   1995     Andrew         0 |
       10. |        B   1995        Ben         1 |
       11. |        B   1995     Albert         1 |
       12. |        B   1995       Mark         0 |
           |--------------------------------------|
       13. |        B   1996     Albert         0 |
       14. |        B   1996        Ben         1 |
       15. |        B   1996      James         2 |
       16. |        B   1996     Jordan         0 |
       17. |        B   1996        Jim         1 |
       18. |        B   1996       Mark         0 |
       19. |        B   1996     Andrew         0 |
           |--------------------------------------|
       20. |        B   2010       Mark         0 |
       21. |        B   2010      James         2 |
       22. |        B   2010        Ben         0 |
       23. |        B   2010     Albert         0 |
       24. |        B   2010        Jim         1 |
       25. |        B   2010     Jordan         0 |
       26. |        B   2010     Andrew         0 |
           |--------------------------------------|
       27. |        C   1994      Kevin         1 |
       28. |        C   1994       Matt         1 |
       29. |        C   1994     Junior         1 |
       30. |        C   1994     Morris         1 |
       31. |        C   1994     Philip         1 |
           |--------------------------------------|
       32. |        C   1995     Junior         1 |
       33. |        C   1995     Morris         1 |
       34. |        C   1995      Kevin         1 |
       35. |        C   1995       Matt         1 |
       36. |        C   1995     Philip         1 |
           |--------------------------------------|
       37. |        C   1996     Junior         1 |
       38. |        C   1996       Matt         1 |
       39. |        C   1996      Kevin         0 |
       40. |        C   1996      Marco         2 |
       41. |        C   1996     Philip         1 |
       42. |        C   1996     Morris         0 |
           |--------------------------------------|
       43. |        C   2010      Marco         2 |
       44. |        C   2010       Matt         0 |
       45. |        C   2010      Kevin         0 |
       46. |        C   2010     Philip         0 |
       47. |        C   2010     Morris         0 |
       48. |        C   2010     Junior         1 |
           +--------------------------------------+

      present=1 identifies those who are still there; 0 left; 2 new member. So a conversation in 2010 between Kevin and Matt who were former employees of C would go something like "You know Junior is still with the firm! Say what? Yeah and Marco also who joined a year after you left."



      Last edited by Andrew Musau; 13 May 2018, 10:01.

      Comment


      • #4
        Thank you William and Andrew!

        I see how reshaping would work and make it easier to eventually achieve what I want. However, because it's quite a large dataset (1,004 variables and 19,664 observations in wide format) and I need to merge it with an even larger dataset (also in wide format) later on, I found another work around;

        code:

        Code:
        foreach v of varlist exec_name1-exec_name249 {
        gen NG`v' = 0
        foreach m  of varlist exec_name1-exec_name249 {
        replace NG`v' = NG`v'+1 if `v'!="" & year!="2010" & `v'!= `m'[_n+1]
        }
        }
        
        foreach v of varlist NGexec_name1-NGexec_name249{
        gen gone`v'=0
        replace gone`v'=1 if `v'==249
        }
        Thanks again!

        Comment

        Working...
        X