Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I don't understand the new data organization. What on earth is that variable Year doing in there? In the example data, it isn't even a variable--it's just a constant 1950. But assuming in the full data set it does vary, it seems to have no connection whatsoever to any of the other variables. The other variables are chronologically related only to the Date and Year StartRole and EndRole variables. The variable Year should just be dropped from the data set: it is completely out of place.

    In addition, I do not understand clearly what you want to do with the percentage variables. Do you want a total for each manager separately in each company and year? Or do you want a grand total for all the female managers, and another grand total for all the male mangers in each company in year?

    Comment


    • #17
      Thank you for your response.
      I need percentage for each manager in each company each year for example:
      company3 manager1 1950 33%
      company3 manager 1 1951 100%
      company3 manager 1 1952 65% these percentages are calculated based on the fraction of each year a manager are in each company.

      For year, a part of my data is financial ratios for each company from 1950 to 2022. I can't remove it as the year of start and end role don't match them completely. For example I have financial ratio for 1950 but its possible that I don't have any start/end role in 1950. in the other words the years I have financial ratios for are more than the years I have start and end role of managers. So could I ask you please change your loop command in a way that I can keep "Year"?
      I even tried to calculate them in 2 different files and then merge them together but I don't know why it cant match whole the data( I tried merge 1:1, 1:m and m:1)

      Last edited by Fimi Karimi; 19 Jul 2023, 17:27.

      Comment


      • #18
        OK. Since you didn't include those financial variables in your example, I didn't know they exist and the Year variable stuck out as being unrelated to anything else and there was no apparent reason for it to be there.

        I'm still not completely sure I understand what you want the final result to look like, but perhaps it is this:
        Code:
        clear*
        input long(CompanyID MnagerID) int(Year DateStartRole YearStartRole) double StartYearPercentage int(DateEndRole YearEndRole) double EndYearPercentage long Gender
        14714 36350 1950 17481 2007 .13972602739726028 17584 2008 .14246575342465753 0
        24404 274872 1950 20212 2015 .663013698630137 22715 2022 .18904109589041096 0
        3268333 647023 1950 22175 2020 .2876712328767123 23142 2023 .3589041095890411 0
        77289 140814 1950 17106 2006 .16712328767123288 18596 2010 .9123287671232877 0
        2957 341389 1950 15857 2003 .5863013698630137 16192 2004 .3315068493150685 0
        1216462 731518 1950 16751 2005 .13972602739726028 17938 2009 .1095890410958904 0
        19249 27228 1950 13788 1997 .25205479452054796 14792 2000 .4986301369863014 0
        2215970 602625 1950 20089 2015 1 23376 2024 1 0
        2684926 766264 1950 21027 2017 .4328767123287671 21896 2019 .947945205479452 0
        8542 1269387 1950 20676 2016 .3917808219178082 22022 2020 .29315068493150687 0
        3471954 2680690 1950 23001 2022 .0273972602739726 23376 2024 1 0
        734725 371154 1950 16497 2005 .8356164383561644 23376 2024 1 0
        31761 1013779 1950 19724 2014 1 23376 2024 1 0
        3230 310464 1950 16482 2005 .8767123287671232 23376 2024 1 1
        31338 320152 1950 16657 2005 .3972602739726027 19484 2013 .3424657534246575 0
        810610 2142863 1950 21670 2019 .6712328767123288 23376 2024 1 1
        23104 535473 1950 14441 1999 .46301369863013697 14976 2001 0 0
        1949535 33059 1950 19480 2013 .6684931506849315 23010 2022 .9972602739726028 0
        1020882 1055600 1950 18435 2010 .5287671232876713 23376 2024 1 0
        3061398 1601311 1950 21755 2019 .4383561643835616 22280 2020 1 0
        2066595 345640 1950 19851 2014 .6520547945205479 21497 2018 .8547945205479452 0
        15053 57816 1950 13393 1996 .3315068493150685 15854 2003 .4054794520547945 0
        4913 180576 1950 13970 1998 .7534246575342466 16719 2005 .7726027397260274 0
        1635432 34517 1950 18345 2010 .7753424657534247 22791 2022 .3972602739726027 0
        603630 373308 1950 16284 2004 .41643835616438357 17960 2009 .16986301369863013 0
        6357 41901 1950 15116 2001 .6164383561643836 18725 2011 .26575342465753427 0
        2776429 220567 1950 21228 2018 .5643835616438356 21434 2018 .5643835616438356 0
        570234 1538188 1950 20227 2015 .6219178082191781 23376 2024 1 0
        27970 202652 1950 21208 2018 .936986301369863 23376 2024 1 0
        32315 595234 1950 15706 2003 1 19191 2012 .5424657534246575 0
        3454 49346 1950 10227 1988 1 14335 1999 .2465753424657534 0
        2402253 337217 1950 21270 2018 .10136986301369863 21307 2018 .10136986301369863 0
        23165 104112 1950 13844 1997 .09863013698630137 16433 2004 .9917808219178083 0
        1967739 1017642 1950 20963 2017 .6082191780821918 23376 2024 1 0
        25680 320153 1950 15620 2002 .2356164383561644 19170 2012 .4849315068493151 0
        4873 64197 1950 15669 2002 .10136986301369863 17166 2006 .9972602739726028 0
        31132 546904 1950 18994 2012 .9972602739726028 22370 2021 .24383561643835616 1
        19601 2135197 1950 21768 2019 .40273972602739727 23376 2024 1 0
        790219 205102 1950 16588 2005 .5863013698630137 17245 2007 .2136986301369863 0
        127647 64197 1950 15805 2003 .7287671232876712 18828 2011 .547945205479452 0
        28170 1535236 1950 20207 2015 .6767123287671233 23376 2024 1 0
        1025143 64197 1950 20485 2016 .915068493150685 21677 2019 .34794520547945207 0
        1984187 1948857 1950 21867 2019 .13150684931506848 22400 2021 .32602739726027397 1
        1682362 1457678 1950 21537 2018 .03561643835616438 23376 2024 1 0
        167 1818027 1950 12054 1993 1 23376 2024 1 1
        1212957 1124709 1950 18808 2011 .5068493150684932 19026 2012 .09041095890410959 0
        7518 59621 1950 11413 1991 .7534246575342466 17416 2007 .6821917808219178 0
        1230126 1520380 1950 18049 2009 .5863013698630137 23376 2024 1 0
        9281 274510 1950 14579 1999 .08493150684931507 14955 2000 .9452054794520548 1
        26166 31196 1950 17986 2009 .7589041095890411 18416 2010 .4191780821917808 1
        25615 331918 1950 12667 1994 .32054794520547947 15973 2003 .7315068493150685 0
        494219 34164 1950 19332 2012 .07123287671232877 22420 2021 .38082191780821917 0
        26718 332820 1950 13057 1995 .25205479452054796 17304 2007 .37534246575342467 0
        125402 203099 1950 15580 2002 .3452054794520548 17218 2007 .13972602739726028 0
        13136 760730 1950 12874 1995 .7534246575342466 20822 2017 .005479452054794521 0
        1206647 750450 1950 11688 1992 1 17917 2009 .052054794520547946 0
        1937998 762770 1950 20586 2016 .6383561643835617 23376 2024 1 0
        1146485 274693 1950 15706 2003 1 19408 2013 .13424657534246576 0
        1226156 444129 1950 15706 2003 1 17481 2007 .8602739726027397 0
        25597 332813 1950 14792 2000 .5013698630136987 20096 2015 .019178082191780823 1
        7588 484097 1950 20866 2017 .873972602739726 22462 2021 .4958904109589041 0
        1584110 334721 1950 18172 2009 .2493150684931507 20040 2014 .8657534246575342 0
        25680 273657 1950 20305 2015 .40821917808219177 21529 2018 .9424657534246575 0
        2671 38526 1950 16274 2004 .4438356164383562 20235 2015 .4 0
        33237 91477 1950 18877 2011 .3178082191780822 19342 2012 .9561643835616438 0
        753102 13204 1950 16649 2005 .4191780821917808 18794 2011 .4547945205479452 0
        3061358 2505633 1950 22475 2021 .4684931506849315 23376 2024 1 1
        2034213 441136 1950 19940 2014 .40821917808219177 20388 2015 .8191780821917808 0
        2469 38526 1950 17402 2007 .3561643835616438 19631 2013 .7452054794520548 0
        3204648 893094 1950 21384 2018 .4547945205479452 23376 2024 1 0
        25578 55217 1950 15749 2003 .8821917808219178 16925 2006 .336986301369863 0
        27358 1351667 1950 21761 2019 .42191780821917807 23376 2024 1 0
        2366 320145 1950 16342 2004 .25753424657534246 17189 2007 .06027397260273973 0
        857145 554582 1950 17056 2006 .3041095890410959 18203 2009 .8356164383561644 0
        2381322 1948985 1950 21124 2017 .16712328767123288 23376 2024 1 0
        9519 623177 1950 20728 2016 .2493150684931507 22701 2022 .1506849315068493 0
        1601866 592608 1950 21236 2018 .8602739726027397 21732 2019 .4986301369863014 0
        27411 1008473 1950 22273 2020 .019178082191780823 23376 2024 1 0
        23165 63432 1950 22264 2020 .043835616438356165 23376 2024 1 0
        942661 732268 1950 15839 2003 .6356164383561644 16517 2005 .2191780821917808 0
        754662 1337279 1950 15497 2002 .5726027397260274 19887 2014 .4465753424657534 0
        31863 84307 1950 15778 2003 .8027397260273973 21306 2018 .3315068493150685 0
        11888 1126710 1950 18864 2011 .35342465753424657 21353 2018 .4602739726027397 0
        2718020 1279546 1950 22120 2020 .4383561643835616 23376 2024 1 0
        646483 485045 1950 16875 2006 .8 19556 2013 .5397260273972603 0
        33332 450785 1950 11323 1991 1 17169 2007 .005479452054794521 0
        947097 332663 1950 21216 2018 .915068493150685 23376 2024 1 0
        2273 35837 1950 15157 2001 .5041095890410959 18322 2010 .16164383561643836 0
        3393193 1544657 1950 22414 2021 .6356164383561644 23376 2024 1 0
        26883 34015 1950 17496 2007 .09863013698630137 19666 2013 .8410958904109589 0
        2165093 1255358 1950 20030 2014 .16164383561643836 21697 2019 .40273972602739727 0
        17178 140785 1950 13270 1996 .6684931506849315 14245 1999 0 0
        605072 731838 1950 14731 2000 .6684931506849315 15127 2001 .4136986301369863 0
        21520 86932 1950 13911 1998 .915068493150685 15826 2003 .3287671232876712 0
        624022 1395037 1950 20051 2014 .10410958904109589 20468 2016 .038356164383561646 0
        3181143 271656 1950 22740 2022 .7424657534246575 23376 2024 1 0
        2458681 324474 1950 21125 2017 .1643835616438356 23376 2024 1 0
        29776 82957 1950 9132 1985 1 14774 2000 .44931506849315067 0
        1210655 2648152 1950 22036 2020 .6684931506849315 23376 2024 1 0
        712 332941 1950 16859 2006 .8438356164383561 18612 2010 .9561643835616438 0
        end
        format %tdnn/dd/CCYY DateStartRole
        format %tdnn/dd/CCYY DateEndRole
        label values Gender sex
        label def sex 1 "F", modify
        label def sex 0 "M", modify
        
        //    CHECK THAT DATETIME VARIABLES ARE COHERENT
        foreach x in Start End {
            assert year(Date`x'Role) == Year`x'Role
        }
        assert DateEndRole >= DateStartRole
        
        preserve
        
        
        //    GET NUMBER OF DISTINCT MALE, FEMALE, AND TOTAL MANGERS EACH YEAR
        summ YearStartRole, meanonly
        local low = r(min)
        summ YearEndRole, meanonly
        local high = r(max)
        
        forvalues y = `low'/`high' {
            frame put CompanyID MnagerID Gender if inrange(`y', YearStartRole, YearEndRole), into(working)
            frame working {
                by CompanyID Gender (MnagerID), sort: gen n_managers_`y' = sum(MnagerID != MnagerID[_n-1])
                by CompanyID Gender (MnagerID): keep if _n == _N
            }
            frlink m:1 CompanyID Gender, frame(working)
            frget n_managers_`y', from(working)
            drop working
            frame drop working
            foreach x in M F {
                by CompanyID, sort: egen n_managers_`x'_`y' = max(cond(Gender == "`x'":sex, ///
                    n_managers_`y', .))
            }
            mvencode n_managers_*_`y', mv(0)
            replace n_managers_`y' = n_managers_M_`y' + n_managers_F_`y'
            order n_managers_M_`y' n_managers_F_`y', after(n_managers_`y')
        }
        keep CompanyID n_managers*
        by CompanyID, sort: keep if _n == 1
        reshape long n_managers_ n_managers_M_ n_managers_F_, i(CompanyID) j(Year)
        rename *_ *
        tempfile results
        save `results'
        
        restore , preserve
        
        
        //    NOW CALCULATE PERCENT PRESENCE IN EACH YEAR FOR EACH MANAGER
        forvalues y = `low'/`high' {
            frame put CompanyID MnagerID Gender DateStartRole DateEndRole ///
                if inrange(`y', YearStartRole, YearEndRole), into(working)
            frame working {
                gen start = max(DateStartRole, mdy(1, 1, `y'))
                gen end = min(DateEndRole, mdy(12, 31, `y'))
                gen percent_in_year`y' = 100*(end-start)/cond(isleapyear(`y'), 366, 365)
                collapse (sum) percent_in_year`y', by(CompanyID MnagerID)
            }
            frlink m:1 CompanyID Mnager, frame(working)
            frget percent_in_year`y', from(working)
            drop working
            frame drop working
        }
        keep CompanyID MnagerID percent_in_year*
        reshape long percent_in_year, i(CompanyID MnagerID) j(Year)
        drop if missing(percent_in_year) | percent_in_year == 0
        merge m:1 CompanyID Year using `results', nogenerate
        save `results', replace
        
        restore
        
        rangejoin Year YearStartRole YearEndRole using `results', by(CompanyID MnagerID)
        If this is not what you are looking for, when you post back, please show what it should look like.

        Comment


        • #19
          Thank you, Clyde. I run this command.
          the first section was ok. the second section took 4 hours but finally the results was wrong. I mean in previous command in #13 every thing was correct there were just a problem with (Year) that I explained to you. but in this new command the number of managers is wrong and after reshaping the data all other variables are removed.


          Data Wide -> Long
          -----------------------------------------------------------------------------
          Number of observations 13,670 -> 1,230,300
          Number of variables 271 -> 5
          j variable (90 values) -> Year
          xij variables:
          n_managers_1935 n_managers_1936 ... n_managers_2024->n_managers_
          n_managers_M_1935 n_managers_M_1936 ... n_managers_M_2024->n_managers_M_
          n_managers_F_1935 n_managers_F_1936 ... n_managers_F_2024->n_managers_F_
          -----------------------------------------------------------------------------

          . rename *_ *

          . tempfile results

          . save `results'
          . restore , preserve
          nothing to restore
          r(622);

          sum n_managers
          Variable | Obs Mean Std. dev. Min Max
          -------------+---------------------------------------------------------
          n_managers | 1,230,300 .1539608 .5799274 0 11

          in previous command (#13) Max was 33. Could you guess why this problem has happend?

          Comment


          • #20
            When I run the code in #18 with the example data shown there, it gives no error messages. And while I have not checked every company ID and year to verify the correctness of the number of managers by hand, in the handful that I did check, they are all correct.

            I am particularly mystified by the "nothing to restore" message. The data are preserved at the very beginning once the data are read in and a few assumptions checked. Somehow your preserved data got lost by the time the -restore- command was reached. One way that can happen is if you are running this code in chunks or one line at a time. This code must be run without interruption from beginning to end. Otherwise the local macros will not work properly, and neither will -preserve- and -restore-. So perhaps this accounts for all of the problems you are having. Run it uninterrupted start to finish.

            If that does not resolve the problems, please post back with example data that reproduces the problems you are encountering.

            As an aside, in a large data set you can speed things up a bit by using -tolong- or -greshape- instead of -reshape long-. Both of them support the same syntax as -collapse-, so learning to use them is pretty effortless. -tolong- is available from SSC. So is -greshape-, as part of the -gtools- package. The -gtools- package also contains -gcollapse-, which runs faster than Stata's -collapse- in large data sets.
            Last edited by Clyde Schechter; 21 Jul 2023, 11:47.

            Comment

            Working...
            X