Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dividing and grouping

    Hi dear statalists,

    There are below the data
    Code:
    YEAR     CAE        CAE      CAE         CAE
              01        02       03         04
    2010    6,037.6    1,010.6    534.5    1,360.2
    2011    6,079.8    1,090.4    573.0    1,334.9
    2012    6,211.3    1,118.0    563.4    1,277.0
    2013    6,323.0    1,206.3    553.0    1,162.4
    2014    6,417.3    1,213.2    520.5    1,113.1
    2015    6,631.7    1,273.0    502.9    1,116.4
    2016    6,634.7    1,258.8    570.5    1,121.1
    2017    7,074.2    1,265.1    612.4    1,245.6
    2018    7,304.7    1,346.9    587.6    1,356.6
    2019    7,713.6    1,311.0    646.1    1,326.0
    2020    7,754.9    1,300.3    596.1    1,247.2
    First, I am going to divide the current value by the previous values for each column i.e: 6,079.8 /6,037.6 = 1.006989532
    6,211.3 /6,079.8 = 1.0216290009 so on and so forth.
    Second, I am going to do:
    Code:
    gen Wanted=.
    
        replace wanted=1.600989532   if YEAR==2010 & CAE2==01
        replace wanted=1.021629000  if YEAR==2010 & CAE2==01
        replace wanted=              if YEAR==2010 & CAE2==01
        replace wanted=              if YEAR==2010 & CAE2==01
          .
          .
          .
    
        replace wanted=               if YEAR==2020 & CAE2==04
    Could anyone assist me to create a pretty shortcut path, please?

    Cheers,
    Paris

  • #2
    Please use dataex to provide example data from a dataset imported into Stata. Try to copy and paste your example data in post #1 (omitting the header columns) into Stata's Data Editor window and you will gain an appreciation for the difficulties caused by, for example, commas in the input data, not to mention that the variable names are not provided in a usable form.

    I assume you have a much larger dataset, so something like
    Code:
    dataex YEAR CAE1 CAE2 CAE3 CAE4 if FIRM==123456789
    will help you provide a usable sample of your data.

    Comment


    • #3
      • Please use dataex to post the sample data. As of now that data structure is unclear. There are four CAE variables (which is impossible) and then code part 2 has "CAE2" which does not exist in the data example. The identity of "01" to "04" in part one is also unclear. Are they part of the variable names or actual row of data? Use help dataex to learn more.
      • Please proofread code part 2. Are they really all 2010 in the first four lines? And as described above, what are "CAE2" and "01" to "04"? It implies that the data is in a long form with CAE2 taking values of 1, 2, 3, and 4, but it's incoherent with the data.

      Comment


      • #4
        And also: you appear to have data for 11 years 2010 through 2020 which means you will have the ratios of current value to previous value for 10 years - presumably 2011 through 2020, since there is no "previous value" for 2010. Yet your code suggests you expect to assign values for all 11 years. How is this to be?

        Comment


        • #5
          Thank you for getting back to me prof William and ken. As Prof William mentioned the data is massive I brought a few of them. Thank you for your help.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int Year double(CAE_01 CAE_02)
          2010 6037.614 1010.629
          2011 6079.775 1090.424
          2012 6211.331 1118.038
          2013 6323.027 1206.281
          2014 6417.345 1213.217
          2015 6631.652 1272.966
          2016 6634.715 1258.811
          2017 7074.213 1265.114
          2018 7304.728 1346.924
          2019  7713.58 1311.046
          2020 7754.943 1300.256
          end

          Comment


          • #6
            Originally posted by William Lisowski View Post
            And also: you appear to have data for 11 years 2010 through 2020 which means you will have the ratios of current value to previous value for 10 years - presumably 2011 through 2020, since there is no "previous value" for 2010. Yet your code suggests you expect to assign values for all 11 years. How is this to be?
            Yes you are absolutely right.
            (10 years is fine in this case).
            so It begins by
            Code:
             gen Wanted=. replace Wanted 6079.775/ 6037.614 if year==2011 and CAE_01
            Last edited by Paris Rira; 23 Feb 2023, 09:13.

            Comment


            • #7
              This may be a possibility:

              Code:
              clear
              input int Year double(CAE_01 CAE_02)
              2010 6037.614 1010.629
              2011 6079.775 1090.424
              2012 6211.331 1118.038
              2013 6323.027 1206.281
              2014 6417.345 1213.217
              2015 6631.652 1272.966
              2016 6634.715 1258.811
              2017 7074.213 1265.114
              2018 7304.728 1346.924
              2019  7713.58 1311.046
              2020 7754.943 1300.256
              end
              
              * If the ratio is to be attached to the earlier year:
              foreach x in 01 02{
                  gen wanted_`x' = .
                  replace wanted_`x' = CAE_`x'[_n+1] / CAE_`x'
              }
              
              * If the ratio is to be attached to the later year:
              foreach x in 01 02{
                  gen wanted2_`x' = .
                  replace wanted2_`x' = CAE_`x' / CAE_`x'[_n-1]
              }
              Results:
              Code:
                   +------------------------------------------------------------------------+
                   | Year     CAE_01     CAE_02   wan~d_01   wan~d_02   wan~2_01   wan~2_02 |
                   |------------------------------------------------------------------------|
                1. | 2010   6037.614   1010.629   1.006983   1.078956          .          . |
                2. | 2011   6079.775   1090.424   1.021638   1.025324   1.006983   1.078956 |
                3. | 2012   6211.331   1118.038   1.017983   1.078927   1.021638   1.025324 |
                4. | 2013   6323.027   1206.281   1.014917    1.00575   1.017983   1.078927 |
                5. | 2014   6417.345   1213.217   1.033395   1.049248   1.014917    1.00575 |
                6. | 2015   6631.652   1272.966   1.000462   .9888803   1.033395   1.049248 |
                7. | 2016   6634.715   1258.811   1.066242   1.005007   1.000462   .9888803 |
                8. | 2017   7074.213   1265.114   1.032585   1.064666   1.066242   1.005007 |
                9. | 2018   7304.728   1346.924   1.055971    .973363   1.032585   1.064666 |
               10. | 2019    7713.58   1311.046   1.005362   .9917699   1.055971    .973363 |
               11. | 2020   7754.943   1300.256          .          .   1.005362   .9917699 |
                   +------------------------------------------------------------------------+

              Comment


              • #8
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input int Year double(CAE_01 CAE_02)
                2010 6037.614 1010.629
                2011 6079.775 1090.424
                2012 6211.331 1118.038
                2013 6323.027 1206.281
                2014 6417.345 1213.217
                2015 6631.652 1272.966
                2016 6634.715 1258.811
                2017 7074.213 1265.114
                2018 7304.728 1346.924
                2019  7713.58 1311.046
                2020 7754.943 1300.256
                end
                // I should have asked for this variable
                generate Firm = 101, before(Year) 
                // this is the good part
                sort Firm Year
                foreach sfx in 01 02 {
                    generate ratio_`sfx' = CAE_`sfx'/CAE_`sfx'[_n-1]
                }
                list, clean noobs
                Code:
                . list, clean noobs
                
                    Firm   Year     CAE_01     CAE_02   ratio_01   ratio_02  
                     101   2010   6037.614   1010.629          .          .  
                     101   2011   6079.775   1090.424   1.006983   1.078956  
                     101   2012   6211.331   1118.038   1.021638   1.025324  
                     101   2013   6323.027   1206.281   1.017983   1.078927  
                     101   2014   6417.345   1213.217   1.014917    1.00575  
                     101   2015   6631.652   1272.966   1.033395   1.049248  
                     101   2016   6634.715   1258.811   1.000462   .9888803  
                     101   2017   7074.213   1265.114   1.066242   1.005007  
                     101   2018   7304.728   1346.924   1.032585   1.064666  
                     101   2019    7713.58   1311.046   1.055971    .973363  
                     101   2020   7754.943   1300.256   1.005362   .9917699

                Comment


                • #9
                  Thank you so much. Both worked perfectly.
                  The major time consuming part is the second part which I am going to group them. Since there are too many columns if I want to use
                  Code:
                  g wanted==.
                       replace wanted=   if YEAR==2017 & raio_01
                      replace wanted=    if YEAR==2017 & ratio_02
                  
                  .
                  .
                  .
                  It would take ages. Do you know any shortcuts please?

                  Comment


                  • #10
                    Originally posted by Paris Rira View Post
                    Thank you so much. Both worked perfectly.
                    The major time consuming part is the second part which I am going to group them. Since there are too many columns if I want to use
                    Code:
                    g wanted==.
                    replace wanted= if YEAR==2017 & raio_01
                    replace wanted= if YEAR==2017 & ratio_02
                    
                    .
                    .
                    .
                    It would take ages. Do you know any shortcuts please?
                    This is not making sense to me. "wanted =" what number? It's immediately followed by an "if"?

                    To move this forward, maybe better to type the "wanted" column yourself to show the grouping method?

                    Comment


                    • #11
                      Code:
                      g wanted= .
                      replace wanted =1.006983 if  year==2011& ratio_01
                      Later I am going to use -wanted- in another formula
                      for example:
                      Code:
                      bys idemp1: replace real_capital = net/(wanted/100) if _n==1
                      according to -if i will use the proper -wanted-



                      Last edited by Paris Rira; 23 Feb 2023, 09:45.

                      Comment


                      • #12
                        I agree. You have, in this example, two different variables, ratio_01 and ratio_02, that you want to combine into a single variable.
                        Added in edit:
                        Code:
                        if  year==2011& ratio_01
                        does not mean anything - or rather, it means
                        Code:
                        if  year==2011& ratio_01!=0
                        Perhaps you are unaware of Stata's reshape command. If it is new to you, you should read the documentation - the full PDF documentation linked to from the top of the output of
                        Code:
                        help reshape
                        is a better introduction than the shorter output from help, which assumes you know about reshape and just need a refresher on syntax.

                        Maybe this is what you want to accomplish.
                        Code:
                        // I should have asked for this variable
                        generate Firm = 101, before(Year)
                        // this is the good part
                        sort Firm Year
                        foreach sfx in 01 02 {
                            generate ratio_`sfx' = CAE_`sfx'/CAE_`sfx'[_n-1]
                        }
                        list, clean noobs
                        reshape long CAE_ ratio_, i(Firm Year) j(var) string // string option for leading zero
                        rename (CAE_ ratio_) (CAE ratio)
                        destring(var), replace
                        sort Firm var Year
                        list, noobs sepby(Firm var)
                        Code:
                        . list, noobs sepby(Firm var)
                        
                          +-----------------------------------------+
                          | Firm   Year   var        CAE      ratio |
                          |-----------------------------------------|
                          |  101   2010     1   6037.614          . |
                          |  101   2011     1   6079.775   1.006983 |
                          |  101   2012     1   6211.331   1.021638 |
                          |  101   2013     1   6323.027   1.017983 |
                          |  101   2014     1   6417.345   1.014917 |
                          |  101   2015     1   6631.652   1.033395 |
                          |  101   2016     1   6634.715   1.000462 |
                          |  101   2017     1   7074.213   1.066242 |
                          |  101   2018     1   7304.728   1.032585 |
                          |  101   2019     1    7713.58   1.055971 |
                          |  101   2020     1   7754.943   1.005362 |
                          |-----------------------------------------|
                          |  101   2010     2   1010.629          . |
                          |  101   2011     2   1090.424   1.078956 |
                          |  101   2012     2   1118.038   1.025324 |
                          |  101   2013     2   1206.281   1.078927 |
                          |  101   2014     2   1213.217    1.00575 |
                          |  101   2015     2   1272.966   1.049248 |
                          |  101   2016     2   1258.811   .9888803 |
                          |  101   2017     2   1265.114   1.005007 |
                          |  101   2018     2   1346.924   1.064666 |
                          |  101   2019     2   1311.046    .973363 |
                          |  101   2020     2   1300.256   .9917699 |
                          +-----------------------------------------+
                        Last edited by William Lisowski; 23 Feb 2023, 09:49.

                        Comment


                        • #13
                          Each year has 30 columns, so each year has 30 different values for wanted. i.e. year 2011 and ratio_01-- year2011 and ratio_02-- year 2011 and ratio_03 ....., year 2020 and ratio_20

                          Comment


                          • #14
                            Code:
                            g wanted=.
                            replace wanted=1  if YEAR==2011 & CAE_01
                                replace wanted=1 if YEAR==2011 & CAE_02
                                replace wanted=1 if YEAR==2011 & CAE_03
                                replace wanted=1 if YEAR==2011 & CAE_04
                                replace wanted=1 if YEAR==2011 & CAE_05
                                replace wanted=1 if YEAR==2011 & CAE_06
                                replace wanted=1 if YEAR==2011 & CAE_07
                                replace wanted=1 if YEAR==2011 & CAE_08
                                replace wanted=1 if YEAR==2011 & CAE_09
                                replace wanted=1 if YEAR==2011 & CAE_10
                                replace wanted=1 if YEAR==2011 & CAE_11
                                replace wanted=1 if YEAR==2011 & CAE_12
                                replace wanted=1 if YEAR==2011 & CAE_13
                                replace wanted=1 if YEAR==2011 & CAE_14
                                replace wanted=1 if YEAR==2011 & CAE_15
                                replace wanted=1 if YEAR==2011 & CAE_16
                                replace wanted=1 if YEAR==2011 & CAE_17
                                replace wanted=1 if YEAR==2011 & CAE_18
                                replace wanted=1 if YEAR==2011 & CAE_19
                               .
                              .
                            
                                
                                replace wanted=1 if YEAR==2011 & CAE_29
                                replace wanted=1 if YEAR==2011 & CAE_30
                            it is only for one year. I have do for 10 years, this process. I seek to not repeat this long typing process

                            Comment


                            • #15
                              Please don't overlook post #12.

                              Comment

                              Working...
                              X