Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • issues with scalar in a loop: a loop to create a variable that contains the maximum value of a certain variable in each yr-mth-decile ?

    Dear Statalist
    I want to generate a variable that contains the maximum beginning market values of firms in the same decile, year and month and traded in a certain stock exchange (abbreviated as N)

    *First I have created my deciles for firms that are traded in N (New York Stcok Exchange), the variable for the exchange identifier is primexch. Deciles are created on the basis of beginning market values, crmv is the firm market value:

    gen l_crmv=l.crmv
    egen sizedecile = xtile(crmv) if primexch=="N", by(yr mth) p(1/9)

    * Now I wanted to create a variable that will contain the maximum beginning market value for each decile in a given year and month. If I do not use a loop, I would think that I should do the following:

    sum l_crmv if sizedecile==1 & yr==1990 & mth==1
    scalar max1=r(max)

    sum l_crmv if sizedecile==1 & yr==1990 & mth==2
    scalar max2=r(max)
    .....
    ....
    ....
    sum l_crmv if sizedecile==2 & yr==1990 & mth==1


    **Obviously that is really time consuming especially when I have over 20 years in the sample and I want to do that for each decile in each month of each year !!!
    I tried to create a loop:

    forval j = 1/N {
    gen max`j'=.
    }
    levelsof sizedecile, local(levels)

    foreach x of local levels {
    foreach z of numlist 1990/2012 {
    foreach y of numlist 1/12 {

    sum l_crmv if sizedecile== `x' & yr==`z' & mth==`y'

    forval j = 1/N {
    scalar max`j' = r(max)


    }
    }
    }
    }

    The loop is not correct, as the loop in loop only summarizes the beginning market values for each decile in a given year and month but can't generate the variable I look for. Also first loop is inappropriate !

    I will appreciate you suggestions ! I am sure there should be a simple way, but it is out of my head !


    Many thanks in advance
    Ahmed

  • #2
    There is no need for a loop here. What you seem to need is one command

    Code:
     
    egen maxI_crmv = max(I_crmv), by(sizedecile yr mth)

    Comment


    • #3
      Thanks Nick,
      Great as usual

      *Given the fact that the data contains firms that are traded in different stock exchanges other than "N", my code : egen sizedecile = xtile(crmv) if primexch=="N", by(yr mth) p(1/9) , generated missing values for all firms that are traded in primexch other than "N". I thought to change your code so I can calculate the maximum lagged market values for only firms that are traded in "N" as follows:

      egen maxl_crmv = max(l_crmv) if sizedecile !=., by(sizedecile yr mth)

      * Now I want to assign firms that were NOT traded in New York Stcok Exchange "N" , i.e. those with missing values for sizedecile (each month and year) , to the relevant deciles such that :
      For a given year and month,
      a firm with l_crmv less than the maxl-crmv of the decile 1 will be assigned to decile 1
      a firm with l_crmv higher than maxl_crmv of the first decile but equal to or lower than maxl_crmv of the second decile , will be assigned to decile 2
      and so on, each month and year...

      Therefore I run also the followingn code:

      egen minl_crmv = min(l_crmv) if sizedecile !=., by(sizedecile yr mth)

      However, I can't proceed to replace those firms with missing sizedecile as prescribed above ? ? I tried this code, it runs, but the output shows 0 real changes !!


      forval i = 1/`=_N' {
      local same yr[`i']==yr & mth[`i'] == mth

      if sizedecile==. {

      replace sizedecile= sizedecile in `i' if l_crmv>minl_crmv & l_crmv<=maxl_crmv & `same'


      }
      }

      Any more suggestions ?


      Comment


      • #4
        The inner command

        Code:
         
        if sizedecile == .
        does not do what you probably imagine it does. It is equivalent to

        Code:
         
        if sizedecile[1] == .
        The condition should, I guess, be transferred to the right-hand side of your replace statement.

        See http://www.stata.com/support/faqs/pr...-if-qualifier/

        Comment


        • #5
          Thanks Nick, I read the link you referred to and can see how this can be problematic. I tried to use something else rather tha (if sizedecile==.) as follows:


          levelsof yr, local(levels)
          foreach y of local levels {
          foreach m of numlist 1/12 {

          replace sizedecile= sizedecile if l_crmv > minl_crmv & l_crmv <= maxl_crmv & yr==`m' & mth==`m' & sizedecile==.
          }
          }


          Unfortunately, it gives it real changes


          I think the problem is in how to link the l_crmv of those missing sizedecile to the range of of l_crmv for the appropriate deciles so as to be replaced !!

          I am sure there should be a way to do it.... Any suggestions ?

          Ahmed

          Comment


          • #6
            I would need to hold all the details in your post in my head simultaneously to give a good answer but

            Code:
             
            replace sizedecile= sizedecile
            will do nothing useful. The if qualifier doesn't change that principle.

            Comment


            • #7
              I can see that the code you referred to will not perform the task, but I really can't think of any way to replace those missing values.
              Yes, please take your time Nick, and I appreciate any help.

              Ahmed

              Comment


              • #8
                HI Ahmed,
                So I see the problem here is that your "sizedecile" variable is missing for firms that start trading at a different stock market. Correct?
                A possible solution could be as follows:
                gen t=group(yr mth)
                levelsof t, local(t)
                foreach i of local t {
                _pctile primexch if if primexch=="N", p(10 20 30 40 50 60 70 80 90)
                replace sizedecile=1 if t==`i' & primexch<=r(p10)
                replace sizedecile=2 if t==`i' & primexch<=r(p20) & primexch>r(p10)
                **And so on
                }


                HTH
                Fernando
                Last edited by FernandoRios; 07 May 2014, 15:05.

                Comment


                • #9
                  Thanks Fernardo
                  The problem is not exactly in firms that start trading at a later date, but it is in firms that are trading in another stock exchange. In my data set, i have firms that trade in three stock exchanges and I want to create portfolios (size deciles) based on lagged market values for each month and year . If I create those deciles using all firms from all stock exchanges, the ranking will not be fair as firms that trade in stock exchanges other than New York are smaller and will go to lower deciles. The recommended approach is to create the deciles each year and month based only on New York stock exchange firms, then determine the maximum market values for each decile of the 10 deciles, then go to firms that are trading in the other stock exchange and assign then to those previously created deciles such that a firm a will be assigned to decile 1 if it have lagged market value lower or equal to the maximum lagged market value of decile 1, this has to be done each month year....

                  Do you think that your code will achieve the desired result based on my explanation above ?

                  I tried this code now:

                  gen l_crmv=l.crmv
                  egen sizedecile = xtile(l_crmv) if primexch=="N", by(yr mth) p(1/9)
                  egen maxl_crmv = max(l_crmv) if sizedecile !=., by(sizedecile yr mth)

                  ** Now I try a new loop again:

                  gen newdecile=.
                  levelsof yr, local(levels)

                  foreach y of local levels {
                  foreach m of numlist 1/12 {

                  replace newdecile= 1 if yr==`y' & mth==`m' & sizedecile==1 | sizedecile==. & l_crmv > minl_crmv & l_crmv <= maxl_crmv
                  replace newdecile= 2 if newdecile !=1 & yr==`y' & mth==`m' & sizedecile==2 | sizedecile==. & l_crmv > minl_crmv & l_crmv <= maxl_crmv
                  replace newdecile= 3 if newdecile !=1&2 & yr==`y' & mth==`m' & sizedecile==3 | sizedecile==. & l_crmv > minl_crmv & l_crmv <= maxl_crmv

                  * and so on
                  }
                  }

                  The code makes some real changes but do you think this will be the desired output ?? I don't feel confident of what i am doing at this stage, but it might be correct. I thought to exclude each time firms that have been already assigned to a newdecile (replaced) each time before I replace for a newer decile...May be I am wrong ?

                  I appreciate your suggestions!!

                  Comment


                  • #10
                    Yes, sorry for the confusion. I meant to say at a different stock market. not different time (corrected the posting right now).
                    You still have problems with your code, but the solution, while long to type is "simple"

                    gen newdecile=sizedecile
                    levelsof yr, local(levels)

                    foreach y of local levels {
                    foreach m of numlist 1/12 {
                    forvalues sd =1/10 {
                    sum l_crmv if yr==`y' & mth==`m' & sizedecile==`sd'
                    replace newdecile=`sd' if yr==`y' & mth==`m' & l_crmv > r(min) & l_crmv <= r(max) & newdecile==.
                    }
                    }
                    }
                    At the end, its possible that some observations from other stocks are not classified because they are smaller than the smallest NY stock firm. This should simply go to the Decile 1.
                    Hope this works.
                    Fernando

                    Comment


                    • #11
                      Fernando
                      I gave a big like to your last post. Thanks a lot. I think it works properly. Here are my comments:

                      1- I found that newdecile contains missing data as you mentioned and that will probably be for small stocks that are less than the NYSE breakpoints , in order to replace those I simply use the following code, and there is no need for any loop:
                      replace newdecile=1 if newdecile==.

                      Is that correct ?



                      2- I also wonder why (before replacing the missing sizedecile) almost all size deciles contains the same number of observations except decile 10 that contains much much higher , does that mean that I am using a wrong code for creating my size deciles each month and year from the beginning for those New York firms ? In other words , I get the following output for those codes:

                      **for a certain year and month and sizedecile 10:

                      sum sizedecile if sizedecile==10 & yr==1998 & mth==10

                      Variable | Obs Mean Std. Dev. Min Max
                      -------------+--------------------------------------------------------
                      sizedecile | 2619 10 0 10 10

                      ** Now for sizedecile 1:

                      sum sizedecile if sizedecile==1 & yr==1998 & mth==10

                      Variable | Obs Mean Std. Dev. Min Max
                      -------------+--------------------------------------------------------
                      sizedecile | 29 1 0 1 1

                      ** try with a different year and month and a decile rather than 1

                      sum sizedecile if sizedecile==10 & yr==1999 & mth==5

                      Variable | Obs Mean Std. Dev. Min Max
                      -------------+--------------------------------------------------------
                      sizedecile | 2595 10 0 10 10

                      sum sizedecile if sizedecile==4 & yr==1999 & mth==5
                      Variable | Obs Mean Std. Dev. Min Max
                      -------------+--------------------------------------------------------
                      sizedecile | 29 4 0 4 4



                      . sum sizedecile if sizedecile==1 & yr==1999 & mth==5

                      Variable | Obs Mean Std. Dev. Min Max
                      -------------+--------------------------------------------------------
                      sizedecile | 29 1 0 1 1


                      Shouldn't the deciles (before allocating missing values) have fairly similar number of obs. ? The code I used to create sizedeciles each month and year for New York Stock Exchange (N) was:

                      egen sizedecile = xtile(l_crmv) if primexch=="N", by(yr mth) p(1/9)

                      Is the p(1/9) wrong, should it be something else if I am creating 10 deciles ? May be I miss something ?

                      Any help...

                      Ahmed

                      Comment


                      • #12
                        Hi Ahmed,
                        For your first question, yes, That should be it. As they are smaller than the smallest NYE data, they "should" go to decile 1.
                        Regarding the second question, I would suggest to check the values l_crmv takes. Based on your code, there shouldnt be anything bad on that command, however, it might be the case that you have, say, 2500 observations with a l_crmv equal to a small number. While Stata will try to break the variable in even groups, there is no rule for variables with similar values, and they are usually grouped together regardless of the size of the resulting classification.
                        You could start by just doing something like:
                        sum l_crmv if yr==1999 & mth==5,d
                        and see your variable distributes.
                        Hope this helps
                        Fernando

                        Comment


                        • #13
                          Fernardo
                          If I understand correctly, you mean there might be firms that have similar market values (in my case wil be high similar values) and they will be grouped in decile 10 , right ? I am not sure how to detect that, but I run the summarize code you advised and here is my output:
                          . sum l_crmv if yr==1999 & mth==5,d

                          l_crmv
                          -------------------------------------------------------------
                          Percentiles Smallest
                          1% 2.820125 .0095
                          5% 6.869625 .0475625
                          10% 12.07031 .117 Obs 8503
                          25% 35.275 .182875 Sum of Wgt. 8503

                          50% 116.472 Mean 1747.829
                          Largest Std. Dev. 11204.28
                          75% 485.0966 203999.1
                          90% 2033.052 204691.5 Variance 1.26e+08
                          95% 5157.138 345127.8 Skewness 17.28623
                          99% 31273.16 410377.5 Kurtosis 425.0446


                          Does that mean anything ? Do I need to deal with any issues before proceeding with my analysis or this wired allocation of obs in the top decile should be fine in your opinion ?

                          Best
                          Ahmed

                          Comment

                          Working...
                          X