Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorting Variables based on their values

    Hello everyone,

    I am currently working on a panel data set where I have to create dummy variables based on firm market value (rmarkval variable). I have to define firms as small or large depending if their unique rmarkval based on idgvkey exceeds their median rmarkval value. If it exceeds then they should be considered as a large firm. I am currently running these commands but it counts every single observation for each idgvkey. How do I do that it sorts out firms based on their unique idgvkey and gives me a unique result (that if the firm is large or not based on their idgvkey)?

    Code:
    gen large=0
    by idgvkey: replace large=1 if rmarkval>mkvalmedian
    
    egen nlarge=sum(large), by(idgvkey)
    
    egen _Unique= sum(large), by(idgvkey)
    
    replace _Unique = . if idgvkey[_n]==idgvkey[_n-1]
    
    by idgvkey: gen largeee=1 if nlarge>(_Unique/2)
    egen nlargeee=sum(largeee), by(idgvkey)
    
    drop large nlarge largeee
    rename nlargeee large
    Click image for larger version

Name:	panel.png
Views:	1
Size:	239.2 KB
ID:	1482634

    Thanks for your time!

  • #2
    I am afraid I am not clear on your objective, I cannot follow what your code is intended to do, and your picture of your data omits important variables.

    You apparently have multiple observations of markval for each idgvkey. Since you tell us you have panel data, what is the time variable that is missing from your picture? I am going to assume it is yearly so that I can write about it.

    How is mkvalmedian calculated? As the median of every observation of markval? Or separately for each year?

    You want to determine whether the firm is large or not by comparing makrval to mkvalmedian. But no matter how mkvalmedian is calculated, it is possible that, for any firm, some values of markval will be larger than mkvalmedian, and other values will be smaller, and in that case is the firm "large" or not? Or do you want to calculate "large" separately for each year?

    Comment


    • #3
      Dear William, sorry for the confusion. I have only displayed two variables in the picture and yes they are set yearly. Also, idgvkey variable is the firm ID variable. As you said, some values of markval will be larger than the mkvalmedian and others will be smaller, and I want to find out that if the firm is "large" or not by comparing markval. to mkvalmedian. However, I want it to be defined based on the idgvkey variable, so that "large" dummy takes a value of 1 for only one of the idgvkey observation.

      Code for mkvalmedian, it is the median value of all observations.
      Code:
      sum rmarkval, detail
      egen mkvalmedian=median(rmarkval)
      Thanks so much!

      Comment


      • #4
        Hi Ismayil:

        I'm having trouble understanding what you're trying to do also, even after your response to William. The ambiguity revolves around what the unit of analysis is that you desire to calculate the different statistics for. Perhaps I can help you clarify what you want to achieve.

        The use of the term "firm-year" instead of firm is something that would enable you to perhaps clarify what you want.

        You can also use the phrase "years within a group of cases defined by a firm's unique id" or something like that.

        The following are possibilities of what you want to measure.

        1) median market values

        1a) median yearly market value for one firm
        Plural: median yearly market values for each firm

        1b) median market value across all firms and years
        No plural

        1c) median market value across all firms within one year
        plural: median market values across all firms for each year

        2) Firm's market value compared to #1 above to determine if large or not large.

        2a) market value of firm in one year

        2b) median yearly market value for firm across all year's its observed for

        The next constrain is that the statistic that is produced should be constant within idgvkey, implied by what you said.

        That means that certain combinations of #1 and #2 are impossible.
        2a can never be used under this constraint.
        2b and 1c can't be mixed and still satisfy the constraint.
        2b and 1a can't be mixed, as you're comparing a firm with itself.
        So that leaves 2b and 1b.
        Next, you say you're using 1b, so that is consistent with this.
        So you want to define firm market value for the comparison as 2b, no?

        Or are you saying you want to compare 1b with 2a, but only keep a specific year? So something like "firm A had a higher market value in 2006 than the median market value of firms between 1995 and 2016"? Or "this variable tells you which firms in 2006 had a higher market value than the median market value of all firms between 1995 and 2016"?

        Next, I'm just guessing, but perhaps instead of
        replace _Unique = . if idgvkey[_n]==idgvkey[_n-1] perhaps you want to say
        bysort idgvkey: replace _Unique = . if idgvkey[_n]==idgvkey[_n-1] ?

        If you want to change all of the values but one within a firm ID to system missing, you could just do the following.

        bysort idgvkey: replace _Unique = . if _n!=1

        or

        replace _Unique=. if year!=2006

        if it is a particular year you want to keep the values for.

        Or, looking at your code again, perhaps you're trying to measure

        "the number of times a firm had a higher market value in the 1996-2016 period than the median for all firms and years in the 1996-2016 period"?

        Hope that helps,

        Carl

        Comment


        • #5
          Dear Carl, thanks for your reply! As I am not a native speaker I sometimes struggle with explanations, my apologies. As you said, I am definitely looking for the 2b and 1b combination. But, I do not really know how to code it since I am a beginner. Also my code is definitely wrong, and if you can help with calculating 1b-2b combination it would be much appreciated!

          Regards,
          Ismayil

          Comment


          • #6
            Hi Ismayil,

            Since you're new....welcome!

            As I understand it, you want to consider a firm "large" if its median market value across all years it was observed for is larger than the median for all firms in all years.

            I think the following will do what you want.

            egen grandmedian=median(rmarkval)
            egen firmmedian=median(rmarkval), by(idgvkey)
            gen largefirm=firmmedian>grandmedian

            You also indicated you wanted the variable to be system missing in all years of a firm except one. I will tell you how to do that, but I don't really see a reason to, but that's up to you.

            If you want "largefirm" to by system missing for every value of a firm but one, you could go

            bysort idgvkey: replace largefirm=. if _n!=1

            or also

            bysort idgvkey: replace largefirm=. if _n!=_N

            depending on whether you want the value to appear in the first or last row.

            Hope that helps, let me know if you have other questions or if the above doesn't work.

            Carl

            Comment


            • #7
              Hi Carl,
              Thanks so much for your help! My results make sense now. Appreciate it !

              Comment

              Working...
              X