Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subgroup in panel data

    Hi all.
    Please, I am using panel data for 500 firms between 2000-2010. I have analysed all 500 firms together using -logit- , but would like to split the 500 firms into 2 groups based on one of my independent variable "X1" ( using the median of X1) . Group 1 = those banks that lie below the median of X1 and group 2 = those banks that lie above the median of X1. Please, how can I do it ? Thanks a lot.



  • #2
    One issue is that some firms may fall into both groups as you have panel data.

    Code:
    sum x1, d
    gen hix1= x1> r(p50)

    Comment


    • #3
      One quick way is to look up the median manually and use this value like:

      Code:
      sum X1, detail
      logit ... if X1 <= `r(p50)'
      logit ... if X1 > `r(p50)'
      EDIT Andrew made a good point. Maybe you first want to compute an average for each firm over all dates and use this value for the median split.
      Best wishes

      (Stata 16.1 MP)

      Comment


      • #4
        Dear Andrew Musau & Felix Bittmann
        Thanks a lot for your answers.

        Please, I have two following questions :
        1-) Please, Why some firms may fall into both groups ? since group 1 will have the banks that equal to and above the median , and group 2 will have the banks that lie below the median.
        2-) Please, do you mean if the median equal to 0.125 the command will be:
        logit ... if X1 <= 0.125 logit ... if X1 > 0.125 Thanks a alot

        Comment


        • #5
          1-) Please, Why some firms may fall into both groups ? since group 1 will have the banks that equal to and above the median , and group 2 will have the banks that lie below the median.
          With panel data, firms are observed over several time points, say years, so that an observation is a firm-year. Therefore, if we compute the median based on individual observations, there is a chance that in some years, a firm has values below the median value and in other years, above. As Felix suggests, one way to overcome this is to compute a single average for a firm and then compute the median based on these averages. In this way, you guarantee that a firm is either above or below the median value.


          2-) Please, do you mean if the median equal to 0.125 the command will be:
          logit ... if X1 <= 0.125 logit ... if X1 > 0.125 Thanks a alot
          Precisely, the median is a distinct value. All in all


          Code:
          bys firmid: egen meanX1= mean(X1)
          egen tag= tag(firmid)
          sum meanX1 if tag, d
          gen hiX1= meanX1> r(p50)
          logit ... if hiX1
          logit ... if !hiX1
          where you replace "firmid" with the name of your panel identifier.
          Last edited by Andrew Musau; 17 May 2021, 16:22.

          Comment


          • #6
            Dear Andrew Musau
            Thanks a lot . Much appreciated.
            Now, I understand what you mean for some firms may fall into both groups. Please, as you have mentioned that to overcome this is to compute a single average for a firm and then compute the median based on these averages. please, to be sure , do you mean the commands that will overcome this issue are:
            bys firmid: egen meanX1= mean(X1)
            egen tag= tag(firmid) ? Well, if yes , the hix1 will be the first group yes ? and most importantly , Stata dose know that !hiX1 is <= X1 ? Thanks a lot.

            Comment


            • #7
              Correct.

              Comment


              • #8
                Thanks a lot Dear Andrew Musau

                Comment


                • #9
                  Dear Andrew Musau
                  Sorry for bothering you with my questions. Please, if I have missing values in the variable X1 ? Your commands in #5 will still the same?

                  Comment


                  • #10
                    Missing values do not affect the calculations. However, they can bias the average upwards or downwards. For example, if your variable is market capitalization and there is a time effect, e.g., firms in the sample generally grew bigger over time, then by averaging nominal values, a firm whose mean is based on the last 8 years in the sample will have an upward bias in its average compared to a similar firm whose mean is calculated over a 20 year period.

                    Comment

                    Working...
                    X