Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating dummy variable based on percentiles

    Hi Everyone,

    I have a variable G-Index with the following distribution:


    Governance |
    Index |
    (Gompers, |
    Ishii, |
    Metrick) | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 1 0.02 0.02
    2 | 9 0.14 0.16
    3 | 55 0.87 1.02
    4 | 161 2.53 3.56
    5 | 331 5.21 8.77
    6 | 549 8.64 17.41
    7 | 738 11.61 29.02
    8 | 857 13.49 42.51
    9 | 917 14.43 56.94
    10 | 803 12.64 69.58
    11 | 698 10.99 80.56
    12 | 521 8.20 88.76
    13 | 396 6.23 95.00
    14 | 189 2.97 97.97
    15 | 98 1.54 99.51
    16 | 20 0.31 99.83
    17 | 6 0.09 99.92
    18 | 4 0.06 99.98
    19 | 1 0.02 100.00
    ------------+-----------------------------------
    Total | 6,354 100.00

    I am trying to create a variable treat, which is equal to 0 if the value of G-index is in the top 25% percentile and 1 if its in the bottom 75%. Can anyone help me with the proper codes for that?

    Thanks!

  • #2
    I don't know what the variable name as G-index is not a legal name. Regardless, note that something like

    Code:
    gen wanted = gindex <= 10 if gindex < ,
    will generate 1 in 70% of your observations (for which you have non-missing values) while

    Code:
    gen wanted = gindex <= 11 if gindex < .
    will generate 1 in about 81%. Necessarily those are the only choices if the aim is 75%. But why do this at all?

    Comment


    • #3
      Nick Cox the index is named gindex (lower values indicate better governed firms and higher indicate the opposite). I need to create this dummy as a treatment variable in my study. I need to take the top 75-80% of the companies based on gindex as treated (dummy=1) and the bottom 20-25% as control (dummy=0). Really appreciate your assistance with the coding.

      Comment


      • #4
        So, your question is answered in #2. If you're asking a new question, I don't understand what it is.

        Comment


        • #5
          Nick Cox Thank you. Got your answer.

          I got the gindex figures for 1994, 1996, 1998 etc. Data for the years in between i.e. 1995, 1997 is missing. If I want to fill up the corresponding year with the same data, what should be the coding?

          For example, 1994's data should be autofilled in 1995, 1996's data autofilled in 1997 etc,

          Comment


          • #6
            To me that sounds a very bad idea. The idea that things changed, and then did not change, in this way, is utterly implausible. This is a recipe for a failed grade or a rejected paper, assuming competent teachers or reviewers. The data you have are the data you should analyse.

            Comment


            • #7
              Nick Cox this is the way this data works. Its calculated once every 2 years, and researchers use it in the way I mentioned earlier. G-index is a measure of firm’s governance mechanisn, and it doesn’t change every year.

              Really require your help with the coding.

              Comment


              • #8
                Ok; thanks for the explanation. But there is utterly no point in getting exactly the same result twice. You don't need help in coding here. The result for 1995 is by fiat the result for 1994. Why make Stata do the same work twice over?

                Comment


                • #9
                  Nick Cox the dataset I got only gives the data in this way
                  1994
                  1996
                  1998

                  which is why I need stata to fill up 1995 with 1994's data and so on.

                  Comment


                  • #10
                    Sorry, but I have nothing to add to my previous comments. Giving you code to do something that appears silly at best is not in your best interests. If anyone else thinks differently, they will respond.

                    Comment

                    Working...
                    X