Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a new variable that groups the observations randomly based on certain characteristica

    Hello,

    i want to generate a new variable that is 1 for 5 random observations and 0 otherwise within the group that has the same value for "TICKER" and "FPEDATS". How can i do that?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 TICKER long FPEDATS
    "0000" 20088
    "0000" 20088
    "0000" 20088
    "0000" 20088
    "0000" 20088
    "0000" 20453
    "0000" 20453
    "0000" 20453
    "0000" 20453
    "0000" 20453
    "0000" 20453
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20088
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20453
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 20819
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21184
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "0001" 21549
    "000R" 20453
    "000R" 20453
    "000R" 20453
    "000R" 20453
    "000R" 20453
    "000R" 21184
    "000R" 21184
    "000R" 21184
    "000R" 21184
    end
    format %td FPEDATS

  • #2
    In your example data there is one combination of TICKER and FPEDATS that has only 4 observations, and another with only 5. So those will just get all 1's.
    Code:
    by TICKER FPEDATS, sort: gen wanted = (_n <= 5)
    
    //    RANDOMIZE THE ORDER
    set seed 1234
    by TICKER FPEDATS: gen double shuffle = runiform()
    sort TICKER FPEDATS shuffle
    drop shuffle

    Comment


    • #3
      Thank you! The order doesn't matter but the assignments of the "1"s should be random. is that really the case here? Since the seed is only set for shuffeling the observations which is not neccessary.

      Comment


      • #4
        The shown variables here are the same, but the rest is not which is why i need to assign 5 random obersvation with "1". Maybe shuffeling before assigning generating the wanted?

        Comment


        • #5
          Yes, if the observations differed on unshown variables, then the approach requires shuffling first:
          Code:
          gen other_variable = _n
          set seed 1234
          gen double shuffle = runiform()
          by TICKER FPEDATS (shuffle), sort: gen wanted = (_n <= 5)
          drop shuffle
          sort other_variable
          Here I have created an other_variable so that you can see what is going on. Of course that is not part of the solution--you already have your other variables.

          Comment


          • #6
            Thank you thats exactly what i wanted!

            Comment

            Working...
            X