Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • backfilling data

    Dear Statalisters,

    I have a dataset that records firm's IDs and its ratings. Only rating changes are recorded, so if there is no record, that means the ratings haven't change.
    Following is the data that I currently have
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id month) int year str2 rating
    1  1 2000 "A" 
    1  3 2000 "A-"
    1 12 2000 "A" 
    1  4 2002 "B+"
    1  5 2002 "A" 
    1  9 2002 "B-"
    end
    Now I want to create a data with the ratings for all periods that are not recorded. For example, for firm 1, ratings for 2/2000 will be similar to its latest rating available before 2/2000, which is the rating in 1/2000 (A).
    Following is data that I want to achieve:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id month) int year str2 rating
    1  1 2000 "A" 
    1  2 2000 "A" 
    1  3 2000 "A-"
    1  4 2000 "A-"
    1  5 2000 "A-"
    1  6 2000 "A-"
    1  7 2000 "A-"
    1  8 2000 "A-"
    1  9 2000 "A-"
    1 10 2000 "A-"
    1 11 2000 "A-"
    1 12 2000 "A" 
    1  1 2001 "A" 
    1  2 2001 "A" 
    1  3 2001 "A" 
    1  4 2001 "A" 
    1  5 2001 "A" 
    1  6 2001 "A" 
    1  7 2001 "A" 
    1  8 2001 "A" 
    1  9 2001 "A" 
    1 10 2001 "A" 
    1 11 2001 "A" 
    1 12 2001 "A" 
    1  1 2002 "A" 
    1  2 2002 "A" 
    1  3 2002 "B+"
    1  4 2002 "B+"
    1  5 2002 "A" 
    1  6 2002 "A" 
    1  7 2002 "A" 
    1  8 2002 "A" 
    1  9 2002 "B-"
    1 10 2002 "B-"
    1 11 2002 "B-"
    1 12 2002 "B-"
    end
    In this example, there is only 1 firm ID , however, in my actual dataset, there will be thousands firm IDs. Some firms can be added into the list. For example, if firm ID 4 first appearance in the dataset is in 5/2003, then when I backfilling the data, there should be NO ratings for this firm before 5/2003.

    Can you please suggest what I can do to create such dataset?

    Thank you very much in advance.

    Kind regards,
    Mia

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id month) int year str2 rating
    1  1 2000 "A"
    1  3 2000 "A-"
    1 12 2000 "A"
    1  4 2002 "B+"
    1  5 2002 "A"
    1  9 2002 "B-"
    end
    
    // FIRST CREATE A REAL STATA MONTHLY DATE VARIABLE
    gen mdate = ym(year, month)
    format mdate %tm
    xtset id mdate
    drop month year
    tsfill
    by id (mdate), sort: replace rating = rating[_n-1] if missing(rating)

    Comment


    • #3
      Thanks very much Clyde

      I've tried your code and it works well with my current example. However, when I include more firms into the sample data, then it has some problem. For example, with the following data:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(id month) int year str2 rating
      1  1 2000 "A" 
      1  3 2000 "A-"
      1 12 2000 "A" 
      1  4 2002 "B+"
      1  5 2002 "A" 
      1  9 2002 "B-"
      2  2 2000 "B-"
      2  5 2002 "B+"
      3  7 2001 "A" 
      end
      When using your code, Stata only creates the month/year observation for firm ID 2 for up to 5/2002 and for firm ID 3, there is only 1 observation for 7/2001. I would like to fill the ratings up to the latest time in the dataset. For example, the latest time in this current example is Sep 2002. So I would like all firms in the sample to have rating records from the first time it appears until Sep 2002.

      I'm sorry for not explaining well enough in my post#1. Thank you very much in advance.

      Comment


      • #4
        Dear Clyde,

        I'm just wondering if using "tsfill, full " works in this case. I have tried and it seems working, just don't know if I overlook anything.

        Comment


        • #5
          Mia, you got it right. The -full- option is all you need. But I think you need to be cautious. For example, for firm 3, all entries before 07/2001 are missing as we don't know what the rating was before it was changed to "A". The -full- option fills entries after 07/2001 with "A", but it may be wrong because it's possible that firm 3's rating changed in 09/2001 to "B" but the data collector simply failed to collect this information. Therefore I think, Clyde's code in #2 seems incomplete, but is a safe solution.
          Last edited by Fei Wang; 07 Nov 2021, 21:37.

          Comment


          • #6
            Thank you Fei. What you said totally makes sense. After running the code as in#, but with option full for tsfill, I drop obs with missing ratings, then it seems that it can deal with the issue you raised.

            By the way, I would like to get your help with another issue I have. Now I want to merge the data back to my master data using ID, month and year. In post #2, the var mdate is a combination of month and date. Now I want to separate them using month() and year() function. I did try to use different format for mdate but it still looks very strange. Can you please help me with that? Thank you very much

            Comment


            • #7
              Mia, this may help: https://stats.idre.ucla.edu/stata/fa...ith-tm-format/.
              Last edited by Fei Wang; 08 Nov 2021, 00:58.

              Comment


              • #8
                Thank you very much Fei. The link is helpful and it solves my issue perfectly. Thank you.

                Comment

                Working...
                X