Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create unique id for observations over 6 month window in unpalanced panel conditional on dummy variable

    Hi, I have an unbalanced panel data set with dbtr_crdtr as the panel variable and DT_INCPTNM the monthly time variable (with gaps). I want to create a unique id, by dbtr_crdtr, for observations in the six months following a "switch" which is indicated by the variable SWITCH_DUMMY == 1. All observations not in this six months window following the switch or dbtr_crdtr ids without a "switch" can be attributed the same id (0 in the example below). I gave an example of the variable to be created. As the dataset is very large ( ~ 4 million observations) and contains close to a million dbtr_crdtr ids I cannot use a reshape.

    Does anybody have an idea how I could produce this "UniqueID" or can at least give me a hint on how this could be realised?

    KR,
    Bob

    dbtr_crdtr DT_INCPTNM SWITCH_DUMMY UniqueID(var to be created)
    1472 2021m2 0 0
    1472 2021m10 0 0
    1472 2021m11 0 0
    1472 2021m12 0 0
    1472 2022m3 0 0
    1472 2022m4 0 0
    1268 2022m3 1 1
    1268 2022m4 0 1
    1268 2022m5 0 1
    1268 2022m6 0 1
    1268 2022m8 0 1
    1273 2022m4 1 2
    1273 2022m8 0 2
    1274 2023m6 0 0
    1275 2021m8 0 0
    1477 2023m8 1 3
    1477 2023m9 0 3
    1477 2024m4 0 0


  • #2
    Well, your example contradicts your description in words of what you want. You say that you want to do this "by dbtr_crdtr," which would mean that uniqueidvar starts over at 1 with each new value of dbtr_crdtr. But in your example, you do the opposite and keep increasing the number. I'm going to ignore your words and assume you meant what you showed in the example.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int dbtr_crdtr float dtincptnm byte(switch_dummy uniqueidvar)
    1472 733 0 0
    1472 741 0 0
    1472 742 0 0
    1472 743 0 0
    1472 746 0 0
    1472 747 0 0
    1268 746 1 1
    1268 747 0 1
    1268 748 0 1
    1268 749 0 1
    1268 751 0 1
    1273 747 1 2
    1273 751 0 2
    1274 761 0 0
    1275 739 0 0
    1477 763 1 3
    1477 764 0 3
    1477 771 0 0
    end
    format %tm dtincptnm
    
    by dbtr_crdtr (dtincptnm), sort: gen stop_date = dtincptnm + 6 if switch_dummy == 1
    format stop_date %tm
    by dbtr_crdtr (dtincptnm): replace stop_date = stop_date[_n-1] if missing(stop_date) ///
        & dtincptnm <= stop_date[_n-1]
    gen wanted = sum(stop_date != stop_date[_n-1] & !missing(stop_date))
    replace wanted = 0 if missing(stop_date)
    As an aside, I think the uniqueidvar is a terrible choice of name for this variable. It is clearly some kind of grouping variable and it uniquely identifies nothing at all.

    Finally, to make it as easy as possible for other people to help you with code, in the future, when showing data examples, please use the -dataex- command to do so, as I have done above. If you are running version 16 or later, or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you very much for your quick reply. The code works perfectly. You are right that the newly created variable is more a grouping id, I will rename it. I will use dataex for future posts.

      Comment

      Working...
      X