Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeated time values within panel - xtset - Lending Club dataset (multiple loan application per day)

    Hey guys,

    I have an issue with respect to structuring my Lending Club dataset in a panel data format. I have a dataset with loan applications from the Lending Club Peer-to-Peer lending platform. My intention is to run an xtlogit regression to find out whether loan applications from US states with certain characteristics (e.g. startup activity) have a higher probability of being funded. As there are a lot loan applications per day, I created a date_time2 variable with clocktime notation. I would like to base my panel on the different states within the US, as such I'm trying to run the following command:

    xtset State_Code date_time2, delta (1 second)

    Unfortunately, STATA returns "repeated time values within panel" r (451). This is because STATA still recognizes a lot of these values as duplicates. Below you find a screenshot, which shows how I have structured my data. For State_Code==1 the last loans in my dataset were made on December 31st 2015. I distinguish them by seconds. Accordingly the first loan in my dataset for State==2 was made on September 23rd of 2007.

    If more information is required please let me know. Hope you guys can help.

    Thanks in advance!

    Click image for larger version

Name:	STATA Screenshot.png
Views:	2
Size:	13.3 KB
ID:	1348107





  • #2
    Well, first of all, by posting a screenshot you have made it maximally difficult for people to try to work with your data example. For the future read the FAQ, especially section 12, for advice on how to post things in a more user-friendly manner.

    At least by eye, I don't see any duplicates in your example. But I have never known Stata to be wrong about these things. -duplicates report State_Code date_time2- will give you a count of duplications. If you want to see them, -duplciates list State_Code date_time2- will show them to you.

    But let's look at this somewhat differently. Do you need to specify a time variable in your -xtset- command? The only later analyses that require that are analyses that use lag and lead operators, or analyses that incorporate autocorrelation structure. Although you don't say it in so many words, you have left me with the impression that the times on these loans are just made up anyway: the dates are real but the times were put in only because you hoped to create a separate timestamp for each loan. If I have that right, then applying lags or leads or estimating autocorrelation parameters would be an irreproducible exercise in meaninglessness anyway. So if I am on the right track here, I would get rid of date_time2 and just stick with a date variable that actually means something, and -xtset State_Code- without a time variable.

    Comment


    • #3
      Clyde, Thanks for your quick response. Sorry for posting in a user-unfriendly manner, I'm new to this forum, I'll read up on the do's and don'ts

      You're correct, the clocktime is artificial to distinguish between loan applications on a certain day for a certain state. Nevertheless, I combine my Lending Club loan application variables such as "Amount_Requested" with state specific data such as "Rate of New Entrepreneurs" and "Total Banking Assets" for a certain quarter or year. These latter variables are all lagged by one period. Moreover, I would like to run a fixed effects xtlogit regression (to account for time invariant state characteristics) and therefore I need my panel to also include a unique time indicator, I guess. Below I provide some data for the state of Alabama (sample variables).

      [Lending Club Data] Dummy_Accepted = binary, 1 if loan was accepted, 0 if rejected
      [Lending Club Data] App_Date_Corrected = date of the loan application
      [Lending Club Data] Purpose = String variable - purpose of the loan
      [Lending Club Data] Amount_Requested = the amount of funding requested by the loan applicant in US dollars
      [Kauffman Index Data] New_Entrepreneurs_Lagged = Measures the percent of the adult population of an area that became entrepreneurs in a given month (lagged by one year)
      [Lending Club Data] State = US State
      [Lending Club Data] State_Code = Code of the respective state
      date_time2 = application date, with artificial clocktime to try and distinguish between daily observations

      All other variables such as the "Total banking assets" (within a specific state for a specific quarter) are lagged by one period (year or quarter).

      Can you give me some advice on how to proceed? I don't want to lose the specifics of the data by only using one observation per day. Is there any way to make STATA recognize the uniqueness of my time data?

      Thanks


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(Dummy_Accepted App_Date_Corrected) str18 Purpose double Amount_Requested float New_Entrepreneurs_Lagged str2 State byte State_Code double date_time2
      0 19278 "debt_consolidation" 25000 .24 "AL" 1 1665619204000
      0 19278 "major_purchase"     10000 .24 "AL" 1 1665619205000
      0 19278 "debt_consolidation"  5000 .24 "AL" 1 1665619206000
      0 19278 "small_business"     20000 .24 "AL" 1 1665619207000
      0 19278 "other"               5000 .24 "AL" 1 1665619208000
      0 19278 "other"              25000 .24 "AL" 1 1665619209000
      0 19278 "debt_consolidation" 35000 .24 "AL" 1 1665619210000
      0 19278 "debt_consolidation" 10000 .24 "AL" 1 1665619211000
      0 19278 "debt_consolidation"  3500 .24 "AL" 1 1665619212000
      0 19278 "moving"              1000 .24 "AL" 1 1665619213000
      0 19278 "debt_consolidation" 20000 .24 "AL" 1 1665619214000
      0 19278 "debt_consolidation" 20000 .24 "AL" 1 1665619215000
      0 19278 "debt_consolidation" 12000 .24 "AL" 1 1665619216000
      0 19278 "debt_consolidation" 10000 .24 "AL" 1 1665619217000
      0 19278 "home_improvement"   30000 .24 "AL" 1 1665619218000
      0 19278 "other"              20000 .24 "AL" 1 1665619219000
      0 19278 "debt_consolidation" 30000 .24 "AL" 1 1665619220000
      0 19279 "home_improvement"   15000 .24 "AL" 1 1.6657056e+12
      0 19279 "debt_consolidation" 10500 .24 "AL" 1 1665705601000
      0 19279 "debt_consolidation"  5000 .24 "AL" 1 1665705602000
      0 19279 "debt_consolidation" 20000 .24 "AL" 1 1665705603000
      0 19279 "debt_consolidation"  3500 .24 "AL" 1 1665705604000
      0 19279 "other"               6000 .24 "AL" 1 1665705605000
      0 19279 "other"               3000 .24 "AL" 1 1665705606000
      0 19279 "debt_consolidation" 30000 .24 "AL" 1 1665705607000
      0 19279 "debt_consolidation" 10000 .24 "AL" 1 1665705608000
      0 19279 "debt_consolidation" 35000 .24 "AL" 1 1665705609000
      0 19279 "major_purchase"     10000 .24 "AL" 1 1665705610000
      0 19279 "debt_consolidation" 25000 .24 "AL" 1 1665705611000
      0 19280 "debt_consolidation"  5000 .24 "AL" 1  1.665792e+12
      0 19280 "debt_consolidation" 10000 .24 "AL" 1 1665792001000
      0 19280 "credit_card"         2550 .24 "AL" 1 1665792002000
      0 19280 "other"               3000 .24 "AL" 1 1665792003000
      0 19280 "debt_consolidation" 10000 .24 "AL" 1 1665792004000
      0 19280 "debt_consolidation" 10000 .24 "AL" 1 1665792005000
      0 19280 "debt_consolidation" 16000 .24 "AL" 1 1665792006000
      0 19280 "other"               9000 .24 "AL" 1 1665792007000
      0 19280 "debt_consolidation" 35000 .24 "AL" 1 1665792008000
      0 19280 "debt_consolidation" 12000 .24 "AL" 1 1665792009000
      0 19280 "debt_consolidation" 10000 .24 "AL" 1 1665792010000
      0 19280 "debt_consolidation"  5000 .24 "AL" 1 1665792011000
      0 19280 "renewable_energy"    5000 .24 "AL" 1 1665792012000
      0 19280 "other"               3000 .24 "AL" 1 1665792013000
      0 19280 "debt_consolidation"  5000 .24 "AL" 1 1665792014000
      0 19280 "credit_card"        15000 .24 "AL" 1 1665792015000
      0 19280 "car"                18000 .24 "AL" 1 1665792016000
      0 19280 "debt_consolidation" 15000 .24 "AL" 1 1665792017000
      0 19280 "credit_card"        18000 .24 "AL" 1 1665792018000
      0 19280 "debt_consolidation" 24000 .24 "AL" 1 1665792019000
      0 19281 "other"               5000 .24 "AL" 1 1.6658784e+12
      0 19281 "medical"             1000 .24 "AL" 1 1665878401000
      0 19281 "debt_consolidation" 17000 .24 "AL" 1 1665878402000
      0 19281 "other"              10000 .24 "AL" 1 1665878403000
      0 19281 "debt_consolidation" 30000 .24 "AL" 1 1665878404000
      0 19281 "moving"              6000 .24 "AL" 1 1665878405000
      0 19281 "credit_card"        25000 .24 "AL" 1 1665878406000
      0 19281 "debt_consolidation" 10000 .24 "AL" 1 1665878407000
      0 19281 "moving"              4500 .24 "AL" 1 1665878408000
      0 19281 "debt_consolidation" 15000 .24 "AL" 1 1665878409000
      0 19281 "debt_consolidation" 20000 .24 "AL" 1 1665878410000
      0 19281 "debt_consolidation" 28000 .24 "AL" 1 1665878411000
      0 19281 "debt_consolidation" 25000 .24 "AL" 1 1665878412000
      0 19281 "debt_consolidation" 35000 .24 "AL" 1 1665878413000
      0 19281 "debt_consolidation"  8000 .24 "AL" 1 1665878414000
      0 19281 "debt_consolidation" 25000 .24 "AL" 1 1665878415000
      0 19281 "vacation"            8000 .24 "AL" 1 1665878416000
      0 19281 "home_improvement"   10000 .24 "AL" 1 1665878417000
      0 19281 "debt_consolidation" 35000 .24 "AL" 1 1665878418000
      0 19281 "debt_consolidation" 18000 .24 "AL" 1 1665878419000
      0 19281 "debt_consolidation" 10000 .24 "AL" 1 1665878420000
      0 19281 "major_purchase"      5000 .24 "AL" 1 1665878420000
      0 19281 "moving"              1000 .24 "AL" 1 1665878422000
      0 19281 "debt_consolidation"  6000 .24 "AL" 1 1665878423000
      0 19281 "credit_card"         3000 .24 "AL" 1 1665878424000
      0 19281 "debt_consolidation" 10000 .24 "AL" 1 1665878424000
      0 19281 "debt_consolidation" 35000 .24 "AL" 1 1665878426000
      0 19281 "debt_consolidation" 30000 .24 "AL" 1 1665878427000
      0 19281 "home_improvement"   10000 .24 "AL" 1 1665878428000
      0 19281 "debt_consolidation"  1500 .24 "AL" 1 1665878428000
      0 19281 "debt_consolidation"  7500 .24 "AL" 1 1665878430000
      0 19281 "major_purchase"     10000 .24 "AL" 1 1665878431000
      0 19281 "debt_consolidation"  8000 .24 "AL" 1 1665878432000
      0 19281 "debt_consolidation"  2500 .24 "AL" 1 1665878432000
      0 19281 "other"              12000 .24 "AL" 1 1665878434000
      0 19281 "debt_consolidation" 20000 .24 "AL" 1 1665878435000
      0 19281 "debt_consolidation" 10000 .24 "AL" 1 1665878436000
      0 19282 "debt_consolidation" 27000 .24 "AL" 1 1.6659648e+12
      0 19282 "small_business"     10000 .24 "AL" 1 1665964801000
      0 19282 "house"              35000 .24 "AL" 1 1665964802000
      0 19282 "debt_consolidation" 10000 .24 "AL" 1 1665964803000
      0 19282 "debt_consolidation" 10000 .24 "AL" 1 1665964804000
      0 19282 "debt_consolidation" 15500 .24 "AL" 1 1665964805000
      0 19282 "debt_consolidation"  1000 .24 "AL" 1 1665964806000
      0 19282 "debt_consolidation" 15000 .24 "AL" 1 1665964807000
      0 19282 "wedding"             3000 .24 "AL" 1 1665964808000
      0 19282 "debt_consolidation" 30000 .24 "AL" 1 1665964809000
      0 19282 "debt_consolidation" 20000 .24 "AL" 1 1665964810000
      0 19282 "debt_consolidation"  8000 .24 "AL" 1 1665964811000
      0 19282 "debt_consolidation" 20000 .24 "AL" 1 1665964812000
      0 19282 "home_improvement"   10000 .24 "AL" 1 1665964813000
      end
      format %tdDD/NN/CCYY App_Date_Corrected
      format %tc date_time2

      Comment


      • #4
        Moreover, I would like to run a fixed effects xtlogit regression (to account for time invariant state characteristics) and therefore I need my panel to also include a unique time indicator, I guess.
        No, you don't. That's simply not true. The only thing that suggests a need for a unique time variable is that you want to use "lagged" values of total entrepreneurs and banking assets. But those aren't actually lagged in the sense that the Stata lag operators would work if you had distinctive timestamps of the kind you tried to create. Your enterpreneurs variable is lagged by a year, not by a second. And your banking assets variable is lagged by a year or by a quarter, also not be a second. Had you succeeded in assigning distinctive timestamps to the loans and -xtset- your data using that time variable, your lagged entrepreneurship and assets variables would have been lagged by a only a second!

        So what you need to do is to merge these variables into your data set by linking, for example, the 1995 entrepreneurs variable with the 1996 loans, etc., Analogously for the banking assets variable. This is one situation where the lag operators won't do what you need.

        Comment

        Working...
        X