Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to summate points prior to the event(key variable) you are considering.

    I've got data re cars , dummy dataset which I'm trying to practice on.

    The faults could be either Brake failure/Engine failure/Oil failure/Wheelwear

    If car had brake failure = 1 point
    If car had oil failure = 1 point
    If car had Wheelwear = 1 point

    Therefore =
    If car had brake failure + oil failure = 2 points

    The question: - The key end point I'm interested in is engine failure
    I want to find out if all those who had an engine failure, how may faulty car points did the car have previous to the engine failure (ie the summation of brake failure +/- oil failure +/- wheelwear)

    I want to find out post engine failure, did the car get any faults in terms of wheel wear.

    Could you kindly tell me what command I should use ? I can not for the life of me figure out the best way to go about it...



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double ID str10(AdmissionDate ReleaseDate) float(Faults brakefailure enginefailure oilfailure wheelwear)
    123345672828373 "1 Jan 1960" "4 Jan 1960" 1 1 0 0 0
    123345672828373 "1 Feb 1960" "4 Feb 1960" 3 0 1 0 0
           13455633 "1 Apr 1960" "5 Apr 1960" 1 1 0 0 0
           13455633 "5 Apr 1960" "7 Apr 1960" 2 0 0 1 0
           13455633 "1 Nov 1960" "1 Nov 1960" 3 0 1 1 0
           13455633 "1 Jan 1961" "3 Jan 1961" 1 0 0 0 1
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 6 out of 6 observations

  • #2
    Maybe something like this? This code assumes there is only one engine failure per ID.

    Code:
    gen admission_date = daily(AdmissionDate,"DMY")
    gen release_date = daily(ReleaseDate,"DMY")
    format %td admission_date release_date
    
    egen pre_engine_fails = rowtotal(brakefailure oilfailure wheelwear)
    egen post_engine_fails = rowtotal(wheelwear)
    
    sort ID admission_date
    by ID: egen engine_fail_date = max(admission_date*enginefailure)
    format %td engine_fail_date
    
    by ID: egen total_pre_engine_faults = total(pre_engine_fails*(admission_date<engine_fail_date))
    by ID: egen total_post_engine_faults = total(post_engine_fails*(admission_date>engine_fail_date))
    drop admission_date release_date pre_engine_fails post_engine_fails engine_fail_date // clean-up
    Last edited by Hemanshu Kumar; 27 Sep 2022, 07:04.

    Comment


    • #3
      Here's another way that is, one sense simpler, and in another sense, more complicated.

      Code:
      gen admission_date = daily(AdmissionDate,"DMY")
      gen release_date = daily(ReleaseDate,"DMY")
      format %td admission_date release_date
      
      by ID (admission_date), sort: egen total_pre_engine_faults = ///
          total(cond(sum(enginefailure) == 0, brakefailure+oilfailure+wheelwear, .))
      by ID (admission_date): egen total_post_engine_faults = ///
          total(cond(sum(enginefailure) > 0, wheelwear, .))
      Acknowledgement: The first three lines of code shown here are copied from #2.

      I think the choice between these approaches would depend on whether the additional variables created in #2, but not here, are useful for other purposes, or if the code feels more self-explanatory with them.

      Added: On an unrelated point, it may be problematic to use double for storage of a vehicle ID. In your example, one ID has 15 digits, which is just 1 shy of the limit that a double can handle. If some of your IDs are longer you risk having some digits truncated, with the possible result that two different vehicles' data will be confused with each other. Since one rarely does calculations with IDs anyway, you might be better off storing them as string variables. (Not to mention that at least in the United States, vehicle identification numbers contain letters as well as digits.)
      Last edited by Clyde Schechter; 27 Sep 2022, 10:03.

      Comment

      Working...
      X