Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Manipulating datatime

    Hi all,
    my dataset is like that:

    id data time
    1 01jan2018
    1 01jan2018
    1 03jan2018
    1 03jan2018
    1 03jan2018
    1 06jan2018
    .... ................
    2 01jan2018
    2 01jan2018
    2 01jan2018
    2 01jan2018
    2 05jan2018
    2 05jan2018
    2 07jan2018
    .... ................

    so, I want that to become

    id data time
    1 01jan2018
    1 02jan2018
    1 03jan2018
    1 04jan2018
    1 05jan2018
    1 06jan2018
    .... ................
    2 01jan2018
    2 02jan2018
    2 03jan2018
    2 04jan2018
    2 05jan2018
    2 06jan2018
    2 07jan2018
    .... ................


    Could anyone help me?
    Thanks for each reply.

  • #2
    Dear Mario, Please use (ssc install) dataex to re-post your data.
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

    Comment


    • #3
      A problem would seem to be how you know which observation with value 1jan2018 should stay that way and which should have the new value 2jan2018. And so on.

      Comment


      • #4

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int idimpianto str19 dtcomu
        9828 "01/06/2015 13:28:48"
        9828 "01/06/2015 13:28:48"
        9828 "01/06/2015 13:28:48"
        9828 "04/06/2015 16:53:39"
        9828 "04/06/2015 16:53:39"
        9828 "04/06/2015 16:53:39"
        9828 "04/06/2015 16:53:39"
        9828 "08/06/2015 11:23:37"
        9828 "08/06/2015 11:23:37"
        9828 "08/06/2015 11:23:37"
        9828 "11/06/2015 19:41:05"
        9828 "11/06/2015 19:41:05"
        9828 "11/06/2015 19:41:05"
        9828 "11/06/2015 19:41:05"
        9828 "15/06/2015 11:11:03"
        5345 "01/06/2015 09:11:14"
        5345 "01/06/2015 09:11:14"
        5345 "01/06/2015 09:11:14"
        5345 "01/06/2015 09:11:14"
        5345 "01/06/2015 09:11:14"
        5345 "01/06/2015 09:11:14"
        5345 "01/06/2015 09:11:14"
        5345 "01/06/2015 09:11:14"
        5345 "09/06/2015 08:21:56"
        5345 "09/06/2015 08:21:56"
        5345 "09/06/2015 08:21:56"
        5345 "09/06/2015 08:21:56"
        5345 "09/06/2015 08:21:56"
        5345 "09/06/2015 08:21:56"
        5345 "09/06/2015 08:21:56"
        5345 "09/06/2015 08:21:56"
        5345 "17/06/2015 08:11:36"
        end
        where there is the same date, I need the time variable to be consecutive until it changes.
        So, as we see in that part of my dataset when the "idimpianto=9828" is for three-times consecutive equal to 01/06/2015 13:28:48. so, I need it to be consecutive:

        01/06/2015 13:28:48
        02/06/2015 13:28:48
        03/06/2015 13:28:48
        and so on

        also, I want to ignore the hhmmss

        I hope that now to be clear.

        Comment


        • #5
          As a starting point, you may type this:

          Code:
          . split dtcomu, parse("")
          variables created as string: 
          dtcomu1  dtcomu2
          
          . gen mydata1 = date(dtcomu1, "DMY")
          
          . format mydata1 %td
          
          . sort idimpianto mydata1
          
          . by idimpianto, sort : egen float seq1 = seq(), from(1) to(20) block(1)
          
          . list in 1/20
          
               +---------------------------------------------------------------------------+
               | idimpi~o                dtcomu      dtcomu1    dtcomu2     mydata1   seq1 |
               |---------------------------------------------------------------------------|
            1. |     5345   01/06/2015 09:11:14   01/06/2015   09:11:14   01jun2015      1 |
            2. |     5345   01/06/2015 09:11:14   01/06/2015   09:11:14   01jun2015      2 |
            3. |     5345   01/06/2015 09:11:14   01/06/2015   09:11:14   01jun2015      3 |
            4. |     5345   01/06/2015 09:11:14   01/06/2015   09:11:14   01jun2015      4 |
            5. |     5345   01/06/2015 09:11:14   01/06/2015   09:11:14   01jun2015      5 |
               |---------------------------------------------------------------------------|
            6. |     5345   01/06/2015 09:11:14   01/06/2015   09:11:14   01jun2015      6 |
            7. |     5345   01/06/2015 09:11:14   01/06/2015   09:11:14   01jun2015      7 |
            8. |     5345   01/06/2015 09:11:14   01/06/2015   09:11:14   01jun2015      8 |
            9. |     5345   09/06/2015 08:21:56   09/06/2015   08:21:56   09jun2015      9 |
           10. |     5345   09/06/2015 08:21:56   09/06/2015   08:21:56   09jun2015     10 |
               |---------------------------------------------------------------------------|
           11. |     5345   09/06/2015 08:21:56   09/06/2015   08:21:56   09jun2015     11 |
           12. |     5345   09/06/2015 08:21:56   09/06/2015   08:21:56   09jun2015     12 |
           13. |     5345   09/06/2015 08:21:56   09/06/2015   08:21:56   09jun2015     13 |
           14. |     5345   09/06/2015 08:21:56   09/06/2015   08:21:56   09jun2015     14 |
           15. |     5345   09/06/2015 08:21:56   09/06/2015   08:21:56   09jun2015     15 |
               |---------------------------------------------------------------------------|
           16. |     5345   09/06/2015 08:21:56   09/06/2015   08:21:56   09jun2015     16 |
           17. |     5345   17/06/2015 08:11:36   17/06/2015   08:11:36   17jun2015     17 |
           18. |     9828   01/06/2015 13:28:48   01/06/2015   13:28:48   01jun2015      1 |
           19. |     9828   01/06/2015 13:28:48   01/06/2015   13:28:48   01jun2015      2 |
           20. |     9828   01/06/2015 13:28:48   01/06/2015   13:28:48   01jun2015      3 |
               +---------------------------------------------------------------------------+
          Best regards,

          Marcos

          Comment


          • #6
            A very literal interpretation of what you are asking for:
            Code:
            gen datestr = substr(dtcomu,1,10)
            gen date =  daily(datestr, "DMY")
            format date %td_D/N/Y
            bys idimpianto (date): replace date=date[_n-1]+1 if datestr==datestr[_n-1]
            sort idimpianto date
            But I have to say I have difficulties understanding why you would do this.
            You might be better helped if you better explained the purpose of your data transformation.

            Comment


            • #7
              Thanks Jorrit and Marcos.

              Jorrit the command that you suggested me works well.


              The purpose is to declare data to be panel data -xtset-. If I do not transform my time variable I am unable to do this, because a message advises me that the variables do not uniquely identify the observations. Moreover, I know that each line is a different daily file that I joined vertically with -append-
              Other variables are price, brand, city......


              NB: I have more or less 1,177,000 observations.


              Comment


              • #8
                If you know different files belong to different dates, it would be advisable to append them in such a way that dates or sequence would be copied over or indicated properly.
                There are also ways to xtset even when different obs have same dates, especially if you are aware of the proper ordering of the different observations.
                To add, I am quite surprised that different files would have exact same timestamps down to seconds when in reality they are supposed to be of different days/moments.
                You might want to back up a step and explain more about what the different files are and how they would need to be combined.

                Comment


                • #9
                  The files in my possession represent the daily price that companies communicate to the government.
                  In these files there are several information about these companies. Every file is built at the same way.
                  Each different file represent different day.

                  The problem is that companies communicate the price using an automatic tool. Therefore, if company A communicates the price 1.25 for a whole week, on 01/01/2012, this appears from the file 01/01/2018 to the file 07/01/2018 with the same price and the same date at the first communication.

                  This situation continues until the company communicates the price using the automatic price communication tool again.

                  So, I could manipulate csv file directly, but i have 1460 csv files.
                  I -append- all these files without problems, but I can not declare the data as panel data if I do not delete the repetition of the dates. This is because I do not know other ways to do this.

                  If you want you can suggest me a different way to -xtset-.in that contest.

                  Sorry for my unclear English and I hope now everything is ok.
                  Thanks

                  Comment


                  • #10
                    Alright. If you have reason to be sure that the dates really are the correct ones after you do this manipulation, everything is fine.
                    However, if the date can somehow be understood from the file name, you might want to use that info by generating a variable that holds the filename when you import each file into stata, roughly:

                    Code:
                    foreach file of local myfilelist {
                    *use or import or whatever
                    gen id="`file'" /*assign an id field based on the file name*/
                    *save with some name or save, replace 
                    }

                    Comment

                    Working...
                    X