Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Date Variable - Change Hours, minutes and seconds

    Hello,
    I have a time variable with this format : DD/MM/YYY hh:mm:ss ; only all observations of the variable have 00:00:00
    I would like to know if there's a way to change the hh:mm:ss for each date.

    How can I transform Old_Date into New_Date ?

    Old_Date New_Date
    01/01/2010 00:00:00 01/01/2010 00:00:00
    01/01/2010 00:00:00 01/01/2010 00:00:30
    01/01/2010 00:00:00 01/01/2010 00:01:00
    01/01/2010 00:00:00 01/01/2010 00:01:30
    01/01/2010 00:00:00 01/01/2010 00:02:00

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(old_date new_date)
    1.5779232e+12 1.5779232e+12
    1.5779232e+12 1577923230000
    1.5779232e+12 1577923260000
    1.5779232e+12 1577923290000
    1.5779232e+12 1577923320000
    end
    format %tc old_date
    format %tc new_date
    
    
    by old_date, sort: gen double wanted = old_date + (_n-1) * 30 * 1000
    format wanted %tc
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    Comment


    • #3
      I apologize for not using datex, I will start doing so in my future posts.

      I want to thank you greatly, it worked perfectly. I now have been able to declare my dataset as a panel.

      Best Regards

      Comment


      • #4
        My panel is composed of daily data with multiple observations in each day. I was wondering if setting the panel with a delta = .001 seconds and not delta = 1 day will bias the results.
        Is it okay to change the structure of delta ?

        Comment


        • #5
          Well, this seems like a very odd question. Based on what you asked for in #1, delta is 30 seconds. If you try to set delta = 0.001 seconds instead, -xtset- will think you have gaps of 29,999 missing observations between each pair of observations. If you set delta = 1 day, I don't know what -xtset- will do, but I'm sure it won't be anything good.

          I'm getting the impression now, however, that what's going on here is the fitting of a square peg into a round hole, or perhaps putting lipstick on a pig. In #1, I had assumed that your observations were actually obtained at 30 second intervals and that, for some reason, the original data source didn't record the time, or didn't record it correctly. Now I'm thinking you just have a bunch of observations with the same date that were not, in fact, collected at 30 second intervals, and you're trying to -xtset- the data and -xtset- is complaining that you have multiple observations with the same time. In fact, maybe you don't even really know what time the observations were made, or even what order they were in. In that case, you have no business trying to -xtset- the data with a time variable. As I see it, there are two possibilities here, and in neither case is this approach suitable:

          1. The data were actually collected at different times, and it matters what those times were, or at least it matters what the correct chronological order is. In that case, making up times is just falsifying your data, and -xtset- was doing you a favor by pointing out that your data are not suitable for what you want to do with them. Sweeping the error message under the rug in this case will just lead you to analyze phony data and produce phony results. The solutions in this case would be either to get the correct time data, or, if that isn't possible, change your analysis plan to things that do not depend on knowing the times beyond the level of the date.

          2. The data were not actually collected at different times, or they were but it doesn't matter for your analysis, and it doesn't matter what the chronological order is. In that case, your data are fine, and just -xtset- your data without a time variable. -xtset- does not require a time variable. The time variable is only needed when you want to work with time-series operators like lags and leads or with autoregressive correlation structures: which would only be the case if the correct chronological sequence or exact times matter.

          Added: There is a third possibility I just remembered. Maybe it's your panel id variable that's wrong. Is there something that distinguishes the different observations on each unit. For example perhaps you are trying to treat firm id as the panel variable, but the observations of a firm on a given date are from different divisions within the firm. In that case, the correct panel identifier is not the firm, but the combination of firm and division. If this is what's going on, you need to create a new variable that represents the combination of firm and division, and then you can -xtset- with that as the panel id variable and your time variable (as it was originally, not modified to 30 second intervals).
          Last edited by Clyde Schechter; 29 Oct 2020, 18:28.

          Comment


          • #6
            Thank you for your clear and accurate insights.
            The data consists of every fish sale operation carried out in different harbors of the country from 2010 to 2019.

            I have numerous operations for each day recording for each operation : the date (without time; lack of precision), region, harbor name, harbor city, harbor type, boat category, boat type, fishmonger type, group of fish species, category of fish species, subcagategory of fish species, price at which it was sold and amount.
            1. My panel id is the harbor name, which is already a combination of the type and city of the harbors. And -xtset- returns the repeated time values within the panel. I tried using the region as panel id and the issue persits. I could try combining the harbor with the region, how can I create a combined variable of the harbor and region ?
            2. I would like to study the effect of predictors(region, harbor type, boat category, boat type, fishmonger type, work environment, subcategory of fish species) on the log of prices over time. Ideally I would like to introduce a lag of the dependant variable as an explanatory variable. As I couldn't set the data as panel (r(451)), and as the chronological order doesn't matter(within the day-hours), I though of going around it by assigning random hours values to my days to bypass the repeated time values within the panel error. [Stata only accounts for the first operation in each day as an observation and all the rest are treated as duplicates].
            Do you think I could work with the modified time variable in my case or just drop the time variable ?
            I haven't found any other solution for the repeated time values within the panel.

            Thanks in advance

            Comment


            • #7
              Well, if you are planning to introduce a lagged variable, then you do need to -xtset- with a time variable. You cannot use your modified time variables because if you do that, the lagged variable will be the variable from the same harbor name that you arbitrarily designated as being the one 30 minutes earlier! Clearly that would be wrong.

              It sounds like you need a much more fine-grained panel variable, though knowing nothing about fish markets I cannot say what it would be, though I would guess it involves not only the harbor name but also the fishmonger type, fish species and subcategory of fish species. Maybe some of the boat and fishmonger variables are involved as well.

              What you need to do is think closely about what you mean by a lagged variable. Sit down and look at an observation in your data (one that is not from the first day). And ask yourself, which observation from the previous day would be the lag of this one. If you had to pick the lag of this observation, how would you decide which of the preceding day's observations it is? Then observe which combination of variables you had to take into account in order to pick out which observation would be the correct lag. That combination, which must be identical for an observation to be the lag of another (in addition to being one day earlier), is your correct panel id.

              As for creating such a variable, doing it in generic terms, if the variables that you need to combine are x y z and w:
              Code:
              egen combination = group(x y z w) // WORKS WITH ANY NUMBER OF VARIABLES
              Then you can -xtset combination time- and you'll be good to go. If when you try this you still get a repeated time values error then it either means that you have overlooked something and need to add another variable (or perhaps more than one) to the combination to get it right, or it means there is a problem in your data where the same unit really has two different observations on the same date: that would mean your data is wrong, or at least is not suitable for treatment as panel data.

              Comment


              • #8
                Dear Clyde,
                I am sorry for my late response.
                Thank you so much for your very helpful solution, it worked perfectly.

                By creating a combined variable, I was able to create a panel id.
                Code:
                . xtset id DateKey
                       panel variable:  id (unbalanced)
                        time variable:  DateKey, 01 Jan 10 to 31 Dec 19, but with gaps
                                delta:  1 day
                I am very grateful for your help,
                Best Regards

                Comment


                • #9
                  While doing some tests on all data, to check if it's correctly normalized.
                  I found that 99% of the panel id observations are duplicates.

                  What does this mean please ?

                  Code:
                  . unab vlist :  id
                  . sort `vlist'
                  . quietly by `vlist':  gen dup = cond(_N==1,0,_n)
                  . sum dup
                  
                      Variable |       Obs        Mean    Std. Dev.       Min        Max
                  -------------+--------------------------------------------------------
                           dup |   1484908    510.8731     544.394          0       3418
                  
                  . drop if dup>1
                  (1472822 observations deleted)

                  Comment


                  • #10
                    By using the name dup for our variable, you have talked yourself into believing that it identifies duplicate observations. It does not. It identifies observations that have the same id. In panel data where T is fairly large, the vast majority of observations will share their id with other observations. For example, if T = 10, 9 out of every 10 (i.e. 90%) observations will have a duplicate id. There is nothing wrong with that.

                    What the results in #9 show you is that you have precisely 1,484,908-1,472,822 = 12,086 distinct id's in your data set. If that is not in line with your expectations, then you will need to review your data management to see what has gone wrong. But if that is what you expect, then this is just confirmation.

                    Comment


                    • #11
                      Clyde Schechter :
                      Thank you very much for your very helpful remarks, you've been of great help.

                      Comment

                      Working...
                      X