Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a new variable based on row numbers - issues with 'gen time = _n'

    Hi,

    I am trying to run the command 'gen time = _n' in order to create a new variable 'time' - however my dataset is very large ~ 100,000,000 observations with 3 variables. I need to create this 'time' variable in order for the next step of my analysis. The issue I am having after running this command is as follows;

    At some points in this newly generated 'time' variable, I get repeated values - for example instead of observing;

    .
    .
    .
    56030
    56031
    56032
    .
    .
    .

    I see the following;

    .
    .
    .
    56030
    56030
    56031
    .
    .
    .


    etc.

    this happens at several different points over the dataset and I can't continue with the next stage of my analyses (Since I need to use 'tsset time' and get the error r(451) - repeated time values in sample)

    Does anyone know how I can solve this?

    many thanks,

    Kishan


  • #2
    Try

    Code:
    gen long time = _n
    Do you really have 100 million different times though? I wonder if maybe you instead need something like

    Code:
    bysort id: gen time = _n
    But even that may not be quite right. Maybe you could describe your data a bit more, including the information that identifies cases. I am surprised there isn't some sort of time variable already. What keeps the records properly sorted by time?
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thank you Richard,

      That was such a simple solution!

      gen long time=_n worked well.

      I have accelerometer data which is measured over 10 days at 100 hz which results in the large dataset.

      Thank you very much.

      Comment

      Working...
      X