Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Retrieve observation number to local

    Hi,


    I have a panel data set. I want to extract a specific observation number to a local, to be used later. I want to get the observation number for the first observation for a new ID. In this example the observation number in bold.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str15 ID double date float obs
    "Sweden" 21147  1
    "Sweden" 21150  2
    "Sweden" 21151  3
    "Sweden" 21152  4
    "Sweden" 21153  5
    "Sweden" 21154  6
    "Sweden" 21157  7
    "Sweden" 21158  8
    "Sweden" 21159  9
    "Sweden" 21160 10
    "Sweden" 21161 11
    "Sweden" 21164 12
    "Sweden" 21165 13
    "Sweden" 21166 14
    "Sweden" 21167 15
    "Sweden" 21168 16
    "Sweden" 21171 17
    "Sweden" 21172 18
    "Sweden" 21173 19
    "Sweden" 21174 20
    "Sweden" 21175 21
    "Sweden" 21180 22
    "Sweden" 21181 23
    "US"     20809  1
    "US"     20810  2
    "US"     20811  3
    "US"     20815  4
    "US"     20816  5
    "US"     20817  6
    "US"     20818  7
    "US"     20821  8
    "US"     20822  9
    "US"     20823 10
    "US"     20824 11
    "US"     20825 12
    "US"     20828 13
    "US"     20829 14
    "US"     20830 15
    "US"     20831 16
    "US"     20832 17
    "US"     20835 18
    "US"     20836 19
    "US"     20837 20
    "US"     20838 21
    "US"     20839 22
    end
    format %td date
    Any ideas?

  • #2
    Although you ask for a local, observation numbers are not always needed and there will be as many values as there are new panels. If you pack them all into a local, you'll have to unpack them again. So, my answer is to change the question: Why do you want this? My bet is that there is a better different method to store the information. For example

    Code:
    bysort id (date) : gen byte first = _n == 1
    indicates the start of a new panel.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Although you ask for a local, observation numbers are not always needed and there will be as many values as there are new panels. If you pack them all into a local, you'll have to unpack them again. So, my answer is to change the question: Why do you want this? My bet is that there is a better different method to store the information. For example

      Code:
      bysort id (date) : gen byte first = _n == 1
      indicates the start of a new panel.
      Thanks for asking. Your code will give me a tag for the first instance of ID. This will however not work for my problem.

      For this specific problem I do need a local. The local will be used in a loop later on. Sure there are many panels, but the local will be reset for each time the loop is run.

      I will for each panel (many) of my dataset run a loop where the loop starts at the first and ends at the last observation within each panel. To be able to do this I need the observation numbers for the first and last observation of respective panel.

      This will give me the ending observation number ( local last_seq_start) for a panel (i will substract 19 from the ending observation because i am expanding the dataset by 20 observations if there are no gaps in the date sequence, hence the final sequence will need to begin so there is still 20 observations left within that panel) :

      I do not understand how to get the last observation (local last_seq_start ) within a specific panel though...
      Code:
      levelsof ID, local(testlist) clean
      foreach var in `testlist' {
          local maxIDcount = 0
          local last_seq_start = 0
          local first_seq_start = 0
          qui count if Closeprice_!=. & ID== "`var'"
          if `maxIDcount' < r(N) local maxIDcount =r(N)
          else di ""
          local last_seq_start = `maxIDcount' - 19
      This is the loop i will run for each panel:

      Code:
          forval i = `first_seq_start'/`last_seq_start' {
              local j = `i'+19
              tsspell date in `i'/`j', c(longgap != 1) spell(spell`i') seq(seq`i') end(end`i')
              egen length`i' = max(seq`i'), by(ID_num spell`i')
              expand 2 in `i'/`j' if length`i'[`j']==20, gen(duplicate`i')
              replace ID = ID + "`i'" if duplicate`i'==1
              drop ID_num spell`i' seq`i' end`i' duplicate`i' length`i'
              encode ID, gen(ID_num)
          }
      }
      The loops above is not really a problem (they use variables not given in my original dataex), they will work just fine. But I can not get the last observation within respective panels. Ideas?

      Comment


      • #4
        It seems that you want to run tsspell (SSC) on the first 20 observations in each panel. That doesn't on the face of it need a loop

        Code:
        egen id = group(ID), label 
        tsset id date 
        tsspell ... if obs <= 20

        Comment


        • #5
          Originally posted by Nick Cox View Post
          It seems that you want to run tsspell (SSC) on the first 20 observations in each panel. That doesn't on the face of it need a loop

          Code:
          egen id = group(ID), label
          tsset id date
          tsspell ... if obs <= 20
          I wish... that would have been much easier. No, i want to:

          1. Run tsspell on the first sequence of 20 obs (obs 1-19) within a panel.
          2. If the sequence does not have gaps longer than 3 days, expand the sequences to the end of the dataset and change the ID.
          3. Run tsspell on the next sequence (2-20).
          4. Repeat for the entire panel.

          The result is sequences of 20 obs without longer gaps, that I Am going to analyse in a later step.

          Comment


          • #6
            I still have the same reaction even to a more complicated problem: tell us the real problem and there may well be a much simpler solution. http://xyproblem.info/

            1. Whether there are gaps longer than 3 days is discoverable in one pass. I don't see why you want to cover overlapping segments and look at almost every observation many, many times.

            2. What makes 20 a magic number?

            3. When I look at your data, I see Mondays to Fridays and no Saturdays or Sundays, which seems a case for a business calendar.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              I still have the same reaction even to a more complicated problem: tell us the real problem and there may well be a much simpler solution. http://xyproblem.info/

              1. Whether there are gaps longer than 3 days is discoverable in one pass. I don't see why you want to cover overlapping segments and look at almost every observation many, many times.

              2. What makes 20 a magic number?

              3. When I look at your data, I see Mondays to Fridays and no Saturdays or Sundays, which seems a case for a business calendar.
              1. I have of course tagged all gaps longer than 3 days (in one pass).
              1.1 I am running analyses on segments which are unique. Sure, they are up to 95% (19/20) alike other segments but not 100%.
              2. Nothing. The problem at hand is the same regardless of how many days per segment. 20 days is merely 4 weeks of...
              3. ...business calendar! Using business calendar (bcal) would not solve my problem of getting segments of observation with no long gaps...



              The real problem is that I want to analyze sequencies/segments of observations, with no long gaps. I don't look at observations one-by-one, I analyse sequencies (segments if you like). All sequencies are unique (since there are, at least, one observation that differ).

              Setting: Panel data (financial data basically). Mostly registered as in a business calendar, although from different countries, varying degrees of gaps and so on.

              If anyone has a simpler idea if getting rolling sequencies (of 20 observations each), within panels, and expanding the sequencies at the bottom of the dataset with a renamed ID, I would be glad. Since I am generating rolling sequencies my idea was to refer to observation numbers (such as in my code example above).

              If I just could get a way of getting the observation number to a local it would be solved. If you need a more detailed dataex, I will provided it of course.

              Regards,

              Jesper Eriksson
              Last edited by Jesper Eriksson; 18 Oct 2020, 12:16.

              Comment


              • #8
                Thanks for the further details. Sorry, but I am getting no closer and will leave it there for others.

                Comment


                • #9
                  Ok i did a easy fix. I generated totalobs (which corresponds to respective observation number) and used that in my local.

                  Code:
                  levelsof ID, local(testlist) clean
                  foreach var in `testlist' {
                  gen totalobs =_n
                  qui su totalobs if ID== "`var'"
                  local first_seq_start = totalobs[`r(min)']

                  Comment

                  Working...
                  X