Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing values with largest smaller values of a variable

    Dear Statalist members,

    I am following and benefiting this forum for more than 2 years and up to now I found solutions to every single problem arisen during my early reseach stage. Many thanks to all who supported this forum. This time I could not find any similar problem here, so I would like to ask you following question.

    I have a panel data defined by "xtset movieid week" I would like to replace values of weekdum2 variable of a particular cross-sectional unit with the largest smaller value. For instance, lets say I want to do it for movieid==97. This means I want weekdum2 value being 22 (not appear below) for obsid 40767, 23 for obsid 40892 24 for obsid 41029 and 26 for obsid 41159.

    I tried the following code but it generates missing values for some observations.

    sort week2
    by week2: replace weekdum2=weekdum2[_n-1] if movieid==97


    An example from my dataset is as follows:


    Code:
    * Example generated by - dataex -. To install: ssc install dataex
    clear
    input float movieid long obsid float(week2 weekdum2)
       97 40767 335 23
      798 40840 335 23
      813 40780 335 23
      910 40772 335 23
     1277 40781 335 23
     1313 40778 335 23
     1581 40765 335 23
     2332 40776 335 23
     2452 40768 335 23
     3041 40796 335 23
     3148 40771 335 23
     3320 40770 335 23
     3512 40764 335 23
     3557 40791 335 23
     3908 40779 335 23
     4191 40823 335 23
     4612 40800 335 23
     4699 40759 335 23
     5213 40824 335 23
     5484 40783 335 23
     5735 40761 335 23
     5965 40782 335 23
     6199 40773 335 23
     6402 40774 335 23
     6422 40829 335 23
     6541 40758 335 23
     7415 40760 335 23
     7681 40762 335 23
     8629 40827 335 23
     8998 40775 335 23
     9546 40819 335 23
    10605 40763 335 23
    10977 40818 335 23
       97 40892 336 24
      798 40974 336 24
      910 40894 336 24
     1277 40904 336 24
     1581 40902 336 24
     2332 40905 336 24
     2452 40889 336 24
     3148 40897 336 24
     3320 40900 336 24
     3512 40888 336 24
     3557 40930 336 24
     3906 40884 336 24
     3908 40906 336 24
     4191 40952 336 24
     4612 40920 336 24
     4699 40880 336 24
     5213 40927 336 24
     5735 40885 336 24
     5965 40899 336 24
     6134 40879 336 24
     6199 40890 336 24
     6402 40891 336 24
     6422 40923 336 24
     6541 40881 336 24
     7415 40883 336 24
     7681 40886 336 24
     7933 40882 336 24
     8629 40978 336 24
     8998 40909 336 24
    10605 40887 336 24
    10977 40951 336 24
       97 41029 337 26
      910 41028 337 26
     1277 41040 337 26
     1581 41057 337 26
     2168 41026 337 26
     2452 41027 337 26
     2930 41010 337 26
     3148 41037 337 26
     3320 41044 337 26
     3512 41050 337 26
     3906 41017 337 26
     3908 41043 337 26
     4191 41054 337 26
     4612 41062 337 26
     4699 41012 337 26
     4783 41021 337 26
     5213 41079 337 26
     5735 41018 337 26
     5896 41016 337 26
     5965 41038 337 26
     6134 41011 337 26
     6199 41023 337 26
     6402 41024 337 26
     6422 41032 337 26
     6541 41013 337 26
     7415 41015 337 26
     7681 41019 337 26
     7933 41014 337 26
     8629 41117 337 26
     8998 41055 337 26
    10605 41020 337 26
    10977 41093 337 26
       43 41137 338 27
       97 41159 338 27
      285 41146 338 27
      910 41161 338 27
    end



  • #2
    If you use bysort commands (same as sort followed by by), make sure you sort within your by-variable (week2) as well, especially if you use expressions like [_n-1], otherwise the order is not guaranteed and you may get different results depending on which sort commands you used before. So at the least, change your code above to
    Code:
    bysort week2 (movieid OR obsid): replace weekdum2 = weekdum2[_n-1] if movieid==97
    You get missing values because the observations with movieid==97 are often the first observation within the week, and weekdum2[0] doesn't exist so the result is missing.

    I'm afraid it's not really clear what you mean by "largest smaller value". It seems that you want to have the value of the previous week. In that case, you could do:
    Code:
    bysort movieid (week2): gen weekdum_alt = weekdum2[_n-1] if movieid == 97
    I would rather generate a new variable than replacing an existing one, you may still need it later for something else. If you still want to replace it:
    Code:
    replace weekdum2 = weekdum_alt if movieid == 97

    Comment


    • #3
      Originally posted by Wouter Wakker View Post
      If you use bysort commands (same as sort followed by by), make sure you sort within your by-variable (week2) as well, especially if you use expressions like [_n-1], otherwise the order is not guaranteed and you may get different results depending on which sort commands you used before. So at the least, change your code above to
      Code:
      bysort week2 (movieid OR obsid): replace weekdum2 = weekdum2[_n-1] if movieid==97
      You get missing values because the observations with movieid==97 are often the first observation within the week, and weekdum2[0] doesn't exist so the result is missing.

      I'm afraid it's not really clear what you mean by "largest smaller value". It seems that you want to have the value of the previous week. In that case, you could do:
      Code:
      bysort movieid (week2): gen weekdum_alt = weekdum2[_n-1] if movieid == 97
      I would rather generate a new variable than replacing an existing one, you may still need it later for something else. If you still want to replace it:
      Code:
      replace weekdum2 = weekdum_alt if movieid == 97


      Dear Wouter Wakker,

      Thanks for your response and sorry for being not clear. You're right. Indeed, I meant that I want to replace with the value of the weekdum2 corresponding to the previous week2. However, I am afraid that the code you suggest generates missing value for lowest value of weekdum2. I think this is because of if condition. Stata uses ONLY the values corresponding to movieid==97 to replace, but what I want is to use entire values of weekdum2 rather than only the values corresponding to movieid==97. I would be grateful if you have any further idea.

      Comment


      • #4
        This code generates the value of the previous week for every observation of the next week, irrespective of weather a certain movieid was included in that week:
        Code:
        bys week2 (obsid): gen aux = weekdum2 if _n==1
        gen group = !missing(aux)
        bys group (aux): gen aux1 = aux[_n-1]
        bys week2: egen prev_week = max(aux1)
        drop aux aux1 group
        You can then replace the values of weekdum2 :
        Code:
        replace weekdum2 = prev_week if movieid==97 & !missing(prev_week)
        Or better, generate a new variable.

        Comment


        • #5
          Originally posted by Wouter Wakker View Post
          This code generates the value of the previous week for every observation of the next week, irrespective of weather a certain movieid was included in that week:
          Code:
          bys week2 (obsid): gen aux = weekdum2 if _n==1
          gen group = !missing(aux)
          bys group (aux): gen aux1 = aux[_n-1]
          bys week2: egen prev_week = max(aux1)
          drop aux aux1 group
          You can then replace the values of weekdum2 :
          Code:
          replace weekdum2 = prev_week if movieid==97 & !missing(prev_week)
          Or better, generate a new variable.

          I appreciate your time and effort, but I am afraid this code does not replace weekdum2 of some movieids and replaces only largest the value of weekdum2 of some movieids. I guess the example dataset consist of 100 observations is not enough to check whether the code really does the job.

          For others to benefit from discussion, I share my solution to the problem. I basiclly substract 1 from week2 and replace weekdum2 values with mode of weekdum2 values corresponding to substracted week2 values. This works due to structure of my dataset such as having at least 2 observations per week and having delta of 1 unit.

          Code:
          gen week=week2
          sort movieid week
          replace week2=week-1 if movieid==97
          bysort week2 (movieid): egen modeweekdum2=mode(weekdum2) if week2>0 & week2<731
          replace weekdum2=modeweekdum2
          drop modeweekdum2
          Last edited by Bora Sinci; 30 Jul 2019, 09:31.

          Comment

          Working...
          X