Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • keep the observation parallel to the year and replacing the rest duplicates observation within a group with zero

    Hi Statalist,
    I'm new to the forum and I have around 1 million observations. I need to keep one observation in the third and fourth column parallel to the year in the second column and replace the rest duplicated values with zero. I am writing here because after long searching could not find an appropriate command to transfer my data.

    My data look likes:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    *dataex commoditycode year reportercode new_var year1 id
    clear
    input long commoditycode int(year reportercode) float new_var int year1 float id
    10110 2008  56 18 2008  1
    10110 2008  56 18 2009  1
    10110 2008  56 18 2010  1
    10110 2008  56 18 2011  1
    10110 2008  56 18 2012  1
    10110 2008  56 18 2013  1
    10110 2008  56 18 2014  1
    10110 2008  40  4 2008  2
    10110 2008  40  4 2009  2
    10110 2008  40  4 2010  2
    10110 2008  40  4 2011  2
    10110 2008  40  4 2012  2
    10110 2008  40  4 2013  2
    10110 2008  40  4 2014  2
    10110 2011 124 36 2008  3
    10110 2011 124 36 2009  3
    10110 2011 124 36 2010  3
    10110 2011 124 36 2011  3
    10110 2011 124 36 2012  3
    10110 2011 124 36 2013  3
    10110 2011 124 36 2014  3
    10110 2008 251 37 2008  4
    10110 2008 251 37 2009  4
    10110 2008 251 37 2010  4
    10110 2008 251 37 2011  4
    10110 2008 251 37 2012  4
    10110 2008 251 37 2013  4
    10110 2008 251 37 2014  4
    10110 2011 842 60 2008  5
    10110 2011 842 60 2009  5
    10110 2011 842 60 2010  5
    10110 2011 842 60 2011  5
    10110 2011 842 60 2012  5
    10110 2011 842 60 2013  5
    10110 2011 842 60 2014  5
    10110 2008 276 42 2008  6
    10110 2008 276 42 2009  6
    10110 2008 276 42 2010  6
    10110 2008 276 42 2011  6
    10110 2008 276 42 2012  6
    10110 2008 276 42 2013  6
    10110 2008 276 42 2014  6
    10110 2008 100  7 2008  7
    10110 2008 100  7 2009  7
    10110 2008 100  7 2010  7
    10110 2008 100  7 2011  7
    10110 2008 100  7 2012  7
    10110 2008 100  7 2013  7
    10110 2008 100  7 2014  7
    10110 2009 842 62 2008  8
    10110 2009 842 62 2009  8
    10110 2009 842 62 2010  8
    10110 2009 842 62 2011  8
    10110 2009 842 62 2012  8
    10110 2009 842 62 2013  8
    10110 2009 842 62 2014  8
    10110 2008 826 47 2008  9
    10110 2008 826 47 2009  9
    10110 2008 826 47 2010  9
    10110 2008 826 47 2011  9
    10110 2008 826 47 2012  9
    10110 2008 826 47 2013  9
    10110 2008 826 47 2014  9
    10110 2008  31  2 2008 10
    10110 2008  31  2 2009 10
    10110 2008  31  2 2010 10
    10110 2008  31  2 2011 10
    10110 2008  31  2 2012 10
    10110 2008  31  2 2013 10
    10110 2008  31  2 2014 10
    10110 2014 251 43 2008 11
    10110 2014 251 43 2009 11
    10110 2014 251 43 2010 11
    10110 2014 251 43 2011 11
    10110 2014 251 43 2012 11
    10110 2014 251 43 2013 11
    10110 2014 251 43 2014 11
    10110 2008  32 16 2008 12
    10110 2008  32 16 2009 12
    10110 2008  32 16 2010 12
    10110 2008  32 16 2011 12
    10110 2008  32 16 2012 12
    10110 2008  32 16 2013 12
    10110 2008  32 16 2014 12
    10110 2009 251 39 2008 13
    10110 2009 251 39 2009 13
    10110 2009 251 39 2010 13
    10110 2009 251 39 2011 13
    10110 2009 251 39 2012 13
    10110 2009 251 39 2013 13
    10110 2009 251 39 2014 13
    10110 2009 276 45 2008 14
    10110 2009 276 45 2009 14
    10110 2009 276 45 2010 14
    10110 2009 276 45 2011 14
    10110 2009 276 45 2012 14
    10110 2009 276 45 2013 14
    10110 2009 276 45 2014 14
    10110 2010 842 68 2008 15
    10110 2010 842 68 2009 15
    end
    I want my data to look like this:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long commoditycode int(year reportercode) float new_var int year1 float id
    10110 2008  56 18 2008  1
    10110 2008   0  0 2009  1
    10110 2008   0  0 2010  1
    10110 2008   0  0 2011  1
    10110 2008   0  0 2012  1
    10110 2008   0  0 2013  1
    10110 2008   0  0 2014  1
    10110 2008  40  4 2008  2
    10110 2008   0  0 2009  2
    10110 2008   0  0 2010  2
    10110 2008   0  0 2011  2
    10110 2008   0  0 2012  2
    10110 2008   0  0 2013  2
    10110 2008   0  0 2014  2
    10110 2011   0  0 2008  3
    10110 2011   0  0 2009  3
    10110 2011   0  0 2010  3
    10110 2011 124 36 2011  3
    10110 2011   0  0 2012  3
    10110 2011   0  0 2013  3
    10110 2011   0  0 2014  3
    10110 2008 251 37 2008  4
    10110 2008   0  0 2009  4
    10110 2008   0  0 2010  4
    10110 2008   0  0 2011  4
    10110 2008   0  0 2012  4
    10110 2008   0  0 2013  4
    10110 2008   0  0 2014  4
    10110 2011   0  0 2008  5
    10110 2011   0  0 2009  5
    10110 2011   0  0 2010  5
    10110 2011 842 60 2011  5
    10110 2011   0  0 2012  5
    10110 2011   0  0 2013  5
    10110 2011   0  0 2014  5
    10110 2008 276 42 2008  6
    10110 2008   0  0 2009  6
    10110 2008   0  0 2010  6
    10110 2008   0  0 2011  6
    10110 2008   0  0 2012  6
    10110 2008   0  0 2013  6
    10110 2008   0  0 2014  6
    10110 2008 100  7 2008  7
    10110 2008   0  0 2009  7
    10110 2008   0  0 2010  7
    10110 2008   0  0 2011  7
    10110 2008   0  0 2012  7
    10110 2008   0  0 2013  7
    10110 2008   0  0 2014  7
    10110 2009   0  0 2008  8
    10110 2009 842 62 2009  8
    10110 2009   0  0 2010  8
    10110 2009   0  0 2011  8
    10110 2009   0  0 2012  8
    10110 2009   0  0 2013  8
    10110 2009   0  0 2014  8
    10110 2008 826 47 2008  9
    10110 2008   0  0 2009  9
    10110 2008   0  0 2010  9
    10110 2008   0  0 2011  9
    10110 2008   0  0 2012  9
    10110 2008   0  0 2013  9
    10110 2008   0  0 2014  9
    10110 2008  31  2 2008 10
    10110 2008   0  0 2009 10
    10110 2008   0  0 2010 10
    10110 2008   0  0 2011 10
    10110 2008   0  0 2012 10
    10110 2008   0  0 2013 10
    10110 2008   0  0 2014 10
    10110 2014   0  0 2008 11
    10110 2014   0  0 2009 11
    10110 2014   0  0 2010 11
    10110 2014   0  0 2011 11
    10110 2014   0  0 2012 11
    10110 2014   0  0 2013 11
    10110 2014 251 43 2014 11
    10110 2008  32 16 2008 12
    10110 2008   0  0 2009 12
    10110 2008   0  0 2010 12
    10110 2008   0  0 2011 12
    10110 2008   0  0 2012 12
    10110 2008   0  0 2013 12
    10110 2008   0  0 2014 12
    10110 2009   0  0 2008 13
    10110 2009 251 39 2009 13
    10110 2009   0  0 2010 13
    10110 2009   0  0 2011 13
    10110 2009   0  0 2012 13
    10110 2009   0  0 2013 13
    10110 2009   0  0 2014 13
    10110 2009   0  0 2008 14
    10110 2009 276 45 2009 14
    10110 2009   0  0 2010 14
    10110 2009   0  0 2011 14
    10110 2009   0  0 2012 14
    10110 2009   0  0 2013 14
    10110 2009   0  0 2014 14
    10110 2010   0  0 2008 15
    10110 2010   0  0 2009 15
    end
    Your help is highly appreciated

    warm regards

  • #2
    Why not just

    Code:
    keep if year == year1
    as the other observations seem to carry precisely no information?
    Last edited by Nick Cox; 03 Jun 2019, 11:51.

    Comment


    • #3
      Dear Prof Cox,
      they might carry some valuable information from the international trade perspective.

      Comment


      • #4
        Imad:
        probably not that efficient:
        Code:
        egen gather=group( commoditycode year reportercode new_var)
        bysort gather: g flag=1 if _n==1
        bysort gather: replace reportercode=0 if flag==.
        bysort gather: replace new_var=0 if flag==.
        drop flag
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          You are asking to replace other values by zero. What valuable information is included there in lots and lots of zeros?

          Sure, perhaps you have other variables you have not told us about which you want to keep; but if so that is the answer to give.

          Comment


          • #6
            Dear Prof Carlo, thank you for your effort
            Your command keeps the first observation in a group and replaces others with zero. The way that I need more precisely is to keep the observation corresponding to column year parallel in a row to year1
            Last edited by IMAD KAREEM; 03 Jun 2019, 21:32.

            Comment


            • #7
              Dear Prof Cox,
              indeed you are right. This is only the initial version of my data. I have more variables that need to be merged to get to the complete data.

              Comment

              Working...
              X