Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate new variable that equals observation for another variable in a specific year?

    Dear all,

    I would like to generate a new variable that takes as a constant an observation for another variable in a specific year.

    I would use this to e.g. keep track on an initial position.

    Below a simplified version of my data set:

    Country GDP Year StartPos
    AT 1 2000 1
    AT 2 2001 1
    AT 3 2002 1
    BE 4 2000 4
    BE 5 2001 4
    BE 6 2002 4

    StartPos would then take the observation for GDP in 2000...

    Many thanks for your help!

    Joao



  • #2
    Here's some technique:

    Code:
    clear
    input str2 Country GDP Year StartPos
    AT 1 2000 1
    AT 2 2001 1
    AT 3 2002 1
    BE 4 2000 4
    BE 5 2001 4
    BE 6 2002 4
    end 
    
    egen GDP2000 = total(GDP * (Year == 2000)), by(Country) 
    
    list, sepby(Country) 
    
    
         +-------------------------------------------+
         | Country   GDP   Year   StartPos   GDP2000 |
         |-------------------------------------------|
      1. |      AT     1   2000          1         1 |
      2. |      AT     2   2001          1         1 |
      3. |      AT     3   2002          1         1 |
         |-------------------------------------------|
      4. |      BE     4   2000          4         4 |
      5. |      BE     5   2001          4         4 |
      6. |      BE     6   2002          4         4 |
         +-------------------------------------------+
    Please read

    1. http://www.statalist.org/forums/help#stata

    2. http://www.stata-journal.com/sjpdf.h...iclenum=dm0055

    Comment


    • #3
      Perfect! Many, many thanks Nick!

      Comment


      • #4
        Dear Nick,

        Just another question. Imagine that instead of setting 2000, I would like to take the first year for which I have an observation, possibly changing from country to country?

        Thanks!!
        João

        Comment


        • #5
          This is discussed in the paper referenced in #2.

          Comment


          • #6
            Indeed. I had managed in the meantime. Really, many thanks!

            Comment


            • #7
              Dear Nick,

              I am facing a new problem.

              I understand that the code you suggested only works when there is only one value in each data set.

              Imagine that I have now the following data set:


              Country Year GDP Countrynumber Yearreference GDPreference
              AT 2000 25700 1 2000 25700
              AT 2001 25800 1 2000 25700
              AT 2002 26800 1 2000 25700
              AT 2003 27200 1 2000 25700
              AT 2004 28600 1 2000 25700
              AT 2005 29800 1 2004 25700
              AT 2006 31100 1 2004 25700
              AT 2007 32500 1 2004 25700
              AT 2008 32700 1 2004 25700
              AT 2009 31100 1 2008 25700
              AT 2010 32100 1 2008 25700
              AT 2011 33500 1 2008 25700
              AT 2012 35100 1 2008 25700
              AT 2013 35200 1 2012 25700
              AT 2014 36000 1 2012 25700
              AT 2015 37700 1 2012 25700
              AT 2016 37200 1 2012 25700
              BE 2000 24500 2 2000 24500
              BE 2001 25100 2 2000 24500
              BE 2002 26200 2 2000 24500
              BE 2003 26300 2 2000 24500
              BE 2004 27300 2 2000 24500
              BE 2005 28300 2 2004 24500
              BE 2006 29300 2 2004 24500
              BE 2007 30400 2 2004 24500
              BE 2008 30100 2 2004 24500
              BE 2009 28900 2 2008 24500
              BE 2010 30600 2 2008 24500
              BE 2011 31300 2 2008 24500
              BE 2012 32200 2 2008 24500
              BE 2013 32100 2 2012 24500
              BE 2014 33000 2 2012 24500
              BE 2015 34400 2 2012 24500
              BE 2016 34300 2 2012 24500

              [...]


              I would need to populate GDPreference with the GDP level corresponding to Yearreference... Therefore the test "Year = Yearreference" would be 1 in 4 cases...

              Many thanks for your help.

              J.


              Comment


              • #8
                The principles here are, again, discussed in the paper referenced in #2.

                Code:
                clear
                input str2 Country Year GDP Countrynumber Yearreference GDPreference
                AT 2000 25700 1 2000 25700
                AT 2001 25800 1 2000 25700
                AT 2002 26800 1 2000 25700
                AT 2003 27200 1 2000 25700
                AT 2004 28600 1 2000 25700
                AT 2005 29800 1 2004 25700
                AT 2006 31100 1 2004 25700
                AT 2007 32500 1 2004 25700
                AT 2008 32700 1 2004 25700
                AT 2009 31100 1 2008 25700
                AT 2010 32100 1 2008 25700
                AT 2011 33500 1 2008 25700
                AT 2012 35100 1 2008 25700
                AT 2013 35200 1 2012 25700
                AT 2014 36000 1 2012 25700
                AT 2015 37700 1 2012 25700
                AT 2016 37200 1 2012 25700
                BE 2000 24500 2 2000 24500
                BE 2001 25100 2 2000 24500
                BE 2002 26200 2 2000 24500
                BE 2003 26300 2 2000 24500
                BE 2004 27300 2 2000 24500
                BE 2005 28300 2 2004 24500
                BE 2006 29300 2 2004 24500
                BE 2007 30400 2 2004 24500
                BE 2008 30100 2 2004 24500
                BE 2009 28900 2 2008 24500
                BE 2010 30600 2 2008 24500
                BE 2011 31300 2 2008 24500
                BE 2012 32200 2 2008 24500
                BE 2013 32100 2 2012 24500
                BE 2014 33000 2 2012 24500
                BE 2015 34400 2 2012 24500
                BE 2016 34300 2 2012 24500
                end
                
                egen Wanted = mean(cond(Year == Yearreference, GDP, .)), by(Country)
                assert Wanted == GDPreference
                The mean of a constant is that constant. When met in statistics, that doesn't seem interesting. For looking up values in a program, that is often useful.
                Last edited by Nick Cox; 11 Feb 2018, 03:04.

                Comment

                Working...
                X