Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to first difference observations

    A simplified version of my data looks like the following:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(Amount Year Total)
     12 1999  38
     23 1999  38
      3 1999  38
     23 2000 178
    123 2000 178
     32 2000 178
    end

    Here, amount corresponds to amount produced in a sector, by year. The total corresponds to all the output produced in that year.I want to calculate the growth in total (first difference it)- or generate Total(t+1)-Total(t)

    Is thee a way to do this that does not involve me declaring my data as panel?

    Just to be clear, this variable should take on a value of "." for the year 1999 and a value of 140 for the year 2000.

    Thanks!


  • #2
    What indicates sector? Don't you want: that sector, that year?

    Comment


    • #3
      Chinmay Sharma with the information you've provided there is no way to provide any useful advice. One of the purposes of defining your data as panel is that is explicitly defines a primary key in your data (e.g., i by j). With the information available in your example, no one can identify how unique records are defined in your data over time. As an example:

      Code:
      // Clear all data from memory
      clear
      
      // Set the random number seed 
      set seed 7779311
      
      // Generate some fake data
      set obs 10
      
      // Create year values
      qui: g int year = _n + 2006
      
      // Random numbers of sectors across years
      qui: g nsector = rpoisson(6)
      
      // Create nsector observations of sectors within years
      expandcl nsector, gen(sector) cl(year)
      
      // Creates an ID variable for sectors
      bys year: g sectorid = _n
      
      // Number of firms within sectors/years
      qui: g nfirms = int(runiform(15, 50))
      
      // Expand the data to show firms within sectors and years
      expandcl nfirms, gen(firms) cl(year sectorid)
      
      // Create a firm id
      bys year sectorid: g firmid = _n
      
      // Create some amount value
      g amount = rnormal(100000, 25000)
      
      // Create some total value
      g total = rnormal(1000000, 500000)
       
      // Remove erroneous variables
      drop nsector sector nfirms firms 
      
      // If the primary key is known:
      bys sectorid firmid (year): g difftotal = total - total[_n - 1]
      If the primary key (e.g., the combination of variables that uniquely identifies records in the data set) is known, then the by prefix can be used with the id variables and sorting the data based on time (year in this case). Without the other information, however, it isn't really possible for someone to give you advice on what code to use beyond saying that it is possible to do what you seem to be asking.

      Comment


      • #4
        Clyde Schechter , ,wbuchanan

        Thanks for your responses.
        Please find below the data labelled by sector.


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(Amount Year Total sector)
         12 1999  38 1
         23 1999  38 2
          3 1999  38 3
         23 2000 178 1
        123 2000 178 2
         32 2000 178 3
        end
        The difference that I am looking for is the diifference in amount over all sectors produced. If, for instance, I type code of the form:

        Code:
        gen diff=total-total[_n-1]

        as suggested above, then I wont' be first differencing. I will in fact otain 0 for a lot of the observations (as it just subtracts the previous observation's value, not the previous year's value).

        Put differently, how can I difference a variable based on values of another variable?

        In my example, the difference in output over all sectors should be to the effect::

        Code:
        gen difference=total(value in year 2000)-total(value in year 1999)
        The final results should be:




        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(Amount Year Total sector Difference)
         12 1999  38 1   0
         23 1999  38 2   0
          3 1999  38 3   0
         23 2000 178 1 140
        123 2000 178 2   .
         32 2000 178 3 140
        end


        Please let me know if something is unclear.




        Thanks for your response.

        Comment


        • #5
          Chinmay,

          Thanks for thanking me for my responses, but up to now, I haven't been on this thread! I think you meant to thank Nick Cox, no?

          It appears that the variable Total is always constant. Is that correct? Also, I don't get why you want Difference to be missing value in observation 5, but 140 in observations 4 and 6.

          I also don't understand why you are averse to -xtset-ing your data to do this. -xtset- doesn't do anything irreversible: you can always run -xtset, clear- later if it is somehow getting in the way. But, anyhow, you can do it without -xtset- if there are no gaps in the year variable.
          Code:
          by sector (Year), sort: gen Difference = Total - Total[_n-1]
          replace Difference = 0 if missing(Difference)
          sort Year sector
          If there are gaps in Year, then it is more complicated, and, in fact you would be essentially writing a personalized version of the code that Stata Corp. has already created to handle this problem with the D. operator. So, to cover the possibility that Year has gaps you are unaware of (or will in some future version of this data);

          Code:
          xtset sector Year
          gen Difference = D.Total
          replace Difference = 0 if missing(Difference)
          xtset, clear // IF YOU REALLY DON'T LIKE HAVING IT -xtset-
          sort Year sector
          If the reason you are averse to using -xtset- is that the data are already -xtset- with some different panel or time variables and you don't want to lose that information, you can do this:

          Code:
          // SAVE CURRENT -xtset- VARIABLES IN LOCAL MACROS
          xtset
          local original_panel `r(panelvar)'
          local original_time `r(timevar)'
          
          // CALCULATE DIFFERENCE
          xtset sector Year
          gen Difference = D.Total
          replace Difference = 0 if missing(Difference)
          
          // RESTORE THE ORIGINAL -xtset-
          xtset `original_panel' `original_time'


          Comment


          • #6
            Clyde Schechter You are completely right; I had meant to thank Nick Cox

            I subconsciously thanked you out of force of habit.

            1) Yes you are completely right, the total term does not change.
            2) Datapoint number 5 should be 140, as you have correctly pointed out.

            I think I have left out a very important piece of information, which I just realized is essential to answering this question. When I was transcribing my original problem to the one listed above, I did not do it properly. Please accept my apologies.


            The dataset that I have in mind looks like the above, but it is by country, by year:


            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(Amount Year Total sector Country)
             12 1999  38 1 1
             23 1999  38 2 1
              3 1999  38 3 1
             23 2000 178 1 1
            123 2000 178 2 1
             32 2000 178 3 1
            123 1999 278 1 2
            123 1999 278 2 2
             32 1999 278 3 2
             13 2000 345 1 2
             32 2000 345 2 2
            300 2000 345 3 2
            end
            Each country produces goods in different sectors,for each year. The Total variable corresponds to the total output produced by a country (summed over sectors) for a given year. The variable that I wish to create is a lagged variable of total output produced by country. The dataset that I envision looks like:


            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(Amount Year Total sector Country Lag)
             12 1999  38 1 1   0
             23 1999  38 2 1   .
              3 1999  38 3 1   0
             23 2000 178 1 1  38
            123 2000 178 2 1  38
             32 2000 178 3 1  38
            123 1999 278 1 2   0
            123 1999 278 2 2   0
             32 1999 278 3 2   0
             13 2000 345 1 2 278
             32 2000 345 2 2 278
            300 2000 345 3 2 278
            end

            Also, this is the reason why I do not use the xtset command- there are repeated time values in the panel!
            Thanks for your help; I apologize for my mistake.
            Last edited by Chinmay Sharma; 10 Apr 2016, 10:41.

            Comment


            • #7
              OK, now it makes more sense. You just have to create a new panel variable that incorporates both country and sector:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(Amount Year Total sector Country Lag)
               12 1999  38 1 1   0
               23 1999  38 2 1   .
                3 1999  38 3 1   0
               23 2000 178 1 1  38
              123 2000 178 2 1  38
               32 2000 178 3 1  38
              123 1999 278 1 2   0
              123 1999 278 2 2   0
               32 1999 278 3 2   0
               13 2000 345 1 2 278
               32 2000 345 2 2 278
              300 2000 345 3 2 278
              end
              
              egen country_sector = group(Country sector), label
              xtset country_sector Year
              gen Difference = D.Total
              sort Country Year sector 
              
              list, noobs sepby(Country)

              By the way, apologies accepted!

              Comment

              Working...
              X