Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data multiplication

    Dear forum,

    I had a question on the multiplication of values in panel data. I have a dataset containing firms with numerous observations per firm. I want to estimate the following: current Real estate value= real estate at cost * (CPI1975/CPIpurchase) * (Real estate index/ Real estate index 1975)

    So to basically if the building was bought pre 1975 I would use the CPI index ratio to inflate the real estate value at cost to 1975 and from then I would inflate it by the real estate index. I tried to run the following code:

    replace AnnualCPI=1 if purchaseRNICK>1974
    replace CPIpurchase=1 if purchaseRNICK>1974
    gen CPITotal=AnnualCPI/CPIpurchase

    gen REITotal1975=REIfyear/REindex1975
    gen Index=CPITotal*REITotal1975


    However the problem Im facing is that pre 1975 STATA cannot execute the multiplication due the missing values of the Real estate index pre 1975 so I basically only get the real estate index numbers from 1975 onwards. How can I excecute the formula?

    Can you guys help me?

    Kind regards

    ps the formula is attached as wel as the data snapshot

  • #2
    Well, the data snapshot you posted doesn't look like the code you posted, so it's hard to know where to begin. There is no variable purchaseRNICK in the snapshot. You also refer in your description to a building being bought pre or post 1975, but there is no variable that, by its name, would seem to indicate what year a building was purchased in. The only variable in the data snapshot that looks like a year is called REI1975, Maybe that's it? There is no variable REIFyear in the data snapshot, although you refer to it in your code. Anyway, with all the confusion, here it seems difficult to give a suggestion that will be reliable.

    So, please repost your question in a clearer way. Place your code in code blocks. (See FAQ #12 if you don't know how to do this.) Also, don't use screenshots to show data: you can't copy and paste from them, so if we want to try out some code on your data, this isn't helpful. Obtain the -dataex- program from SSC (-ssc install dataex-), read its help file (-help dataex-) and then use that to generate a code block that contains instructions to replicate (a relevant sample of) your data set. If your data set's variables don't contain variable labels that explain what the variable names mean, then please also provide a glossary.

    Comment


    • #3
      Sorry! Thanks for the advice.
      This is the code I'm running :
      Code:
      // purchase year rounded
      
      gen purchaseR=trunc(purchase)
      sort gvkey
      by gvkey: egen purchaseRNICK= max(purchaseR)
      
                              //  REvalues
      
      
      gen CPI1975=1975 if purchaseRNICK<1975
      joinby CPI1975 using E:\Tommieboy\CPI.dta, unmatched(both)
      drop _merge
      joinby purchaseRNICK using E:\Tommieboy\CPIpurchase.dta, unmatched(both)
      
      replace AnnualCPI=1 if purchaseRNICK>1974
      replace CPIpurchase=1 if purchaseRNICK>1974
      
      drop _merge
      joinby state purchaseRNICK using E:\Tommieboy\REI.dta, unmatched(both)
      drop _merge
      joinby state fyear using C:\Users\Anky\Downloads\REI2.dta, unmatched(both)
      
      drop _merge
      gen REI1975=1975 if purchaseRNICK>1974
      joinby state REI1975 using E:\Tommieboy\REI4.dta, unmatched(both)
      drop _merge
      
      gen CPITotal=AnnualCPI/CPIpurchase
      gen REITotal1975=REIfyear/REindex1975
      gen Index=CPITotal*REITotal1975
      As for the data I uploaded a DTA file as my computer would hang if I used the dataex method.
      PurchaseRNICK is basically the rounded purchase year.
      Annual CPI is the CPI level merged with with fyear (fiscal year)
      CPI1975 is the CPI level in 1975
      REIindex1975 is the Real estate index in 1975
      REIfyear is the Real Estate Index merged with the fiscal year

      Hope this clarifies it a bit.
      Attached Files

      Comment


      • #4
        Maybe I'm dense but I still don't understand your data and how it relates to the code. I do appreciate the posting of a .dta attachment: that works.

        So let me try to propose a solution using variable names that I can understand and perhaps you can adapt it to your situation.

        purchase_year is the year in which the building was purchased
        purchase_price is the price at which the building was purchased
        cpi_year is the CPI in the year in which the building was purchased
        rei_year_state is the REI for that state in the year the building was purchased (not defined, hence missing value, if purchase_year < 1975)
        rei_1975_state is the REI for that state in 1975

        In addition to these variables, there is a constant value of the CPI in 1975, which I will assume is stored in a local macro called cpi1975 (though I have the impression you are storing it in a non-varying variable called CPI1975).

        All of the above seem to be somewhere in your data under possibly different names.

        Now, if you are trying to get everything in terms of 1975 dollars, then I believe the code you want is:

        Code:
        gen real_estate_value = purchase_price * `cpi1975'/cpi_year if purchase_year < 1975
        replace real_estate_value = purchase_price * rei_1975_state/rei_year_state if year >= 1975
        If that's not what you're trying to do, I'm afraid I don't know where you're heading or what to suggest.

        Comment


        • #5
          Well that's what I thought at first but see the formula in the first post; it is a muliplication of those variables. I will try to elaborate more. What I actually have are two formulas to calculate the REvalue the first one is stated above and is used for RE with a purchase date prior to 1975 and a second one which is for RE post 1975. The second part, for the RE with purchase date post 1975, I managed to do already. However the first part is giving me quite some trouble. The goal is to create a single "TotalIndex", which includes both pre and post 1975 indices.

          Some more information on the first part: What I'm basically trying to do is calculate the current Real estate value for pre1975 purchase dates so I take the real_estate value at cost(t=0) and inflate it using the CPI and REI indices.
          For instance say the building is purchased in 1970 I first have to inflate it using the CPI index to 1975, as no REI is available, then take that value and inflate it using the REI index ratio (as this is only available from 1975). That way we can estimate the RE value now. However, as this is a multiplication of the two I dont know how to make this a single column for pre 1975. If i have that index i can replace the pre 1975 values in the formula for the REpost1975 index using your method.

          Hope this clarifies it a bit more

          Comment


          • #6
            Well, then I don't get your "multiplication" approach. It seems, at best, unnecessarily complicated. To get what you want, there needs to be one more variable:

            rei_2016_state is the value of REI for 2016 in the state of purchase.

            Then use this approach instead of my code in #4:
            Code:
            gen real_estate_value = purchase_price*(`cpi1975'/cpi_year)*rei_2016_state/rei_1975_state if purchase_year < 1975
            replace real_estate_value = purchase_price*rei_2016_state/rei_year_state if purchase_year >= 1975

            Comment


            • #7
              I completely agree with you, however its part of my assignment. I tried it that way but prior to 1975 there is no REI data. So what stata doesn't do is it doesn't match the REI part to the firm (and state) to generate a good value. Now it multiplies CPI data with missing values. I cant find a way to put the above formula into stata commands.

              Comment


              • #8
                If you do it the way I suggest in #6, the absence of REI data prior to 1975 makes no difference, because the only REI values used in the -gen- command are the value for 2016 and the value for 1975.

                If your assignment is to create a multiplier, then you can back into it from the code in #4 as follows:

                Code:
                gen index = (`cpi1975'/cpi_year)*rei_2016_state/rei_1975_state if purchase_year < 1975
                replace index = rei_2016_state/rei_year_state if purchase_year >= 1975

                Comment


                • #9
                  Tried your code and this is what I get, see DTA file. In my eyes it doesn't make sense.
                  Attached Files

                  Comment


                  • #10
                    Please show the data set you started with, the exact code you used (don't retype it: copy and paste from your Results window or log file into the Forum editor), and I'll try to figure out what's going on. I strongly suspect that whatever code you used is not my code translated into your variable names, because with my code there is no possibility of getting a missing value for index.

                    Comment


                    • #11
                      For the data https://onedrive.live.com/redir?resi...int=file%2cdta
                      Sorry to do it this way but the file was probably to big.
                      As there is only depreciation available in 1993, we use that number to calculate the depreciation ratio and therefore age. Attached are the other files needed for the code.
                      The log is as follows:
                      Code:
                      . do "C:\Users\Anky\AppData\Local\Temp\S
                      > TD02000000.tmp"
                      
                      . //Sample selection
                      . use "E:\Laatste Data.dta", clear
                      
                      . //Destring
                      .
                      . quietly destring, replace
                      
                      .
                      . // remove duplicates
                      . sort gvkey fyear
                      
                      . quietly by gvkey fyear:  gen dup = con
                      > d(_N==1,0,_n)
                      
                      . drop if dup>0
                      (52014 observations deleted)
                      
                      .
                      . //Drop non USA
                      . keep if loc=="USA"
                      (24383 observations deleted)
                      
                      . drop if at==0
                      (646 observations deleted)
                      
                      . drop if at==.
                      (31301 observations deleted)
                      
                      . // SIC codes removal
                      .
                      . generate ind2d = int(sic/100)
                      
                      . drop if ind2d== 10
                      (1211 observations deleted)
                      
                      . drop if ind2d== 11
                      (0 observations deleted)
                      
                      . drop if ind2d== 12
                      (322 observations deleted)
                      
                      . drop if ind2d== 13
                      (5656 observations deleted)
                      
                      . drop if ind2d== 14
                      (250 observations deleted)
                      
                      . drop if ind2d== 15
                      (821 observations deleted)
                      
                      . drop if ind2d== 16
                      (491 observations deleted)
                      
                      . drop if ind2d== 17
                      (445 observations deleted)
                      
                      . drop if ind2d== 60
                      (192 observations deleted)
                      
                      . drop if ind2d== 61
                      (1694 observations deleted)
                      
                      . drop if ind2d== 62
                      (1169 observations deleted)
                      
                      . drop if ind2d== 63
                      (1959 observations deleted)
                      
                      . drop if ind2d== 64
                      (698 observations deleted)
                      
                      . drop if ind2d== 65
                      (1725 observations deleted)
                      
                      . drop if ind2d== 66
                      (0 observations deleted)
                      
                      . drop if ind2d== 67
                      (5070 observations deleted)
                      
                      .
                      . // Active / non active
                      .
                      . gen year = year(datadate)
                      
                      . drop if (year==1993 & at==.)
                      (0 observations deleted)
                      
                      . gen active=1
                      
                      . replace active=fyear if fyear==1993
                      (6120 real changes made)
                      
                      . sort gvkey
                      
                      . by gvkey : egen active2 = max(active)
                      
                      . drop if active2==1
                      (53608 observations deleted)
                      
                      . drop active2
                      
                      .
                      . // remove three consecutive years
                      .
                      . sort gvkey fyear
                      
                      . by gvkey : drop if _N<3
                      (608 observations deleted)
                      
                      .
                      . // drop gaps
                      . bysort gvkey (fyear) : drop if _N < (f
                      > year[_N] - fyear[1] + 1)
                      (2322 observations deleted)
                      
                      .
                      . // replace missing values with 0 PPEs
                      .
                      . replace fatb=0 if fatb==.
                      (20441 real changes made)
                      
                      . replace fatc=0 if fatc==.
                      (15897 real changes made)
                      
                      . replace fatp=0 if fatp==.
                      (20355 real changes made)
                      
                      .
                      .                                 // Con
                      > struction at cost values
                      .
                      . // generate Total Marktet Value Power
                      > Plant Equipment (PPE) to restrict it t
                      > o 1993
                      .
                      . gen totalppe=fatb + fatc + fatp
                      
                      . gen PPErest=totalppe if fyear==1993
                      (66450 missing values generated)
                      
                      . by gvkey: egen PPEFin= max(PPErest)
                      
                      .
                      . //Calculating Age
                      .
                      . gen depratio= dpacb/fatb
                      (67813 missing values generated)
                      
                      . gen age= depratio * 40
                      (67813 missing values generated)
                      
                      .
                      . //Purchase year
                      .
                      . gen purchase=1993-age
                      (67813 missing values generated)
                      
                      .
                      . // purchase year rounded
                      .
                      . gen purchaseR=trunc(purchase)
                      (67813 missing values generated)
                      
                      . sort gvkey
                      
                      . by gvkey: egen purchaseRNICK= max(purc
                      > haseR)
                      (47387 missing values generated)
                      
                      .
                      .                                       
                      >           //  REvalues
                      .
                      . //Get AnnualCPI
                      . gen CPI1975=1975 if purchaseRNICK<1975
                      (68585 missing values generated)
                      
                      . joinby CPI1975 using E:\Tommieboy\CPI.
                      > dta, unmatched(both)
                      
                      . drop _merge
                      
                      . joinby purchaseRNICK using E:\Tommiebo
                      > y\CPIpurchase.dta, unmatched(both)
                      
                      .
                      . replace AnnualCPI=1 if purchaseRNICK>1
                      > 974
                      (68607 real changes made)
                      
                      . replace CPIpurchase=1 if purchaseRNICK
                      > >1974
                      (68607 real changes made)
                      
                      .
                      . drop _merge
                      
                      . // Get REIpurchase
                      . joinby state purchaseRNICK using E:\To
                      > mmieboy\REI.dta, unmatched(both)
                      
                      . drop _merge
                      
                      . // get REIfyear
                      . joinby state fyear using C:\Users\Anky
                      > \Downloads\REI2.dta, unmatched(both)
                      
                      . //Get REI1975
                      . drop _merge
                      
                      . gen REI1975=1975 if purchaseRNICK>1974
                      (3522 missing values generated)
                      
                      . joinby state REI1975 using E:\Tommiebo
                      > y\REI4.dta, unmatched(both)
                      
                      . drop _merge
                      
                      .
                      . gen CPITotal=AnnualCPI/CPIpurchase
                      (4341 missing values generated)
                      
                      . gen REITotal1975=REIfyear/REindex1975
                      (7144 missing values generated)
                      
                      . gen Index=CPITotal*REITotal1975 if purchaseRNICK<1975
                      (76461 missing values generated)
                      
                      . replace Index =REIfyear/REIpurchase if purchaseRNICK>=1975
                      (21174 real changes made)
                      Hope this helps
                      Attached Files

                      Comment


                      • #12
                        OK. Your data is rather oddly organized. The five attached files are really just two distinct ones: the three REI files are the same and the two CPI files are the same, just with different names for the variables. The original big data set is, to put it mildly, messy, and I have serious questions about how you selected your sample. Even with all the selections you make, you still end up with a data set that has lots of missing values for key variables such as purchaseRNICK. I've dropped most of the observations that can't be used. See comments in the code below. Your code could also be simplified by using a few loops, or making use of the inlist() function, etc. But I won't go into that as it's more stylistic than anything else. Your use of -joinby- to combine data sets happens to work here given the alignment of the data sets, but it is not usually the correct way to do this. So in the code below I have used the -merge- command for this purpose. I believe the code below does what you need:

                        Code:
                        //Sample selection
                        use stalist, clear
                        
                        //Destring
                        quietly destring, replace
                        
                        // remove duplicates
                        // THIS IS VERY DANGEROUS.  THE OBSERVATIONS THAT ARE DUPLICATES
                        // ON gvkey AND fyear ARE NOT THE SAME ON OTHER VARIABLES, SO YOU ARE
                        // MAKING AN ARBITRARY, AND IRREPRODUCIBLE SELECTION OF ONE SUCH RECORD
                        // FOR EACH GVKEY AND YEAR COMBINATION.  (RUN -duplicates report- TO DEMONSTRATE THIS.
                        
                        // YOU NEED TO UNDERSTAND THE DATA WELL ENOUGH THAT YOU CAN DETERMINE
                        // WHICH OF THE OBSERVATIONS IS THE RIGHT ONE FOR YOUR ANALYSIS (OR IF PERHAPS
                        // THEY NEED TO BE COMBINED IN SOME WAY).  BUT JUST PICKING ONE ARBITRARILY MAKES
                        // NO SENSE.  EACH TIME YOU RUN THIS, YOU WILL GET DIFFERENT RESULTS.
                        sort gvkey fyear
                        quietly by gvkey fyear:  gen dup = cond(_N==1,0,_n)
                        drop if dup>0
                        
                        
                        //Drop non USA
                        keep if loc=="USA"
                        drop if at==0
                        drop if at==.
                        
                        // SIC codes removal
                        generate ind2d = int(sic/100)
                        
                        drop if ind2d== 10
                        drop if ind2d== 11
                        drop if ind2d== 12
                        drop if ind2d== 13
                        drop if ind2d== 14
                        drop if ind2d== 15
                        drop if ind2d== 16
                        drop if ind2d== 17
                        drop if ind2d== 60
                        drop if ind2d== 61
                        drop if ind2d== 62
                        drop if ind2d== 63
                        drop if ind2d== 64
                        drop if ind2d== 65
                        drop if ind2d== 66
                        drop if ind2d== 67
                        
                        // Active / non active
                        gen year = year(datadate)
                        drop if (year==1993 & at==.)
                        gen active=1
                        replace active=fyear if fyear==1993
                        
                        sort gvkey
                        by gvkey : egen active2 = max(active)
                        
                        drop if active2==1
                        drop active2
                        
                        // remove three consecutive years
                        sort gvkey fyear
                        by gvkey : drop if _N<3
                        
                        // drop gaps
                        bysort gvkey (fyear) : drop if _N < (fyear[_N] - fyear[1] + 1)
                        
                        // replace missing values with 0 PPEs
                        replace fatb=0 if fatb==.
                        
                        replace fatc=0 if fatc==.
                        
                        replace fatp=0 if fatp==.
                        
                        // Construction at cost values
                        // generate Total Marktet Value Power Plant Equipment (PPE) to restrict it to 1993
                        gen totalppe=fatb + fatc + fatp
                        gen PPErest=totalppe if fyear==1993
                        by gvkey: egen PPEFin= max(PPErest)
                        
                        //Calculating Age
                        gen depratio= dpacb/fatb
                        gen age= depratio * 40
                        
                        //Purchase year
                        gen purchase=1993-age
                        // purchase year rounded
                        gen purchaseR=trunc(purchase)
                        
                        sort gvkey
                        by gvkey: egen purchaseRNICK= max(purchaseR)
                        drop if missing(purchaseRNICK) // THERE ARE TONS OF THESE. DUE TO MISSINGNESS OF age
                        
                        //    AT THIS POINT I HAVE YOUR SELECTED DATA SET
                        //    NOW LET'S SAVE IT AND BRING IN REI
                        tempfile purchases
                        save `purchases'
                        
                        use REI, clear
                        isid state purchaseRNICK, sort
                        // MERGE BACK THE PURCHASES DATA
                        merge 1:m state purchaseRNICK using `purchases'
                        assert _merge == 2 if purchaseRNICK < 1975
                        // NOTE SOME UNMATCHED OBSERVATIONS AFTER 1975
                        table state purchaseRNICK if _merge == 2 & purchaseRNICK > 1975
                        // CALCULATE REI IN 2014 FOR EACH STATE
                        by state, sort: egen REI2014 = max(cond(purchaseRNICK == 2014, REIpurchase, .))
                        // AND FOR 1975
                        by state: egen REI1975 = max(cond(purchaseRNICK == 1975, REIpurchase, .))
                        // ELIMINATE EXCESS OBSERVATIONS FROM REI WITH NOTHING IN purchases
                        drop if _merge == 1
                        drop _merge
                        save `"`purchases'"', replace
                        count if missing(purchaseRNICK)
                        
                        // NOW BRING IN CPI
                        use CPIPurchase, clear
                        
                        // MERGE TO ORIGINAL DATA
                        merge 1:m purchaseRNICK using `purchases'
                        tab purchaseRNICK if  (purchaseRNICK <= 1975) != (_merge == 3)
                        //     OBTAIN CPI IN 1975 AND STORE IN LOCAL MACRO
                        summ CPIpurchase if purchaseRNICK == 1975, meanonly
                        local cpi1975 = r(mean)
                        drop if _merge == 1
                        drop _merge
                        
                        //    NOW DO CPI AND REI ADJUSTMENTS
                        gen cpi_adjustment = `cpi1975'/CPIpurchase if purchaseRNICK <= 1975
                        replace cpi_adjustment = 1 if purchaseRNICK > 1975
                        
                        gen rei_adjustment = REI2014/REIpurchase if purchaseRNICK > 1975
                        replace rei_adjustment = REI2014/REI1975 if purchaseRNICK <= 1975
                        
                        gen index = cpi_adjustment*rei_adjustment
                        The only observations for which index is missing are those for NY in 1917 (where there is no CPI data) and PR in 1985 (because the REI data set doesn't include anything for PR).



                        Comment


                        • #13
                          I can imagine that it is quite messy! Unfortunately my STATA knowledge is very limited, as you can probably tell. For the data selection I followed the specified conditions in my assignment. I have a question though, with
                          Code:
                          by state, sort: egen REI2014 = max(cond(purchaseRNICK == 2014, REIpurchase, .))
                          you are fixing the REI to 2014, right? Isn't this supposed to be dependent on t, according to the formula?

                          I can't stress enough how deeply I appreciate your help, as I can't ask anyone else for help. Thank you for the help so far!

                          Comment


                          • #14
                            Code:

                            by state, sort: egen REI2014 = max(cond(purchaseRNICK == 2014, REIpurchase, .))
                            you are fixing the REI to 2014, right? Isn't this supposed to be dependent on t, according to the formula?
                            In one of your early posts, you stated that the point of this was to adjust the purchase prices to current. While current means 2016, 2014 is the latest year for which you have REI, so we can't take it any further than that. The way in which the index depends on the year of purchase is in the commands:

                            Code:
                            gen rei_adjustment = REI2014/REIpurchase if purchaseRNICK > 1975
                            replace rei_adjustment = REI2014/REI1975 if purchaseRNICK <= 1975
                            (where REIpurchase is the REI during the year of purchase). So the REI part of the adjustment uses the ratio of current year index to index in year of purchase or 1975 (whichever is later).

                            Comment


                            • #15
                              I understand, with current I meant at time t. As the next part of the"assignment" is to perform a time series regression with uses the RE Valuei,t as one of the independent variables so I believe it needs to be dependent on fyear. Sorry for the confusion.
                              Last edited by Mike Trott; 24 Jan 2016, 13:37.

                              Comment

                              Working...
                              X