Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a Variable of a subset

    I like to create a variable ROA which contains the values for the year 2004-2005 out of the years 2004-2014.
    If I use the follwoing code with Stat 13 and I get the right variable with the ROA values for the YR 2004-2005 but missings for all other YR. And this is the Problem.

    Code:
    egen ROAh = ROA if YR ==2004 | YR ==2005
    MY aim is run a regression for the Years 2006-2014 with this ROAh Variable with contains ROA values of 2004-2005 as an control bariable but I with my creation ot the ROAh variable I get some problems as of the missing values an the listwise deletion of stata.

    Code:
     xtreg ROA LEV SI TAN GR ROAh, re vce(robust)
    Many thanks for any help

  • #2
    Contrary to your implication, the code you cite is illegal: you need gen (or any allowed abbreviation of generate), not egen

    The bigger puzzle is what you expect to happen when year is not 2004 or 2005. What values should the new variable then contain, precisely?

    Comment


    • #3
      Of course you are right. So I will create the new variable with gen. But then I still have the missing value problem. My question is that i like to have a ROA variable with controlls for the ROA effect of the the two years before YEAR 2004-2005. As I created it know I get an ommited. So I am looking for another posibility or a solution with can deal with this problem. Perhaps a dummy variable, but then have the problem of how I can create it. A Lagged Variable with the following comand does not result in the right way either.

      Code:
       gen ROAlg1 = ROA[_n-1]

      Comment


      • #4
        Sorry, but I don't understand what you want here.

        Comment


        • #5
          I am also plenty confused by the write up your question. Posting a extract of your data layout, plus a column with values for the variable you are trying to create might help. Read the FAQ poinst 10 through 12 in particular: http://www.statalist.org/forums/help#question


          That said, your code in #3 should be something like:

          Code:
          bys ID year: gen ROAlg1 = ROA[_n-1]
          with ID year being whatever variables you use in xtset


          edit: note that this is entirely different form your code in #1, which simply copies values of ROA to ROAh for years 2004 and 2005. There is no lag or any other operation included in that code.
          Last edited by Jorrit Gosens; 05 Feb 2016, 07:07.

          Comment


          • #6
            Iam sorry for confusing you and try to ask my question again:

            this is my panelregression for the panel over the years 2005-2014
            Code:
            xtset NAL YR
            Code:
            xtreg ROA LEV SI TAN GR ROAh, re vce(robust)
            Datasample:
            Code:
            input int YR float(LEV ROA)
            2006       .17195     .32441425
            2006    .04275328      .1584982
            2006            0     .13191599
            2006            .             .
            2006     .4046698     .17859454
            2006    .13331167     .13362427
            2006     .3514738     .10692054
            2006            .             .
            2006            .             .
            2006     .1380531     .07905605
            2006   .005745375     .25099224
            2006     .1677741      .1118494
            2006            .             .
            2006     .1940078      .1239504
            2006      .321362     .13731825
            2006            0     .16562533
            2006            .             .
            2006    .19064733       .203278
            2006    .53590167     .12990163
            2006    .05260342      .0635585
            2006     .3501284     .24775353
            2006    .20355543     .11819786
            2006     .3515536     .10167561
            2006     .3339846     .13436419
            2006     .2732947     .16149776
            2006    .25581458     .16372384
            2006    .08232636     .02801413
            2006     .3264541     .15814626
            2006     .2154179     .15318905
            2006    .25025356     .18703516
            2006            .             .
            2006     .3095583      .0301685
            2006    .20481825     .14222625
            2006    .52872694     -.3758024
            2006    .12478518      .1753002
            2006            .             .
            2006     .3322882    -.04657931
            2006     .3068142     .14068739
            2006            0      .2480999
            2006            0      .2480999
            2006    .14791071     .06125066


            As already mentioned I try to create the ROAh variable but have my difficulties with this. The ROAh Variable should be a variable containing the ROA values of the prevoius years = mean over the prevoius years 2004 and 2005. As I also have this data. Therefore I tried it with following comand but then have the missing value problem. As I canot regress the ROAh Variable as I have created it.

            Code:
            gen ROAh = ROA if YR ==2004 | YR ==2005
            Does anybody have an advice how to create the varibale I like to have?

            Hope that it is more understandablenow?

            Comment


            • #7
              Getting closer, but not quite close enough yet. The confusion is what rules you want to apply to create ROAh.

              Examples:
              1) The ROAh of NAL=A in year 2006 should be the average of ROA of NAL=A in years 2004 and 2005. The ROAh of NAL=A in year 2014 should also have be the average of ROA of NAL=A in years 2004 and 2005.

              2) The ROAh of NAL=A in year 2006 should be the average of ROA of NAL=A in years 2004 and 2005. The ROAh of NAL=A in year 2014 should be the average of ROA of NAL=A in years 2012 and 2013.

              3) The ROAh of all NAL in year 2006 should be the average of ROA of all NAL in years 2004 and 2005. The ROAh of all NAL in year 2014 should also have be the average of ROA all NAL in years 2004 and 2005.

              4) The ROAh of all NAL in year 2006 should be the average of ROA of all NAL in years 2004 and 2005. The ROAh of all NAL in year 2014 should be the average of ROA all NAL in years 2012 and 2014.


              Or something like that. Even better would be to make a new column in your data extract with ROA and the ROAh that you are attempting to create, so that people can see the logic in them.

              Comment


              • #8
                Many thanks. I got now my missing link. Becuase I have understand now your question reagrding the filling rule of the ROAh Value for all the years (2005-2014) to run a panel regression. Your first expression seems to be in the direction I am looking for. If this ist the seolution, I can get it so many thanks.

                By the way. My aim ist to test with ROAh a hypothesis called "history". Which means I will test if the past mean ROAh for the year 2004-2005 have an affect on the ROA of the years 2006-2014. So I am not sure if I am thinking in the wrong direction or if I am completly wrong in the way I like to do it no if it is possible?



                Comment


                • #9
                  Easiest way to get to that is probably with 'carryforward', a package from scc.
                  1)install that, 2)calculate mean value of 2004/2005 by id, filling it in for observations of year 2004 only and 3) copy that value to other years, by doing:

                  Code:
                  ssc install carryforward
                  bys NAL YR: gen ROAh = (ROA + ROA[_n+1])/2 if YR==2004 
                   by NAL: carryforward ROAh, replace
                  That said, I'm not too convinced this is a good idea. If ROA were absolute values, this might make some sense: larger firms might get larger absolute returns. In that case, it would be a far better idea to divide ROA by e.g., some indicator of company size or value. If ROA are percentages, than comparing with values realized in 2004/2005 seems a bad idea. Companies that perform badly in year 2004 are probably more likely to perform badly in 2005 again, at least more likely than companies that performed well in 2004. But would performance in years 2004/05 really have any influence on what happens up to 10 years later? If that is exactly the question you seek to answer, this is the way to do it. If you are trying to 'correct' for something, I would say leave this variable out of your model. If I would attempt to account for 'historic performance' in such an exercise, I would probably look at performance of e.g., the last three years, so the averge of years 2004-2006 for year 2007, and 2011-2013 for year 2014. If you want to try that, use below code.
                  Code:
                   bys NAL YR: gen ROAh = (ROA[_n-1]+ROA[_n-2]+ROA[_n-3])/3 

                  Comment


                  • #10
                    Many thanks for your help Jorrit. I will think about the best way and if it really makes sense to test this historic hypothesis as you mentioned.
                    Have a good weekend!

                    Comment

                    Working...
                    X