Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data-Creating dummy variable

    Hi, I am trying to create a dummy variable in my panel data set based on two criteria. I have 50 ids (bank) and 80 quarters(1994q1-2013q4). I need to create a dummy variable( dummy=1 if asset>$1 billion, dummy=2 if 1million>asset>$0.5milion..) for each bank based on their asset value by 2013q4. Any help would be appreciated.
    Thanks,
    Dimet

  • #2
    Dummy variables are usually coded 0 and 1. Does the variable that you wish to create take other values besides 1 and 2?
    Last edited by Friedrich Huebler; 05 Jul 2015, 00:04.

    Comment


    • #3
      Implicit in Firedrich Huebler's remark is that is it better to create two dummy variables rather than one.

      Comment


      • #4
        I would describe Dimet's variable (coded 1 or 2) as a categorical variable rather than a dummy variable, and would use Stata's factor variable notation (see help fvvarlist) to include dummy variables as part of an analysis. So I don't see any need to code it as 0/1, and I think it's better to create no dummy variables rather than one or two.

        With that said, none of us have addressed Dimet's underlying question, how to create the variable(s) in question, however it is coded.

        Dimet, I don't quite understand what is required. Is it the case that your variable will, for a given bank, have the same value for every quarter, based on that bank's assets in 2014q4? Could you run xtset and describe id quarter asset and copy and paste the Stata log into a CODE block (see section 12 of the Statalist FAQ linked to at the top of the page for instructions on presenting Stata output) to give the readers a better understanding of your data?

        Comment


        • #5
          Originally posted by William Lisowski View Post
          I would describe Dimet's variable (coded 1 or 2) as a categorical variable rather than a dummy variable, and would use Stata's factor variable notation (see help fvvarlist) to include dummy variables as part of an analysis. So I don't see any need to code it as 0/1, and I think it's better to create no dummy variables rather than one or two.

          With that said, none of us have addressed Dimet's underlying question, how to create the variable(s) in question, however it is coded.

          Dimet, I don't quite understand what is required. Is it the case that your variable will, for a given bank, have the same value for every quarter, based on that bank's assets in 2014q4? Could you run xtset and describe id quarter asset and copy and paste the Stata log into a CODE block (see section 12 of the Statalist FAQ linked to at the top of the page for instructions on presenting Stata output) to give the readers a better understanding of your data?
          Thanks for the responses. I am trying to categorize banks as Large, Medium, and Small size based on their asset value by 2013q4.
          I used the code below which categorizes banks but I want to take 2013q4 as a reference year. That is the problem. How can I take 2013q4 as reference year in grouping banks? Thanks!
          bysort assetrank:gen bgroup = 1 if (asset > 300000000)
          replace bgroup = 2 if 300000000 >asset>100000000
          replace bgroup = 3 if asset<100000000

          Comment


          • #6
            William Lisowski:
            I would describe Dimet's variable (coded 1 or 2) as a categorical variable rather than a dummy variable
            I agree. But I had at the back of my mind that Dimet was wanting to create an explanatory dummy variable, in which case categorical variables are a bad choice implying that the change from one state to the next is constant.
            If it is to create a dependent variable, then I leave it up to William

            Comment


            • #7
              I think something like this might do, assuming id is your bank id variable and quarter is your time variable. If you have further questions, first please post the description of your data that I requested. This code relies on 2013q4 being the final observation for each bank.

              Code:
              generate bgroup = 1 if (asset > 300000000)
              replace bgroup = 2 if 300000000 >asset>100000000 
              replace bgroup = 3 if asset<100000000
              bysort id (quarter): replace bgroup = bgroup[_N]

              Comment


              • #8
                Eric, I am not arguing against using dummy variables as explanatory variables, but rather against creating them directly. I'm specifically referring to the discussion in 11.4..3 of the Stata User's Guide, "Factor variables". I agree that including Dimet's bgroup (as we now understand it) as an independent variable in a regression would be a bad idea; I would instead include i.bgroup as an independent variable. Factor variable operators (and their cousins, time series operators) are one of the coolest features of Stata.

                Comment


                • #9
                  Originally posted by William Lisowski View Post
                  I think something like this might do, assuming id is your bank id variable and quarter is your time variable. If you have further questions, first please post the description of your data that I requested. This code relies on 2013q4 being the final observation for each bank.

                  Code:
                  generate bgroup = 1 if (asset > 300000000)
                  replace bgroup = 2 if 300000000 >asset>100000000
                  replace bgroup = 3 if asset<100000000
                  bysort id (quarter): replace bgroup = bgroup[_N]
                  Thanks William. This code worked fine since 2013q4 is the last final observation for each bank. However, I wonder how would you modify the code if I take another quarter as a reference, such as 2008q4?

                  Code:
                  xtset
                         panel variable:  assetrank (unbalanced)
                          time variable:  quarter, 1994q1 to 2013q4
                                  delta:  1 quarter
                  Code:
                                storage   display    value
                  variable name   type    format     label      variable label
                  ------------------------------------------------------------------------------------------
                  id              int     %10.0g                assetrank
                  quarter         float   %tq                  
                  asset           long    %12.0f                ASSET

                  Comment


                  • #10
                    I think that the following will do the trick, although you might want to add some checking to verify that every id has asset data in 2008q4.
                    Code:
                    generate bgroup = 1 if (asset > 300000000)
                    replace bgroup = 2 if 300000000 >asset>100000000
                    replace bgroup = 3 if asset<100000000
                    replace bgroup = . if quarter != yq(2008,4)
                    bysort id: egen temp = total(bgroup)
                    replace bgroup = temp
                    Also note that, having copied the initial three statements from your example, they result in an asset value of precisely 300000000 or 100000000 having a missing value for bgroup. You would perhaps be better off using a recode command to replace the initial generate and two replace commands.

                    Comment


                    • #11
                      Note that an overall view of such calculations can be found in http://www.stata-journal.com/article...article=dm0055

                      There is much to be said for laying the code slowly and carefully as in William's example. Once you understand the principles it can be wrapped up more concisely too. I have changed the inequalities arbitrarily to catch exact equalities with the boundaries.

                      Code:
                        
                      generate group = cond(asset >= 300e6, 1, cond(asset >= 100e6, 2, 3))
                      egen bgroup = total(group / (quarter == yq(2008,4))), by(id)

                      Comment

                      Working...
                      X