Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using absorb to create dummy variables

    Hello

    I have firm-pair-quarter panel data where firm-pairs are not repeated or duplicated (i.e., if pair A-B-2010Q1 is in the sample, pair B-A 2021Q1 will not be in the sample).

    I have the identifying information for each observation in a separate column, call it Firm I and Firm J.

    I want to use the absorb() function to identify all of the pairs where firm XYZ is either in the Firm I column OR the Firm J column.

    As best I can tell, the input into the absorb() function relies only on a single categorical variable. Is there some way for me to do this in stata without having to create unique indicators for each firm? I have over 20 million observations so I don't know if my stata would be able to handle me making the indicators by themselves.

    Thanks for any thoughts in advance!


  • #2
    Your question is very unclear. First, there is more than one Stata command that has an -absorb()- option, and that option works differently among them, so we don't know which one you are referring to. Secondly, I can't follow "I want to use the absorb() function to identify all of the pairs where firm XYZ is either in the Firm I column OR the Firm J column" in this context, as the -absorb()- option doesn't identify anything, at least as I understand the term, in any of the commands where it is allowed. So please explain in greater detail what you want to do.

    It almost surely will also help if you provide example data, and illustrate in the example data what kind of result you expect.

    The most useful way to show example data is by using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Below is an output of dataex for six observations in my data.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(PERMNO MATCH_PERMNO) str200 YEARQTR
      10001 10010 "20051"
      10001 10020 "20051"
      10010 10020 "20051"
      10001 10010 "20052"
      10001 10020 "20052"
      10010 10020 "20052"
      end
      In the first row, '10001' is the firm i identifier and '10010' is the firm j identifier. The third column denotes the time period. The absorb function I am using is from the reghdfe package.

      I want to create a dummy indicator variable that takes the value of 1 if '10001' is present in either the firm i OR firm j column (i.e., PERMNO or MATCH_PERMNO). I also want to create a dummy variable for if '10028' is present in either column, etc. In a final step, this indicator will be interacted with the YEARQTR variable to vary over time in my panel dataset.

      Here is what I want the dataset to look like with the final indicator variables (which I can preferably absorb due to space limitations):

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(PERMNO MATCH_PERMNO) str200 YEARQTR Firm10001_20051 Firm10010_20051 Firm10020_20051 Firm10001_20052 Firm10010_20052 Firm10020_20052
      10001 10010 "20051" 1 1 0 0 0 0
      10001 10020 "20051" 1 0 1 0 0 0
      10010 10020 "20051" 0 1 1 0 0 0
      10001 10010 "20052" 0 0 0 1 1 0
      10001 10020 "20052" 0 0 0 1 0 1
      10010 10020 "20052" 0 0 0 0 1 1
      end
      Please let me know if I can further clarify anything.

      Comment


      • #4
        Thank you. I pretty much get what you want to do now. The only part that remains unclear is what you mean by "etc." in "I also want to create a dummy variable for if '10028' is present in either column, etc." I will show you pseudocode that abstracts away from that issue, and most likely you will be able to tweak that into actual code to do what you illustrate in #3.

        The other thing is that your YEARQTR variable is not going to be useful in analysis because it is a string. So you also need to convert that into a real Stata internal format quarterly date.

        Code:
        gen year = substr(YEARQTR, 1, 4)
        gen quarter = substr(YEARQTR, 5, 1)
        destring year quarter, replace
        gen qdate = yq(year, quarter), after(YEARQTR)
        format qdate %tq
        drop year quarter
        
        local special_permnos 10001 10028 // "ETC."
        levelsof YEARQTR, local(dates)
        foreach sp of local special_permnos {
            foreach d of local dates {
                gen byte Firm`sp'_`d' = inlist(`sp', PERMNO, MATCH_PERMNO) & YEARQTR == `"`d'"'
            }
        }

        Comment


        • #5
          Thanks so much!

          Do I need to manually list permnos for the local special_permnos variable?

          There are over 1,800 permnos in my dataset. I wonder if there is some way to obtain unique values of PERMNOs from the PERMNO and MATCH_PERMNO columns in order to populate this local variable?

          Comment


          • #6
            Well, this is what I was getting at when I said I don't know what you mean by "etc." You singled out 10001 and 10028. I don't know how "etc." fills out the rest of the special PERMNOs. You didn't explain that in your response.

            If there is some regular pattern to which PERMNO's are used, then probably we can write some code that will reproduce that pattern without having to list them individually. Or, if it's something like all the PERMNOs that have some values of some other variables you have not shown me fit some criterion, then the -levelsof- command, with an appropriate -if- condition and -local()- option can create it for you. But if it's just an arbitrary subset of the PERMNOs with no pattern in either the PERMNO itself or the other variables, then, yes, you will have to list them all out.

            Comment


            • #7
              Sorry for the delay, and thanks for your help already.

              By construction, all but one unique PERMNO will be in the PERMNO variable (for each YEARQTR grouping, of which I have sixty).

              Is there an easy way to assign unique values of the PERMNO categorical variable to the local variable special_permnos and then add the extra unique values by hand to the local variable?

              Thanks in advance.

              Comment


              • #8
                By construction, all but one unique PERMNO will be in the PERMNO variable (for each YEARQTR grouping, of which I have sixty).

                Is there an easy way to assign unique values of the PERMNO categorical variable to the local variable special_permnos and then add the extra unique values by hand to the local variable?
                Sorry, but I don't understand this.

                If you want every value of PERMNO to be included in local macro special_permnos, that is done with
                Code:
                levelsof PERMNO, local(special_permnos)
                But that doesn't seem to be quite what you want, and I can't figure out the difference. But perhaps that helps you get started?

                Comment


                • #9
                  Yes, sorry, I will be more specific.

                  The levelsof code is correct. I want every value of PERMNO to be included.

                  However, I want additional values to be included, in addition to the values in PERMNO.

                  Specifically, if PERMNO has unique values 10000-11000, I want to add the additional values of 11001, 11002, 11003, 11004, and 11005 to the local variable special_permnos. These values are not in PERMNO.

                  Thanks

                  Comment


                  • #10
                    I think I have actually figured it out. Thanks so much for your help!

                    Comment

                    Working...
                    X