No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Portfolio analysis, measuring re-occurence and making panel data from repeated time values within panel (r451)


    First post, here goes:

    I have data as follow:
    Firm_ID Year SICCode EquityAmount Invested Total Amount Inv. by the firm Target company_ID Co-investment Count of Patents
    1 1995 7372 54.4 1500 10001 2 400
    1 1995 4565 8.7 1500 10003 1 440
    1 1996 6383 7.9 1500 10007 1 528
    1 2001 1781 15.4 1500 10012 1 652
    2 1995 7372 29.9 1480 10001 2 150
    2 2003 9773 22.9 1480 10005 2 175
    3 1996 7372 77.8 980 10001 3 8129
    3 1997 9444 139.9 980 10002 1 8129
    3 2001 9773 48,8 980 10005 1 9220
    1. This data is of Larger firms/investment portfolios (indicated by Firm_ID) investing in smaller companies (Target Company_ID).

    2. The assigned SIC(Standard Industrial Classification Codes) are the codes of the targeted companies. SIC codes vary between 1.000 and 10.000 and indicate an industry.

    3. As you can see the data is unbalanced, Firm_ID 1 has 4 observations, 2 has 2 obs and 3 has 3 obs.

    4. The time values are sometimes repeated for multiple firms (e.g. first 2 rows). In this case with investments in the company 10001 and 10003 are both conducted in 1995, but sometimes multiple investments are conducted in the same company as multiple rounds of investment, so company 10003 might as well be another 10001.

    5. There is a variable named Co-investments, this variable shows 2 in the first row because in 1995 Firm_ID 1 and Firm_ID 2 invested in this company as 2 investors. It shows 1 when there is only 1 investor in the data set invested in that specific company on that specific date. However, in 1996 Firm_ID decides to invest in the same company that Firm_ID 1 & 2 were invested in a year earlier. That is why it has a 3.

    My goals and scream for your help:

    1. I want to analyze how the SIC dispersion in the portfolio of a firm influences Variable Y(patents).
    The theory states that the larger the distance (variance?) of SIC from the mean or the yearly mean, the stronger the growth of Patents. So the more explorative a Firm becomes by investing in distant SIC codes relative to each other: 1781, 6383, 4565, 7372, the more patents it purchases it. What would be the right measure to measure portfolios dispersion in SIC codes? variance, squared percentage growth or other measures? which commands would I apply?
    Eventually, I want to be able to regress the growth or this dispersion with the growth of patents to show the relationship.

    2. Variable/feature vector 7 "Co-investment" is non-existing, I would like to generate it ... I am guessing through variable Firm_ID and Target Company and measuring re-occurrence? Is there a command for this in STATA? This is going to be a dummy moderator ... something that will add to the explorative power of an investment

    3. How do I order/structure my dataset so that I have a constant time variance, so time in equally spaced points in order for my data to become panel data and ready for regression? right now I get error r(451), repeated time value.
    I have only around 2500 observations for 19 firms and am not keen on omitting much of the data, considering that some portfolios only consist of 59 investments, and 1 has even 700(this one has the repeated time values a lot).

    I am afraid making my set smaller will make my data less relevant. Also If you have any remarks or tips please don't hesitate!
    I am here to learn and only had a beginner's course in STATA!

    Thank you in advance!


  • #2
    welcome to this forum.
    Even the regular contributors to Stata (not STATA, please) forum were beginners at some point in time (I'm still a beginner as far as some Stata commands I am not familiar with are concerned): so experience in not an issue.
    A more relevant topic is the way query should be posed/posted to increase one's chances of getting helpful replies (see the FAQ).
    I think that most of your questions (points 1 and 2) should be answered by taking a look at what others did in the past in your research field (which is miles away from mine) when presented with the same research goal.
    As far as your point 3 question is concerned, if Stata complains about repeated time values within the same panel, and provided that you do not plan to use time-series related commands, such as lags and leads, you can simply -xtset- your dataset with -panelid- only:
    xtset panelid
    Kind regards,
    (Stata 16.0 SE)


    • #3
      Thank you Carlo for your warm welcome and response!

      As for xtsetting on solely panelid, I recall doing that after having read your advice in a different forum. However, I do have the intention of lagging between 1 to 3 years, since the effect on patents of such investments is supposed to kick in after 1 to 3 years.

      I'm wondering if I can lag the variable prior to the regression, so making variable "Count of Patents" turn into "Count of Patents 3 years later" and then run the regression.

      I agree that question 1 is field-specific.

      As for my question 2 I see that I might have formulated the question unclear. What I want to know is; how to create the variable "Co-investments", if it is based on the number of unique values of Firm_ID that occur together with the same "Target Company_ID"

      example: the first investment of ''Firm_ID" 3 is in "Target company_ID" 10001 and has a "co-investment" value of 3. This 3 means that it is the 3rd "Firm_ID" investing in "Target company_ID" 10001, in this case because this happened in 1996.
      However in 1995 both "Firm_ID" 1 and 2 had invested in the same "Target company_ID" 10001, and because we can not conclude which one was first, both get "co-investment" value 2.

      my question is how I could create such a variable based on the presence of the rest variables.


      • #4
        if you are planning to lag the regressand (admittedly, I'm not clear with this detail from your post), you should leave static panel data regression and move to dynamic panel data regression (say, -xtabond-).
        If you have repeated time values within the same panel, lagging your variables beforehand will end up in a smaller sample (and your original one is already limited).
        A possible fix (that causes some changes in your original research plan, though) implies creating a new -timevar-, say:
        egen new_timevar=group(Year SICCODE)
        If I got your last question right, you can use the same trick to create the variable "Co-Investments":
        egen Co_Investments=group('Firm_ID Target company_ID)
        Kind regards,
        (Stata 16.0 SE)