Hello,

First post, here goes:

I have data as follow:

1. This data is of Larger firms/investment portfolios (indicated by Firm_ID) investing in smaller companies (Target Company_ID).

2. The assigned SIC(Standard Industrial Classification Codes) are the codes of the targeted companies. SIC codes vary between 1.000 and 10.000 and indicate an industry.

3. As you can see the data is unbalanced, Firm_ID 1 has 4 observations, 2 has 2 obs and 3 has 3 obs.

4. The time values are sometimes repeated for multiple firms (e.g. first 2 rows). In this case with investments in the company 10001 and 10003 are both conducted in 1995, but sometimes multiple investments are conducted in the same company as multiple rounds of investment, so company 10003 might as well be another 10001.

5. There is a variable named Co-investments, this variable shows 2 in the first row because in 1995 Firm_ID 1 and Firm_ID 2 invested in this company as 2 investors. It shows 1 when there is only 1 investor in the data set invested in that specific company on that specific date. However, in 1996 Firm_ID decides to invest in the same company that Firm_ID 1 & 2 were invested in a year earlier. That is why it has a 3.

My goals and scream for your help:

1. I want to analyze how the SIC dispersion in the portfolio of a firm influences Variable Y(patents).

The theory states that the larger the distance (variance?) of SIC from the mean or the yearly mean, the stronger the growth of Patents. So the more explorative a Firm becomes by investing in distant SIC codes relative to each other: 1781, 6383, 4565, 7372, the more patents it purchases it. What would be the right measure to measure portfolios dispersion in SIC codes? variance, squared percentage growth or other measures? which commands would I apply?

Eventually, I want to be able to regress the growth or this dispersion with the growth of patents to show the relationship.

2. Variable/feature vector 7 "Co-investment" is non-existing, I would like to generate it ... I am guessing through variable Firm_ID and Target Company and measuring re-occurrence? Is there a command for this in STATA? This is going to be a dummy moderator ... something that will add to the explorative power of an investment

3. How do I order/structure my dataset so that I have a constant time variance, so time in equally spaced points in order for my data to become panel data and ready for regression? right now I get error r(451), repeated time value.

I have only around 2500 observations for 19 firms and am not keen on omitting much of the data, considering that some portfolios only consist of 59 investments, and 1 has even 700(this one has the repeated time values a lot).

I am afraid making my set smaller will make my data less relevant. Also If you have any remarks or tips please don't hesitate!

I am here to learn and only had a beginner's course in STATA!

Thank you in advance!

Haik

First post, here goes:

I have data as follow:

Firm_ID | Year | SICCode | EquityAmount Invested | Total Amount Inv. by the firm | Target company_ID | Co-investment | Count of Patents |

1 | 1995 | 7372 | 54.4 | 1500 | 10001 | 2 | 400 |

1 | 1995 | 4565 | 8.7 | 1500 | 10003 | 1 | 440 |

1 | 1996 | 6383 | 7.9 | 1500 | 10007 | 1 | 528 |

1 | 2001 | 1781 | 15.4 | 1500 | 10012 | 1 | 652 |

2 | 1995 | 7372 | 29.9 | 1480 | 10001 | 2 | 150 |

2 | 2003 | 9773 | 22.9 | 1480 | 10005 | 2 | 175 |

3 | 1996 | 7372 | 77.8 | 980 | 10001 | 3 | 8129 |

3 | 1997 | 9444 | 139.9 | 980 | 10002 | 1 | 8129 |

3 | 2001 | 9773 | 48,8 | 980 | 10005 | 1 | 9220 |

2. The assigned SIC(Standard Industrial Classification Codes) are the codes of the targeted companies. SIC codes vary between 1.000 and 10.000 and indicate an industry.

3. As you can see the data is unbalanced, Firm_ID 1 has 4 observations, 2 has 2 obs and 3 has 3 obs.

4. The time values are sometimes repeated for multiple firms (e.g. first 2 rows). In this case with investments in the company 10001 and 10003 are both conducted in 1995, but sometimes multiple investments are conducted in the same company as multiple rounds of investment, so company 10003 might as well be another 10001.

5. There is a variable named Co-investments, this variable shows 2 in the first row because in 1995 Firm_ID 1 and Firm_ID 2 invested in this company as 2 investors. It shows 1 when there is only 1 investor in the data set invested in that specific company on that specific date. However, in 1996 Firm_ID decides to invest in the same company that Firm_ID 1 & 2 were invested in a year earlier. That is why it has a 3.

My goals and scream for your help:

1. I want to analyze how the SIC dispersion in the portfolio of a firm influences Variable Y(patents).

The theory states that the larger the distance (variance?) of SIC from the mean or the yearly mean, the stronger the growth of Patents. So the more explorative a Firm becomes by investing in distant SIC codes relative to each other: 1781, 6383, 4565, 7372, the more patents it purchases it. What would be the right measure to measure portfolios dispersion in SIC codes? variance, squared percentage growth or other measures? which commands would I apply?

Eventually, I want to be able to regress the growth or this dispersion with the growth of patents to show the relationship.

2. Variable/feature vector 7 "Co-investment" is non-existing, I would like to generate it ... I am guessing through variable Firm_ID and Target Company and measuring re-occurrence? Is there a command for this in STATA? This is going to be a dummy moderator ... something that will add to the explorative power of an investment

3. How do I order/structure my dataset so that I have a constant time variance, so time in equally spaced points in order for my data to become panel data and ready for regression? right now I get error r(451), repeated time value.

I have only around 2500 observations for 19 firms and am not keen on omitting much of the data, considering that some portfolios only consist of 59 investments, and 1 has even 700(this one has the repeated time values a lot).

I am afraid making my set smaller will make my data less relevant. Also If you have any remarks or tips please don't hesitate!

I am here to learn and only had a beginner's course in STATA!

Thank you in advance!

Haik

## Comment