I am working on my MSc thesis and need your help. I want to calculate the cosine similarity between 2000 pairs of documents. What I have in more detail: two separate datasets (a,b) with 2000 observations in each (observation 1 in dataset a is a pair with observation 1 in dataset b). In both datasets, I have ca. 100 variables which are word counts of special words that I previously selected. I want to determine the cosine similarity of these word counts for every one of the 2000 pairs. So i think I have a total of 4000 vectors and one vector consists of 100 numbers.

I don't know how I can prepare my dataset (i think i need to merge something before I can start as a and b are separate files). Also, I read something about the

]]>

I have a dataset which includes bond transaction characteristics of Insurance Companies such as bond rating, issuer ID, bond ID, etc. As you could imagine, 1 bond could have multiple ratings from more agencies hence the dataset could look something like:

Bond_ID Issuer_ID Bond_Rating

1 Moody's 23

1 S&P's 21

1 Fitch 20

2 Moody's 25

2 Fitch 26

2 S&P's 23

I'd like to use a code that would keep the lowest value for the rating of Bond 1 and Bond 2 so that the dataset could look like the following after using the appropriate code:

Bond_ID Issuer_ID Bond_Rating

1 Fitch 20

2 S&P's 23

I have tried looking up a code that corresponds to my needs but I had no luck, perhaps maybe due to my wording in the search bar.

]]>

I have a balanced panel dataset where N=27 and T=20.

I checked the -xtunitroot- pages of the manual but is not clear to me which test fits the characteristics of my dataset. Could you help me?

Also, since this is one of the test that I will do before regress the variables (still have to choose among -xtreg-, -xtgls-, etc.), in the case of non-stationarity can I use the differences (d.x) to carry out the analysis?

I hope I have not violated any forum rules, this is my first post.

Thank you]]>

2. How can I add a requirement in Stata that all control variables need to be non-missing? The command drop if missing (variables) is not giving me the desired outcome, so is there maybe another way?

3. I would like to have a dummy variable which should be equal to 1 if variable X or/and variable Y are greater than 0, and it should equal 0 if variable X AND variable Y are not greater than 0. How can I include two variables in the dummy requirement or how can I combine two dummies to make it one? I did:

4. Also about the dummies: when there is a missing value, I would like the dummy to be also missing. However, now if I use the command

I hope somebody can help me with this, thanks in advance!]]>

input str80 name int symbol

"F-000509" 1

"F-000703" 2

"Future Bond Election" 3

"F-000708" 4

"F-000709" 5

"F-000799" 6

"F-000808" 7

"F-000809" 8

"F-000899" 9

end

I want to drop the data with the id=3,6,9. Because the real dataset is very large, the id can range from 1 to 10000000.

So, I need Stata code to do this.

Can someone help me in Stata?

Thank you!]]>

I am trying to do network analysis using -nwcommands-. In my survey data each person named up to 6 friends. I reshaped the data into a long data set that looks similar to this:

person_ID friend_ID

1 2

1 3

1 4

2 1

2 4

3 1

3 2

3 4

4 2

4 1

I would now like to transform this edgelist into an adjacency matrix to then obtain network indices such as eigenvector centrality and closeness centrality.

However, when I use the command

the resulting adjacency matrix does not correspond to the ties I have in my edgelist. Particularly, the matrix shows

some ties that do not exist in my edgelist or leaves out ties that actually exist.

Does anyone have an idea what the problem could be or what I may have overlooked?

Any help is very much appreciated, thank you!

]]>

input str80 name str90 symbol

"F-000509" "Environmental Projects4502 - Parks Consolidated Construction Fund"

" " "Future Bond Election "

"F-000703" "Swimming Pool Upgrades"

" ""4502 - Parks Consolidated Construction Fund"

" ""Future Bond Election"

"F-000708" "Parking garage Upgrades"

end

I want to repeat the value of the first column "name" to fill the gap between the name values.

This is just for illustration.

In fact, the real dataset has many name values and the gap between name values is unequal.

Thanks for your help with the Stata code.]]>

]]>

```

eststo reg1 : reg var treated control1 control2 control3, cluster(subgroup) robust

matrix beta=e(b)

coeff=beta[1,1]

boottest treated, boottype(wild) cluster(subgroup) robust nograph seed(1234) reps(10000)

pval = r(p)

```

I would like to export the treated coefficient obtained in the regression alongside the p-value found in the boottest and the significance stars. I would normally just use esttab but esttab uses the s.e. from the regression, which I do not want. Is there a way to do this or do I jave to construct a custom table from scratch ?]]>

I am working with firm-level longitudinal data from 2011 to 2020, containing economic information of pharmacies in Italy. Since the opening of new pharmacies is decided by the local administration, then some municipalities started to open new pharmacies in 2016, others in 2017, others in 2018, others in 2019 and others in 2020. For this reason I created a treatment variable that is 1 if a firm is in a municipality when it open at least 1 new pharmacy and 0 otherwise.

The problem is that the treatment period is different for any firm , because some are treated by the 2016 onwards, others from the 2017 onwards...

So I have to decide which difference in differences estimation method I will use because I have a pre-treatment period (from 2011 to 2016) but I don't have a post period (because even in the treated municipalities new pharmacies continue to be opened) and the lenght of the treatment period is different for any municipality and then for any firm in my dataset.

At the beginning I tried xtdidregress but then I thought that maybe is not the right choice and I started to think about flexpaneldid but I am not sure yet. Could you help me ?

Thank you in advance for your help.

Francesco]]>

The only indication that the help document gives is to use "estimates store (namevar)" but what it stores does not have much sense. I also tried to use r(gstats) but it does not work (or maybe I am just doing it wrongly).

If somebody knows how to do it, it would be a big help!

Thanks in advance]]>

I need to test whether two coefficients of the same variable in different models are significantly different from each other. I wanted to do this by using the test command. You can see my stata output below;

regress v2 v1

est store m1

regress v3 v1

est store m2

test [m1]v1 = [m2]v1

After running my code, I get an error message: 'equation m1 not found'

Can someone provide me a solution?

]]>

for my thesis I observed interactions between managers (5 each) and employees during the assessment center. I have created two different spreadsheets, 1. one with each of the 140 employees with variables like "number of managers who believe the employee is competent". 2. Another one with 140x5 observations containing "believes the employee is competent 0/1". I want to conduct a logistic regression. The 2. sheet contains more information about the interaction and shows a relationship between each manager's reason for acceptance/rejection of the employee. However, I would be looking at the same employee 5 times. Is that problematic?

Best & thank you!

Laeti ]]>

My dataset has >75,000 and I understand this may be part of the problem. Please help because I cant do subsequent analysis if this ID variable is not converted to a numeric variable. I have more than 10 datasets to merge on this ID. I cant extract the data using dataex because I am working in a secure environment where I cant copy the data except as an image below.

Example ID's are shown below.

Array

Thanks.]]>

Here is the regression I am running:

sureg (y1= x1 x2 x5 x6 i.userid i.date ) (y2= x1 x2 x3 x4 i.userid i.date), isure

I'd appreciate if someone could help me to fix this problem.

Thanks in advance.]]>