Hello Professors and Computer Scientists,
I have tried to ask several professors and my fellow students with no success. I have also browsed the internet, including this forum quite thoroughly. I am posting this question as a last resort.
I am trying to “identify the k closest matches for each sample observation based on their revenue and total assets by calculating the Mahalanobis distance between the focal firm and other firms within the industry-year in terms of revenue and total assets” . I am able to compute Mahalanobis distance matrix using “mahascores varlists, idvar(id) varprefix(d1_) treated(varlist) compute_invcov”. Similarly, I am able to use the entire mahapick program with no problem. Similarly, I can simply calculate absolute difference easily with " anymatch price, id(id) metric(price) near(1) dist(price_diff)"
My problem is that I need to match the focal observation's K closest neighbors based on year and industry. There are hundreds of industries over 30 years, which give me tens of thousands of mini-samples to match. None of the above mentioned commands can use "by" or "if" command. So I can't use the usual "by year industry:" prefix. Nor can I write a forvalues loop if "if" can't be used. Maybe I can try to cut the file into tens of thousands of pieces, but I really don't want to go there if I can avoid it. It will be so chaotic.
Does anyone know any nearest neighbors matching commands that can be combined with "by" or "if" . I don't know why people who invented these commands don't allow them. It makes things 10 times more difficult.
Look forward to hearing from your wisdom.
Sincerely.
Tracy
I have tried to ask several professors and my fellow students with no success. I have also browsed the internet, including this forum quite thoroughly. I am posting this question as a last resort.
I am trying to “identify the k closest matches for each sample observation based on their revenue and total assets by calculating the Mahalanobis distance between the focal firm and other firms within the industry-year in terms of revenue and total assets” . I am able to compute Mahalanobis distance matrix using “mahascores varlists, idvar(id) varprefix(d1_) treated(varlist) compute_invcov”. Similarly, I am able to use the entire mahapick program with no problem. Similarly, I can simply calculate absolute difference easily with " anymatch price, id(id) metric(price) near(1) dist(price_diff)"
My problem is that I need to match the focal observation's K closest neighbors based on year and industry. There are hundreds of industries over 30 years, which give me tens of thousands of mini-samples to match. None of the above mentioned commands can use "by" or "if" command. So I can't use the usual "by year industry:" prefix. Nor can I write a forvalues loop if "if" can't be used. Maybe I can try to cut the file into tens of thousands of pieces, but I really don't want to go there if I can avoid it. It will be so chaotic.
Does anyone know any nearest neighbors matching commands that can be combined with "by" or "if" . I don't know why people who invented these commands don't allow them. It makes things 10 times more difficult.
Look forward to hearing from your wisdom.
Sincerely.
Tracy
Comment