Selecting variables in a dataset from observations within another dataset

Roberto Zentgraf

Join Date: Aug 2020

Posts: 4
#1

Selecting variables in a dataset from observations within another dataset

17 Aug 2020, 18:27

Hello,

Hope there is someone who could help me to solve this...

I have a dataset with monthly returns for 130 stocks (the stock code is the name of column/variable). Thinking as a spreadsheet, first column contains dates, and the remaining columns contain the monthly returns. Let us name this dataset as StockReturns.dta

In another dataset, I have all the 130 stocks and some other info, such as Marketvalue, P/B ratio, number of monthly trades and so on. In this one, columns are Code, Mktvalue, etc), each observation is composed by stock Code (the same name used in the first dataset), MKT Value, etc. Let us name this dataset as StockCharacteristics.dta

I would like to select the stocks from StockCharacteristics.dta (for example, get the 10 with highest MKT Value) save their codes in a "list" and then use this list in the StockReturns.dta (could, for example, use "keek command" for the variables in the list...

Well, I know how to select the desired stocks in the second dataset; what I do not know is how to save the list with the name of stocks and the, how to get this list in the first dataset... I

I really don´t know if this is a complete dummy question, since it is very simple to do in an EXCEL environment.

Thanks, best regards,
RZ
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

17 Aug 2020, 18:31

You want the -merge- command. The general scheme would be:

Code:

use StockCharcteristics, clear // HERE INSERT CODE TO IDENTIFY THE STOCKS YOU ARE // INTERESTED IN; // KEEP ONLY THE STOCK CODE VARIABLE // AND BE SURE TO ELIMINATE ANY DUPLICATES keep stock_code duplicates drop merge 1:m stock_code using StockReturns, keep(match) nogenerate

In the future, when asking for help with code, please use the -dataex- command to post example data (from all relevant data sets). It is uncommon that a verbal description provides enough information about the data to suggest correct code.

If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Last edited by Clyde Schechter; 17 Aug 2020, 18:37.
1 like
Comment
Roberto Zentgraf

Join Date: Aug 2020

Posts: 4
#3

17 Aug 2020, 19:04

Thank you Clyde! It was my first post, still learning the best way to do it. Anyway, next time I will use the -dataex-!
Best regards
Comment

Roberto Zentgraf

Join Date: Aug 2020
Posts: 4

17 Aug 2020, 21:25

Hi, following Clyde suggestions above, I generated the two datasets I´m working on...
The first is what I called StockCharacteristics, already filtered;
The second is what I called StockReturns.

Trying to merge, it didn´t work...
Any other suggestion?
Tks

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 Codigo
"ABEV3"
"B3SA3"
"BBAS3"
"WEGE3"
"WHRL3"
"YDUQ3"
end

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(ALSO3 BAZA3 ABEV3 BTOW3)
14.839294751 16.208268295 13.906832154  21.66160962
13.984911114 15.559937563 15.204156093 22.009122074
15.981470772 16.856599027 15.816461869 19.354513052
17.078679232 17.504929758 15.989792958 19.257981815
18.710901138 18.962249799 16.414454127 26.420599613
18.159228377 19.664555347 15.946460186 24.615465478
17.331719236 18.962249799 16.642935162 19.682719258
15.906564603 18.962249799 16.956623992  17.37562269
14.803219082 17.557638703 16.634221583 15.058872998
 13.33209172 16.855333155 17.052204696 14.344541843
15.345697297 17.557638703 16.762440146 14.074254379
14.941137272 17.454805317  16.36730667 14.248010605
 14.87677545 18.182088872 15.805339058  14.70170742
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

18 Aug 2020, 10:22

Well, of course, it didn't work. The second data set doesn't contain Codigo. There's no way to link these up. To combine two data sets together, they must have a common variable or group of variables so that Stata can tell which observations in each go with which observations in the other. The only exception to this is if you have two data sets with no common variables, but they are sorted so that observation 1 in dataset 1 goes with observation 1 in dataset 2, observation 2 in dataset 1 goes with observation 2 in dataset 2, etc.--in that case there is -merge 1:1 _n-. But you need to be completely certain that the datasets are both sorted so that the matchup by order in the data set is exactly the right pairing.
Comment
Roberto Zentgraf

Join Date: Aug 2020

Posts: 4
#6

18 Aug 2020, 12:39

Tks again! Well, and so the only way is to manually select what I want??
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#7

18 Aug 2020, 12:48

No, and you should definitely not do anything manually for serious work. Even for playing around, it's not a good idea, but for actual work it's really dreadful.

Let me ask you this: if you were going to do this manually, how would you know which observation in the StockCharacteristics data set goes with which observation(s) in the StockReturns data set? Whatever information you would use to do this manually has to be provided to Stata in the -merge- command. Presumably there are some variables in those data sets, variables that you didn't show in your example, that establish that correspondence. What are those variables? Show examples of both dataset that include these variables. (If there are no such variables, then, seriously, how would you know which observations belong with which? Explain in detail.)

Last edited by Clyde Schechter; 18 Aug 2020, 13:14.
Comment

Announcement

Selecting variables in a dataset from observations within another dataset

Comment

Comment

Comment

Comment

Comment

Comment