reshape long problems with some variables with few observations

Salvatore Greco

Join Date: Jun 2022

Posts: 41
#1

reshape long problems with some variables with few observations

21 Jul 2022, 04:26

Hi,
I'm Salvatore. I am doing a research paper using Panel data to study the dynamics of bank NPLs. In my model, I use microeconomic (bank-specific) variables with respect to which I actually managed to use the Stata reshape long command to have a dataset in the long format (roughly 8000 observations). The problem, however, concerns the introduction within my model of macroeconomic variables (such as rates of change in government debt relative to GDP) with respect to which, there is a much smaller number of observations (10 as I use yearly delta data from 2021-2011). Beyond the small number of observations with respect to which I will then make my own conclusions once I have made the estimates, I would like to ask how I can introduce these data into my dataset so that once I run the reshape long command I have no problem getting the result I want. I am asking this because I have made some attempts in the past but obtained a different result from the desired one.
I tried to paste here the last rows for my dataset format-wide, but since I am using 6 variables and a 10-year time period I cannot paste without having problems with the format. However, you can see the data for the Public debt I would like to introduce in the model (delta actually).
These are the data

Code:

-2.897617514

15.80909769

-0.223214286

0.149031297

-0.445103858

-0.36954915

-0.073855244

2.188679245

4.743083004

5.680868839

0.419463087

2.229845626

But I don't know how to do it. I tried to paste those values on an xls file but the result was wrong. Could you please help me as I don't know if it is better to paste them on xls file in a wide format or first launch on Stata the command reshape long and then paste them.
P.s. I can arrange the data here listed both in wide and long format on excel, therefore, this seems to me that this is not the main issue.
Thanks in advance. Hope to be clear. As you can understand Public debt is a variable in my model and not a cross-sectional unit but at the same time it is a variable which is not specific to any banks in my dataset which are my cross-sectional units
To clarify I have 117 cross-sectional units (banks) with 6 variables each measured from 2021-2010. This is to make sure that you understand the dataset I am working with. P.s. I managed to reshape this dataset from wide to long but I don't know how to add the macro variables as the Public debt that I mentioned before (actually there are others but I guess that once understood what to do with just one it would be pretty straightforward for the others later on).
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

21 Jul 2022, 07:57

You should not be attempting to add data using copy-and-paste. You should use the merge command to combine your dataset in long format with your dataset containing macroeconomic variables to be added.
1 like
Comment
Salvatore Greco

Join Date: Jun 2022

Posts: 41
#3

21 Jul 2022, 09:52

Thanks a lot for your help. I looked at some videos on youtube to get familiar with the command, however, after trying to use it with the drop menu above on Stata I have some problems. Before asking for your support let me explain the steps I made.
First I tried to use the command but the file of the dataset (containing the macro variables as GDP, Public Debt, and others) I was trying to merge my bigger dataset (that of 117 cross-sectional units representing the sample of Italian banks) with was not in Stata format.1) Therefore, I paste the macro variable dataset alone on Stata and then saved it using the "save as" option on the menu creating a file in a "dta" format.
2) Then I tried again to use the command. To clarify, here you can see what Stata says, basically, it does not recognize my variables. I can tell you that they all are numeric variables. I really apologize, but I don't know what is the best way to show a screenshot here (please can you tell me? However you can find it as an attachment). To be clear I can tell you that as variables I consider all these listed here but banche and year (banche is the cross sectional unit).
The format for the data is the long format typically used for panel data.

Code:

banche year npl netinteres~n avgequitya~s costtoincome roaa llp assets

This is what Stata says.Therefore, my variables are:
npl

netinteres~n

avgequitya~s

costtoincome

roaa

llp

assets

When I try to launch the merge command Stata says

Code:

merge 1:1 npl netinterestmargin avgequityavgassets costtoincome roaa llp assets using "C:\Users\Salvatore\Desktop\tesi\prova per merge > .dta" variable npl not found r(111);

If I tell Stata not to consider the npl variable (which is the first variable listed in the merge menu) then Stata says

Code:

merge 1:1 netinterestmargin avgequityavgassets costtoincome roaa llp assets using "C:\Users\Salvatore\Desktop\tesi\prova per merge.dta > " variable netinterestmargin not found r(111);

In this case, the first variable listed in the menu was netinterstmargin.
P.s In the merge menu, I use the 1:1 key variables merge options but the macro variables I would like to add are GDP, Public Debt, Stock Exchange performance and others which are not at all related to the banks which are the cross sectional units. Therefore I don't know if the "1:1" option is the right one to be selected.

I truly and genuinely thank you a lot for your help, wish you a great day
Regards,
Attached Files
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

21 Jul 2022, 13:44

As someone whose only language is English, when I order something I do not understand from a menu in French, I sometimes find the results ... unexpected (but usually pleasingly so).

I think you have the same problem using merge from the Stata menu. You admit you are unfamiliar with the command, and I suspect you are unfamiliar with the concept of merging two datasets. And as a consequence, your results are not what you hope for, but in your case not pleasingly so.

When you encounter an unfamiliar command, the first step on the path to knowledge is not Google or YouTube. It is to read Stata's documentation, which is included in your Stata installation as Help files and PDFs.

Either type

Code:

help merge

or from Stata's Data > Combine datasets > Merge two datasets dialog box, click the question mark in the lower left corner.

That will open Stata's Viewer window with the Help file for the merge command. At the top of the file is a clickable link "(View complete PDF manual entry)". Click that and the Stata Data Management Reference Manual PDF will open to the section with the complete documentation for the merge command. Scroll down to the section for the Remarks and examples and read that to achieve an understanding of what merging is all about.

You will see, among other things, that to merge two datasets, you must specify a group of variables that appear in both datasets that the merge command can use to tell how to match the observations. That is what is meant by "Key variables: (match variables)" in the dialog box you show us, where you have filled in the names of variables in your macro dataset of macro variables.

Based on your description, your "key variables" in your macro dataset are banche and year. Presumably you also have variables of that name in your panel dataset; if not, you will have to rename the "key variables" in either your macro dataset or your panel dataset so the names are the same in both before you can do the merge.

But you need to familiarize yourself with the documentation to understand whether you need a 1:1 merge or an m:1 merge.

With that said, let me comment on another aspect of your post.

Copying from Excel and pasting into Stata's Data Manager window is not a reliable way to move data in an Excel worksheet into Stata. The command you want is

Code:

import excel

which is found in Stata's menus at File > Import > Excel spreadsheet (*.xls;*.xlsx)

Again, you should familiarize yourself with the documentation before heading to the menus to use it.

And with that said, a piece of advice that I routinely offer to new users of Stata.

I'm sympathetic to you as a new user of Stata - there is quite a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.
1 like
Comment
Salvatore Greco

Join Date: Jun 2022

Posts: 41
#5

21 Jul 2022, 14:37

Hello,
I thank you for your kind words and encouragement.
I am currently working on my thesis so I am forced to use a software that I have never used without any kind of support other than that offered by this forum. At the same time I am aware that as much as I would be interested in learning more about Stata for a professional career I will not be using the program. So to study it in depth would require me time that at the moment given the timeline required for thesis writing I do not have. For these reasons I am looking on support through the forum because I have no one who can really help me.

Having said that, I thank you. I was able to learn more through your advice about the functional characteristics of the merge command. It says that "merge joins corresponding observations from the dataset currently in memory (called the master dataset) with those from filename.dta (called the using dataset), matching on one or more key variables."
Also based on what you told me it would seem that at least one common variable must exist in order for the merge command to be used to merge two datasets using this common variable as the "key variable." This is not the case for me. The variables contained in my second dataset are entirely different variables than those belonging to the main dataset which contains observations for the banks. In the second dataset, there is no single observation for banks, but these are data for GDP and other macro variables.

In this case what kind of command would you recommend that I use on the premise that I would like to have a single dataset to perform an analysis?

I genuinely thank you.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

21 Jul 2022, 15:45

Please help us understand what variables are in your two datasets.

Code:

clear use (whatever the name is of your bank-specific dataset) codebook clear use (whatever the name is of your macroeconomic dataset) codebook

Then copy and paste the commands and results from the Stata Results window into your next post here.
Comment
Salvatore Greco

Join Date: Jun 2022

Posts: 41
#7

25 Jul 2022, 05:42

Sorry for the delay; I studied a bit to be sure to be as precise as possible in my request. My goal is to merge the two following datasets:
The first one (let's name it MICRO) consists of 117 cross-sectional units (Italian banks) for which I have 7 different variables (no missing observations) with data from 2021-2011. Here you can see a list of the variables in the dataset

Code:

NetInterestMargin

Code:

NPL

Code:

AvgEquityAvgAssets

Code:

CosttoIncome

Code:

ROAA

Code:

LLP

Code:

Assets

P.S. all are numeric variables and the dataset is in a long format.
The second one (let's name it MACRO) consists of a combination of 10-time series (2021-2010) (5 principal time series and their lag_1 and one lag_2). Therefore the name of variables (as specified on Stata) are:

Code:

Delta_Public_DebtGDP

Code:

DeltaNetLendingNetBorrowing

Code:

deltabankloans

Code:

deltaFTSEMIB

Code:

RealGDPGrowth

Code:

DeltaNetLendingNetBorrowing_L1

Code:

Delta_Public_DebtGDP_L1

Code:

deltabankloans_L1

Code:

RealGDPGrowth_L1

Code:

RealGDPGrowth_L2

P.S all are numeric variables and the dataset is in a long format.

As you can notice the variables of the two different datasets are different, and I do not believe the merge command might work. Having in mind that I would like to have as a final result a combined dataset of the two different datasets (MICRO one and MACRO one), what do you think could be the right approach to do so?

I hope that stating the name of the variables may be helpful for creating the right command.
Thanks a lot for the support Mr. Lisowsky you are giving to me. I truly appreciate it.
With really kind regards,
Salvatore
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

25 Jul 2022, 08:51

What you have provided in post #7 is not what was requested in post #6.

I am currently working on my thesis so I am forced to use a software that I have never used without any kind of support other than that offered by this forum. At the same time I am aware that as much as I would be interested in learning more about Stata for a professional career I will not be using the program. So to study it in depth would require me time that at the moment given the timeline required for thesis writing I do not have. For these reasons I am looking on support through the forum because I have no one who can really help me.

I am pessimistic about your prospects of accomplishing this task.

In my experience, the thesis is expected to be the culmination of what was already learned in the curriculum. To continue my analogy from post #4, what you are doing is as if you were told your thesis needed to be submitted in Estonian, and you have not previously learned Estonian. If your curriculum has used some program other than Stata for "hands-on" econometric instruction, that is the program you should be using for your thesis work. If you have not previously had experience doing hands-on econometric analysis, then it is likely that Stata will prove to be the least of your problems.

Statalist is a useful forum for questions and answers, but it cannot replace the mentorship of a thesis advisor, nor can it supplant the effort required to actively learn the basics of Stata and data analysis.
1 like
Comment

Salvatore Greco

Join Date: Jun 2022
Posts: 41

26 Jul 2022, 09:52

Hello Mr. Lisowski,
Hope you are fine. I tried to learn a bit of Estonian as you kindly suggested to me when I asked you to give some tips about merging the two datasets (MICRO & MACRO) we were talking about. I would like to show you what I came up with. I would really appreciate it if you would be willing to tell me if the dataset I am going to show you would work as the result of the merger process.

Code:

input int(id year) long NPL double(NetInterestMargin AvgEquityAvgAssets CosttoIncome ROAA) long LLP double(Assets Delta_Public_DebtGDP DeltaNetLendingNetBorrowing deltabankloans deltaFTSEMIB RealGDPGrowth)
 1 2011   19314 3.0665003  8.894316 62.530881  .64168311  1512 .00011159  .41946309 -13.732023  2.4215157 -3.4026191   .85
 1 2012   52350 3.4012127   7.81904 60.919118   .0662024  6397 .00017049  5.6808688  -97.73907 -.76626672 -31.591894 -3.01
 1 2013   64777 2.2446172 7.3237025 66.958581  .34299635  3957 .00018873   4.743083 -1757.4577 -2.8740579  10.930995 -1.86
 1 2014   61749 1.9582811 7.5171206 65.863558  .33335957  4509 .00019803  2.1886792  96.370667  1.1005573  44.138574   .07
 1 2015   54397 1.9604366 7.7805375 77.657658  .31596684  1876 .00018999 -.07385524  -12.09783  .84090043  3.2348891   .66
 1 2016   73094 2.1448878 8.0798279 92.572374 -.32779643  3481 .00024428 -.36954915  39.562322   .3816689 -23.208574   1.4
 1 2017   73897 2.3868627  8.231432 74.750368  .36938713  2770 .00025721 -.44510386  10.284363    1.23481  20.886378  1.74
 1 2018   70294 2.5765006  6.133131 73.268095  .12783611  9033 .00028366   .1490313 -2.4917407 -4.1319113  2.8530874   .81
 1 2019   72401 2.2111859 4.6421887 71.355508  .52595686  3696 .00025955 -.22321429   26.13205 -1.6286674 -1.3125943    .5
 1 2020   51485 2.0623895 4.4296105  79.59674 -.63414315 11306 .00026035  15.809098   9.511647   4.431668 -8.2740222 -9.09
 1 2021   51485 2.0623895 4.4296105  79.59674 -.63414315 11306 .00025123 -2.8976175 -30.753208  5.0594383  29.455174  6.62
 2 2011  620110 3.2349247 17.237876 53.954779  .84728333 30845 .00161717  .41946309 -13.732023  2.4215157 -3.4026191   .85
 2 2012  784941 2.7195258 17.228454 56.402536  .68975915 33924 .00161806  5.6808688  -97.73907 -.76626672 -31.591894 -3.01
 2 2013  982703 2.3040631 16.615386 57.204414 -.22881157 88227 .00178251   4.743083 -1757.4577 -2.8740579  10.930995 -1.86
 2 2014 1110154 2.1936202 16.109134 59.292962  .17999861 59386 .00175794  2.1886792  96.370667  1.1005573  44.138574   .07
 2 2015 1184900 2.1452373 16.020317 60.044954  .23465928 56866 .00174936 -.07385524  -12.09783  .84090043  3.2348891   .66
 2 2016 1178308 2.0612768 15.911901 65.398388 -.00233048 48769 .00162654 -.36954915  39.562322   .3816689 -23.208574   1.4
 2 2017 1129950 2.5462314 15.894939 59.670936  .25559979 48726 .00161476 -.44510386  10.284363    1.23481  20.886378  1.74
 2 2018  823055  2.590812  15.12528 70.538284   .2117942 75007 .00146113   .1490313 -2.4917407 -4.1319113  2.8530874   .81
 2 2019  394612 2.4109246   13.6811 70.754662  .23051058 31291 .00145384 -.22321429   26.13205 -1.6286674 -1.3125943    .5
 2 2020  276581 2.1381271 12.213402 76.745459   .0629762 29135 .00143596  15.809098   9.511647   4.431668 -8.2740222 -9.09
 2 2021  220345 1.9617634 11.075969  63.64592  .24074262 45235  .0014706 -2.8976175 -30.753208  5.0594383  29.455174  6.62
 3 2011   57375 1.8373895 3.9954013 67.904107  .54930344  5615 .00050806  .41946309 -13.732023  2.4215157 -3.4026191   .8

As you can easily notice I have my macro dataset (macro variables are

Code:

 
 Delta_Public_DebtGDP DeltaNetLendingNetBorrowing deltabankloans deltaFTSEMIB RealGDPGrowth

) in a long format. For every 10 cross-sectional units (listed with the id number) the macro variables are repeated according to the reference year (i.g. when you can read the first "2" in the first column you will read the same macro data you will find for the fist "1" in the first column.

Can you please tell me if this is the right approach and therefore correct format for the data to carry out future analysis?

Hoping my Estonian has improved at least a bit

Thanks for your help.
Sincerely,
SG

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

26 Jul 2022, 10:10

It appears your data is organized as I would expect it to be to carry out your analyses. In a given year, we can see that both IDs have the same values of the MACRO data.
Comment

-2.897617514
15.80909769
-0.223214286
0.149031297
-0.445103858
-0.36954915
-0.073855244
2.188679245
4.743083004
5.680868839
0.419463087
2.229845626

Announcement