On the popularity of xtabond2 - how can codes create an impact?

Mohammed Omran

Join Date: Jul 2020

Posts: 25
#1

On the popularity of xtabond2 - how can codes create an impact?

26 Aug 2022, 13:00

Roodman (2009) is an ideal example of how a complex topic in econometrics can be simplified and communicated to a wider audience beyond the realm of econometrics. Researchers from fields other than econometrics may have limited to intermediate levels of knowledge about econometrics and need not have more than this, it is simply not their thing! For them, econometric tools and applications are simply means of conducting research.

His paper is well structured and easy to read and follow, which explains the popularity of the xtabond2 command among researchers from different related disciplines (e.g., accounting, finance, management, and marketing). For me, apart from the commonly used matrix notations in intermediate econometric textbooks, some of the technical content of Roodman (2009) is merely econometric jargon, however, the author compensates for this by providing a jargon-free narrative with pedagogical emphasis. This combination of technical and pedagogical elements allowed the paper and its related command to have a wider outreach.

I encountered many user-written commands recently, some of them as old as xtabond2 or even older, yet their application is largely limited due to poor communication with the audience beyond econometrics. I hope that this message attracts the attention of those involved in code development in the Stata community. I believe that this issue deserves greater attention as the practical consequences could be significant!

Reference
Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. Stata Journal, 9(1), 86–136. https://doi.org/10.1177/1536867x0900900106

Last edited by Mohammed Omran; 26 Aug 2022, 13:45.
Tags: None

1 like
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

26 Aug 2022, 13:22

Are you saying that some of the code written by people is indecipherable? Like the actual ado code, do files, what exact code do we mean here?
Comment
Mohammed Omran

Join Date: Jul 2020

Posts: 25
#3

26 Aug 2022, 13:48

Dear Jared
It seems that I have misused the term code, I meant command.

My focus on the paper itself from the perspective of researchers from fields other than econometrics is to emphasise the importance of communicating both the econometric theory underlying the command as well as the technical aspects of its application from the perspective of the end user. In practical applications, both aspects are equivalently important. So, my argument has nothing to do with the programming and coding processes. This is not of interest to me or to many researchers, commands 'are simply means of conducting research'.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#4

26 Aug 2022, 14:37

I agree with you. Lots of papers I read are really mathematically rigorous, but people can easily get lost in complicated mathematical jargon that's quite familiar to engineers and computer scientists but alien to others.

I think this paper is a good example. https://jmlr.org/papers/v19/17-777.html

I understand 85% of the frontmatter and algorithm, but they lose me at the proofs
Comment
Mohammed Omran

Join Date: Jul 2020

Posts: 25
#5

26 Aug 2022, 15:24

To be honest I don’t expect pure technical work to be simplified. However, when a work is intended to have a practical purpose that reaches non-experts, science communication becomes important. This is particularly important in the context of articles published in the Stata journal concerning new commands or any publication that addresses Stata commands from an end-user perspective.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#6

26 Aug 2022, 19:26

https://papers.ssrn.com/sol3/papers....act_id=4196189

In my paper i try and give a mix of theory and application. I do detail assumptions and what the mean, and more importantly, why they matter. I do use lots of math and equations (in terms of the theory), but I always explain WHY it's important. Because to me, the people reading the paper won't just be econometricians. It'll be normal people who have no time for the theoretical details.

It is unfortunate, sometimes, when people gloss over these details as though they're not important. I think this is why I give three applied examples from different literatures, so people can get a concrete sense of why scul is useful
1 like
Comment
Mohammed Omran

Join Date: Jul 2020

Posts: 25
#7

27 Aug 2022, 04:42

The paper is neatly presented
Well done

Because to me, the people reading the paper won't just be econometricians. It'll be normal people who have no time for the theoretical details

Exactly
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

27 Aug 2022, 10:34

I am a great admirer of Roodman, and I have used his contributions in my own research a lot (-cmp- and -boottest-). I do not use much the Arellano-Bond estimator, but I can confirm that Roodman writes very well and explains very well. His companion paper of -cmp- is a beautiful articulation of the structure of triangular systems that you cannot find in textbooks, and his (with coauthors) companion paper on -boottest- is a collection of results that are difficult otherwise to collect and understand on the wild bootstrap.

But here is a contrarian view, and I call it the R problem. Every time a great statistician/econometrician creates a package/command that does something complicated easily, two thing happen. First, a method which was initially inaccessible to a lot of people who know what they are doing, now becomes accessible because you can do it with one line of code, and this is great. Second, a method which is hard to understand, now becomes accessible to a lot of people who do not know what they are doing, again because you can do it with one line of code.

And here is the problem: when we have lots of packages that do complicated stuff easily (all you need is to plug in some data, and there you go, you get some results), we get a lot of results produced by people who really do not know what is going on. Deep results that have deep meaning get washed away, and we get a lot of results/output by people who should not be producing results/output.

Everybody with Stata on their hands, and some dataset, can do -reg y x, robust-. Now understanding what this all does, and what this all means is a whole different matter. Same for the Arellano-Bond estimator, same for triangular systems, same for wild bootstrap, etc.

I guess my point is that we should not fall asleep at the wheel just because something is easily done now. We still need to read the literature, and understand the method.
4 likes
Comment
Mohammed Omran

Join Date: Jul 2020

Posts: 25
#9

27 Aug 2022, 12:11

I agree, it is a double-edged sword. Although the Roodman highlights the possible misuse of commands and how such phenomenon can be mitigated by disclosing relevant tests and statistics. The rest is the responsibility of journal editors and reviewers when it comes to journal publications for example.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30126
#10

27 Aug 2022, 14:12

The negative side of the easy availability of statistical software and its misuse is a frequent topic over at Andrew Gelman's blog. A particularly relevant thread about this began yesterday. If interested, see https://statmodeling.stat.columbia.e...onable-papers/.
1 like
Comment
Mohammed Omran

Join Date: Jul 2020

Posts: 25
#11

27 Aug 2022, 15:50

This reminds me of

Brodeur, A., N. Cook, and A. Heyes (2020). Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics. American Economic Review 110, 3634–3660.
Comment

Announcement

On the popularity of xtabond2 - how can codes create an impact?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment