Stata vs Python for data management

Christopher Bratt

Join Date: May 2019

Posts: 144
#1

Stata vs Python for data management

06 Dec 2020, 11:34

Stata has the best data management of a single data set (a single data frame) that I have come across. Is anyone willing to give a brief description of the data management capabilities in Python (propably meaning pandas and numpy)? For instance, Stata has flexible and convenient variations of the -egen- command for grouped data. Another example is Stata's interpretation of if/else, it adapts to the object we use it on (a simple scalar vs a vector, with a vector if/else automatically becomes vectorised and applied to each observation separately).

Do we have similar data management capabilities within Python when we send a small program from Stata to Python?

Last edited by Christopher Bratt; 06 Dec 2020, 11:47.
Tags: None
Christopher Bratt

Join Date: May 2019

Posts: 144
#2

06 Dec 2020, 17:39

It seems the answer is no. I've now found a reliable source saying that Python's packages are not particularly good for data management. It seems to be better to complete most if not all data management within Stata before calling functions in Python.
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 783
#3

07 Dec 2020, 00:50

https://pandas.pydata.org/pandas-doc...son/index.html
1 like
Comment
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#4

07 Dec 2020, 00:50

What is that source you're referring to?

I've used Python / Pandas for data management and found it to be capable of doing everything that Stata can do and more. It was also much faster while reading in data and reshaping it.

Not to say anything bad about Stata. It is often much simpler to code something in Stata which is why it is usually my first choice, but sometimes I turn to Python to take advantage of its flexibility in object assignment.

I guess it all depends on what you need to do exactly. Some tools can handle some problems better than others.
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#5

07 Dec 2020, 04:10

Wouter Wakker: Thanks for the answer. I've never used Python myself. I searched the internet for information and ended up at Quora, where a person referred to his own books on Python and R and maintained that R was better for data management than Python. Since I find Stata better than R for data manament of a singe data set (a singe data frame), I assumed that Stata would probably be better for this task than Python. And then, as you note, Stata's language for data management is intuitive and easy to use -- and not least, easy to understand for a reader! (Merging, reshaping, or subsetting may be different, I usually don't do that in Stata).

Tusen takk, Bjarte! That's an informative page, I didn't find it while browsing.

(This subject is actually interesting beyond the question what to do in Stata and what to do in Python. As some of us try to make reseach reproducable, showing all code in a supplement, I'm uncertain how many languages/softwares we should use. It might be wise to focus most of the code on a single language. Personally, I may be using too many within a single project.)

PS. Ease of use is an important aspect of which language is "better" for a task. Nick Cox's contribution to -egen- has made life very easy for someone who works with clustered data. And Stata makes if/else easier to use than other languages I've seen.

Last edited by Christopher Bratt; 07 Dec 2020, 04:18.
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#6

07 Dec 2020, 08:24

Rounding off this discussion of Python/Pandas: This is an excellent introduction for anyone wanting to extend their Stata-based work with coding in Pandas:

https://www.youtube.com/watch?v=5rNu16O3YNE&t=3676s
Comment

Announcement

Stata vs Python for data management

Comment

Comment

Comment

Comment

Comment