Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata: Statistics vs "Data Science"

    I have recently joined a company heavy in "Data Science". I thought, great, I love statistics myself so this is a perfect match in heaven, except that, our "Data Scientists" don't seem to have much statistical background, meaning, they would use xgboost, tensorflow, pytorch and various other frameworks, but they have no idea what does it means to reject the null hypothesis, or what's an r^2 or rmse, AIC, BIC, etc.

    Is this something common in the industry, have you observed that too in the workplace, or did I just get lucky?

  • #2
    Jerome:
    unfortunately, I would say that is pretty common.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      https://stats.stackexchange.com/ is one site where statistics people and machine learning come together, or not, as the case may be.

      If it's obvious to you that ML means maximum likelihood, then you're a statistical person. If it's obvious that ML means machine learning, you are a machine learning person. If you don't know what it means, that's OK: you're a student and yet to learn or you stumbled on this discussion by accident.

      Comment


      • #4
        I worked only during one episode with data scientists. I wrote a report for the university senior management on "Determinants of students' drop out decisions," for which report I used the full population of students at the university in question.

        My impression of data scientists was pretty similar to what you are describing. The data scientists I met came across as sophisticated guardians and collectors of data. They had impressive skills in these database management programs, they could retrieve for me whatever cut of the collected data I could imagine...

        On the negative they needed a lot of explaining for every little thing I got from them. Extracting the data from the data scientists was what took the most time on completing this project. It was so bad that we were bouncing back and forth emails for almost a year, and they were sending me garbage data. At the end I went in person to the building where the data scientists were nesting, I hanged over the head of the same guy with whom I communicated for a year through email, and for some hour and a half we produced a proper dataset from which I could complete the study.

        I went to their building expecting that I would have to deal with a mentally handicapped person, an expectation I formed throughout our 1 year email communications that did not lead to nothing... Instead I saw a guy who very well knew what he is doing, and had impressive from my point of view database management skills and could produce whatever cut of the data you explain to him.

        My overall conclusion from this one episode was that data scientists are masters of these database management programs (languages?) they use, and if you have the patience to hang over his head while he is working, he can produce you whatever cut of the existing data you desire. But also they came across as people absolutely uninterested in what this data says, in modelling and visualising this data.

        Comment

        Working...
        X