Python for hypothesis testing - possibly off-topic

More of a general question, not really about PsychoPy, but thought I’d ask in here anyway since I assume after running your experiments in the glorious PsychoPy you may or may not end up considering Python as the analysis platform.

Given the general growth of Python over the last ~decade, particularly with regards to scientific computing, I’m quite surprised at how difficult it is to implement the statistics we need (as researchers in psych). I know SciPy has some functionality, and of course there is statsmodels, but after looking into this it seems that to just run a few stats tests we need to import a bunch of different modules from various libraries and write relatively lots of code. Despite that, people still talk about how great Python is for statistics, which is strange. It seems fine for the basics, but beyond that my impression is the complete opposite.

Am I missing something? Does anybody here have experience in using Python for statistics in psychological research, or does everyone eventually just switch to R (or SPSS!)? It would be so nice to have everything self-contained in Python.

You are right: Standard statistical methods are lagging a bit behind of what’s possible for example in R.

These are common go tos:

Pingouin sounds like what you need, but I haven’t tried it myself (yet).

2 Likes

I had never heard of Pingouin before but looking at the docs it seems exactly what is needed. Thanks a lot, that’s really helpful!

1 Like

Just use whatever tool is best for you, for the purpose at hand. e.g. I use Python (i.e. PsychoPy) to run eye movement experiments and Python to process and measure the resulting raw eye movement recordings (importing and processing files; transforming coordinates; filtering; parsing into saccades, glissades, blinks and fixations) and then making automated measurements of individual saccades and fixations. The output of that process is a tabular set of saccade measurements, which I then manipulate and analyse in R.

I find the data manipulation and analysis tools in R to be fantastic (and in my previous experience, better than those available in Python) but I would never consider doing any of the earlier stages of the process in that language. Maybe these days the latter parts are also feasible in Python, but I’d rather stick with what I know and am productive in. Realistically, this isn’t a language battle: it’s more about what packages/libraries are available in each ecosystem, and which ones you already have expertise in. The cost of learning to do something you can already do in another system is a period of lost productivity, and shouldn’t be ignored as a valid consideration.

R and Python are each converging on the other in many ways, but in the interim, your own familiarity and expertise is as important a consideration as the “objective” differences or features of the languages.

I’d say that there are, however, strong and compelling and objective arguments to go for an open, reproducible and fully scripted analysis pipeline, in whatever language, rather than using a closed system.

1 Like

I tend to go for Bayesian stats. So mostly that has involved PyMC3 (Python). But sometimes I use JASP. I also recently found dabest which has some excellent visualisation and some basic stats tests (Python, R, Matlab, WebApp). I agree with what others have said… use whatever is the best tool available, so experience with multiple languages is best. For example, recently I’ve been getting into brms which is for R only.

2 Likes