psychopy.org | Reference | Downloads | Github

Require only hard dependencies in pip install


#1

This is a continuation of our discussion at https://github.com/psychopy/psychopy/pull/2296.

In short, I find it very annoying that pip install psychopy installs 68(!) dependencies when all I need is the very basic core functionality for a nice little paradigm. Therefore, I suggested to have only the core/hard dependencies in the install_requires section of setup.py. Note that this is the recommended practice as per the official Python packaging documentation, which states that this section should be used “to specify what a project minimally needs to run correctly”. The recommended way to install all dependencies is to provide a requirements.txt.

In my opinion, these are the core dependencies that are absolutely necessary to run a minimal PsychoPy: numpy, scipy, matplotlib, pandas, pyglet, and moviepy. These packages are relatively unproblematic and straightforward to install via pip on all major platforms.

All other packages should go in requirements.txt. This makes it pretty easy for people to either install a minimal or fully-featured PsychoPy via pip.

People who do not have a lot of technical background and just want to use PsychPy should install the stand-alone packages, which include everything (even Python). I think these two options cater to two different groups of people: while pip is probably better for people familiar with Python and its packaging ecosystem, the stand-alone distribution is suitable for people who just want to use PsychoPy.

Let me address two comments @jon made:

This is a nice example. Pillow is technically a soft dependency. I could write a script that never uses it (although the app certainly does and so does ImageStim). How many seconds does it take to install pillow using pip? Conversely, how many seconds does it take a user to debug the message “ImportError: No module named Image” and work out that this was because they need to back to the terminal and do pip install pillow (not pip install Image nor pip install PIL which are the name used during import).

It’s not about the time it takes to install a package, but with every additional package you require, the chance of something going wrong increases. If I don’t ever need Pillow, why should I have to install it? I’d really prefer an opt-in approach. Regarding your debug message, this is not what is happening. The actual message is ImportError: No module named PIL, because there is no Image module anymore (they’ve removed it way back in version 1.0). And it’s pretty easy to find out that PIL is actually installed with pip install pillow (but I agree that this is an odd case but we can document this).

I think a lot of these potential issues could be solved by providing accurate install docs (and PsychoPy already has very good docs FWIW).

As another for-instance, PyQt is not a hard dependency. Some people don’t need ever to open a dialog box. Those people are rare and I would not support suggesting that PyQt is soft even though some people might never need it. When they find that a(nother) demo has failed to run, and spent time working out how to install it ( pip install pyqt5 not pip install pyqt nor pip install PyQt5 …) they get annoyed that things don’t “just work”.

I disagree. People who want everything to “just work” should install the stand-alone version. People who use pip prefer a minimal setup. They will be able to correctly parse the import errors and install any missing packages.

In short, I’m willing to accept dropping something like pygame, but definitely not the more drastic drops that I think you’re asking for. I think most people prefer “batteries included”.

First, pygame is really something else, because it is a deprecated unmaintained package, which no project should require as a hard dependency. Second, I’m not asking for drops - I’m merely suggesting to provide a minimal install via pip (which I’ve listed above) and have a requirements.txt for the full install. Plus, there is always the option to use stand-alone versions.


#2

The issue is exactly that there will be nearly as many opinions as there are users. Many people will never need moviepy but would find the inability to play a beep sound bizarre, for example.

I understand the desire to keep a minimal and clean installation. So why not simply create a separate environment for PsychoPy and all of its dependencies, and another one that you keep in a minimalist state for other work? It is so easy to create and switch between environments (particularly with Anaconda), and disk space is hardly an issue these days, that you can have your cake and eat it too.

That route also allows you to freeze a particular version of PsychoPy for running an experiment, while still being able to keep up with later releases or the developer install for ongoing work.

One of the guiding principles of the PsychoPy project has always been that it is oriented around the needs of users, rather than developers. Many of those users, even the ones who hand-code their experiments, will be scientists rather than professional programmers, who (like me) will often fall at the first hurdle when inscrutable error messages arise. We want to ease the process for them as much as possible. They won’t notice at all the impact of having many dependencies installed, but Jon nicely illustrates how they might struggle to resolve a missing dependency, particularly when the install name is not the same as the project name.


#3

This list is currently dictated by the source code - all other packages are not required to run because they are imported on demand. Of course, this list is not set in stone and open for discussion, but I’d argue that it should not consist of all packages PsychoPy could ever need. Again: each package you require will increase the likelihood that the installation breaks for some users, and you will lose these users.

I understand the desire to keep a minimal and clean installation. So why not simply create a separate environment for PsychoPy and all of its dependencies, and another one that you keep in a minimalist state for other work? It is so easy to create and switch between environments (particularly with Anaconda), and disk space is hardly an issue these days, that you can have your cake and eat it too. That route also allows you to freeze a particular version of PsychoPy for running an experiment, while still being able to keep up with later releases or the developer install for ongoing work.

This may be true if you actually run an experiment - then I agree freezing is a good idea to facilitate reproducibility. I disagree when you say disk space is hardly an issue. A current PsychoPy installation adds around 750MB, and if I do this for each project you quickly get in the GB range. This is certainly an issue for me.

One of the guiding principles of the PsychoPy project has always been that it is oriented around the needs of users, rather than developers. Many of those users, even the ones who hand-code their experiments, will be scientists rather than professional programmers, who (like me) will often fall at the first hurdle when inscrutable error messages arise. We want to ease the process for them as much as possible. They won’t notice at all the impact of having many dependencies installed, but Jon nicely illustrates how they might struggle to resolve a missing dependency, particularly when the install name is not the same as the project name.

I understand this guiding principle. But that’s why there are standalone releases. Why do you want to make it hard for developers? Why should users care which dependencies get pulled in when you install via pip? Most users probably don’t even know pip and will use standalone releases.

In case my previous post wasn’t clear, here’s a summary of what I’m proposing:

  1. Users who want a fully featured PsychoPy: Standalone release
  2. Users (developers) who want a minimal PsychoPy environment: pip install psychopy
  3. Users (developers) who want a batteries included PsychoPy: pip install -r https://raw.githubusercontent.com/psychopy/psychopy/master/requirements.txt

I really don’t understand the problem.


#4

Really? Here are some components of it:

  • Most users won’t read the documentation telling them to type pip install -r https://raw.githubusercontent.com/psychopy/psychopy/master/requirements.txt. A python user that has heard of psychopy or used it before will go to a terminal, without going to any web page at all, type pip install psychopy and want it to work for their study. So pip install psychopy should support most studies. They do not want to go to a webpage and remind themselves that complicated url for requirements.txt
  • The converse option of pip install psychopy --no-deps is really easy to remember, common to all packages, and allows all the things the save-my-disk-space user care about
  • Most users don’t want to keep going back to the terminal to install further packages, as each missing dependency is noticed
  • Most users don’t care if psychopy installs 68 soft-dependencies via pip. You can buy a 4TB HDD for ÂŁ100

Happy to put this to the vote but, I’d rather spend my own time working on really important issues like keyboard timing, so I hope you don’t mind if I leave the discussion here.


#5

Everyone feel free to vote below:

  • I want pip install to give minimal dependencies
  • I want pip install to give most common dependencies
  • I want pip install to give nearly all dependencies

0 voters


#6

Whatever, I could reply to each of your points, but it won’t change your mind anyway, so I’m out.


#7

I’ll change my mind if lots of people want the change but right now I have a strong belief that most people want “most dependencies”


#8

I was in favour of a reduced set of dependencies, but I hadn’t thought of pip install psychopy --no-deps. That works well for me.

(I do run into disk space issues, on my laptop - it doesn’t have that much disk space and what it has is not easily expanded. I use virtualenvs a lot and the disk usage adds up quickly.)


#9

Sure, this is always an option. Note that so far, no one has voted for “nearly all dependencies”, which is the current state of the psychopy package. “Most common dependencies” would also be fine with me (better than “nearly all dependencies”), so I guess someone should come up with a list of these.


#10

There are lots of modules lazy loaded that I don’t normally use but crash the program when some of their dependencies are not installed. It was suggested that we use some sort of just-in-time import mechanism to only load stuff if people are actually using some function. I think we discussed this as a possible solution here: https://github.com/psychopy/psychopy/issues/2064


#11

Yep. Happy for that to be done still