Require only hard dependencies in pip install

This is a continuation of our discussion at Remove pygame requirement by cbrnr · Pull Request #2296 · psychopy/psychopy · GitHub.

In short, I find it very annoying that pip install psychopy installs 68(!) dependencies when all I need is the very basic core functionality for a nice little paradigm. Therefore, I suggested to have only the core/hard dependencies in the install_requires section of setup.py. Note that this is the recommended practice as per the official Python packaging documentation, which states that this section should be used “to specify what a project minimally needs to run correctly”. The recommended way to install all dependencies is to provide a requirements.txt.

In my opinion, these are the core dependencies that are absolutely necessary to run a minimal PsychoPy: numpy, scipy, matplotlib, pandas, pyglet, and moviepy. These packages are relatively unproblematic and straightforward to install via pip on all major platforms.

All other packages should go in requirements.txt. This makes it pretty easy for people to either install a minimal or fully-featured PsychoPy via pip.

People who do not have a lot of technical background and just want to use PsychPy should install the stand-alone packages, which include everything (even Python). I think these two options cater to two different groups of people: while pip is probably better for people familiar with Python and its packaging ecosystem, the stand-alone distribution is suitable for people who just want to use PsychoPy.

Let me address two comments @jon made:

This is a nice example. Pillow is technically a soft dependency. I could write a script that never uses it (although the app certainly does and so does ImageStim). How many seconds does it take to install pillow using pip? Conversely, how many seconds does it take a user to debug the message “ImportError: No module named Image” and work out that this was because they need to back to the terminal and do pip install pillow (not pip install Image nor pip install PIL which are the name used during import).

It’s not about the time it takes to install a package, but with every additional package you require, the chance of something going wrong increases. If I don’t ever need Pillow, why should I have to install it? I’d really prefer an opt-in approach. Regarding your debug message, this is not what is happening. The actual message is ImportError: No module named PIL, because there is no Image module anymore (they’ve removed it way back in version 1.0). And it’s pretty easy to find out that PIL is actually installed with pip install pillow (but I agree that this is an odd case but we can document this).

I think a lot of these potential issues could be solved by providing accurate install docs (and PsychoPy already has very good docs FWIW).

As another for-instance, PyQt is not a hard dependency. Some people don’t need ever to open a dialog box. Those people are rare and I would not support suggesting that PyQt is soft even though some people might never need it. When they find that a(nother) demo has failed to run, and spent time working out how to install it ( pip install pyqt5 not pip install pyqt nor pip install PyQt5 …) they get annoyed that things don’t “just work”.

I disagree. People who want everything to “just work” should install the stand-alone version. People who use pip prefer a minimal setup. They will be able to correctly parse the import errors and install any missing packages.

In short, I’m willing to accept dropping something like pygame, but definitely not the more drastic drops that I think you’re asking for. I think most people prefer “batteries included”.

First, pygame is really something else, because it is a deprecated unmaintained package, which no project should require as a hard dependency. Second, I’m not asking for drops - I’m merely suggesting to provide a minimal install via pip (which I’ve listed above) and have a requirements.txt for the full install. Plus, there is always the option to use stand-alone versions.

3 Likes

The issue is exactly that there will be nearly as many opinions as there are users. Many people will never need moviepy but would find the inability to play a beep sound bizarre, for example.

I understand the desire to keep a minimal and clean installation. So why not simply create a separate environment for PsychoPy and all of its dependencies, and another one that you keep in a minimalist state for other work? It is so easy to create and switch between environments (particularly with Anaconda), and disk space is hardly an issue these days, that you can have your cake and eat it too.

That route also allows you to freeze a particular version of PsychoPy for running an experiment, while still being able to keep up with later releases or the developer install for ongoing work.

One of the guiding principles of the PsychoPy project has always been that it is oriented around the needs of users, rather than developers. Many of those users, even the ones who hand-code their experiments, will be scientists rather than professional programmers, who (like me) will often fall at the first hurdle when inscrutable error messages arise. We want to ease the process for them as much as possible. They won’t notice at all the impact of having many dependencies installed, but Jon nicely illustrates how they might struggle to resolve a missing dependency, particularly when the install name is not the same as the project name.

1 Like

This list is currently dictated by the source code - all other packages are not required to run because they are imported on demand. Of course, this list is not set in stone and open for discussion, but I’d argue that it should not consist of all packages PsychoPy could ever need. Again: each package you require will increase the likelihood that the installation breaks for some users, and you will lose these users.

I understand the desire to keep a minimal and clean installation. So why not simply create a separate environment for PsychoPy and all of its dependencies, and another one that you keep in a minimalist state for other work? It is so easy to create and switch between environments (particularly with Anaconda), and disk space is hardly an issue these days, that you can have your cake and eat it too. That route also allows you to freeze a particular version of PsychoPy for running an experiment, while still being able to keep up with later releases or the developer install for ongoing work.

This may be true if you actually run an experiment - then I agree freezing is a good idea to facilitate reproducibility. I disagree when you say disk space is hardly an issue. A current PsychoPy installation adds around 750MB, and if I do this for each project you quickly get in the GB range. This is certainly an issue for me.

One of the guiding principles of the PsychoPy project has always been that it is oriented around the needs of users, rather than developers. Many of those users, even the ones who hand-code their experiments, will be scientists rather than professional programmers, who (like me) will often fall at the first hurdle when inscrutable error messages arise. We want to ease the process for them as much as possible. They won’t notice at all the impact of having many dependencies installed, but Jon nicely illustrates how they might struggle to resolve a missing dependency, particularly when the install name is not the same as the project name.

I understand this guiding principle. But that’s why there are standalone releases. Why do you want to make it hard for developers? Why should users care which dependencies get pulled in when you install via pip? Most users probably don’t even know pip and will use standalone releases.

In case my previous post wasn’t clear, here’s a summary of what I’m proposing:

  1. Users who want a fully featured PsychoPy: Standalone release
  2. Users (developers) who want a minimal PsychoPy environment: pip install psychopy
  3. Users (developers) who want a batteries included PsychoPy: pip install -r https://raw.githubusercontent.com/psychopy/psychopy/master/requirements.txt

I really don’t understand the problem.

Really? Here are some components of it:

  • Most users won’t read the documentation telling them to type pip install -r https://raw.githubusercontent.com/psychopy/psychopy/master/requirements.txt. A python user that has heard of psychopy or used it before will go to a terminal, without going to any web page at all, type pip install psychopy and want it to work for their study. So pip install psychopy should support most studies. They do not want to go to a webpage and remind themselves that complicated url for requirements.txt
  • The converse option of pip install psychopy --no-deps is really easy to remember, common to all packages, and allows all the things the save-my-disk-space user care about
  • Most users don’t want to keep going back to the terminal to install further packages, as each missing dependency is noticed
  • Most users don’t care if psychopy installs 68 soft-dependencies via pip. You can buy a 4TB HDD for £100

Happy to put this to the vote but, I’d rather spend my own time working on really important issues like keyboard timing, so I hope you don’t mind if I leave the discussion here.

1 Like

Everyone feel free to vote below:

  • I want pip install to give minimal dependencies
  • I want pip install to give most common dependencies
  • I want pip install to give nearly all dependencies

0 voters

1 Like

Whatever, I could reply to each of your points, but it won’t change your mind anyway, so I’m out.

I’ll change my mind if lots of people want the change but right now I have a strong belief that most people want “most dependencies”

I was in favour of a reduced set of dependencies, but I hadn’t thought of pip install psychopy --no-deps. That works well for me.

(I do run into disk space issues, on my laptop - it doesn’t have that much disk space and what it has is not easily expanded. I use virtualenvs a lot and the disk usage adds up quickly.)

1 Like

Sure, this is always an option. Note that so far, no one has voted for “nearly all dependencies”, which is the current state of the psychopy package. “Most common dependencies” would also be fine with me (better than “nearly all dependencies”), so I guess someone should come up with a list of these.

There are lots of modules lazy loaded that I don’t normally use but crash the program when some of their dependencies are not installed. It was suggested that we use some sort of just-in-time import mechanism to only load stuff if people are actually using some function. I think we discussed this as a possible solution here: https://github.com/psychopy/psychopy/issues/2064

Yep. Happy for that to be done still

If I may weigh in as an inexperienced user of psychopy. I realize this discussion happened a while ago but it does not seem it moved much since then. I just stumbled upon this thread while Googling if I can opt-out of installing the PsychoPy Builder as I don’t really need it. I hope you don’t mind.

It’s true that it’s nice that pip install psychopy gets you all dependencies you need, regardless of what stimuli you need to show, etc. From that perspective, I agree it’s difficult to make a minimal list of dependencies as that will depend on the user.

However, I do think there are different delineations possible. For example, it is quite standard to separate code that is needed to run a program, and code that is needed/helpful to create, write, or test a program. I would for example appreciate if psychopy itself included only stuff that is needed to run an experiment written for psychopy. I don’t need the IDE or the GUI bundled with it by default - as those seem helpful if you want to create or modify an experiment, and even then they are optional.

Another way how to look at it is to consider separation of periferies and deployment. For example, while a user might not know in advance what kind of stimuli they will show and what packages that will require for that, etc., they will probably know if they plan to use external hardware like an fMRI scanner or an eye-tracker, or if they plan to deploy their study online.

From those perspectives it would make sense to me to split it up somewhat like this

  1. Users who want a fully featured PsychoPy: Standalone release. The alternative to a standalone release is to install all components manually: pip install psychopy-core psychopy-coder psychopy-builder psychopy-hardware psychopy-online
  2. Users who want a minimal PsychoPy environment: pip install psychopy-core, alternatively pip install psychopy-core --no-deps if they want really to have a complete control over their dependencies.
  3. Users who want to use GUI to develop simple programs: pip install psychopy-core psychopy-builder, users who want to use psychopy to talk to, e.g., an eye-tracker: pip install psychopy-core psychopy-hardware, etc.

I think this kind of delineation would make sense to me - it does not obscure it for users who want “the whole thing” to just work, makes it significantly easier and more transparent for users who already have an idea about their development/deployment workflow and experimental setup, etc, even if they are not exactly sure how they will go about programming the whole thing, and also makes it easier for people who exactly know what they are doing and want full control, starting from a minimal install.

I understand that separating the code and maintaining it in multiple separately installable packages probably requires a lot of additional work. I am also not sure how much dependencies would be actually shaven off from the core package if the additional bells and whistles were taken out. So I understand if you would be reluctant to commit to this idea depending on these factors.

Just something that came to my mind when stumbling upon this thread and it did not seem this kind of separation was mentioned. If I am completely off base, sorry about that, I admit I am writing this without much knowledge about how is PsychoPy setup to work in the first place.

@Kucharssim the more recent discussion of this issue is at Yet another topic on installation & dependencies - #19 by TChauhan

Essentially there is progress towards this idea but I’ll update on that thread and close this thread so we don’t have too many different threads repeating the same concepts

1 Like