Every Python developer or data scientist gets to the point where they need to consume, and often publish, Python packages. The main source of open source, publicly available Python packages is PyPI. Used worldwide, PyPI hosts 3 million Python package releases as of this writing. In some cases, however, your team might need to host a Python package index internally. This article introduces Pulp, an open source project for managing repositories of software packages. Our example shows how the Pulp instance works on the Operate First environment where it is hosted. Our example is based on how data scientists and Python developers at Red Hat use the Operate First deployment.
Managing software repositories with Pulp
Pulp can manage content in various formats: RPM packages, container images, Ansible roles, Maven repositories, Python package indexes, and others. See Pulp's content plugins documentation for a full listing.
From a Python packaging perspective, you are likely most interested in the Python content plugin, which you can use to create and host multiple Python package indexes on a single Pulp instance. This perfectly fits into a scenario where multiple teams wish to manage their own Python package index, but may need to operate just one Pulp instance (or very few instances) deployed within the organization.
Because Pulp is supported by Red Hat engineers and is modular, our teams within Red Hat decided to use Pulp to host our Python packages. The Pulp Python package index is deployed in the Operate First production environment. We'll use that as our example for using a Pulp instance as a Python package index.
How to use the Pulp Python package index
The documentation at the Operate First index's site walks you through setting up a Python package index, publishing Python packages, and consuming already hosted Python packages from the Pulp Python package index. Let's look at the main functions of using Pulp.
Setting up a Pulp Python repository
To set up a repository, submit a request to the Operate First support team, as shown in Figure 1. After your request is processed, the instance and the access to it will be configured and ready for use.
Publishing Python packages
After your private index has been set up, you can publish Python packages there. Currently, you need to follow the steps documented in Project Thoth's hello world example application. Eventually, we hope that role-based access control (RBAC) will be enabled.
Consuming Python packages from a Pulp Python package index
With a simple command, you can consume the packages hosted on the Operate First cloud:
$ pip install --index-url "https://pulp.operate-first.cloud/pypi/<index-name>/simple/" --extra-index-url "https://pypi.org/simple"
Note: By including the --extra-index-url option, you can ask pip to fall back on PyPI to retrieve packages that are not found on the specified private index.
Acknowledgments
The Pulp instance on the Operate First environment is live and available to developers after 10 months of cross-team collaboration between engineers from the Pulp team, the Project Thoth team, the team supporting the Operate First deployments, and Python engineers who were involved during the process.
We would like to thank everyone who was part of this effort. Thanks, especially, to the following engineers who were actively involved in the collaboration:
- Bob Fahr, Insights-core team
- Brian Gollaher, Red Hat Enterprise Linux Product Management
- Chris Hambridge, Ansible Engineering
- Christoph Goern, Project Thoth
- Christian Heimes, Red Hat Identity Management, CPython upstream, Python Packaging Authority
- Daniel Alley, Pulp project
- Gerrod Ubben, Pulp project
- Pavel Tisnovsky, Connected Customer Experience (CCX)
- Sviatoslav Sydorenko, Ansible Core Engineering, Python Packaging Authority
- Tomas Orsava, Python maintenance team
- Tom Coufal, Open Services Team