PyPI caching mirror
PyPI caching proxy
See Alternatives.
Choose between running inside Docker container if you want to
run in a known-working environment, or outside via a Python app (instructions here are
for the Flask development server) if you
want more control over the environment.
Note: the index cache and the management of the file cache runs in memory, but is
not synchronised across multiple processes, so use multiple threads instead of
multiple processes. The cache is thread-safe.
Uses a Gunicorn WSGI server
docker run -p 5000:5000 epicwink/proxpi
Without arguments, runs with 2 threads. If passing arguments, make sure to bind to an
exported address (or all with 0.0.0.0
) on port 5000 (ie --bind 0.0.0.0:5000
).
Alternatively, use Docker Compose
docker compose up
pip install proxpi
Install proxpi[pretty]
instead to get coloured logging and tracebacks (disable by
setting environment variable NO_COLOR=1
).
FLASK_APP=proxpi.server flask run
See flask run --help
for more information on address and port binding, and certificate
specification to use HTTPS. Alternatively, bring your own WSGI server.
Use PIP’s index-URL flag to install packages via the proxy
pip install --index-url http://127.0.0.1:5000/index/ simplejson
Either head to http://127.0.0.1:5000/ in the browser, or run:
curl -X DELETE http://127.0.0.1:5000/cache/simplejson
curl -X DELETE http://127.0.0.1:5000/cache/list
If you need to invalidate a locally cached file, restart the server: files should never
change in a package index.
PROXPI_INDEX_URL
: index URL, default: https://pypi.org/simple/PROXPI_INDEX_TTL
: index cache time-to-live in seconds,PROXPI_EXTRA_INDEX_URLS
: extra index URLs (comma-separated)PROXPI_EXTRA_INDEX_TTLS
: corresponding extra index cache times-to-live in secondsPROXPI_CACHE_SIZE
: size of downloaded project files cache (bytes), default 5GB.PROXPI_CACHE_DIR
: downloaded project files cache directory path, default: a newPROXPI_BINARY_FILE_MIME_TYPE=1
: force file-response content-type to"application/octet-stream"
instead of letting Flask guess it. This may be neededPROXPI_DISABLE_INDEX_SSL_VERIFICATION=1
: don’t verify any index SSL certificatesPROXPI_DOWNLOAD_TIMEOUT
: time (in seconds) before proxpi
will redirect to thePROXPI_CONNECT_TIMEOUT
: time (in seconds) proxpi
will wait for a socket torequests
raises a ConnectTimeout
errorPROXPI_READ_TIMEOUT
: time (in seconds) proxpi
will wait for chunks of datarequests
raises a ReadTimeout
error to preventPROXPI_LOGGING_LEVEL
: PythonINFO
proxpi
was designed with three goals (particularly for continuous integration (CI)):
pip install
timesSpecifically, proxpi
was designed to run for CI services such as
Travis,
Jenkins,
GitLab CI,
Azure Pipelines
and GitHub Actions.
proxpi
works by caching index requests (ie which versions, wheel-types, etc are
available for a given project, the index cache) and the project files themselves (to a
local directory, the package cache). This means they will cache identical requests after
the first request, and will be useless for just one pip install
.
As a basic end-user of these services, for at least most of these services you won’t be
able to keep a proxpi
server running between multiple invocations of your project(s)
CI pipeline: CI invocations are designed to be independent. This means the best that you
can do is start the cache for just the current job.
A more advanced user of these CI services can bring their own runner (personally, my
needs are for running GitLab CI). This means you can run proxpi
on a fully-controlled
server (eg EC2 instance), and proxy PyPI requests (during
a pip
command) through the local cache. See the instructions
below.
Hopefully, in the future these CI services will all implement their own transparent
caching for PyPI. For example, Azure already has
Azure Artifacts which
provides much more functionality than proxpi
, but won’t reduce pip install
times for
CI services not using Azure.
This implementation leverages the index URL configurable of pip
and Docker networks.
This is to be run on a server you have console access to.
Create a Docker bridge network
docker network create gitlab-runner-network
Start a GitLab CI Docker runner using
their documentation
Run the proxpi
Docker container
docker run \
--detach \
--network gitlab-runner-network \
--volume proxpi-cache:/var/cache/proxpi \
--env PROXPI_CACHE_DIR=/var/cache/proxpi \
--name proxpi epicwink/proxpi:latest
You don’t need to expose a port (the -p
flag) as we’ll be using an internal
Docker network.
Set pip
‘s index URL to the proxpi
server by setting it in the runner environment.
Set runners[0].docker.network_mode
to gitlab-runner-network
.
Add PIP_INDEX_URL=http://proxpi:5000/index/
and PIP_TRUSTED_HOST=proxpi
to runners.environment
in the GitLab CI runner configuration TOML. For example, you
may end up with the following configuration:
[[runners]]
name = "awesome-ci-01"
url = "https://gitlab.com/"
token = "SECRET"
executor = "docker"
environment = [
"DOCKER_TLS_CERTDIR=/certs",
"PIP_INDEX_URL=http://proxpi:5000/index/",
"PIP_TRUSTED_HOST=proxpi",
]
[[runners.docker]]
network_mode = "gitlab-runner-network"
...
This is designed to not require any changes to the GitLab CI project configuration (iegitlab-ci.yml
), unless it already sets the index URL for some reason (if that’s the
case, you’re probably already using a cache).
Another option is to set up a proxy, but that’s more effort than the above method.
simpleindex: routes URLs to multiple
indices (including PyPI), supports local (or S3 with a plygin) directory of packages,
no caching without custom plugins
bandersnatch: mirrors one index (eg PyPI),
storing packages locally, or on S3 with a plugin. Manual update, no proxy
devpi: heavyweight, runs a full index (or multiple)
in addition to mirroring (in place of proxying), supports proxying (with inheritance),
supports package upload, server replication and fail-over
pypiserver: serves local directory of
packages, proxy to PyPI when not-found, supports package upload, no caching
PyPI Cloud: serves local or cloud-storage
directory of packages, with redirecting/cached proxying to indexes, authentication and
authorisation.
pypiprivate
: serves local (or S3-hosted)
directory of packages, no proxy to package indices (including PyPI)
Pulp: generic content repository, can host
multiple ecosystems’ packages.
Python package index plugin supports local/S3
mirrors, package upload, proxying to multiple indices, no caching
pip2pi
: manual syncing of specific packages,
no proxy
nginx_pypi_cache
: caching proxy
using nginx, single index
Flask-Pypi-Proxy: unmaintained, no cache
size limit, no caching index pages
http.server
: standard-library,
hosts directory exactly as laid out, no proxy to package indices (eg PyPI)
Apache with mod_rewrite
: I’m not familiar with
Apache, but it likely has the capability to proxy and cache (with eg mod_cache_disk
)
Gemfury: hosted, managed. Private index is not free,
documentation doesn’t say anything about proxying