URL normalization for Python
A Python library for standardizing and normalizing URLs with support for internationalized domain names (IDN).
url-normalize provides a robust URI normalization function that:
Inspired by Sam Ruby’s urlnorm.py
pip install url-normalize
from url_normalize import url_normalize
# Basic normalization (uses https by default)
print(url_normalize("www.foo.com:80/foo"))
# Output: https://www.foo.com/foo
# With custom default scheme
print(url_normalize("www.foo.com/foo", default_scheme="http"))
# Output: http://www.foo.com/foo
# With query parameter filtering enabled
print(url_normalize("www.google.com/search?q=test&utm_source=test", filter_params=True))
# Output: https://www.google.com/search?q=test
# With custom parameter allowlist as a dict
print(url_normalize(
"example.com?page=1&id=123&ref=test",
filter_params=True,
param_allowlist={"example.com": ["page", "id"]}
))
# Output: https://example.com?page=1&id=123
# With custom parameter allowlist as a list
print(url_normalize(
"example.com?page=1&id=123&ref=test",
filter_params=True,
param_allowlist=["page", "id"]
))
# Output: https://example.com?page=1&id=123
# With default domain for absolute paths
print(url_normalize("/images/logo.png", default_domain="example.com"))
# Output: https://example.com/images/logo.png
# With default domain and custom scheme
print(url_normalize("/images/logo.png", default_scheme="http", default_domain="example.com"))
# Output: http://example.com/images/logo.png
You can also use url-normalize
from the command line:
$ url-normalize "www.foo.com:80/foo"
# Output: https://www.foo.com/foo
# With custom default scheme
$ url-normalize -s http "www.foo.com/foo"
# Output: http://www.foo.com/foo
# With query parameter filtering
$ url-normalize -f "www.google.com/search?q=test&utm_source=test"
# Output: https://www.google.com/search?q=test
# With custom allowlist
$ url-normalize -f -p page,id "example.com?page=1&id=123&ref=test"
# Output: https://example.com/?page=1&id=123
# With default domain for absolute paths
$ url-normalize -d example.com "/images/logo.png"
# Output: https://example.com/images/logo.png
# With default domain and custom scheme
$ url-normalize -d example.com -s http "/images/logo.png"
# Output: http://example.com/images/logo.png
# Via uv tool/uvx
$ uvx url-normalize www.foo.com:80/foo
# Output: https://www.foo.com:80/foo
For a complete history of changes, see CHANGELOG.md.
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License