项目作者: maravento

项目描述 :
Domains Blocklist for Squid-Cache
高级语言: Shell
项目地址: git://github.com/maravento/blackweb.git
创建时间: 2016-06-14T17:35:15Z
项目社区:https://github.com/maravento/blackweb

开源协议:GNU General Public License v3.0

下载


BlackWeb

status-stable
last commit
Twitter Follow






English | Español

BlackWeb is a project that collects and unifies public blocklists of domains (porn, downloads, drugs, malware, spyware, trackers, bots, social networks, warez, weapons, etc.) to make them compatible with Squid-Cache.

DATA SHEET


ACL Blocked Domains File Size
blackweb.txt 4927229 123,7 MB

GIT CLONE


  1. git clone --depth=1 https://github.com/maravento/blackweb.git

HOW TO USE


blackweb.txt is already updated and optimized for Squid-Cache. Download it and unzip it in the path of your preference and activate Squid-Cache RULE.

Download

  1. wget -q -c -N https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz && cat blackweb.tar.gz* | tar xzf -

If Multiparts Exist

  1. #!/bin/bash
  2. # Variables
  3. url="https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz"
  4. wgetd="wget -q -c --timestamping --no-check-certificate --retry-connrefused --timeout=10 --tries=4 --show-progress"
  5. # TMP folder
  6. output_dir="bwtmp"
  7. mkdir -p "$output_dir"
  8. # Download
  9. if $wgetd "$url"; then
  10. echo "File downloaded: $(basename $url)"
  11. else
  12. echo "Main file not found. Searching for multiparts..."
  13. # Multiparts from a to z
  14. all_parts_downloaded=true
  15. for part in {a..z}{a..z}; do
  16. part_url="${url%.*}.$part"
  17. if $wgetd "$part_url"; then
  18. echo "Part downloaded: $(basename $part_url)"
  19. else
  20. echo "Part not found: $part"
  21. all_parts_downloaded=false
  22. break
  23. fi
  24. done
  25. if $all_parts_downloaded; then
  26. # Rebuild the original file in the current directory
  27. cat blackweb.tar.gz.* > blackweb.tar.gz
  28. echo "Multipart file rebuilt"
  29. else
  30. echo "Multipart process cannot be completed"
  31. exit 1
  32. fi
  33. fi
  34. # Unzip the file to the output folder
  35. tar -xzf blackweb.tar.gz -C "$output_dir"
  36. echo "Done"

Checksum

  1. wget -q -c -N https://raw.githubusercontent.com/maravento/blackweb/master/checksum.md5
  2. md5sum blackweb.txt | awk '{print $1}' && cat checksum.md5 | awk '{print $1}'

BlackWeb Rule for Squid-Cache


Edit:

  1. /etc/squid/squid.conf

And add the following lines:

  1. # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
  2. # Block Rule for Blackweb
  3. acl blackweb dstdomain "/path_to/blackweb.txt"
  4. http_access deny blackweb

Advanced Rules

BlackWeb contains millions of domains, therefore it is recommended:

Allow Rule for Domains

Use allowdomains.txt to exclude essential domains or subdomains, such as .accounts.google.com, .yahoo.com, .github.com, etc. According to Squid’s documentation, the subdomains accounts.google.com and accounts.youtube.com may be used by Google for authentication within its ecosystem. Blocking them could disrupt access to services like Gmail, Drive, Docs, and others.

  1. acl allowdomains dstdomain "/path_to/allowdomains.txt"
  2. http_access allow allowdomains
Block Rule for Domains

Use blockdomains.txt to add domains not included in blackweb.txt (e.g.: .youtube.com .googlevideo.com, .ytimg.com, etc).

  1. acl blockdomains dstdomain "/path_to/blockdomains.txt"
  2. http_access deny blockdomains
Block Rule for gTLD, sTLD, ccTLD, etc

Use blocktlds.txt to block gTLD, sTLD, ccTLD, etc.

  1. acl blocktlds dstdomain "/path_to/blocktlds.txt"
  2. http_access deny blocktlds

Input:

  1. .bardomain.xxx
  2. .subdomain.bardomain.xxx
  3. .bardomain.ru
  4. .bardomain.adult
  5. .foodomain.com
  6. .foodomain.porn

Output:

  1. .foodomain.com
Block Rule for Punycode

Use this rule to block Punycode - RFC3492, IDN | Non-ASCII (TLDs or Domains), to prevent an IDN homograph attack. For more information visit welivesecurity: Homograph attacks.

  1. acl punycode dstdom_regex -i \.xn--.*
  2. http_access deny punycode

Input:

  1. .bücher.com
  2. .mañana.com
  3. .google.com
  4. .auth.wikimedia.org
  5. .xn--fiqz9s
  6. .xn--p1ai

ASCII Output:

  1. .google.com
  2. .auth.wikimedia.org
Block Rule for Words

Use this rule to block words (Optional. Can generate false positives).

  1. # Download ACL:
  2. sudo wget -P /etc/acl/ https://raw.githubusercontent.com/maravento/vault/refs/heads/master/blackword/blockwords.txt
  3. # Squid Rule to Block Words:
  4. acl blockwords url_regex -i "/etc/acl/blockwords.txt"
  5. http_access deny blockwords

Input:

  1. .bittorrent.com
  2. https://www.google.com/search?q=torrent
  3. https://www.google.com/search?q=mydomain
  4. https://www.google.com/search?q=porn
  5. .mydomain.com

Output:

  1. https://www.google.com/search?q=mydomain
  2. .mydomain.com

Advanced Rules Summary

  1. # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
  2. # Allow Rule for Domains
  3. acl allowdomains dstdomain "/path_to/allowdomains.txt"
  4. http_access allow allowdomains
  5. # Block Rule for Punycode
  6. acl punycode dstdom_regex -i \.xn--.*
  7. http_access deny punycode
  8. # Block Rule for gTLD, sTLD, ccTLD
  9. acl blocktlds dstdomain "/path_to/blocktlds.txt"
  10. http_access deny blocktlds
  11. # Block Rule for Words (Optional)
  12. acl blockwords url_regex -i "/etc/acl/blockwords.txt"
  13. http_access deny blockwords
  14. # Block Rule for Domains
  15. acl blockdomains dstdomain "/path_to/blockdomains.txt"
  16. http_access deny blockdomains
  17. # Block Rule for Blackweb
  18. acl blackweb dstdomain "/path_to/blackweb.txt"
  19. http_access deny blackweb

BLACKWEB UPDATE


⚠️ WARNING: BEFORE YOU CONTINUE

This section is only to explain how update and optimization process works. It is not necessary for user to run it. This process can take time and consume a lot of hardware and bandwidth resources, therefore it is recommended to use test equipment.

Bash Update

The update process of blackweb.txt consists of several steps and is executed in sequence by the script bwupdate.sh. The script will request privileges when required.

  1. wget -q -N https://raw.githubusercontent.com/maravento/blackweb/master/bwupdate/bwupdate.sh && chmod +x bwupdate.sh && ./bwupdate.sh

Dependencies

Update requires python 3x and bash 5x. It also requires the following dependencies:

  1. wget git curl libnotify-bin perl tar rar unrar unzip zip gzip python-is-python3 idn2 iconv

Make sure your Squid is installed correctly. If you have any problems, run the following script: (sudo ./squid_install.sh):

  1. #!/bin/bash
  2. # kill old version
  3. while pgrep squid > /dev/null; do
  4. echo "Waiting for Squid to stop..."
  5. killall -s SIGTERM squid &>/dev/null
  6. sleep 5
  7. done
  8. # squid remove (if exist)
  9. apt purge -y squid* &>/dev/null
  10. rm -rf /var/spool/squid* /var/log/squid* /etc/squid* /dev/shm/* &>/dev/null
  11. # squid install (you can use 'squid-openssl' or 'squid')
  12. apt install -y squid-openssl squid-langpack squid-common squidclient squid-purge
  13. # create log
  14. if [ ! -d /var/log/squid ]; then
  15. mkdir -p /var/log/squid
  16. fi &>/dev/null
  17. if [[ ! -f /var/log/squid/{access,cache,store,deny}.log ]]; then
  18. touch /var/log/squid/{access,cache,store,deny}.log
  19. fi &>/dev/null
  20. # permissions
  21. chown -R proxy:proxy /var/log/squid
  22. # enable service
  23. systemctl enable squid.service
  24. systemctl start squid.service
  25. echo "Done"

Capture Public Blocklists

Capture domains from downloaded public blocklists (see SOURCES) and unifies them in a single file.

Domain Debugging

Remove overlapping domains ('.sub.example.com' is a subdomain of '.example.com'), does homologation to Squid-Cache format and excludes false positives (google, hotmail, yahoo, etc.) with a allowlist (debugwl.txt).

Input:

  1. com
  2. .com
  3. .domain.com
  4. domain.com
  5. 0.0.0.0 domain.com
  6. 127.0.0.1 domain.com
  7. ::1 domain.com
  8. domain.com.co
  9. foo.bar.subdomain.domain.com
  10. .subdomain.domain.com.co
  11. www.domain.com
  12. www.foo.bar.subdomain.domain.com
  13. domain.co.uk
  14. xxx.foo.bar.subdomain.domain.co.uk

Output:

  1. .domain.com
  2. .domain.com.co
  3. .domain.co.uk

TLD Validation

Remove domains with invalid TLDs (with a list of Public and Private Suffix TLDs: ccTLD, ccSLD, sTLD, uTLD, gSLD, gTLD, eTLD, etc., up to 4th level 4LDs).

Input:

  1. .domain.exe
  2. .domain.com
  3. .domain.edu.co

Output:

  1. .domain.com
  2. .domain.edu.co

Debugging Punycode-IDN

Remove hostnames larger than 63 characters (RFC 1035) and other characters inadmissible by IDN and convert domains with international characters (non ASCII) and used for homologous attacks to Punycode/IDNA format.

Input:

  1. bücher.com
  2. café.fr
  3. españa.com
  4. köln-düsseldorfer-rhein-main.de
  5. mañana.com
  6. mūsųlaikas.lt
  7. sendesık.com
  8. президент.рф

Output:

  1. xn--bcher-kva.com
  2. xn--caf-dma.fr
  3. xn--d1abbgf6aiiy.xn--p1ai
  4. xn--espaa-rta.com
  5. xn--kln-dsseldorfer-rhein-main-cvc6o.de
  6. xn--maana-pta.com
  7. xn--mslaikas-qzb5f.lt
  8. xn--sendesk-wfb.com

DNS Loockup

Most of the SOURCES contain millions of invalid and nonexistent domains. Then, a double check of each domain is done (in 2 steps) via DNS and invalid and nonexistent are excluded from Blackweb. This process may take. By default it processes domains in parallel ≈ 6k to 12k x min, depending on the hardware and bandwidth.

  1. HIT google.com
  2. google.com has address 142.251.35.238
  3. google.com has IPv6 address 2607:f8b0:4008:80b::200e
  4. google.com mail is handled by 10 smtp.google.com.
  5. FAULT testfaultdomain.com
  6. Host testfaultdomain.com not found: 3(NXDOMAIN)

For more information, check internet live stats

Remove government domains (.gov) and other related TLDs from BlackWeb.

Input:

  1. .argentina.gob.ar
  2. .mydomain.com
  3. .gob.mx
  4. .gov.uk
  5. .navy.mil

Output:

  1. .mydomain.com

Run Squid-Cache with BlackWeb

Run Squid-Cache with BlackWeb and any error sends it to SquidError.txt on your desktop.

Check execution (/var/log/syslog)

  1. BlackWeb: Done 06/05/2023 15:47:14

Important about BlackWeb Update

  • The default path of BlackWeb is /etc/acl. You can change it for your preference.
  • bwupdate.sh includes lists of remote support related domains (Teamviewer, Anydesk, logmein, etc) and web3 domains. They are commented by default (unless their domains are in SOURCES). To block or exclude them you must activate the corresponding lines in the script (# JOIN LIST), although it is not recommended to avoid conflicts or false positives.
  • If you need to interrupt the execution of bwupdate.sh (ctrl + c) and it stopped at the DNS Loockup part, it will restart at that point. If you stop it earlier, you will have to start from the beginning or modify the script manually so that it starts from the desired point.
  • If you use aufs, temporarily change it to ufs during the upgrade, to avoid: ERROR: Can't change type of existing cache_dir aufs /var/spool/squid to ufs. Restart required.

SOURCES


BLOCKLISTS

Active

Inactive, Offline, Discontinued or Private

DEBUG LISTS

WORKTOOLS


NOTICE


  • This project includes third-party components.
  • Changes must be proposed via Issues. Pull Requests are not accepted.
  • BlackWeb is designed exclusively for Squid-Cache and due to the large number of blocked domains it is not recommended to use it in other environments (DNSMasq, Pi-Hole, etc.), or add it to the Windows Hosts File, as it could slow down or crash it. Use it at your own risk. For more information check Issue 10
  • Blackweb is NOT a blacklist service itself. It does not independently verify domains. Its purpose is to consolidate and reformat public blacklist sources to make them compatible with Squid.
  • If your domain appears in Blackweb and you believe this is an error, you should review the public sources SOURCES, to identify where it is listed and contact the maintainer of that list to request its removal. Once the domain is removed from the upstream source, it will automatically disappear from Blackweb in the next update.
    You can also use the following script to perform the same verification:
  1. wget https://raw.githubusercontent.com/maravento/blackweb/refs/heads/master/bwupdate/tools/checksources.sh
  2. chmod +x checksources.sh
  3. ./checksources.sh

e.g:

  1. [?] Enter domain to search: kickass.to
  2. [*] Searching for 'kickass.to'...
  3. [+] Domain found in: https://github.com/fabriziosalmi/blacklists/releases/download/latest/blacklist.txt
  4. [+] Domain found in: https://hostsfile.org/Downloads/hosts.txt
  5. [+] Domain found in: https://raw.githubusercontent.com/blocklistproject/Lists/master/everything.txt
  6. [+] Domain found in: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/domains/ultimate.txt
  7. [+] Domain found in: https://raw.githubusercontent.com/Ultimate-Hosts-Blacklist/Ultimate.Hosts.Blacklist/master/hosts/hosts0
  8. [+] Domain found in: https://sysctl.org/cameleon/hosts
  9. [+] Domain found in: https://v.firebog.net/hosts/Kowabit.txt
  10. Done

STARGAZERS


Stargazers

CONTRIBUTIONS


We thank all those who have contributed to this project. Those interested can contribute, sending us links of new lists, to be included in this project.

Special thanks to: Jhonatan Sneider

SPONSOR THIS PROJECT


Image

PROJECT LICENSES


GPL-3.0
CC BY-NC-ND 4.0

DISCLAIMER


THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

OBJECTION


Due to recent arbitrary changes in computer terminology, it is necessary to clarify the meaning and connotation of the term blacklist, associated with this project:

In computing, a blacklist, denylist or blocklist is a basic access control mechanism that allows through all elements (email addresses, users, passwords, URLs, IP addresses, domain names, file hashes, etc.), except those explicitly mentioned. Those items on the list are denied access. The opposite is a whitelist, which means only items on the list are let through whatever gate is being used. Source Wikipedia)

Therefore, blacklist, blocklist, blackweb, blackip, whitelist and similar, are terms that have nothing to do with racial discrimination.