Hi fellow admins,
Just wanted to share a quick tip about reducing server load.
Before implementing these Cloudflare rules, my server load (on a 4-core Ubuntu box) was consistently between 3.5 and 4.0. Now, it’s running much smoother, around 0.3 server load.
The best part? It only took 3 rules, and they all work with the Cloudflare free plan.
The order of the rules are important, so please pay attention.
Allowlist
This rule is set first in order to avoid friction with other fediverse servers and good crawlers. Use the action Skip
(http.user_agent contains "Observatory") or
(http.user_agent contains "FediFetcher") or
(http.user_agent contains "FediDB/") or
(http.user_agent contains "+fediverse.observer") or
(http.user_agent contains "FediList Agent/") or
(starts_with(http.user_agent, "Blackbox Exporter/")) or
(http.user_agent contains "Lestat") or
(http.user_agent contains "Lemmy-Federation-Exporter") or
(http.user_agent contains "lemmy-stats-crawler") or
(http.user_agent contains "lemmy-explorer-crawler/") or
(starts_with(http.user_agent, "Lemmy/")) or
(starts_with(http.user_agent, "PieFed/")) or
(http.user_agent contains "Mlmym") or
(http.user_agent contains "Photon") or
(http.user_agent contains "Boost") or
(starts_with(http.user_agent, "Jerboa")) or
(http.user_agent contains "Thunder") or
(http.user_agent contains "VoyagerApp/") or
(cf.verified_bot_category in {
"Search Engine Crawler"
"Search Engine Optimization"
"Monitoring & Analytics"
"Feed Fetcher"
"Archiver"
"Page Preview"
"Academic Research"
"Security"
"Accessibility"
"Webhooks"
}
and http.host ne "old.lemmy.eco.br"
and http.host ne "photon.lemmy.eco.br"
) or
(http.user_agent contains "letsencrypt"
and http.request.uri.path contains "/.well-known/acme-challenge/"
) or
(starts_with(http.request.full_uri, "https://lemmy.eco.br/pictrs/") and
http.request.method eq "GET" and not
starts_with(http.user_agent, "Mozilla") and not
ip.src.asnum in {
200373 198571 26496 31815 18450 398101 50673 7393 14061
205544 199610 21501 16125 51540 264649 39020 30083 35540
55293 36943 32244 6724 63949 7203 201924 30633 208046 36352
25264 32475 23033 31898 210920 211252 16276 23470 136907
12876 210558 132203 61317 212238 37963 13238 2639 20473
63018 395954 19437 207990 27411 53667 27176 396507 206575
20454 51167 60781 62240 398493 206092 63023 213230 26347
20738 45102 24940 57523 8100 8560 6939 14178 46606 197540
397630 9009 11878 49453 29802
})
- The User Agent contains the name of known Fediverse crawlers, and monitoring tools (e.g., “Observatory”, “FediFetcher”, “lemmy-stats-crawler”).
- The User Agent contains the name of known Lemmy mobile and frontends (e.g., “Jerboa”, “Boost”, “VoyagerApp”).
- The request comes from Cloudflare-verified bots in specific categories (like “Search Engine Crawler” or “Monitoring & Analytics”) and is not targeting the specific hosts “old.lemmy.eco.br” or “photon.lemmy.eco.br” where I host alternative frontends.
- The request is a Let’s Encrypt challenge for the domain (used for SSL certificate renewal).
- The request is a specific type of GET request to the “pictrs” image server that does not come from a standard web browser (a User Agent starting with “Mozilla”) and does not originate from a list of specified Autonomous System Numbers (ASNs), this ASNs are all from VPSs providers, so no excuse for browsers UA.
Blocklist
This list blocks the majority of bad crwalers and bots. Use the action Block
(cf.verified_bot_category in {"AI Crawler"}) or
(ip.src.country in {"T1"}) or
(starts_with(http.user_agent, "Mozilla/") and
http.request.version in {"HTTP/1.0" "HTTP/1.1" "HTTP/1.2" "SPDY/3.1"} and
any(http.request.headers["accept"][*] contains "text/html")) or
(http.user_agent wildcard r"HeadlessChrome/*") or
(
http.request.uri.path contains "/xmlrpc.php" or
http.request.uri.path contains "/wp-config.php" or
http.request.uri.path contains "/wlwmanifest.xml"
) or
(ip.src.asnum in {
200373 198571 26496 31815 18450 398101 50673 7393 14061
205544 199610 21501 16125 51540 264649 39020 30083 35540
55293 36943 32244 6724 63949 7203 201924 30633 208046 36352
25264 32475 23033 31898 210920 211252 16276 23470 136907
12876 210558 132203 61317 212238 37963 13238 2639 20473
63018 395954 19437 207990 27411 53667 27176 396507 206575
20454 51167 60781 62240 398493 206092 63023 213230 26347
20738 45102 24940 57523 8100 8560 6939 14178 46606 197540
397630 9009 11878 49453 29802
}
and http.user_agent wildcard r"Mozilla/*"
) or
(http.request.uri.path ne "/robots.txt") and
((http.user_agent contains "Amazonbot") or
(http.user_agent contains "Anchor Browser") or
(http.user_agent contains "Bytespider") or
(http.user_agent contains "CCBot") or
(http.user_agent contains "Claude-SearchBot") or
(http.user_agent contains "Claude-User") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "FacebookBot") or
(http.user_agent contains "Google-CloudVertexBot") or
(http.user_agent contains "GPTBot") or
(http.user_agent contains "meta-externalagent") or
(http.user_agent contains "Novellum") or
(http.user_agent contains "PetalBot") or
(http.user_agent contains "ProRataInc") or
(http.user_agent contains "Timpibot")
) or
(ip.src.asnum eq 32934)
- The request comes from Cloudflare-verified "AI Crawler"s.
- The request originates from a Tor exit node (country code “T1”), it is a Tor heavy tier.
- The request uses a Mozilla browser User Agent with an older HTTP version and accepts HTML content, in 2025 it is super weird, all bots.
- The User Agent is HeadlessChrome, hence bot.
- The request path targets common WordPress vulnerability endpoints (
/xmlrpc.php,/wp-config.php,/wlwmanifest.xml). - The request originates from a specific list of Autonomous System Numbers (ASNs) and uses a Mozilla User Agent. Again, more bots.
- The request is not for
/robots.txtand the User Agent contains the name of known crawlers or bots (e.g., “GPTBot”, “Bytespider”, “FacebookBot”). - The request originates from Autonomous System Number 32934 (Facebook).
Challenge
This one is to protect the frontends, I added some conditions in order to not make logged users verify with cloudflare. Normally a crawler won’t have an user account. Set the action to Managed Challenge.
(http.host eq "old.lemmy.eco.br" and not len(http.request.cookies["jwt"]) > 0)
or (http.host eq "photon.lemmy.eco.br"
and not len(http.request.headers["authorization"]) > 0
and not starts_with(http.cookie, "ph_phc"))
or (http.host wildcard "lemmy.eco.br"
and not len(http.request.cookies["jwt"]) > 0
and not len(http.request.headers["authorization"]) > 0
and starts_with(http.user_agent, "Mozilla")
and not http.referer contains "photon.lemmy.eco.br")
or (http.user_agent contains "yandex"
or http.user_agent contains "sogou"
or http.user_agent contains "semrush"
or http.user_agent contains "ahrefs"
or http.user_agent contains "baidu"
or http.user_agent contains "python-requests"
or http.user_agent contains "neevabot"
or http.user_agent contains "CF-UC"
or http.user_agent contains "sitelock"
or http.user_agent contains "mj12bot"
or http.user_agent contains "zoominfobot"
or http.user_agent contains "mojeek")
or ((http.user_agent contains "crawl"
or http.user_agent contains "spider"
or http.user_agent contains "bot")
and not cf.client.bot)
or (ip.src.asnum in {135061 23724 4808}
and http.user_agent contains "siteaudit")
- A request to the host “old.lemmy.eco.br” that does not have a “jwt” cookie.
- A request to the host “photon.lemmy.eco.br” that lacks both an “Authorization” header and a cookie starting with “ph_phc”.
- A request to any subdomain of “lemmy.eco.br” that lacks both a “jwt” cookie and an “Authorization” header, uses a Mozilla User Agent, and does not have a referrer from “photon.lemmy.eco.br”.
- The User Agent contains the name of a specific crawler, bot, or tool (e.g., “yandex”, “baidu”, “python-requests”, “sitelock”).
- The User Agent contains the words “crawl”, “spider”, or “bot” but is not a verified Cloudflare-managed bot.
- The request originates from specific Autonomous System Numbers (135061, 23724, 4808) and the User Agent contains the word “siteaudit”.
All these are heavily inspired by this article: https://urielwilson.com/a-practical-guide-to-custom-cloudflare-waf-rules/
Please let me know your thoughts.


Yeah, I know, still it is a lot better than lots of bots loading my server, and it is free, I am an unemployed loser in south america, can’t afford anything really.
I think it’s pretty reckless to give a company that is almost certainly connected to American military intelligence access to your users communications and connection data.
If you are brothered by bots, step one robots.txt, then you can still block crawlers in the config of your own webserver just like did on cloudflare, and then you can roll out tools like Anubis or iocaine to frustrate bots further.
I get your point, but any solution would still need to block threats at the server level, and that costs server load. Honestly, we’re too small for the U.S. military to even notice us. Plus, the free CDN is a great benefit.
My real work for the revolution happens on the streets, and there’s very little any intelligence agency can do to stop that.