From 79103e1a350a59416efefa569abfa82654dddc6b Mon Sep 17 00:00:00 2001 From: Jeremy Stanley Date: Fri, 5 Jan 2024 17:04:05 +0000 Subject: [PATCH] Update our Gitea robots.txt from gitea.com's We've experienced some runaway growth of Gitea archive cache files on one of our backends, which according to upstream is often caused by web crawlers indexing the archive URLs. They recommended updating our robots.txt to the current state of https://gitea.com/robots.txt in order to help mitigate the issue. I've kept things we expressly commented out before still commented out, or anything that seems similar to what we commented out on the assumption that the reasons would carry over. After some discussion in IRC, we also decided it would make sense to disallow /avatars and /user/* like they do. Change-Id: I2b43b89de08c9a9d170e1ecbd14b1e6336fd2c84 --- docker/gitea/custom/robots.txt | 73 ++++++++++++++++++++++++++++++---- 1 file changed, 65 insertions(+), 8 deletions(-) diff --git a/docker/gitea/custom/robots.txt b/docker/gitea/custom/robots.txt index 0c7127359f..ecf401ffdf 100644 --- a/docker/gitea/custom/robots.txt +++ b/docker/gitea/custom/robots.txt @@ -3,6 +3,7 @@ # and # https://github.com/robots.txt # at 2020-07-01 +# and https://gitea.com/robots.txt on 2024-01-05 # # Some commented out items are left to indicate we have considered # them and would like to explicitly allow them for indexing while they @@ -10,26 +11,82 @@ User-agent: * -# Disallow: /avatars -# Disallow: /user/* +Disallow: /api/* +Disallow: /avatars +Disallow: /user/* + # Disallow: /*/*/src/commit/* # Disallow: /*/*/commit/* +# Disallow: /*/*/*/refs/* +Disallow: /*/*/*/star +Disallow: /*/*/*/watch +Disallow: /*/*/labels Disallow: /*/*/activity/* -Disallow: /vendor/librejs.html -Disallow: /api/swagger +Disallow: /vendor/* Disallow: /swagger.*.json # Language spam Disallow: /*?lang= -# From github -Disallow: */archive/ -Disallow: */blame/ +# from Github, to be cleaned +Allow: /*/*/tree/master +Allow: /*/*/blob/master +Disallow: /*/*/pulse +Disallow: /*/*/tree/* +Disallow: /*/*/blob/* +Disallow: /*/*/wiki/*/* +Disallow: /gist/*/*/* +Disallow: /oembed +Disallow: /*/forks +Disallow: /*/stars +Disallow: /*/download +Disallow: /*/revisions +Disallow: /*/*/issues/new +Disallow: /*/*/issues/search +Disallow: /*/*/commits/*/* +Disallow: /*/*/commits/*?author +Disallow: /*/*/commits/*?path +Disallow: /*/*/branches +Disallow: /*/*/tags +Disallow: /*/*/contributors +Disallow: /*/*/comments +Disallow: /*/*/stargazers +Disallow: /*/*/search +Disallow: /*/tarball/ +Disallow: /*/zipball/ +Disallow: /*/*/archive/ + # Disallow: /raw/* + +Disallow: /*/followers +Disallow: /*/following +Disallow: /stars/* +Disallow: /*/blame/ +Disallow: /*/watchers +Disallow: /*/network +Disallow: /*/graphs + +# Disallow: /*/raw/ + +Disallow: /*/compare/ +Disallow: /*/cache/ +Disallow: /*/*/blame/ +Disallow: /*/*/watchers +Disallow: /*/*/network +Disallow: /*/*/graphs + +# Disallow: /*/*/raw/ + +Disallow: /*/*/compare/ +Disallow: /*/*/cache/ Disallow: /.git/ -Disallow: */.git/ +Disallow: /*/.git/ Disallow: /*.git$ +Disallow: /*/sitemap.xml +Disallow: /search/advanced +Disallow: /search Disallow: /*q= +Disallow: /*.atom Crawl-delay: 2