TP-Docs
HTML5 Icon HTML5 Icon HTML5 Icon
TP on Social Media

Recent

Welcome to TinyPortal. Please login or sign up.

October 06, 2024, 08:44:56 PM

Login with username, password and session length
Members
Stats
  • Total Posts: 195,395
  • Total Topics: 21,244
  • Online today: 162
  • Online ever: 3,540 (September 03, 2022, 01:38:54 AM)
Users Online
  • Users: 1
  • Guests: 60
  • Total: 61
  • @rjen

Robots.txt, need additional entries for TP

Started by kripz, July 22, 2008, 09:21:55 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

kripz

Im trying to reduce bandwidth/load created by bots.
What other entries for TP can i add?

Ive decided to use robots.txt from Youposted.com

###################################
# YouPosted.com Smart Robots v3.05
###################################
# This is a smart robots.txt which logs the ip and user agent of every visitor.
# Due to the compatibility issues between different bots and whether they support
# wildcards (*), multiple user-agents and end-anchors ($), I am providing different
# blocks for some.
#
# Detected Spider/Bot: None
#
# Headers Sent:
# Content-Type: text/plain
# Expires: Tue, 22 Jul 2008 20:15:57 GMT (12 hour validity)
#
# My Sitemap - I don't provide it just for the fun of it
Sitemap: http://www.youposted.com/sitemap.xml

# Google - Most Important bot
# Unfortunately a robots.txt will only stop it crawling certain urls, and NOT adding any
# urls which it comes across into its index. So we're relying on a meta noindex tag.
User-agent: Googlebot
# Don't index mobile versions
Disallow: /index.php?*;wap
Disallow: /index.php?*;wap2
Disallow: /index.php?*;imode

# Yahoo - Too aggressive
# So limit it as much as possible.
User-agent: Slurp
# Disallow Everything
Disallow: /
# Now allow bits and then disallow bits
Allow: /sitemap.xml$
Allow: /robots.txt$
Allow: /index.php$
Allow: /index.php?topic=*.0$
Allow: /index.php?topic=*.*0$
Allow: /index.php?topic=*.*5$
Allow: /index.php?board=*.0$
Allow: /index.php?board=*.*0$
Allow: /index.php?board=*.*5$
# But don't allow these
Disallow: /index.php?*.msg
Disallow: /index.php?topic=*.msg*0$
Disallow: /index.php?topic=*.msg*5$
Disallow: /index.php?*.new
# Anything with a ; disallow
Disallow: /index.php?*;*
# Arcade Related
Allow: /index.php?action=arcade$
Allow: /index.php?action=stats$
Allow: /index.php?action=arcade;sa=play;game=

# Bad bot - Often ignores robots.txt - Waste of bandwidth
# Despite claiming on their website to be a search engine in development
# I'm suspicious as to whether they are a harvester pretending to be SE
User-agent: Twiceler
Disallow: /

User-Agent: W3C-checklink
Disallow: /

# Stop following PHPSESSID's
User-Agent: MJ12bot
Disallow: /index.php?PHPSESSID

# Catch all (remainder)
# Will be followed by any bots other than ones identified above
# Uses BASIC robots.txt directives without wildcards, end-anchors etc
# So Spiders should understand these (including MSNBOT)
User-agent: *
# Default SMF Folders
Disallow: /attachments/
Disallow: /Packages/
Disallow: /Smileys/
Disallow: /Sources/
Disallow: /Themes/
# Default SMF Actions
Disallow: /index.php?action=activate
Disallow: /index.php?action=admin
Disallow: /index.php?action=calendar
Disallow: /index.php?action=emailuser
Disallow: /index.php?action=findmember
Disallow: /index.php?action=help
Disallow: /index.php?action=helpadmin
Disallow: /index.php?action=login
Disallow: /index.php?action=logout
Disallow: /index.php?action=mlist
Disallow: /index.php?action=modifykarma
Disallow: /index.php?action=pm
Disallow: /index.php?action=post
Disallow: /index.php?action=printpage
Disallow: /index.php?action=profile
Disallow: /index.php?action=recent
Disallow: /index.php?action=register
Disallow: /index.php?action=reminder
Disallow: /index.php?action=search
Disallow: /index.php?action=theme
Disallow: /index.php?action=unread
Disallow: /index.php?action=unreadreplies
Disallow: /index.php?action=verificationcode
Disallow: /index.php?action=who
Disallow: /index.php?theme
# SMF Mod Related
Disallow: /archive.php
Disallow: /index.php?action=blog
Disallow: /index.php?action=viewblog
Disallow: /index.php?action=chess
Disallow: /index.php?action=comment
Disallow: /index.php?action=downloads
Disallow: /index.php?action=links
Disallow: /index.php?action=reporttm
Disallow: /index.php?action=recenttopics
Disallow: /index.php?action=mm
Disallow: /index.php?action=sitemap
Disallow: /index.php?action=staff
Disallow: /index.php?action=tags
Disallow: /index.php?action=thankyou
Disallow: /index.php?action=viewkarma
Disallow: /index.php?action=viewers
Disallow: /index.php?f=
Disallow: /index.php?filter
Disallow: /index.php?referredby
Disallow: /Games/
Disallow: /Downloads/
Disallow: /index.php?action=arcade;favorites
Disallow: /index.php?action=arcade;sa=highscore
Disallow: /index.php?action=arcade;sa=play;random
Disallow: /index.php?action=arcade;category
Disallow: /index.php?action=arcade;sort
Disallow: /index.php?action=arcade;stats
Disallow: /index.php?action=stats;expand
Disallow: /index.php?action=stats;collapse
# Tiny portal


Things off the top of my head:

TP folder, TP Downloads folder...