Basic forum SEO: Start with the simple things

djbaxter

Tazmanian Veteran
Joined
Jun 6, 2006
Messages
10,473
minstrel submitted a new Article:

Basic forum SEO: Start with the simple things

Contrary to what you will read in most forum threads on this subject, the starting point for forum SEO is not in rewriting URLs or other add-ons or SEO packages. It's not that these cannot benefit your forum. It's more that there are a number of basic things you can and should do before you even begin to think about more advanced SEO strategies.

The most neglected part of forum SEO, regardless of the specific forum software employed, that I see when reviewing or happening upon new forums is in what I see on the main forum index page. Remember this:

  1. If search engine spiders can't see it, search engines can't index it.
  2. If Guests (or non-logged in members) can't see it, neither can search engine spiders.
First, look at your home page, if you have one that's different from the forum index. Bear in mind that all spiders can "see" and index is text. What is there on your home page to index? Make sure that your most important search terms are included in the forum description on this page, in the page title, and in the meta description tag.

Then go to the actual forum index page. Look at your category and forum titles. Look at your category and forum descriptions. How much do you see written there for search engine spiders to index that will constitute search terms people will actually use to find your forum? Do they also include your most important search terms? Fix those descriptions. Make them descriptive and informative not only for logged in members who already know what your forum is about, but for searchers who will only see what the spiders see.

Now go to the individual forum pages where you'll find a list of threads. Start by looking at the structure of your title. It should have the following basic format:

Code:
Forum title :: Name of your forum
Most forum software will do it the other way around out of the box, but this is generally easy to fix with a quick template edit.

If you are using vBulletin, there is a built-in standard option for displaying forum descriptions at the top of the forum listing pages. Enable that.

Now go to the individual thread pages - the ones where you see the actual posts in sequence. Again, ensure that the structure of your title has the following basic format:

Code:
Thread title :: Forum title :: Name of your forum

or

Code:
Thread title :: Name of your forum
Beyond the Basics
If, and...

Read more about this article here...
 
Last edited by a moderator:

Jim McClain

Senior Citizen
Joined
Jan 31, 2006
Messages
2,006
It's not hard to do well with one or two particular search terms. The hard part is doing well with a lot of different search terms that all are relevant to your site. If I was interested in learning secrets, it wouldn't be about 10 of them, or even ten of them. It would be a specific secret or secrets about a specific thing. like "secrets of MangAnime" or "video recording secret" or similar.

Look for the keywords on your pages and see how well they do as search terms. If they don't do well and they are important to your site, then work on adding content to those pages that use those terms.

Jim
 

Erox

Enthusiast
Joined
Mar 22, 2007
Messages
140
Minstrel:

What are your thoughts on excluding select vBulletin files from your robots.txt file?

ex: (This is only a partial list of my Robots.txt)
Code:
Disallow: /ajax.php
Disallow: /ak_spam.php
Disallow: /all_donors.php
Disallow: /ajax.php
Disallow: /calendar.php
Disallow: /editpost.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php
Disallow: /info.txt
Disallow: /admincp/
Disallow: /attachments/
Disallow: /cache/
Disallow: /cgi-bin/
I've read a tutorial elsewhere recommending that. It makes no sense to me for spiders to waste their time on these files. My memberlist is only open to members only and I understand these bots can get all tied up in the calendar.
 

djbaxter

Tazmanian Veteran
Joined
Jun 6, 2006
Messages
10,473
I do that. And previously, I also used robots.txt with phpBB and SMF files to exclude files and directories I didn't want spidered.

It should be the first line of defense against "duplicate content" if that concerns you.

Mine currently looks like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /misc/
Disallow: /backup/
Disallow: /admincp/
Disallow: /announcement.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /faq.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /misc.php
Disallow: /modcp/
Disallow: /moderator.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /subscriptions.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: ajax.php
Disallow: attachment.php
Disallow: calendar.php
Disallow: cron.php
Disallow: editpost.php
Disallow: global.php
Disallow: image.php
Disallow: inlinemod.php
Disallow: joinrequests.php
Disallow: login.php
Disallow: member.php
Disallow: memberlist.php
Disallow: misc.php
Disallow: moderator.php
Disallow: newattachment.php
Disallow: newreply.php
Disallow: newthread.php
Disallow: online.php
Disallow: poll.php
Disallow: postings.php
Disallow: printthread.php
Disallow: private.php
Disallow: profile.php
Disallow: register.php
Disallow: report.php
Disallow: reputation.php
Disallow: search.php
Disallow: sendmessage.php
Disallow: showgroups.php
Disallow: subscription.php
Disallow: threadrate.php
Disallow: usercp.php
Disallow: usernote.php
Disallow: spiders.php

User-Agent: msnbot
Crawl-Delay: 10

User-Agent: Slurp
Crawl-Delay: 10
 
Last edited by a moderator:

antivirus

Aspirant
Joined
Sep 17, 2005
Messages
10
Thanks for the effort put into this article minstrel, much appreciated!

:tiphat:
 

djbaxter

Tazmanian Veteran
Joined
Jun 6, 2006
Messages
10,473
Thanks for the effort put into this article minstrel, much appreciated!

:tiphat:

Thank you and welcome to TAZ, antivirus! :)

And thank you for your wonderful EZ Bounce add-on. I cannot tell you how much time this has saved me since I installed it.
 

antivirus

Aspirant
Joined
Sep 17, 2005
Messages
10
Thank you and welcome to TAZ, antivirus! :)
And thank you for your wonderful EZ Bounce add-on. I cannot tell you how much time this has saved me since I installed it.

Oh, lol - glad you're finding it useful. I have aquestion about your robots.txt file if you don't mind. I'm just learning about SEO and crawling, etc... Is there a reason why you have some Disallow entries with and without the / prefix? For instance:

Disallow: /usercp.php
Disallow: usercp.php

Thanks!
 

djbaxter

Tazmanian Veteran
Joined
Jun 6, 2006
Messages
10,473
Oh, lol - glad you're finding it useful. I have aquestion about your robots.txt file if you don't mind. I'm just learning about SEO and crawling, etc... Is there a reason why you have some Disallow entries with and without the / prefix? For instance:

Disallow: /usercp.php
Disallow: usercp.php

Thanks!

Yes. :eek:

The reason for those duplicate entries is that I'm not certain if it makes a difference and so I opted for better safe than sorry. I have long meant to do some research into whether or not it matters but it hasn't made it yet to the top of my priority list.

Part of the concern/question for me is that I have my forum installed as a subdomain, with the robots.txt file in the root of the subdomain (and if memory serves also in the primary root of the domain with a different directory structure). I don't know what effect that has and again have long intended to research the issue.

So the truth is, it's a temporary measure - or was supposed to be. :)
 

antivirus

Aspirant
Joined
Sep 17, 2005
Messages
10
Yes. :eek:
The reason for those duplicate entries is that I'm not certain if it makes a difference and so I opted for better safe than sorry. I have long meant to do some research into whether or not it matters but it hasn't made it yet to the top of my priority list. So the truth is, it's a temporary measure - or was supposed to be. :)

Ah, I see - ok then. Our site has a similar directory structure: root/forum/etc...
 

djbaxter

Tazmanian Veteran
Joined
Jun 6, 2006
Messages
10,473
Ah, I see - ok then. Our site has a similar directory structure: root/forum/etc...


But mine is a subDOMAIN not a subFOLDER, which is where I got confused.

In a subdomain, the root is where the forum is.

In a subfolder, the root is the primary domain. So robots.txt shluld go there and your robots.txt lines should look like:

Code:
User-agent: *
Disallow: /forum/images/
Disallow: /forum/admincp/
Disallow: /forum/announcement.php
Disallow: /forum/calendar.php

etc.
 

antivirus

Aspirant
Joined
Sep 17, 2005
Messages
10
But mine is a subDOMAIN not a subFOLDER, which is where I got confused.

In a subdomain, the root is where the forum is.

In a subfolder, the root is the primary domain. So robots.txt shluld go there and your robots.txt lines should look like:

Code:
User-agent: *
Disallow: /forum/images/
Disallow: /forum/admincp/
Disallow: /forum/announcement.php
Disallow: /forum/calendar.php

etc.

Ah, ok thanks for clearing that up for me
 

Orp

ForumsForums.com
Joined
Apr 26, 2004
Messages
1,043
Good article Minstrel!! Thanks! :D

The robots.txt file has been on my to do list for to long. With your help here I should 'get er done' this weekend.

Question as to what the effect of these line in your robots.txt file:
User-Agent: msnbot
Crawl-Delay: 10

User-Agent: Slurp
Crawl-Delay: 10

And why did you pick those two specifically?
 

djbaxter

Tazmanian Veteran
Joined
Jun 6, 2006
Messages
10,473
Question as to what the effect of these line in your robots.txt file:
User-Agent: msnbot
Crawl-Delay: 10

User-Agent: Slurp
Crawl-Delay: 10

And why did you pick those two specifically?

Yahoo (Slurp) is particularly aggressive at spidering. The crawl-delay directive is supposed to slow it down a tad (theoretically). MSN Search also follows this directive.

The above code is specific to MSN's spider, "MSNBot", and Yahoo's spider, "Slurp", and instructs the spiders to wait the specified amount of time, in seconds (10 seconds above, default is 1 second if not specified) before requesting another page from your site. MSNBot and Slurp have been known to index some sites very heavily, and this allows webmasters to slow down their indexing speed.

Googlebot is normally better behaved by default and so doesn't require this.
 

Orp

ForumsForums.com
Joined
Apr 26, 2004
Messages
1,043
Thanks again Minstrel.

Is the advantage of using robots.txt so that the spider doesn't get stuck in an area where they have no permissions? I see spiders stuck at a dead end (the stop sign showing) all the time.

Also, how do I know for sure that they are using my robots.txt file? Maybe ...no more stop signs??? :D
 

Erox

Enthusiast
Joined
Mar 22, 2007
Messages
140
Oh, lol - glad you're finding it useful. I have aquestion about your robots.txt file if you don't mind. I'm just learning about SEO and crawling, etc... Is there a reason why you have some Disallow entries with and without the / prefix? For instance:

Disallow: /usercp.php
Disallow: usercp.php

Thanks!

Yes. :eek:

The reason for those duplicate entries is that I'm not certain if it makes a difference and so I opted for better safe than sorry. I have long meant to do some research into whether or not it matters but it hasn't made it yet to the top of my priority list.
I would assume since the directories needed the backslash in the .htaccess files
Code:
Disallow: /cgi-bin/
That the files in the root folder would too...
Disallow: /usercp.php
I never gave that a second thought until antivirus brought it up. And now minstrel isn't sure which one works....

Search engines are no longer seen accessing these files on my WOL so I don't know for sure...

I confuse easily!!! Now I am wracked with self-doubt!!!!! HELP!!!! :bonk:
 

Jim McClain

Senior Citizen
Joined
Jan 31, 2006
Messages
2,006
The robots.txt file belongs only in the domain root. A sub-domain is considered a root domain by many search spiders/robots, so you would have a robots.txt file in the main domain and in the sub-domain. Spiders will not process directives of a robots.txt file in sub-folders. That isn't to say it won't read the file, like it does any other file in other folders, but it will process no commands or directives there.

When disallowing a file/page of your domain that resides in the same folder with the robots.txt, you refer to it as /file.ext. That is the path from that location to the file or folder. If it resides in a sub-folder, the directive is /sub-folder/file.ext. Some, but not all spiders allow the wildcard character (asterisk) on the disallow line. For those spiders that do not allow it, the line will be ignored.

To disallow all spiders from crawling any files of folders within a folder, use these 2 lines:
Code:
User-agent: *
Disallow: /folder/
To disallow all spiders from accessing folders or files that start with a specific group of alpha-numeric characters, leave the trailing slash off:
Code:
User-agent: *
Disallow: /fold
This effectively acts as a wildcard, keeping spiders out of any folder or file that starts with "fold", such as the folder, /folderama/ and the file located within the same folder as robots.txt named /folding_hands.php

There are some spiders that will follow Allow: directives, but it is the default to crawl any part of a site that is not disallowed. Using the allow directive does not guarantee spiders will crawl that folder or file in particular.

It is a mistake to list files or folders that are secret. Most robots.txt files can easily be downloaded by any unscrupulous spider looking for data that will reveal your vulnerabilities. If you have log files or backups, you should be keeping these off the domain name path anyway, or at least password protecting them. It is common for webmasters to rename their vBulletin admincp and modcp folders to make it harder for snoops and hackers, but if you put them in your robots.txt files to prevent spiders, you are announcing to the snoops and hackers what you renamed those folders to. I know, I did it myself. :hopeless:

Anyway, there's lots of information on robots.txt files available, and the very best place to get search engine specific information, is by going to the search engine and looking there. Begin by reading the robots.txt file the search engine uses and then search for their own help pages on crawlers and spiders. I use terms like <search engine> +robots.txt (replace <search engine> with the name of the search engine of your choice). And please do remember that robots.txt is case and spelling specific. It is NOT robot.txt or Robots.txt, it is just robots.txt. Additionally, your folders and files are case sensitive - use the same spelling and case as it exists on your server.

Many spiders comply with the Robots Exclusion Standard. Yahoo! Slurp uses the 1996 Robots Exclusion Standard.

R'gards,

Jim
 

comperr

Master Admin
Joined
Aug 11, 2006
Messages
1,124
another basic tip: Meta tags - though ignored by some SEs, it can help with others.
 
Top