Welcome, Guest. Please Login or Register.
November 19, 2024, 05:48:17 PM
Home Help Search Log in Register
News: SMF is the next generation in forum software, almost completely re-written from the ground up, make sure you don't fall for cheap imitations that suffer from feature bloat!

YaBB SE Community  |  Development  |  Mod Ideas and Creation  |  search engine spidering « previous next »
Pages: 1 2 [3] 4 Reply Ignore Print
Author Topic: search engine spidering  (Read 2314 times)
cyc
Noobie
*
Posts: 30


I love YaBB SE!

WWW
Re:search engine spidering
« Reply #30 on: December 06, 2002, 11:00:27 AM »
Reply with quote

Quote from: Scotty_B on November 30, 2002, 11:37:00 PMim just curious as to why its spidered a few threads but not all of them  ???

Who knows.....it bloody annoys me  ;)
There is an interesting post over at http://www.vbulletin.com/forum/showthread.php?threadid=17167

cheers
« Last Edit: December 06, 2002, 10:43:03 PM by cyc » Logged

cyc
Noobie
*
Posts: 30


I love YaBB SE!

WWW
Re:search engine spidering
« Reply #31 on: January 17, 2003, 10:56:40 AM »
Reply with quote

Quote from: Scotty_B on November 30, 2002, 11:37:00 PMim just curious as to why its spidered a few threads but not all of them  ???

Scotty, the simple fix is
  • Convert your board to phpbb
  • Buy a vbulletin licence (its only a few $$)
  • Convert your phpbb to vbulletin (there is no straight conversion atm)
  • Install a simple hack for search engine friendly urls
  • Wait for google and co  

Two problems I founds were:
1) Passwds were lost, but this was to be expected
2) Alot of post data now contains invalid html tags, this wont matter much in the long term (plus there is a simple fix)

Total time spent.....less than one hour  ;)

cheers!
Logged

Aquilo
The Black Llama
Sr. Member
****
Posts: 416


Would'nt you like to be a llama too?

WWW
Re:search engine spidering
« Reply #32 on: January 17, 2003, 11:17:16 AM »
Reply with quote

Quote from: Nemesis on November 11, 2002, 09:42:37 AMRewriteEngine on
RewriteBase /
RewriteRule ^index/(.*)\.html$ /index.php?board=$1

try this

This realy works! how cool! but how would you get YaBB to rewrite all the urls?? :(

http://webmail.xtram.com/index/features.htm
is realy
http://webmail.xtram.com/index.php?action=features

That is pritty niffty!
Logged

[td][/td]
[td]
[/td][td][/td][/table]
cyc
Noobie
*
Posts: 30


I love YaBB SE!

WWW
Re:search engine spidering
« Reply #33 on: January 17, 2003, 11:31:05 AM »
Reply with quote

Quote from: Aquilo on January 17, 2003, 11:17:16 AM
Quote from: Nemesis on November 11, 2002, 09:42:37 AMRewriteEngine on
RewriteBase /
RewriteRule ^index/(.*)\.html$ /index.php?board=$1

try this

This realy works! how cool! but how would you get YaBB to rewrite all the urls?? :(

http://webmail.xtram.com/index/features.htm
is realy
http://webmail.xtram.com/index.php?action=features

That is pritty niffty!

Yeah that works no problem, but you do take a performance hit when using .htaccess

The vbulletin hack only edits the urls which are displayed in the main forum table, not all urls. you would do this in forumdisplay.php or whatever file's look after that side of things.

cheers!
Logged

Ben_S
Disciple of Joe
Support Team
YaBB God
*****
Posts: 1586


I Love YaBB SE!

WWW
Re:search engine spidering
« Reply #34 on: January 17, 2003, 11:16:25 PM »
Reply with quote

Quote from: cyc on January 17, 2003, 10:56:40 AMScotty, the simple fix is
  • Convert your board to phpbb
  • Buy a vbulletin licence (its only a few $$)
  • Convert your phpbb to vbulletin (there is no straight conversion atm)
  • Install a simple hack for search engine friendly urls
  • Wait for google and co  

Two problems I founds were:
1) Passwds were lost, but this was to be expected
2) Alot of post data now contains invalid html tags, this wont matter much in the long term (plus there is a simple fix)

Total time spent.....less than one hour  ;)

cheers!

Why would I want to do that, I have no interest in paying for a licence to use vb on a site that is entirely not for profit.

Anyway if that were an option, then you have the problem of not having an accurate system that keeps track of threads. YSE keeps track server side VB rellys on cookes, which is about as usefull as a bucket with holes if you read a few threads and go away for an hour or so intending to come back and read the rest later, because then vb has marked everything as read.  ::)

How utterly annoying.
Logged
GauGau
Noobie
*
Posts: 18


Klugscheisserei grood recht!

WWW
Re:search engine spidering
« Reply #35 on: January 21, 2003, 08:03:44 AM »
Reply with quote

guys,

ever wondered why the search engines don't index urls containing a question mark or ampersand? Because they don't want to!!! And they know why! They have a problem with any dynamical page and see the "?" just as a hint that it's dynamical. Problems:
  • content: be honest, all of you run boards: 99% of everything published in boards is trash!
  • size: the databases of boards are huge, too much to index (given the little relevance of most content)
  • speed: most databases would break down if they were spidered completely
  • stability: dynamic pages change more often than static pages - this means more dead links for the search engine
Most of you know all this and yet decide to try to trick the spiders? The search engines are spammed by many other sites, why would we want to spam them too? They are internet's most valuable tools (if their database is "healthy"). Just because it can be done we shouldn't give a simple tool to all the kids running silly sites to make it easy for them to spam the search engines.
In my opinion the search engine spidering can only be done with "mod_rewrite" - but this is a very complex and complicated matter - let those who understand enough to handle mod_rewrite modify their forum accordingly (on their own), but don't you hand out a "copy and paste" solution to the newbies. :-[
Think about this :-\

GauGau
Logged
Fizzy
Full Member
***
Posts: 214


Re:search engine spidering
« Reply #36 on: March 31, 2003, 10:57:07 PM »
Reply with quote

Quote from: GauGau on January 21, 2003, 08:03:44 AM
guys,

ever wondered why the search engines don't index urls containing a question mark or ampersand? Because they don't want to!!! And they know why! They have a problem with any dynamical page and see the "?" just as a hint that it's dynamical. Problems:
  • content: be honest, all of you run boards: 99% of everything published in boards is trash!
  • size: the databases of boards are huge, too much to index (given the little relevance of most content)
  • speed: most databases would break down if they were spidered completely
  • stability: dynamic pages change more often than static pages - this means more dead links for the search engine
Most of you know all this and yet decide to try to trick the spiders? The search engines are spammed by many other sites, why would we want to spam them too? They are internet's most valuable tools (if their database is "healthy"). Just because it can be done we shouldn't give a simple tool to all the kids running silly sites to make it easy for them to spam the search engines.
In my opinion the search engine spidering can only be done with "mod_rewrite" - but this is a very complex and complicated matter - let those who understand enough to handle mod_rewrite modify their forum accordingly (on their own), but don't you hand out a "copy and paste" solution to the newbies. :-[
Think about this :-\

GauGau

:o

I can't believe I just read that :o

What the hell is the point of a support and help forum for us noobie morons if people like GauGau have the attitude that we are all mindless pillocks with forums consisting of nothing more that boring twaddle.

GauGau, for your information I run a health and disease support forum, totally non-profit with policies, regulations and procedures that would make your eyelashes curl and your 'GauGau T-shirts' shrink in the wash!
Quote
Lange angekündigt - jetzt ist es da: GauGau.de zum Kaufen! Genauer gesagt ein T-Shirt von GauGau.de, zu haben im neuen GauGau.de-Shop
Update: es sind jetzt 3 unterschiedliche Shirts im Angebot: für Buben, für Mädchen und für Zeckenzüchter (Kapuzenshirts)
Yep! That's wonderful content, really worthy of a good spidering!


We discuss indepth health and medical issues, review unsponsored products and advise thousands of sufferers of latest news, events, treatments, trials and biological research.

To get this information from the forum onto front pages is all but impossible so for my members and other sufferers the ability to find the site through a search engine after they search for the very disease that they have and we are discussing is vitally important.

GauGau, having looked at your site I think your attitude sucks. To have an opinion is one thing, but to decalre that others should not share info because it may undermine your own website is hypocritical and self-centred.

If you don't want to help others then that's your right, but to put others off from doing so really undermines the fact that there are many great programmers here who willingly put in hours and hours of sweat and toil to help people like me, simple noobies struggling through as best we can, trying to learn as we go. I for one am very grateful to all the mod writers for taking the time to do so.

Apologies to everyone else reading this, but that really wound me up. I don't know php all that well but I do know how to support people with life shattering diseases. If I took the same attitude as GauGau here then I'd keep all my info to myself and not share any of it.

Let the noobies suffer in ignorance.  >:(

* Fizzy kicks the cat and declares his little rant over and done with.
Logged
sensovision
Full Member
***
Posts: 100


WWW
Re:search engine spidering
« Reply #37 on: April 22, 2003, 11:28:29 PM »
Reply with quote

Hello Fizzy, I fully agree with you, it's kindly say just not fair to make such statements like was done by GauGau.

Quotecontent: be honest, all of you run boards: 99% of everything published in boards is trash!
GauGau if you believe that content of your board(people used to judge others comparing them to themselves) contain 99% of useless trash maybe it's really shouldn't be listed in engines, it's up to you, but everone have a right to post their information and to make it visible on the web and search engines are created to help you in this purpose. There is always two sides of the "coin" and if you try to hide one from SE telling that it's bad and not worth to look at it, you'll not get objective opinion about subject you're looking for it called censure(which not necessary bad thing). So I belive that everyone should have access to information and to this one as well, sometimes search engine couldn't find somehting and it could happen so that forum desired info...
GauGau if you're so worry about search engine quality better use your time to help Google fight back on SPAMmers who poison search engine results with doorway pages, hidden links, repeated keywords and other unfair stuff with reporting them using this link http://www.google.com/contact/spamreport.html rather use this time to attack creative minds who build this community and help to thousands(or maybe even millions) of people to use one of the best forum software I know.
This software as mention Fizzy, used not only to post 99% of trash like you think but for some very important needs and help to millions of other people.
« Last Edit: April 22, 2003, 11:40:50 PM by sensovision » Logged

Denis

Are you good with the graphic? check out our design logo contest!
I, Brian
Full Member
***
Posts: 238


It is coming...

WWW
Re:search engine spidering
« Reply #38 on: April 23, 2003, 07:35:11 AM »
Reply with quote


Good! This subject has been covered, but has anyone actually succeeded in forming a hack so that boards can be fully spidered?

Until I find another way, I've manually added all of my topic URLs to a sitemap index for easy spidering: here.

Did anyone actually succeed in having Google deepcrawl all of their forum? Or if they did, would people not wish to say?


BTW - re: GauGau - ignore everything he says, it's not supported from what I'm reading elsewhere. Here is a thread at Forum-Forum where someone joyfully declares what happens when you add a hack to a php.bb to allow spidering - note that Forum-Forum also has all of it's pages rendered from dynamic php into static HTML pages for its archive for easy spidering.



Logged

I, Brian
Full Member
***
Posts: 238


It is coming...

WWW
Re:search engine spidering
« Reply #39 on: April 23, 2003, 09:39:37 AM »
Reply with quote

I mentioned on another thread how php.bb have a hack for closing down the session length issue, thus making php.bb boards capable of being fully indexed -


From: http://www.sitepoint.com/article/971/2

Some shopping carts or forums store session information in the URL when cookies are unable to be written. This effectively kills search engines like Google because search engines key their indexes with URLs, and when you put session information in the URL, that URL will change constantly. This is especially true as Google uses multiple IP addresses to crawl the Web, so each crawler will see a different URL on your site, which basically results in those pages not being listed. It is important that if you use such software, you amend it so that if cookies are unable to be written, the software simply does not track session information.
Logged

cyc
Noobie
*
Posts: 30


I love YaBB SE!

WWW
Re:search engine spidering
« Reply #40 on: April 23, 2003, 10:00:40 AM »
Reply with quote

Its worth following up, I've switched to vBulletin so I don't really care all that much anymore  :)

One thing to keep in mind is traffic, once you have 10-20,000 pages listed in google and other SE's 5-600 meg (15-20+ gig per month) of traffic per day will be achieved by smallish (50,000 posts) forums.

Before SE friendly url's we got only 6 or 7 visitors from google each day, now we see well over 2,000 on most days just from google.com (much more when you count their other sites google.com.au etc)

It takes all of 1 minute to setup url re-writting, someone just needs to change the links within YaBB to use the re-written urls, its not rocket science, anyone who has ever posted a hack should be able to cope with it.

IMO if this was added to YaBB it would be the best of the free forums without question.

cheers!
Logged

[Unknown]
Global Moderator
YaBB God
*****
Posts: 7830


ICQ - 179721867unknownbrackets@hotmail.com WWW
Re:search engine spidering
« Reply #41 on: April 23, 2003, 03:42:58 PM »
Reply with quote

Quote from: cyc on April 23, 2003, 10:00:40 AM
Its worth following up, I've switched to vBulletin so I don't really care all that much anymore  :)

One thing to keep in mind is traffic, once you have 10-20,000 pages listed in google and other SE's 5-600 meg (15-20+ gig per month) of traffic per day will be achieved by smallish (50,000 posts) forums.

Before SE friendly url's we got only 6 or 7 visitors from google each day, now we see well over 2,000 on most days just from google.com (much more when you count their other sites google.com.au etc)

It takes all of 1 minute to setup url re-writting, someone just needs to change the links within YaBB to use the re-written urls, its not rocket science, anyone who has ever posted a hack should be able to cope with it.

IMO if this was added to YaBB it would be the best of the free forums without question.

cheers!

*thinks about how long this has been done in his secret project...*

I am about to write a mod that may help with all of this.  The problem is that YaBB SE uses full URLs not relative ones, and search engines get scared away.

Actually, I'm about to write/release two mods:
- global cookies by domain. (cross-subdomain..)
- short URLs.

If they go well, they will become optiosn in my secret project ;).

-[Unknown]
Logged
David
Destroyer Dave
Global Moderator
YaBB God
*****
Posts: 5761


I'm not a llama!

WWW
Re:search engine spidering
« Reply #42 on: April 23, 2003, 03:53:55 PM »
Reply with quote

This really is not that hard to do with mod_rewrite.  And yes, I am setting myself up to do it.  ;)
Logged

I, Brian
Full Member
***
Posts: 238


It is coming...

WWW
Re:search engine spidering
« Reply #43 on: April 23, 2003, 07:33:01 PM »
Reply with quote

Well, if you're used to configuring Apache then I'm sure a mod_rewrite isn't that hard.

If not then [unknown]'s mod writnig looks like the best remaining option for the rest of the YaBB SE community.

I wouldn't want to have to keep manually indexing my forum threads for spidering for too long. ;)


Logged

[Unknown]
Global Moderator
YaBB God
*****
Posts: 7830


ICQ - 179721867unknownbrackets@hotmail.com WWW
Re:search engine spidering
« Reply #44 on: April 23, 2003, 09:59:46 PM »
Reply with quote

http://www.yabbse.org/community/index.php?board=158;action=display;threadid=21933

-[Unknown]
Logged
Pages: 1 2 [3] 4 Reply Ignore Print 
YaBB SE Community  |  Development  |  Mod Ideas and Creation  |  search engine spidering « previous - next »
 


Powered by MySQL Powered by PHP YaBB SE Community | Powered by YaBB SE
© 2001-2003, YaBB SE Dev Team. All Rights Reserved.
SMF 2.1.4 © 2023, Simple Machines
Valid XHTML 1.0! Valid CSS

Page created in 0.080 seconds with 20 queries.