Monday, January 28, 2008

How to increase pages indexed

There is a 10 ways to increase pages indexed. They are..

1) PageRank
2) Links
3) Sitemap
4) Speed
5) Google's crawl caching proxy
6) Verify
7) Content
9) Staggered launch
10)Size matters.

PageRank

It depends a lot on PageRank. The higher your PageRank the more pages that will be indexed. PageRank isn't a blanket number for all your pages. Each page has its own PageRank. A high PageRank gives the Googlebot more of a reason to return. Matt Cutts confirms, too, that a higher PageRank means a deeper crawl.

Links

Give the Googlebot something to follow. Links (especially deep links) from a high PageRank site are golden as the trust is already established.

Internal links can help, too. Link to important pages from your homepage. On content pages link to relevant content on other pages.

Sitemap

A lot of buzz around this one. Some report that a clear, well-structured Sitemap helped get all of their pages indexed. Google's Webmaster guidelines recommends submitting a Sitemap file

That page has other advice for improving crawlability, like fixing violations and validating robots.txt.

Some recommend having a Sitemap for every category or section of a site.

Speed

A recent O'Reilly report indicated that page load time and the ease with which the Googlebot can crawl a page may affect how many pages are indexed. The logic is that the faster the Googlebot can crawl, the greater number of pages that can be indexed.

This could involve simplifying the structures and/or navigation of the site. The spiders have difficulty with Flash and Ajax. A text version should be added in those instances.

Google's crawl caching proxy

Matt Cutts provides diagrams of how Google's crawl caching proxy at his blog. This was part of the Big Daddy update to make the engine faster. Any one of three indexes may crawl a site and send the information to a remote server, which is accessed by the remaining indexes (like the blog index or the AdSense index) instead of the bots for those indexes physically visiting your site. They will all use the mirror instead.

Verify

Verify the site with Google using the Webmaster tools.

Content

Make sure content is original. If a verbatim copy of another page, the Googlebot may skip it. Update frequently. This will keep the content fresh. Pages with an older timestamp might be viewed as static, outdated, or already indexed.

Staggered launch

Launching a huge number of pages at once could send off spam signals. In one forum, it is suggested that a webmaster launch a maximum of 5,000 pages per week.

Size matters

If you want tens of millions of pages indexed, your site will probably have to be on an Amazon.com or Microsoft.com level.

Know how your site is found, and tell Google

Find the top queries that lead to your site and remember that anchor text helps in links. Use Google's tools to see which of your pages are indexed, and if there are violations of some kind. Specify your preferred domain so Google knows what to index.

Thursday, January 17, 2008

New Google Filter

Is There an Anchor Text Problem?

Aaron Wall put up a post about a new Google filter that causes people with high ranking terms to be bumped down to position #6. There is also a thread at Webmaster World about this phenomenon. This is still reasonably speculative in nature, but there are a lot of people who have seen this.

Aaron offers some really interesting speculation about why this may be occurring. The most interesting theory was the notion that it was an anchor text problem. Here is what Aaron had to say:

I think this issue is likely tied to a stagnant link profile with a too tightly aligned anchor text profile, with the anchor text being overly-optimized when compared against competing sites.

Whether or not this is occurring now, this makes complete sense. It is well within Google’s (or any other search engine’s) ability to detect an unusually high density of one form of anchor text to a given domain. For example, if your site is called yourdomain.com, and you sell widgets, and the anchor text in 48 or your 65 links says “Widgets on Sale”, this is not natural.

Most of the links to your site should be the name of your domain itself (i.e. in this example, “yourdomain”). Such a distribution of anchor text is a flag that the anchor text of your links are being artificially influenced. How is that done? Why by purchasing links, or by heavy duty link swapping.

This is potentially another step in Google’s stepped up war against the practice of link buying. I have long maintained that the main advantage the link buying has over natural links is the fact that people who buy links get to specify the exact (keyword rich) anchor text. used. Looking for unnatural patterns of anchor text provides a backdoor into detecting people who are purchasing links.

It might be a bit heavy handed for Google to ban a site based on this type of evidence, but reducing the impact of anchor text on rankings when there is an unnatural distribution in play still helps them meet their goal. After all, even if the unnatural acnhor text campaign does not represent the result of a link buying campaign, and all those keyword laden links are in fact completely natural, it might still provide better relevance for Google to filter in this manner.

Thinking about this further, this might be a simple search quality adjustment for skewed anchor text distribution. If it affects paid links, from Google’s perspective, this might just be a bonus.

Wednesday, January 2, 2008

Google Video Sitemaps

Creating and submitting Video Sitemaps files

About Google Video Sitemaps

Google Video Sitemaps is an extension of the Sitemap protocol that enables you to publish and syndicate online video content and its relevant metadata to Google in order to make it searchable in the Google Video index. You can use a Video Sitemap to add descriptive information – such as a video’s title, description, duration, etc. – that makes it easier for users to find a particular piece of content. When a user finds your video through Google, they will be linked to your hosted environments for the full playback.

When you submit a Video Sitemap to Google, we will make the included video URLs searchable on Google Video. Search results will contain a thumbnail image (provided by you or autogenerated by Google) of your video content, as well as information (such as title) contained in your Video Sitemap. In addition, your video may also appear in other Google search products. During this beta period, we can’t predict or guarantee when or if your videos will be added to our index, but as we refine our product, we expect both coverage and indexing speed to improve.

Google can crawl the following video file types: .mpg, .mpeg, .mp4, .mov, .wmv, .asf, .avi, .ra, .ram, .rm, .flv. All files must be accessible via HTTP. Metafiles that require a download of the source via streaming protocols are not supported at this time.