Myths of a Search Engine, Ranking, and SEO
Over the past several years, a number of misconceptions have emerged about how the search engines operate. For the beginner SEO, this causes confusion about what's required to perform effectively. So, I'm going to explain the real story behind the myths.
In the late 1990's, search engines had "submission" forms that were part of the optimization process. Webmasters & site owners would tag their sites & pages with keyword information, and "submit" them to the engines. Soon after submission, a bot would crawl and include those resources in their index. Simple!
Unfortunately, this process didn't scale very well, the submissions were often spam, and the practice eventually gave way to purely crawl-based engines. Since 2001, not only has search engine submission not been required, but it is actually virtually useless. The engines all publicly note that they rarely use "submission" URLs , and that the best practice is to earn links from other sites. This will naturally expose your site to search engines.
You can still sometimes find submission pages (here's one for Bing), but these are remnants of time long past, and are essentially useless to the practice of modern SEO. If you hear a pitch from an SEO offering "search engine submission" services, run, don't walk, to a real SEO designer. Even if the engines used the submission service to crawl your site, you'd be unlikely to earn enough "link juice" to be included in their indices or rank competitively for search queries.
Once upon a time, much like search engine submission, meta tags (in particular, the meta keywords tag) were an important part of the SEO process. You would include the keywords you wanted your site to rank for and when users typed in those terms, your page could come up in a query. This process was quickly spammed to death, and eventually dropped by all the major engines as an important ranking signal.
It is true that other tags, namely the TITLE TAG (not stictly a meta tag, but often grouped with them) and META DESCRIPTION TAG, are of critical importance to SEO best practices. Additionally, the META ROBOTS TAG is an important tool for controlling spider access. However, SEO is not "all about meta tags", at least, not anymore.
KEY WORD STUFFING
Ever see a page that just looks spammy? Perhaps something like:
"Bob's cheap Seattle plumber is the best cheap Seattle plumber for all your plumbing needs. Contact a cheap Seattle plumber before it's too late"
Not surprisingly, a persistent myth in SEO revolves around the concept that keyword density - a mathematical formula that divides the number of words on a page by the number of instances of a given keyword - is used by the search engines for relevancy & ranking calculations.
Despite being proven untrue time and again, this myth is still touted as true. Many SEO tools still feed on the concept that keyword density is an important metric. It's not. Ignore it and use keywords intelligently and with usability in mind. The value from an extra 10 instances of your keyword on the page is far less than earning one good editorial link from a source that doesn't think you're a search spammer.
PAID SEARCHES = ORGANIC RESULTS
Put on your tin foil hats, it's time for the most common SEO conspiracy theory: spending money on search engine advertising (PPC) improves your organic SEO rankings.
In all of the experiences I've ever witnessed or heard about, this has never been proven nor has it ever been a probable explanation for effects in the organic results. Google, Yahoo! & Bing all have very effective walls in their organizations to prevent precisely this type of crossover.
At Google in particular, advertisers spending tens of millions of dollars each month have noted that even they cannot get special access or consideration from the search quality or web spam teams. So long as the existing barriers are in place and the search engines cultures maintain their separation, I (and others) believe that this will remain a myth. That said, I have seen anecdotal evidence that bidding on keywords you already organically rank for can help increase your organic click through rate.
SEARCH ENGINE SPAM
As long as there is search, there will always be spam. The practice of spamming the search engines - creating pages and schemes designed to artificially inflate rankings or abuse the ranking algorithms employed to sort content - has been rising since the mid-1990's.
With payouts so high (at one point, a fellow SEO noted that a single day ranking atop Google's search results for the query "buy viagra" could bring upwards of $20,000 in affiliate revenue), it's little wonder that manipulating the engines is such a popular activity on the web. However, it's become increasingly difficult and, in my opinion, less and less worthwhile for two reasons.
- Users hate spam, and the search engines have a financial incentive to fight it. Many believe that Google's greatest product advantage over the last 10 years has been their ability to control and remove spam better than their competitors. It's undoubtedly something all the engines spend a great deal of time, effort and resources on. While spam still works on occasion, it generally takes more effort to succeed than producing "good" content, and the long term payoff is virtually non-existent.
Instead of putting all that time and effort into something that the engines will throw away, why not invest in a value added, long term strategy instead?
- Search engines have done a remarkable job identifying scalable, intelligent methodologies for fighting spam manipulation, making it dramatically more difficult to adversely impact their intended algorithms. Complex concepts like TrustRank, HITS, statistical analysis, historical data and more have all driven down the value of search spam and made so-called "white hat" tactics (those that don't violate the search engines' guidelines) far more attractive.
More recently, Google's Panda update introduced sophisticated machine learning algorithms to combat spam and low value pages at a scale never before witnessed online. If the search engines' job is to deliver quality results, they have raised the bar year after year.
For additional details about spam from the engines, see Google's Webmaster Guidelines and Bing's Webmaster FAQs (pdf).
The important thing to remember is this: Not only do manipulative techniques not help you in most cases, but often times they cause search engines to impose penalties on your site.
Page Level Spam
Search engines perform spam analysis across individual pages and entire websites (domains). Lets first look at how they evaluate manipulative practices on the URL level.
KEY WORD STUFFING
One of the most obvious and unfortunate spamming techniques, keyword stuffing, involves littering repetitions of keyword terms or phrases into a page in order to make it appear more relevant to the search engines. The thought behind this - that increasing the number of times a term is mentioned can considerably boost a page's ranking - is generally false. Studies looking at thousands of the top search results across different queries have found that keyword repetitions play an extremely limited role in boosting rankings, and have a low overall correlation with top placement.
The engines have very obvious and effective ways of fighting this. Scanning a page for stuffed keywords is not challenging, and the engines' algorithms are all up to the task. You can read more about this practice, and Google's views on the subject, in a blog post from the head of their web spam team - SEO Tip: Avoid Keyword Stuffing.
One of the most popular forms of web spam, manipulative link acquisition relies on the search engines' use of link popularity in their ranking algorithms to attempt to artificially inflate these metrics and improve visibility. This is one of the most difficult forms of spamming for the search engines to overcome because it can come in so many forms. A few of the many ways manipulative links can appear include:
- Reciprocal link exchange programs, wherein sites create link pages that point back and forth to one another in an attempt to inflate link popularity. The engines are very good at spotting and devaluing these as they fit a very particular pattern.
- Link schemes, including "link farms" and "link networks" where fake or low value websites are built or maintained purely as link sources to artificially inflate popularity. The engines combat these through numerous methods of detecting connections between site registrations, link overlap or other common factors.
- Paid links, where those seeking to earn higher rankings buy links from sites and pages willing to place a link in exchange for funds. These sometimes evolve into larger networks of link buyers and sellers, and although the engines work hard to stop them (and Google in particular has taken dramatic actions), they persist in providing value to many buyers & sellers.
- Low quality directory links are a frequent source of manipulation for many in the SEO field. A large number of "pay-for-placement" web directories exist to serve this market and pass themselves off as legitimate with varying degrees of success. Google often takes action against these sites by removing the PageRank score from the toolbar (or reducing it dramatically), but won't do this in all cases.
There are many more manipulative link building tactics that the search engines have identified and, in most cases, found algorithmic methods for reducing their impact. As new spam systems emerge, engineers will continue to fight them with targeted algorithms, human reviews and the collection of spam reports from webmasters & SEOs.
All the search engine guidelines say the same thing, show the same content to the engine's crawlers that you'd show to an ordinary visitor. This means, among other things, not to hide text in the html code of your website that a normal visitor can't see.
When this guideline is broken, the engines call it "cloaking" and take action to prevent these pages from ranking in their results. Cloaking can be accomplished in any number of ways and for a variety of reasons, both positive and negative. In some cases, the engines may let practices that are technically "cloaking" pass, as they're done for positive user experience reasons. For more on the subject of cloaking and the levels of risk associated with various tactics and intents, see this post on White Hat Cloaking.
Low Value Pages
Although it may not technically be considered "web spam," the engines all have methods to determine if a page provides unique content and "value" to its searchers before including it in their web indices and search results. The most commonly filtered types of pages are "thin" affiliate content, duplicate content, and dynamically generated content pages that provide very little unique text or value. The engines are against including these pages and use a variety of content and link analysis algorithms to filter out "low value" pages from appearing in the results.
Google's 2011 Panda update took the most aggressive steps ever seen in reducing low quality content across the web, and Google continues to update this process.
Domain Level Spam
In addition to watching individual pages for spam, engines can also identify traits and properties across entire root domains or subdomains that could flag them as spam. Obviously, excluding entire domains is tricky business, but it's also much more practical in cases where greater scalability is required.
Just as with individual pages, the engines can monitor the kinds of links and quality of referrals sent to a website. Sites that are clearly engaging in the manipulative activities described above on a consistent or seriously impacting way may see their search traffic suffer, or even have their sites banned from the index. You can read about some examples of this from past - take a look at JC Penney Google penalty.
Websites that earn trusted status are often treated differently from those who have not. In fact, many SEOs have commented on the "double standards" that exist for judging "big brand" and high importance sites vs. newer, independent sites. For the search engines, trust most likely has a lot to do with the links your domain has earned. Thus, if you publish low quality, duplicate content on your personal blog, then buy several links from spammy directories, you're likely to encounter considerable ranking problems. However, if you were to post that same content to a page on Wikipedia and get those same spammy links to point to that URL, it would likely still rank tremendously well - such is the power of domain trust & authority.
Trust built through links is also a great method for the engines to employ. A little duplicate content and a few suspicious links are far more likely to be overlooked if your site has earned hundreds of links from high quality, editorial sources like CNN.com or Cornell.edu. On the flip side, if you have yet to earn high quality links, judgments may be far stricter from an algorithmic view.
Similar to how a page's value is judged against criteria such as uniqueness and the experience it provides to search visitors, so too does this principle apply to entire domains. Sites that primarily serve non-unique, non-valuable content may find themselves unable to rank, even if classic on and off page factors are performed acceptably. The engines simply don't want thousands of copies of Wikipedia or Amazon affiliate websites filling up their index, and thus use algorithmic and manual review methods to prevent this.
Search engines constantly evaluate the effectiveness of their own results. They measure when users click on a result, quickly hit the "back" button on their browser, and try another result. This indicates that the result they served didn't meet the user's query.
It's not enough just to rank for a query and have the visitor stay on the page. Once you've earned your ranking, you have to prove it over and over again.