The DO’s and DON’T’s of duplicate content.

What is duplicate content?
According to Google’s own webmaster central blog:

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.

Common situations
Company “ACME” wants to have several entry points for the same website. So they register the following domains: acme.com, acme.be, acme.org, acme.eu, … etc. Now they point all those domains to the same website (content). After a bit the page gets crawled by the Search Engines. Guess what happens?

The search engine crawls acme.be and finds an article about product “acme generic”. It indexes this and continues. After a while it crawls acme.com and finds (the same) article about “acme generic”. Then it flags the article at acme.com as a duplicate penalizing it in the search results.

Why do the Search Engines do this?
To have decent search results… A lot of (blackhat?!?) webmasters will use RSS feeds of other sites to publish their own content. If someone browses the net, and wants to find information about “acme generic”, they should only get the unique & relevant search results.

What should I do then?
Continue reading “The DO’s and DON’T’s of duplicate content.”

The quality of the internet turning seo-black

Spam Comments
Sigh… A month ago I posted that the total amount of spam comments passed the 1K mark. Today, about a month later, the spam comments have reached passed the 3K mark.

RSS
Along with the increase I see a lot of content stealing that’s then used by blackhat SEO websites. The SEO techniques are very important to understand if you’re willing to run a decent website. But there are a lot of individuals who are abusing the system (for financial gains).

The statistics report that my RSS feeds are only “leeched”, but that there aren’t any “clickthru’s”. So I wonder if I should keep the RSS feeds online?!? Apparently it’s only used to leech content, and not that interesting to the returning visitors.

SEO Project
A while back I started a small POC project that combined all the SEO techniques I knew. The website was a wordpress blog with permalinks. I created a customized eBay plugin that would fetch data through the eBay API, and represent it on the blog. Each category had it’s own permalink (so it was very SE friendly), but it was only one page ran through variables. The categories contained the latest items along with the most popular search terms for that category. The plugin relies on a small redirect page to obfuscate the fact that it’s just eBay content publishing.

After submitting the site to Google, it took about one month to be fully indexed. At the start of the project I took time to write a few “original” articles to spice up the websites. Just to see what the effect would be between the two… One extra sidenote; I have not used any linking techniques to improve the traffic or rating of the site. It was just about the permalink & content effect.

The results after three months:

  • Visitors : ~30 / day
  • Article ratio : 5% (The articles have some hits, but 95% is on the eBay plugin.)
  • Income (last month) : 0.80€ (eBay redirect) & 0.10€ (adsense)
  • Ranking : Pagerank 0 & Alexa 0

The income is extremely low, but if you bare in mind that this website was just a small POC project to check what the impact of SEO was… Without maintaining this site, it would be self sufficient for it’s domain costs. Where even this website has a lower revenue than this POC project… 😉

(NOTE: the domain will be taken offline in 9 months when it’s due for extension)

Dashes (hypens) versus underscores

Cutting straight to the point: Dashes are better than underscores.

# Google indexes on keywords in hyphenated urls but not on keywords in underscored or conjoined urls.
# Yahoo indexes on keywords in hyphenated and underscored urls but not keywords in conjoined urls.
# MSN indexes on some keywords in hyphenated, underscored and conjoined urls but the exact circumstances in which it does so are at the moment unclear.

With underscores, Google’s programmer roots are showing. Lots of computer programming languages have stuff like _MAXINT, which may be different than MAXINT. So if you have a url like word1_word2, Google will only return that page if the user searches for word1_word2 (which almost never happens). If you have a url like word1-word2, that page can be returned for the searches word1, word2, and even “word1 word2″.

References
Matt Cutt’s history lesson on the matter
SEO Blog’s experiment

Doing the 301 redirect

When a website has been online for a while, and needs some maintenance, it’s sometimes a necessary evil to change the structure. The disadvantage of this is that this mostly influences the links to the indexed pages. The search engines (and thus also your visitors) will get a 404 (page not found), where your pagerank and such will be lost.

What can you do? Simply put a 301 redirect on it. This indicates towards search engines that the page has been moved permanently. In effect the bot will adjust the referenced link in it’s database, so that it’ll be correctly shown in the future. How to do this?

php

Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://www.new-url.com" );

asp

Response.Status="301 Moved Permanently";
Response.AddHeader("Location","http://www.new-url.com/");

asp.net

private void Page_Load(object sender, System.EventArgs e)
{
Response.Status = "301 Moved Permanently";
Response.AddHeader("Location","http://www.new-url.com");
}

Ruby on Rails

def old_action
headers["Status"] = "301 Moved Permanently"
redirect_to "http://www.new-url.com/"
end

3 way linking – ABC linking

Every webmaster knows about the concepts of getting a decent Alexa rating or Google PageRank. One of the factors used for the PageRank is incoming links. So every webmaster started getting links to his/her website. This was mostly done by “LinkExchanging”. When doing a Link Exchange, two webmasters trade a link to each others website. Thus causing a kind of “win-win” situation.

So far so good? Yes, and no… Off course google changed their algorithm a bit, so that the weight of those exchanges decreased a bit. Action causes reaction, and webmasters found an alternative for this. This being 3 way linking. A technique also known as ABC linking, yet it’s a little more complicated way to setup link trades.

The benefits of doing this kind of trading is that it is harder for search engine robots to detect the link exchange. Instead a link from site A to site B, where site B does not link to site A, has more effect, even if site B has a link to site C. It is not that obvious that the owners of site A and B is trying to enhance the number of incoming links.

Addendum
If you’re looking for a good tool to automate your link-exchanges, try “linkex”, a real timesaver!