What is Duplicate Content?
Duplicate content is when there are two copies of the same content on the net. Duplicate content is one of the biggest slayers for a website’s ranking as it filters out the duplicate pages from search results.
It is difficult for search engines to choose which the relevant content for the given query is.
The effects of Duplicate contents are:
- The duplicate content page will not be ranked and the weight of the duplicate page is nullified
- Search engines don’t know whether to direct the link metrics to one page or the duplicate version
How Google filters the Duplicate content?
- Google filters the duplicate content by indexing and analysing pages of the sites through crawlers and robots.
- The robots go through different websites reading and saving each page in their Database. Then they start comparing the data and information among the sites using a specific algorithm that determines if the site content is spam, duplicate or original.
- Google compares the duplicate content and produce the most relevant content for the given query.
- Age and authority of the page is considered by Google before marking which one is duplicate content.
What is the effect of ‘Thin plate’ and ‘Boiler-plate’ content?
- Google released a search algorithm ‘Panda’ in 2011 and updated it in 2014, its main purpose was to award better ranking for the websites with quality content and to penalize sites that have ‘Thin content’ that gives little or no information to the users.
- Updated Panda algorithms target the content farmers and scraper websites, sites which have scraped text from other websites.
- Boiler-plate content is content that is found uniformly throughout the websites with little or no modification in the content. It occurs in many ecommerce sites where the retailers copy the content. Where, it is proliferated through all the pages without any changes. Google marks the content as a ‘Spun article’ by because it results in a high bounce rate.
A desktop application to detect ‘Thin plate content’
“Screaming Frog” is a desktop application specifically designed to detect ‘thin plate content’ in a given site. It navigates through all the pages and pulls out all the SEO related data that can be converted to CSV files. A drawback of this application is that it accounts only for the length of the content and not the content’s value and type. This application is best suited to find thin content in wiki, blog, article and resource oriented pages. It also has little effect on pages with graphics, videos or news updates.
How ‘Aggregated content’ is Panda friendly?
When content is taken from an external source as default, it is called ‘Aggregated Content’. Sometimes, certain websites will publish the content of the guest authors so as to bring traffic to the site. Additionally Google can channelise this as non-duplicated content due to the following:
- The links for the Aggregated content is marked as a“no follow” attribute.
- Creating commentary space and thus marking it as unique and user generated content.
- Aggregated content will be marked as Panda friendly if it has to have a significant and unique value added. (Ex: one can take up the concept from other site and provokes in his own tone).
How to avoid Duplicate content in the website?
- The best way to avoid the duplicate content in the website is to verify the content. Then try to re-word or modify it to create original content.
- Prepare the content with relevant information, not just focusing on keywords., and reviewing your content to see if it adds more value and do split testing, using analytics.
- Two URL’s for a specific domain, with WWW and without WWW, this can be categorised as duplicate content. This can be rectified by using a 301 redirect which redirects one version to the other version”. Verify and set your preferred domains in Google Webmaster tools.
- Always post unique articles with more than 300 words.
- Stop reproducing the same content on all the pages, and also move the boiler plate content to separate pages.
- It will inform by adding a no index tag and tell Google not to include the page in its indexing. However, it can follow and index any links to other contents that are contained in it.
- Taxonomy: The way in which your content is sorted. The content can be classified by category, by tag, or by date but this will result in duplicate content. Hence, it is best to categorise the content under one taxonomy either by category or date or by tag.
- Getting Google authorship will help to broadcast you as the recognised author of the content.
- Be sure that syndicated content consists of a link back to the source.
You can find out more about SEO OVER HERE