Imagine working hard to write a great story or article, only to find someone else claiming it as their own. That’s what happens when people steal your website content.
Content stealing, or ‘scraping’, is a big problem for website owners. These people are thieves who copy your work, use it on their own sites, and sometimes even pretend it is theirs. This can be really frustrating and unfair.
In this article, we will cover what blog content scraping is, how you can reduce and prevent content scraping, and even how to take advantage of content scrapers for your own benefit.
What Is Blog Content Scraping in WordPress?
Blog content scraping is when content is taken from numerous sources and republished on another site. Usually, this is done automatically via your blog’s RSS feed.
Unfortunately, it is very easy and very common to have your WordPress blog content stolen in this way. If it has happened to you, then you understand how stressful and frustrating it can be.
Sometimes, your content will be simply copied and pasted directly to another website, including your formatting, images, videos, and more.
Other times, your content will be reposted with attribution and a link back to your website, but without your permission. Although this can help your SEO, you may want to keep your original content hosted on your site only.
Why Do Content Scrapers Steal Content?
Some of our users have asked us why scrapers are stealing content. Usually, the main motivation for content theft is to profit from your hard work:
- Affiliate commission: Dishonest affiliate marketers may use your content to bring traffic to their site through search engines in order to promote their niche products.
- Lead Generation: Lawyers and realtors may pay someone to add content and gain authority in their community and not realize it is being scraped from other sources.
- Advertising Revenue: Blog owners may scrape content to create a hub of knowledge in a certain niche ‘for the good of the community’ and then plaster the site with ads.
Is It Possible to Completely Prevent Content Scraping?
In this article, we will show you some steps you can take to reduce and prevent content scraping. But unfortunately, there is no way to completely stop a determined thief.
That’s why we finished this article with a section on how you can take advantage of content scrapers. While you can’t always stop a thief, you may be able to gain some traffic and revenue through the content they have stolen from you.
What Should You Do When You Discover Someone Has Scraped Your Content?
Since it’s not possible to completely stop scrapers, you may one day discover that someone is using content they stole from your blog. You may wonder what to do when that happens.
Here are a few approaches that people take when dealing with content scrapers:
- Do Nothing: You can spend a lot of time fighting scrapers, so some popular bloggers decide to do nothing. Google already sees well-known sites as authorities, but that’s not true of smaller sites. So this approach is not always the best, in our opinion.
- Take Down: You can contact the scraper and ask them to take the content down. If they refuse, then you submit a takedown notice. You can learn how in our guide on how to easily find and remove stolen content in WordPress.
- Take Advantage: While we actively work at having content scraped from WPBeginner taken down, we also use a few techniques to get traffic and make money from scrapers. You can learn how in the ‘Take Advantage of Content Scrapers’ section below.
With that being said, let’s take a look at how to prevent blog scraping in WordPress. Since this is a comprehensive guide, we have included a table of contents for easier navigation:
- Copyright or Trademark Your Blog’s Name and Logo
- Make Your RSS Feed More Difficult to Scrape
- Disable Trackbacks and Pingbacks
- Block the Scraper’s Access to Your WordPress Website
- Prevent Image Theft in WordPress
- Discourage Manual Copying of Your Content
- Take Advantage of Content Scrapers
1. Copyright or Trademark Your Blog’s Name and Logo
Trademark and copyright laws protect your intellectual property rights, brand, and business against many legal challenges. This includes plagiarism and illegal use of your copyrighted material or your brand’s name and logo.
You should clearly display a copyright notice on your site. While your website content is automatically covered by copyright laws, displaying a notice will let you know that your content is copyrighted and that they cannot use your protected properties for business.
For example, you can add a copyright notice with a dynamic date to your WordPress footer. This will keep your copyright notice up to date.
This may discourage some users from stealing it. It will also help in the case that you do need to send a cease and desist letter or file a DCMA complaint to take down your stolen content.
You can also apply for copyright registration online. This process can be complicated, but luckily, there are low-cost legal services that can help small businesses and individuals.
Learn how in our guide on how to trademark and copyright your blog’s name and logo.
2. Make Your RSS Feed More Difficult to Scrape
Since blog content scraping is usually done automatically via your blog’s RSS feed, let’s look at a few helpful changes you can make to your feed.
Don’t Include the Full Post Content in Your WordPress RSS Feed
You can include just a summary of each post in your RSS feed instead of the full content. This includes an excerpt as well as post metadata such as the date, author, and category.
There is certainly debate in the blogging community about whether to have full RSS feeds or summary feeds. We won’t get into that now except to say that one of the pros of just having a summary is that it helps prevent content scraping.
You can change the settings by going to Settings » Reading in your WordPress admin panel. You need to select the ‘Excerpt’ option and then click the ‘Save Changes’ button.
Now, the RSS feed will only show an excerpt of your article. If someone is stealing your content through your RSS feed, then they will only get the summary, not the full post.
If you would like to tweak the summary, then you can see our guide on how to customize WordPress excerpts.
Optimize Your RSS Feed to Prevent Scraping
There are other ways you can optimize your WordPress RSS feed to protect your content, get more backlinks, increase your web traffic, and more. One of the best ways is to delay posts from appearing in the RSS feed.
The benefit is that when you delay posts from appearing in your RSS feed, you give the search engines time to crawl and index your content before it appears elsewhere, such as on scraper’s websites. The search engines will then see your site as the authority.
The safest and easiest way to do this is using WPCode because it has a recipe that automatically adds the correct custom code to WordPress.
For detailed instructions, see our guide on how to delay posts from appearing in your WordPress RSS feed.
3. Disable Trackbacks, Pingbacks, and REST API
In the early days of blogging, trackbacks and pingbacks were introduced as a way for blogs to notify each other about links. When someone links to a post on your blog, their website will automatically send a ping to yours.
This pingback will then appear in your blog’s comment moderation queue with a link to their website. If you approve it, then they get a backlink and mention from your site.
This gives the spammer an incentive to scrape your site and send trackbacks. Luckily, you can disable trackbacks and pingbacks to give scrapers one less reason to steal your content.
For more information, check out our guide on disabling trackbacks on all future posts. You might also like to learn how to disable trackbacks and pings on existing WordPress posts.
Disable WordPress REST API
Aside from trackbacks and pingbacks, we also recommend disabling the WordPress REST API, as it can make it easier for spammers to scrape your content.
We have a detailed guide on how you can disable the WordPress REST API.
All you need to do is install and activate the free WPCode plugin and use their pre-made snippet to disable the REST API.
4. Block the Scraper’s Access to Your WordPress Website
One way to stop scrapers from stealing your content is to take away their access to your website. You can do this manually by blocking their IP address, but most users will find it easier to use a security plugin such as a web application firewall.
Block the Scraper Using a Security Plugin (Recommended)
Blocking scrapers manually is tricky and a lot of work. Especially since many hacking attempts and attacks are made using a wide range of random IP addresses from all over the world. It’s almost impossible to keep up with all those random IP addresses.
That’s why you need a Web Application Firewall (WAF) such as Wordfence or Securi. These act as a shield between your website and all incoming traffic by monitoring your website traffic and blocking common security threats before they reach your WordPress site.
For the WPBeginner website, we use Sucuri. It is a website security service that protects your website against such attacks using a website application firewall.
Basically, all your website traffic goes through the security service’s servers, where it is examined for suspicious activity. They automatically block suspicious IP addresses from reaching your website altogether. See how Sucuri helped us block 450,000 WordPress attacks in 3 months.
Manually Block or Redirect the Scraper’s IP Address
Advanced users may also wish to manually block a scraper’s IP address. This is more work, but you can specifically target the scraper’s address once you learn it. Web developer Jeff Star suggests this approach when he writes about how he handles content scrapers.
Note: Adding code to website files can be dangerous. Even a small mistake can cause major errors on your site. That’s why we only recommend this method for advanced users.
You can find the scraper’s IP address by visiting ‘Raw Access Logs’ in the cPanel dashboard of your web hosting account. You need to look for IP addresses with an unusually high number of requests and keep a record of them, say by copying them into a separate text file.
Tip: You need to make sure that you don’t end up blocking yourself, legitimate users, or search engines from accessing your website. Copy a suspicious-looking IP address and use online IP lookup tools to find out more about it.
Once you are confident that the IP address belongs to a scraper, you can block it using the cPanel ‘IP Blocker’ tool or by adding code like this in your root .htaccess file:
1 2 |
<span style="font-size: 115%; color: #000000;">Deny from 123.456.789 </span> |
Make sure you replace the IP address in the code with the one you want to block. You can block multiple IP addresses by entering them on the same line, separated by spaces.
For detailed instructions, see our guide on how to block IP addresses in WordPress.
Instead of simply blocking the scrapers, Jeff suggests you could send them dummy RSS feeds instead. You could create feeds full of Lorem Ipsum and annoying images or even send them right back to their own website, causing an infinite loop and crashing their server.
To redirect them to a dummy feed, you will need to add code like this to your .htaccess file:
1 2 3 |
<span style="font-size: 115%; color: #000000;">RewriteCond %{REMOTE_ADDR} 123.456.789. RewriteRule .* http://dummyfeed.com/feed [R,L] </span> |
5. Prevent Image Theft in WordPress
It’s not just your written content that you need to protect. You should also prevent image theft in WordPress.
Like text, there is no way to completely stop people from stealing your images, but there are plenty of ways to discourage image theft on a WordPress website.
For example, you can disable the hotlinking of your WordPress images. This will mean that if someone scrapes your HTML content, their images will not load on their site.
It will also reduce your server load and bandwidth usage, boosting your WordPress speed and performance.
Alternatively, you can add a watermark to your images that gives you credit. This will make it clear that the scraper has stolen your content.
You can learn these two techniques, as well as other ways to protect your images, in our guide on ways to prevent image theft in WordPress.
6. Discourage Manual Copying of Your Content
While most scrapers use automatic tools, some content thieves may try to manually copy all or part of your content.
One way to make this more difficult is to prevent them from copying and pasting your text. You can do this by making it harder for them to select the text on your website.
To learn how to stop manual copying of your content, see our step-by-step guide on how to prevent text selection and copy/paste in WordPress.
However, this will not completely protect your content. Remember, tech-savvy users can still view the source code or use the Inspect tool to copy anything they want. Also, this method will not work with all web browsers.
Also, keep in mind that not everyone copying your text will be a content thief. For instance, some people may want to copy the title to share your post on social media.
That’s why we recommend you only use this method if you feel it’s truly needed for your site.
7. Take Advantage of Content Scrapers
As your blog gets larger, it is almost impossible to stop or keep track of all content scrapers. We still send out DMCA complaints. However, we know that there are tons of other sites that are stealing our content that we just cannot keep up with.
Instead, our approach is to try to take advantage of content scapers. It’s not so bad when you see that you’re making money from your stolen content or receiving a lot of traffic from a scraper’s website.
Make Internal Linking a Habit to Gain Traffic and Backlinks from Scrapers
In our ultimate guide on SEO, we recommend that you make internal linking a habit. By placing links to your other content in your blog posts, you can increase pageviews and reduce the bounce rate on your own site.
But there is a second benefit when it comes to scraping. Internal links will get you valuable backlinks from the people who are stealing your content. Search engines like Google use backlinks as a ranking signal, so the additional backlinks are good for your SEO.
Lastly, these internal links allow you to steal the scraper’s audience. Talented bloggers place links on interesting keywords, making it tempting for users to click. Visitors to the scraper’s website will also click the links, which will lead them straight back to your own website.
Auto Link Keywords With Affiliate Links to Make Money from Scrapers
If you make money on your website from affiliate marketing, then we recommend enabling auto-linking in your RSS feeds. This will help you maximize your earnings from readers who only read your website via RSS readers.
Even better, it will help you make money from the sites that are stealing your content.
Simply use a WordPress plugin like ThirstyAffiliates that will automatically replace assigned keywords with affiliate links. We show you how in our guide on how to automatically link keywords with affiliate links in WordPress.
Promote Your Website in Your RSS Footer
You can use the All in One SEO plugin to add custom items to your RSS footer.
For example, you can add a banner that promotes your own products, services, or content.
The best part is that those banners will appear on the scraper’s website as well.
In our case, we always add a little disclaimer at the bottom of posts in our RSS feeds. By doing this, we get a backlink to the original article from the scraper’s site.
This lets Google and other search engines know we are the authority. It also lets their users know that the site is stealing our content.
For more tips, check out our guide on how to control your RSS feed footer in WordPress.
We hope this tutorial helped you learn how to prevent blog content scraping in WordPress. You may also want to see our ultimate WordPress security guide or our expert pick of the best content protection plugins for WordPress.
If you liked this article, then please subscribe to our YouTube Channel for WordPress video tutorials. You can also find us on Twitter and Facebook.