256 Kilobytes

Spoofing Your User-Agent as Googlebot: Avoid Paywalls, Circumvent Log-In Requirements, and Exploit Other Quality Loopholes

Articles in Hacking the Government | By August R. Garcia

Published 11 months agoMon, 31 Dec 2018 22:55:38 -0800 | Last update 9 months agoMon, 18 Feb 2019 23:51:35 -0800

Do you really think someone would do that? Just go on the Internet and tell lies?

13,507 views, 0 RAMs, and 1 comment

While there are ways to detect whether a request is actually from Googlebot or whether a the user-agent is being spoofed, many sites will accept the user-agent at face value, which often results in behavior that is somewhere between "moderately amusing" and "occassionally useful." 

User-Agent Spoofing Overview

What is a user-agent?

When a web browser, bot, or other (client) computer makes a request to a webserver to request a webpage, the browser provides various pieces of metadata to assist the server in providing content that will work best for the client. One such piece of information is the user-agent. As described by the mozilla developer documentation:

The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.

Source: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

For a regular human user, this is all handled behind the scenes by your browser. For example, a user who is running Opera 12 on Windows XP may have a user agent along these lines:

Opera/9.80 (Windows NT 5.1) Presto/2.12.388 Version/12.18	

This information may then be used by the server to adjust user experience. For example, if a user-agent that indicates that a user is on a mobile device, the server may return a version of the website optimized for mobile users.

What is user-agent spoofing?

Since user-agents are reported entirely by the client (such as your browser), they can be altered to any arbitrary string at the discretion of the user. The video below demonstrates a hacker spoofing his user-agent:

Providing a user-agent that differs from the default or "correct" user-agent that is otherwise sent by your browser is known as user-agent spoofing (for example, identifying as FireFox on Linux when you are running Opera on Windows 10).

What does it mean to spoof a user-agent as Googlebot?

Robots that make requests to websites generally provide a user-agent that identifies them and provides additional information about the bot (although not all bots do so and some may provide inaccurate or misleading user-agents). Since user-agents are set at the sole discretion of the user (or bot-creator, in this case), the string can by anything the developers would like to set it to. The user-agents that are set for Googlebot by Google's engineers are generally strings similar to the following:

Googlebot/2.1 (+http://www.googlebot.com/bot.html)

In addition to the classic Googlebot, Google also has a number of other bots, as listed in the documentation on Google crawlers:

Spoofing a user-agent as Googlebot is setting your user-agent (or the user-agent of a bot that you've created) to a user-agent that self-identifies your request as coming from Googlebot.

How to Spoof a User-Agent

Why Spoof a User-Agent as Googlebot?

As stated in the introduction of this article:

While there are ways to detect whether a request is actually from Googlebot or whether a the user-agent is being spoofed, many sites will accept the user-agent at face value, which often results in behavior that is somewhere between "moderately amusing" and "occassionally useful."

How to Spoof Your User-Agent as Googlebot

In general, the easiest way to spoof your user agent (when browsing as a human, as opposed to a bot) is to install a user-agent spoofing plugin. This plugin works fine for Googlebot, Bing, and Yahoo!'s crawlers, as well as most common user-agents:

To spoof your user agent when writing a custom bot, a function or method will generally be built into any major programming language or library that is commonly used for web crawling and/or scraping.

Quality Hacks Achievable by Spoofing Your User-Agent as Googlebot

Quora, Forbes, and Tumblr are classic examples of three usecases for spoofing a user-agent as Googlebot:

  • Getting around flexible sampling restrictions, such as metering and lead-in restrictions (Quora) 
  • Avoiding advertisements (Forbes)
  • Accessing login-required areas of websites without logging-in (Tumblr)

Quora

Quora.com uses a classic example of flexible sampling. When a logged-out user first clicks through to Quora (likely from a search result, since Quora's traffic is heavily based on generating traffic from longtail KWs in the form of questions), the users can view the full page that they've clicked to. Upon clicking any link to another page, a log-in prompt will pop up, requiring that the user log in to continue.

However, since this would cripple Googlebot's ability to crawl Quora (since Googlebot does not log into websites), this use of flexible sampling would severely hinder Quora's ability to rank in the SERPs. However, Quora makes an exception for Googlebot by checking for the Googlebot user-agent. Setting your own user-agent to Googlebot will allow you to get around this restriction as well.

Forbes

The fucking massive advertisement that Forbes shows when users first land on the site is worse than AIDS. Spoofing your user-agent as Googlebot will prevent this advertisement from showing, since otherwise Forbes would be much more obnoxious and time-consuming to crawl.

Of course, now you're reading content on Forbes.com, which is somehow even worse.

Tumblr

Until roughly four seconds ago, Tumblr consisted entirely of Internet pornography. Often, search results would appear for pages that would then require you to log in. Graphic and/or adult content required users to be logged in, since this content was placed behind an age-verification/NSFW filter.

However, since Googlebot does not log into websites, an exception is made for Googlebot. By spoofing your user-agent to Googlebot, you can access Tumblr's age-restricted material without logging in. 

Related Content

Download more RAM. 🐏 ⨉ 0 Posted by August R. Garcia 11 months ago

Edit History

• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
• [2018-12-31 22:55 PST] August R. Garcia (11 months ago)
🕓 Posted at 31 December, 2018 22:55 PM PST

Profile Photo - August R. Garcia August R. Garcia LARPing as a Sysadmi... Portland, OR
๐Ÿ—Ž 207 ๐Ÿ—จ 1034 ๐Ÿ 317
Site Owner

Grahew Mattham

August Garcia is some guy who used to sell Viagra on the Internet. He made this website to LARP as a sysadmin while posting about garbage like user-agent spoofing, spintax, the only good keyboard, virtual assitants from Pakistan, links with the rel="nofollow" attributeproxiessin, the developer console, literally every link building method, and other junk.

Available at arg@256kilobytes.com, via Twitter, or arg.256kilobytes.com. Open to business inquiries based on availability.


Account created 1 year ago.
207 posts, 1034 comments, and 317 RAMs.

Last active 13 hours ago:
Commented in thread How to Rank on the top in Google for a long time?

Profile Photo - August R. Garcia August R. Garcia LARPing as a Sysadmi... Portland, OR
๐Ÿ—Ž 207 ๐Ÿ—จ 1034 ๐Ÿ 317
Site Owner

Seems like some websites look at the HTTP_REFERER value as well. I.e., websites check to see if you clicked through from Google. If yes, show the page (and then show a paywall/force a login/whatever) when you try to go to another page; else, always wall/other. Seems like LinkedIn might do this, based on a few tests from the other day, but don't quote me on that.

Download more RAM. 🐏 ⨉ 0 Posted by August R. Garcia 9 months ago 🕓 Posted at 14 February, 2019 06:20 AM PST

Sir, I can do you a nice SEO.

Post a New Comment

To leave a comment, login to your account or create an account.

Do you like having a good time?

Read Quality Articles

Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.

View Articles →

Argue with People on the Internet

Use your account to explain why people are wrong on the Internet forum.

View Forum →

Vandalize the Wiki

Or don't. I'm not your dad.

View Wiki →

Ask and/or Answer Questions

If someone asks a terrible question, post a LMGTFY link.

View Answers →

Make Some Money

Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.

Register an Account
You can also login to an existing account or recover your password. All use of this site is subject to terms outlined in the terms of service and privacy policy.