256 Kilobytes

[MatthewGraham] [Easy Trick] How to Scrape All Competitor URLs, Titles, Descriptions, and Meta KWs in 5 Minutes Max

Articles in Internet Marketing | By August R. Garcia

Published 2 months agoSat, 13 Apr 2019 01:11:43 -0700 | Last update 2 months agoSat, 13 Apr 2019 01:12:12 -0700

Old/archive.

128 views, 0 RAMs, and 0 comments

Step 1: Find a Competitor's Sitemap

Basically any site large enough to be worth scraping will have an XML sitemap. It should be located here

  • [domain].com/sitemap.xml

Step 2: Copy the Full Sitemap

Easiest way to do this is a simple Ctrl+A and Ctrl+C

Step 3: Extract the URLs

Go to this URL:

Paste (Ctrl+V) your URLs into the "Test String" field. Next, paste this into the "Regular Expression" field.

Code:

  • http[^<"'\n\r]*

If you're wondering, this will search for all strings that start with "http" and return everything from that point until the search runs into a "<", a single or double quote, or a linebreak. This will match all URLs.

Step 4: Extract the Meta Tags

This is one of various free tools that will do this:

Tool has no cap, no account required. Extracted 5,000 URLs in 1-3 minutes or so. Scroll to the bottom of the page to download as CSV

For each URL, the CSV will contain the HTML title tag, the meta desctiption, and the meta keywords.

Step 5: Exploit the Data for Profit

At this point, you have the data. Use your imagination -- there's a lot that you can do with it -- check search volume, find seed keywords, etc.

Bonus: Quickly Parse Common Data from Common Conventions/Locations

Quick tricks to parse data in the scraped fields:

  • Meta Keywords
    • Some sites use the meta keywords field to keep track of keywords, which gives you a ton of comma-separated keywords in the CSV's "Meta Keywords" field to work with right off the bat
    • No parsing required
  • Title Tags
    • Title tags are also a great place to start and frequently contain keywords verbatim.
      • It's very common for title tags be formatted like these:
        • Some Keyword Here | Non-SEO Comment Here
        • Buy Green Shoes | We're the #1 Company!
        • How to Buy a Couch | Top 3 Tricks
    • You can split those up into multiple cells with this spreadsheet formula:
      • =SPLIT(B2,"|")
  • URL
    • Keywords are often included in the URL split by dashes. Those can be extracted with these formulas:
      • =REGEXREPLACE(REGEXEXTRACT(A2, "\.com\/(.*)"), "[^a-zA-Z0-9]", " ")
        • Removes the base URL and replaces all non-alphanumeric characters in the path with spaces:
      • =REGEXREPLACE(REGEXEXTRACT(A2, "\.com\/(.*)"), "[^a-zA-Z]", " ")
        • Removes all non-alphabetical (removes numbers as well)
      • Note: If the domain you scraped is not a .com (a .net or other TLD), change the ".com" in either of those to your TLD.
Download more RAM. 🐏 ⨉ 0 Posted by August R. Garcia 2 months ago

Edit History

• [2019-04-13 1:11 PDT] August R. Garcia (2 months ago)
🕓 Posted at 13 April, 2019 01:11 AM PDT

Profile Photo - August R. Garcia August R. Garcia LARPing as a Sysadmi... Portland, OR
🗎 151 🗨 782 🐏 216
Site Owner

Grahew Mattham

August Garcia is some guy who used to sell Viagra on the Internet. He made this website to LARP as a sysadmin while posting about garbage like user-agent spoofing, spintax, the only good keyboard, virtual assitants from Pakistan, links with the rel="nofollow" attributeproxiessin, the developer console, literally every link building method, and other junk.

Available at arg@256kilobytes.com, via Twitter, or arg.256kilobytes.com. Open to business inquiries based on availability.


Account created 6 months ago.
151 posts, 782 comments, and 216 RAMs.

Last active 15 hours ago:
Commented in thread Top 1 Ahrefs Tricks You Didn't Know

Post a New Comment

To leave a comment, login to your account or create an account.

Do you like having a good time?

Read Quality Articles

Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.

View Articles →

Argue with People on the Internet

Use your account to explain why people are wrong on the Internet forum.

View Forum →

Vandalize the Wiki

Or don't. I'm not your dad.

View Wiki →

Ask and/or Answer Questions

If someone asks a terrible question, post a LMGTFY link.

View Answers →

Make Some Money

Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.

Register an Account
You can also login to an existing account or recover your password. All use of this site is subject to terms outlined in the terms of service and privacy policy.