256 Kilobytes

[MatthewGraham] [Easy Trick] How to Scrape All Competitor URLs, Titles, Descriptions, and Meta KWs in 5 Minutes Max

Articles in Internet Marketing | By August R. Garcia

Published 1 week agoSat, 13 Apr 2019 01:11:43 -0700 | Last update 1 week agoSat, 13 Apr 2019 01:12:12 -0700

Old/archive.

33 views, 0 RAMs, and 0 comments

Step 1: Find a Competitor's Sitemap

Basically any site large enough to be worth scraping will have an XML sitemap. It should be located here

  • [domain].com/sitemap.xml

Step 2: Copy the Full Sitemap

Easiest way to do this is a simple Ctrl+A and Ctrl+C

Step 3: Extract the URLs

Go to this URL:

Paste (Ctrl+V) your URLs into the "Test String" field. Next, paste this into the "Regular Expression" field.

Code:

  • http[^<"'\n\r]*

If you're wondering, this will search for all strings that start with "http" and return everything from that point until the search runs into a "<", a single or double quote, or a linebreak. This will match all URLs.

Step 4: Extract the Meta Tags

This is one of various free tools that will do this:

Tool has no cap, no account required. Extracted 5,000 URLs in 1-3 minutes or so. Scroll to the bottom of the page to download as CSV

For each URL, the CSV will contain the HTML title tag, the meta desctiption, and the meta keywords.

Step 5: Exploit the Data for Profit

At this point, you have the data. Use your imagination -- there's a lot that you can do with it -- check search volume, find seed keywords, etc.

Bonus: Quickly Parse Common Data from Common Conventions/Locations

Quick tricks to parse data in the scraped fields:

  • Meta Keywords
    • Some sites use the meta keywords field to keep track of keywords, which gives you a ton of comma-separated keywords in the CSV's "Meta Keywords" field to work with right off the bat
    • No parsing required
  • Title Tags
    • Title tags are also a great place to start and frequently contain keywords verbatim.
      • It's very common for title tags be formatted like these:
        • Some Keyword Here | Non-SEO Comment Here
        • Buy Green Shoes | We're the #1 Company!
        • How to Buy a Couch | Top 3 Tricks
    • You can split those up into multiple cells with this spreadsheet formula:
      • =SPLIT(B2,"|")
  • URL
    • Keywords are often included in the URL split by dashes. Those can be extracted with these formulas:
      • =REGEXREPLACE(REGEXEXTRACT(A2, "\.com\/(.*)"), "[^a-zA-Z0-9]", " ")
        • Removes the base URL and replaces all non-alphanumeric characters in the path with spaces:
      • =REGEXREPLACE(REGEXEXTRACT(A2, "\.com\/(.*)"), "[^a-zA-Z]", " ")
        • Removes all non-alphabetical (removes numbers as well)
      • Note: If the domain you scraped is not a .com (a .net or other TLD), change the ".com" in either of those to your TLD.
Download more RAM. 🐏 ⨉ 0 Posted by August R. Garcia 1 week ago

Edit History

• [2019-04-13 1:11 PDT] August R. Garcia (1 week ago)
🕓 Posted at 13 April, 2019 01:11 AM PDT

Profile Photo - August R. Garcia August R. Garcia LARPing as a Sysadmi... Portland, OR
🗎 104 🗨 620 🐏 116
Site Owner

Grahew Mattham

August Garcia is some guy who used to sell Viagra on the Internet. He made this website to LARP as a sysadmin while posting about garbage like user-agent spoofing, spintax, the only good keyboard, virtual assitants from Pakistan, links with the rel="nofollow" attributeproxies, regular expressions, HTML and CSSsin, the developer console, and probably some other trash.


Account created 4 months ago.
104 posts, 620 comments, and 116 RAMs.

Last active 6 hours ago:
Commented in thread Best Subreddits about SEO?

Post a New Comment

To leave a comment, login to your account or create an account.

Do you like having a good time?

Read Quality Articles

Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.

View Articles →

Argue with People on the Internet

Use your account to explain why people are wrong on the Internet forum.

View Forum →

Vandalize the Wiki

Or don't. I'm not your dad.

View Wiki →

Ask and/or Answer Questions

If someone asks a terrible question, post a LMGTFY link.

View Answers →

Make Some Money

Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.

Register an Account
You can also login to an existing account or recover your password. All use of this site is subject to terms outlined in the terms of service and privacy policy.