256 Kilobytes

Web Scraping, Data Analysis Comments

Type CommentsResponses to top-level threads. Category Web Scraping, Data AnalysisWeb scraping, crawling, and analyzing related data.

Profile Photo - August R. Garcia

Reply to [Solved] Scraping Number of Search Results from Bing with Google Sheets

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

Profile Photo - August R. Garcia

Here's a better version that removes the...

MoreHere's a better version that removes the " results" string and then casts the result to a number instead of a string: =VALUE(substitute(importXML...
🗨 1
🐏 0
👁 526
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

For Python 3:

MoreFor Python 3: import http.client [...] try: msg = " " + http.client.responses[status_code] [...] Code is basically th...
🗨 1
🐏 0
👁 416
Profile Photo - August R. Garcia

Reply to The Basics to Web Scraping with cURL and XPath

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

Profile Photo - August R. Garcia

Bump. Some more cURL, used to determine...

MoreBump. Some more cURL, used to determine the quality of proxygo's proxies: for i in {1..48} ; do curl https://www.blackhatworld.com/seo/ssl-proxi...
🗨 4
🐏 2
👁 334
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

Here's a shorter version with cURL and B...

MoreHere's a shorter version with cURL and BASH that does basically the same thing: for i in $( seq 1 10 ) ; do curl --user-agent "Some User-Agent St...
🗨 2
🐏 0
👁 31
Profile Photo - August R. Garcia

Reply to The Basics to Web Scraping with cURL and XPath

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

Profile Photo - August R. Garcia

Basic DuckDuckGo result scraping:

MoreBasic DuckDuckGo result scraping: curl https://duckduckgo.com/html?q=asdf | awk '{$1=$1};1' | sed '/^$/d' | hxselect -c a.result__url | sed '/^$...
🗨 4
🐏 2
👁 334
Profile Photo - August R. Garcia

Reply to The Basics to Web Scraping with cURL and XPath

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

Profile Photo - August R. Garcia

Have been experimenting with cURL scrapi...

MoreHave been experimenting with cURL scraping from the terminal some more. One issue that has come up a few times with XMLLint is getting the set of matc...
🗨 4
🐏 2
👁 334
Profile Photo - August R. Garcia

Reply to The Basics to Web Scraping with cURL and XPath

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

Profile Photo - August R. Garcia

Crawling/Scraping Reddit with cURL

MoreCrawling/Scraping Reddit with cURL Seems like you can use the process described above for Reddit. Basic examples: curl -s -A "256Kilobyte...
🗨 4
🐏 2
👁 334
Profile Photo - August R. Garcia

Reply to Analyzing the Web: Downloading the Majestic Million, Setting up SQLite, Crawling the Web, and Generating Reports

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

Profile Photo - August R. Garcia

The longest domain names in the Majestic...

MoreThe longest domain names in the Majestic Million, most of which are expired and basically all of which are terrible garbage: 255461  ...
🗨 2
🐏 2
👁 40
Profile Photo - Hash Brown
Profile Photo - Hash Brown

This is excellent!

MoreThis is excellent!
🗨 2
🐏 2
👁 40
Profile Photo - August R. Garcia

Reply to Downloading Bulk "ThisPersonDoesNotExist" Images with Python and urllib2

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

Profile Photo - August R. Garcia

Also, this is a crime against God:

MoreAlso, this is a crime against God:
🗨 2
🐏 0
👁 31
Profile Photo - August R. Garcia

Reply to How to replace NA with 0 in R?

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

Profile Photo - August R. Garcia

Moresome_data_with_nas[is.na(some_data_with_nas)] <- 0 Also, see this duplicate thread: https://www.256kilobytes.com/content/show/900/how-t...
🗨 1
🐏 0
👁 432
Profile Photo - August R. Garcia

Reply to What is ScrapeBox used for?

Comments in Web Scraping, Data Analysis | By August R. Garcia

Published

Profile Photo - August R. Garcia

This is ScrapeBox:

MoreThis is ScrapeBox: http://www.scrapebox.com/ It is a relatively popular tool used to gather data from the internet, as well as to do some...
🗨 1
🐏 0
👁 474
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

As you might expect, a non-capturing gro...

MoreAs you might expect, a non-capturing group is not captured in the match. For example, if you want to match phone numbers, you might require that a whi...
🗨 1
🐏 0
👁 438
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

Just tested this from a clean Ubuntu ins...

MoreJust tested this from a clean Ubuntu install and it worked fine. Run this from the terminal to install: sudo apt-get install r-base Then...
🗨 1
🐏 0
👁 377
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

If you need one-off graphs that contain...

MoreIf you need one-off graphs that contain data that is not expected to change, it can be easiest to generate that data locally to image files and to the...
🗨 1
🐏 0
👁 403
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

Very nice, as they say.

MoreVery nice, as they say. Other scraping methods For those who don’t perhaps have the skills needed to code something there are also other...
🗨 1
🐏 1
👁 805
Profile Photo - Hash Brown

Reply to Scraping results from Google Search

Comments in Web Scraping, Data Analysis | By Hash Brown

Published

Profile Photo - Hash Brown

If you're going to do this in any re...

MoreIf you're going to do this in any real volume you're going to get IP blocked pretty quick, even on Google Sheets. If I were you I would loo...
🗨 1
🐏 0
👁 412
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

MoreA script like run from the terminal or a Bash script works for this type of file renaming: INDEX=1; for i in *.jpg; do mv $i ${INDEX}_cover.jpg;...
🗨 1
🐏 0
👁 1