256 Kilobytes

Web Scraping, Data Analysis

Type All Category Web Scraping, Data AnalysisWeb scraping, crawling, and analyzing related data. Tag All

Profile Photo - Hash Brown

An introduction to scraping with Python and BeautifulSoup

Articles in Web Scraping, Data Analysis | By Hash Brown

Published 10 months agoTue, 08 Jan 2019 22:34:09 -0800 | Last update 8 months agoMon, 11 Mar 2019 22:19:33 -0700 📌

Profile Photo - August R. Garcia

Last Reply Very nice, as they say. Other scraping methods For those who don’t perhaps have the skills needed to code something there are also other... August R. Garcia,

Very nice, as they say.

Wed, 09 Jan 2019 03:16:31 -0800 10 months ago
🗨
1
🐏
1
👁
1,467
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

TFW

MoreSee code. Yeah, it seems like you do in fact have to make a bunch of additional HTTP requests. def resolve_t_co(t_co_url): if t_co_url !...
🗨
0
🐏
1
👁
87
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

It's time to post code so that I can fin...

MoreIt's time to post code so that I can find this the next eight times I need to do this: # ##### ##### ################# ##### ##### # # ##### ###...
🗨
0
🐏
0
👁
44
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

Last Reply Here's a better version that removes the " results" string and then casts the result to a number instead of a string: =VALUE(substitute(importXML... August R. Garcia,

Here's a better version t...

Tue, 08 Oct 2019 09:17:50 -0700 1 month ago
🗨
1
🐏
0
👁
184
Profile Photo - August R. Garcia

[Solved] Python - Getting the remote/server IP address and port from an HTTP response using the requests library?

Answers in Web Scraping, Data Analysis | By August R. Garcia

Published 1 month agoWed, 25 Sep 2019 15:54:55 -0700 | Last update 1 month agoThu, 26 Sep 2019 13:25:27 -0700

Profile Photo - August R. Garcia

Some edge case what even is this.

MoreThe solution here seems to generally work: https://stackoverflow.com/questions/22492484/how-do-i-get-the-ip-address-from-a-http-request-using-th...
🗨
0
🐏
0
👁
122
Profile Photo - August R. Garcia

[Solved] Color code HTTP status codes in Python

Answers in Web Scraping, Data Analysis | By August R. Garcia

Published 3 months agoMon, 29 Jul 2019 22:34:12 -0700

Profile Photo - August R. Garcia

Last Reply For Python 3: import http.client [...] try: msg = " " + http.client.responses[status_code] [...] Code is basically th... August R. Garcia,

For Python 3:

Mon, 09 Sep 2019 12:10:10 -0700 2 months ago
🗨
1
🐏
0
👁
190
Profile Photo - August R. Garcia

The Basics to Web Scraping with cURL and XPath

Articles in Web Scraping, Data Analysis | By August R. Garcia

Published 4 months agoFri, 28 Jun 2019 09:02:04 -0700 | Last update 4 months agoTue, 02 Jul 2019 17:37:33 -0700

Profile Photo - August R. Garcia

Last Reply Bump. Some more cURL, used to determine the quality of proxygo's proxies: for i in {1..48} ; do curl https://www.blackhatworld.com/seo/ssl-proxi... August R. Garcia,

Bump. Some more cURL, use...

Thu, 29 Aug 2019 18:31:09 -0700 2 months ago
🗨
4
🐏
2
👁
2,828
Profile Photo - August R. Garcia

[BASH, cURL] Yellow Pages Scraper: Fully Functional Script with Source Code

Articles in Web Scraping, Data Analysis | By August R. Garcia

Published 4 months agoFri, 05 Jul 2019 23:22:06 -0700 | Last update 4 months agoSat, 06 Jul 2019 01:44:02 -0700

Profile Photo - August R. Garcia

What a nice, free YellowPages scraper.

MoreEdit: When trying to scrape indefinitely (~100+ pages), there's some buggy behavior with exit conditions currently. If/when an updated script is poste...
🗨
0
🐏
1
👁
322
Profile Photo - August R. Garcia

Downloading Bulk Images: ThisPersonDoesNotExist with Python and urllib2

Articles in Web Scraping, Data Analysis | By August R. Garcia

Published 8 months agoThu, 14 Mar 2019 06:25:36 -0700 | Last update 8 months agoThu, 14 Mar 2019 08:05:08 -0700

Profile Photo - August R. Garcia

Last Reply Here's a shorter version with cURL and BASH that does basically the same thing: for i in $( seq 1 10 ) ; do curl --user-agent "Some User-Agent St... August R. Garcia,

Here's a shorter version...

Thu, 04 Jul 2019 12:56:36 -0700 4 months ago
🗨
2
🐏
0
👁
3,258
Profile Photo - August R. Garcia

[cURL, BASH] How to Crawl and Scrape DuckDuckGo Search Results

Articles in Web Scraping, Data Analysis | By August R. Garcia

Published 4 months agoTue, 02 Jul 2019 17:29:24 -0700 | Last update 4 months agoThu, 04 Jul 2019 19:21:52 -0700

Profile Photo - August R. Garcia

You can use these same concepts to build...

MoreAs discussed recently, it is relatively easy to scrap various arbitrary pieces of data using cURL (and XPath). You can use these same concepts to buil...
🗨
0
🐏
1
👁
672
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

What a nice trick.

MoreWhat a nice trick. How to Extract Emails from HTML with Google Sheets Code function get_raw_html(url) { // The code below logs the H...
🗨
0
🐏
0
👁
339
Profile Photo - August R. Garcia

[Infographic] The Beginner's SQLite Cheat Sheet

Articles in Web Scraping, Data Analysis | By August R. Garcia

Published 6 months agoSat, 04 May 2019 23:59:37 -0700

Profile Photo - August R. Garcia

The important hotkeys, commands, and tri...

MoreWhat a great infographic. Copy-Pasteable Version of the SQLite Commands Cheat Sheet General SQLite Commands and Information Opening SQLit...
🗨
0
🐏
0
👁
321
Profile Photo - August R. Garcia

Analyzing the Web: Downloading the Majestic Million, Setting up SQLite, Crawling the Web, and Generating Reports

Articles in Web Scraping, Data Analysis | By August R. Garcia

Published 6 months agoWed, 24 Apr 2019 03:29:27 -0700 | Last update 6 months agoThu, 25 Apr 2019 09:14:10 -0700

Profile Photo - August R. Garcia

Last Reply The longest domain names in the Majestic Million, most of which are expired and basically all of which are terrible garbage: 255461  ... August R. Garcia,

The longest domain names...

Mon, 29 Apr 2019 09:10:47 -0700 6 months ago
🗨
2
🐏
2
👁
485
Profile Photo - Some Guy

How to replace NA with 0 in R?

Answers in Web Scraping, Data Analysis | By Some Guy

Published 10 months agoSun, 16 Dec 2018 09:39:45 -0800

Profile Photo - August R. Garcia

Last Reply some_data_with_nas[is.na(some_data_with_nas)] <- 0 Also, see this duplicate thread: https://www.256kilobytes.com/content/show/900/how-t... August R. Garcia,

Mon, 21 Jan 2019 14:28:21 -0800 9 months ago
🗨
1
🐏
0
👁
251
Profile Photo - Some Guy

What is ScrapeBox used for?

Answers in Web Scraping, Data Analysis | By Some Guy

Published 10 months agoSun, 13 Jan 2019 05:21:48 -0800

Profile Photo - August R. Garcia

Last Reply This is ScrapeBox: http://www.scrapebox.com/ It is a relatively popular tool used to gather data from the internet, as well as to do some... August R. Garcia,

This is ScrapeBox:

Wed, 16 Jan 2019 15:43:47 -0800 9 months ago
🗨
1
🐏
0
👁
286
Profile Photo - Some Guy

What is a regex non-capturing group?

Answers in Web Scraping, Data Analysis | By Some Guy

Published 9 months agoTue, 15 Jan 2019 11:54:03 -0800

Profile Photo - August R. Garcia

Last Reply As you might expect, a non-capturing group is not captured in the match. For example, if you want to match phone numbers, you might require that a whi... August R. Garcia,

As you might expect, a no...

Tue, 15 Jan 2019 14:58:59 -0800 9 months ago
🗨
1
🐏
0
👁
264
Profile Photo - August R. Garcia
Profile Photo - August R. Garcia

Sometimes, you have to extract emails an...

More form.article-form { border:1px solid black; border-radius:8px; box-shadow:2px 2px rgba(70,70,70,0.2); padding:0.5em 0.75em; background-color:rgba(170...
🗨
0
🐏
1
👁
910
Profile Photo - Some Guy
Profile Photo - Some Guy

MoreI'm trying to do some web scraping and I'm getting the following: "error in ogrlistlayers(dsn = dsn) : cannot open data source". H...
🗨
0
🐏
0
👁
409
Profile Photo - Some Guy

How to install the R programming language on Ubuntu?

Answers in Web Scraping, Data Analysis | By Some Guy

Published 10 months agoFri, 11 Jan 2019 18:46:45 -0800

Profile Photo - August R. Garcia

Last Reply Just tested this from a clean Ubuntu install and it worked fine. Run this from the terminal to install: sudo apt-get install r-base Then... August R. Garcia,

Just tested this from a c...

Fri, 11 Jan 2019 18:47:39 -0800 10 months ago
🗨
1
🐏
0
👁
225
Profile Photo - Some Guy

How can I use R to generate graphs for my website?

Answers in Web Scraping, Data Analysis | By Some Guy

Published 10 months agoFri, 11 Jan 2019 18:42:58 -0800

Profile Photo - August R. Garcia

Last Reply If you need one-off graphs that contain data that is not expected to change, it can be easiest to generate that data locally to image files and to the... August R. Garcia,

If you need one-off graph...

Fri, 11 Jan 2019 18:45:31 -0800 10 months ago
🗨
1
🐏
0
👁
228
Profile Photo - Some Guy

How do I fix "Indentationerror: expected an indented block"?

Answers in Web Scraping, Data Analysis | By Some Guy

Published 10 months agoSun, 16 Dec 2018 09:40:51 -0800

Profile Photo - Some Guy

I have written some Python code using No...

MoreI have written some Python code using Notepad in Windows. Every time I try to run the code using the command line I get an error message and it does...
🗨
0
🐏
0
👁
240
Profile Photo - Some Guy

How to change column name in R?

Answers in Web Scraping, Data Analysis | By Some Guy

Published 10 months agoSun, 16 Dec 2018 09:29:57 -0800

Profile Photo - Some Guy

I am working in DataFrames for a work pr...

MoreI am working in DataFrames for a work project and want to change a column to show "R" as its name. How do you do this?
🗨
0
🐏
0
👁
195
Profile Photo - Some Guy

How to resolve a non-numeric argument to a binary operator?

Answers in Web Scraping, Data Analysis | By Some Guy

Published 10 months agoSun, 16 Dec 2018 09:29:32 -0800

Profile Photo - Some Guy

I have just started learning R from scra...

MoreI have just started learning R from scratch so i m getting trouble in execution of following code in RStudio. How can I resolve the non-numeric argume...
🗨
0
🐏
0
👁
210
Profile Photo - Some Guy
Profile Photo - Some Guy

I am not exactly an old hand when it com...

MoreI am not exactly an old hand when it comes to DataFrame, but I like to think I have been getting the hang of it. The message "Number of items to...
🗨
0
🐏
0
👁
223
Profile Photo - Some Guy

Scraping results from Google Search

Forum in Web Scraping, Data Analysis | By Some Guy

Published 11 months agoSun, 02 Dec 2018 03:42:06 -0800

Profile Photo - Hash Brown

Last Reply If you're going to do this in any real volume you're going to get IP blocked pretty quick, even on Google Sheets. If I were you I would loo... Hash Brown,

If you're going to do...

Mon, 03 Dec 2018 13:46:23 -0800 11 months ago
🗨
1
🐏
0
👁
244
Profile Photo - Some Guy

I downloaded a bunch of stock photos for testing. Is there a quick way to bulk rename in Bash (or something similar)?

Forum in Web Scraping, Data Analysis | By Some Guy

Published 11 months agoSun, 02 Dec 2018 03:42:45 -0800 | Last update 11 months agoMon, 03 Dec 2018 06:33:04 -0800

Profile Photo - August R. Garcia

Last Reply A script like run from the terminal or a Bash script works for this type of file renaming: INDEX=1; for i in *.jpg; do mv $i ${INDEX}_cover.jpg;... August R. Garcia,

A script like run from th...

Sun, 02 Dec 2018 03:46:44 -0800 11 months ago
🗨
1
🐏
0
👁
217