256 Kilobytes

Answers in Web Scraping, Data Analysis | By August R. Garcia

Published | Last Update

618 views, 0 RAMs, and 0 comments

Tags: Python, Urllib, Brotli, urllib3, requests

Profile Photo - August R. GarciaAugust R. GarciaLARPing as a Sysadmi...Portland, ORSite Owner

The solution here seems to generally work:

# Basically works, except the edge case shown below
r.server_ip           = r.raw._connection.sock.getpeername()[0]     
r.server_port         = r.raw._connection.sock.getpeername()[1]

However, there seems to be some edge case where this occassionally doesn't work. See these debugging statements:

Example of Failure

dict_keys(['reason', 'request', 'status_code', 'raw', '_content_consumed', 'url', 'encoding', 'connection', 'history', 'elapsed', 'headers', '_content', 'cookies'])
r.status_code              200
r.raw                      <requests.packages.urllib3.response.HTTPResponse object at 0x7fd9f4069d68>
r.raw._connection          <requests.packages.urllib3.connection.HTTPConnection object at 0x7fd9e4783ac8>
r.raw._connection.sock     None
r.history                  []
r.elapsed                  0:00:00.937903
r.request                  <PreparedRequest [GET]>
r.connection               <requests.adapters.HTTPAdapter object at 0x7fd9e47e5f98>
r.headers                  {'Content-Encoding': 'br', 'Content-Disposition': 'attachment; filename="file.txt"', 'Date': 'Wed, 25 Sep 2019 22:21:36 GMT', 'X-Cache': 'MISS from barracuda.greenetwp.us', 'Pragma': 'no-cache', 'Via': '1.0 barracuda.greenetwp.us:8080 (http_scan/4.0.2.6.19)', 'Cache-Control': 'no-cache, must-revalidate', 'X-Frame-Options': 'SAMEORIGIN', 'Expires': '-1', 'Content-Type': 'text/javascript; charset=UTF-8', 'X-XSS-Protection': '0', 'Server': 'gws', 'Proxy-Connection': 'close'}
r.reason                   OK
r.cookies                  <RequestsCookieJar[]>
r._content_consumed        False
r._content                 False

Example of Success

dict_keys(['server_ip', 'reason', 'server_port', 'request', 'status_code', 'raw', '_content_consumed', 'url', 'encoding', 'connection', 'history', 'elapsed', 'headers', '_content', 'cookies'])
r.status_code              200
r.raw                      <requests.packages.urllib3.response.HTTPResponse object at 0x7fd9e47e51d0>
r.raw._connection          <requests.packages.urllib3.connection.HTTPConnection object at 0x7fd9e47a0fd0>
r.raw._connection.sock     <socket.socket fd=63, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.1.154', 47000), raddr=('36.55.230.146', 8888)>
r.history                  []
r.elapsed                  0:00:00.592083
r.request                  <PreparedRequest [GET]>
r.connection               <requests.adapters.HTTPAdapter object at 0x7fd9e47a0ac8>
r.headers                  {'Content-Encoding': 'gzip', 'Content-Disposition': 'attachment; filename="file.txt"', 'Date': 'Wed, 25 Sep 2019 22:21:36 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, must-revalidate', 'Server': 'nginx/1.14.0', 'X-Frame-Options': 'SAMEORIGIN', 'Connection': 'keep-alive', 'Expires': '-1', 'Content-Type': 'text/javascript; charset=UTF-8', 'X-XSS-Protection': '0', 'Transfer-Encoding': 'chunked'}
r.reason                   OK
r.cookies                  <RequestsCookieJar[]>
r._content_consumed        False
r._content                 False

The only obvious difference between the requests that succeeded and the ones that failed was that the failed requests had a Content-Encoding of 'br' (Brotli). It's possible this is somehow connected, since there are known issues related to support for Brotli in the requests library:

Anyway, using this version instead of the code at the start of this post seems to work consistently:

# Seems to work consistently 
r.server_ip           = r.raw._original_response.fp.raw._sock.getpeername()[0]
r.server_port         = r.raw._original_response.fp.raw._sock.getpeername()[1]

Download more RAM. 🐏 ⨉ 0Posted by August R. Garcia 1 year ago

Edit History

• [2019-09-25 15:54 PDT] August R. Garcia (1 year ago)
• [2019-09-25 15:54 PDT] August R. Garcia (1 year ago)
• [2019-09-25 15:54 PDT] August R. Garcia (1 year ago)
🕓 Posted at 25 September, 2019 15:54 PM PDT

Sir, I can do you a nice SEO.

Post a New Comment

Do you like having a good time?

Register an Account

You can also login to an existing account or reset your password. All use of this site is subject to the terms of service and privacy policy.

Read Quality Articles

Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.

View Articles →

Argue with People on the Internet

Use your account to explain why people are wrong on the Internet forum.

View Forum →

Vandalize the Wiki

Or don't. I'm not your dad.

View Wiki →

Ask and/or Answer Questions

If someone asks a terrible question, post a LMGTFY link.

View Answers →

Make Some Money

Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.

Register an Account

Answers— Read More

Find more related content below!