256 Kilobytes

Answers in Web Scraping, Data Analysis | By August R. Garcia

Published 2 weeks agoWed, 25 Sep 2019 15:54:55 -0700 | Last update 2 weeks agoThu, 26 Sep 2019 13:25:27 -0700

62 views, 0 RAMs, and 0 comments

Tags: Python, Urllib, Brotli, urllib3, requests

Profile Photo - August R. Garcia August R. Garcia LARPing as a Sysadmi... Portland, OR
🗎 191 🗨 943 🐏 286
Site Owner

The solution here seems to generally work:

# Basically works, except the edge case shown below
r.server_ip           = r.raw._connection.sock.getpeername()[0]     
r.server_port         = r.raw._connection.sock.getpeername()[1]    

However, there seems to be some edge case where this occassionally doesn't work. See these debugging statements:

Example of Failure

dict_keys(['reason', 'request', 'status_code', 'raw', '_content_consumed', 'url', 'encoding', 'connection', 'history', 'elapsed', 'headers', '_content', 'cookies'])
r.status_code              200
r.raw                      <requests.packages.urllib3.response.HTTPResponse object at 0x7fd9f4069d68>
r.raw._connection          <requests.packages.urllib3.connection.HTTPConnection object at 0x7fd9e4783ac8>
r.raw._connection.sock     None
r.history                  []
r.elapsed                  0:00:00.937903
r.request                  <PreparedRequest [GET]>
r.connection               <requests.adapters.HTTPAdapter object at 0x7fd9e47e5f98>
r.headers                  {'Content-Encoding': 'br', 'Content-Disposition': 'attachment; filename="file.txt"', 'Date': 'Wed, 25 Sep 2019 22:21:36 GMT', 'X-Cache': 'MISS from barracuda.greenetwp.us', 'Pragma': 'no-cache', 'Via': '1.0 barracuda.greenetwp.us:8080 (http_scan/4.0.2.6.19)', 'Cache-Control': 'no-cache, must-revalidate', 'X-Frame-Options': 'SAMEORIGIN', 'Expires': '-1', 'Content-Type': 'text/javascript; charset=UTF-8', 'X-XSS-Protection': '0', 'Server': 'gws', 'Proxy-Connection': 'close'}
r.reason                   OK
r.cookies                  <RequestsCookieJar[]>
r._content_consumed        False
r._content                 False

Example of Success

dict_keys(['server_ip', 'reason', 'server_port', 'request', 'status_code', 'raw', '_content_consumed', 'url', 'encoding', 'connection', 'history', 'elapsed', 'headers', '_content', 'cookies'])
r.status_code              200
r.raw                      <requests.packages.urllib3.response.HTTPResponse object at 0x7fd9e47e51d0>
r.raw._connection          <requests.packages.urllib3.connection.HTTPConnection object at 0x7fd9e47a0fd0>
r.raw._connection.sock     <socket.socket fd=63, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.1.154', 47000), raddr=('36.55.230.146', 8888)>
r.history                  []
r.elapsed                  0:00:00.592083
r.request                  <PreparedRequest [GET]>
r.connection               <requests.adapters.HTTPAdapter object at 0x7fd9e47a0ac8>
r.headers                  {'Content-Encoding': 'gzip', 'Content-Disposition': 'attachment; filename="file.txt"', 'Date': 'Wed, 25 Sep 2019 22:21:36 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, must-revalidate', 'Server': 'nginx/1.14.0', 'X-Frame-Options': 'SAMEORIGIN', 'Connection': 'keep-alive', 'Expires': '-1', 'Content-Type': 'text/javascript; charset=UTF-8', 'X-XSS-Protection': '0', 'Transfer-Encoding': 'chunked'}
r.reason                   OK
r.cookies                  <RequestsCookieJar[]>
r._content_consumed        False
r._content                 False

The only obvious difference between the requests that succeeded and the ones that failed was that the failed requests had a Content-Encoding of 'br' (Brotli). It's possible this is somehow connected, since there are known issues related to support for Brotli in the requests library:

Anyway, using this version instead of the code at the start of this post seems to work consistently:

# Seems to work consistently 
r.server_ip           = r.raw._original_response.fp.raw._sock.getpeername()[0]
r.server_port         = r.raw._original_response.fp.raw._sock.getpeername()[1]
Download more RAM. 🐏 ⨉ 0 Posted by August R. Garcia 2 weeks ago

Edit History

• [2019-09-25 15:54 PDT] August R. Garcia (2 weeks ago)
• [2019-09-25 15:54 PDT] August R. Garcia (2 weeks ago)
• [2019-09-25 15:54 PDT] August R. Garcia (2 weeks ago)
🕓 Posted at 25 September, 2019 15:54 PM PDT

Sir, I can do you a nice SEO.

Post a New Comment

To leave a comment, login to your account or create an account.

Do you like having a good time?

Read Quality Articles

Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.

View Articles →

Argue with People on the Internet

Use your account to explain why people are wrong on the Internet forum.

View Forum →

Vandalize the Wiki

Or don't. I'm not your dad.

View Wiki →

Ask and/or Answer Questions

If someone asks a terrible question, post a LMGTFY link.

View Answers →

Make Some Money

Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.

Register an Account
You can also login to an existing account or recover your password. All use of this site is subject to terms outlined in the terms of service and privacy policy.