256 Kilobytes

Answers in Web Scraping, Data Analysis | By August R. Garcia

Published 1 month agoThu, 17 Oct 2019 07:37:05 -0700

45 views, 0 RAMs, and 0 comments

Tags: Python, Selenium, Firefox, Web Crawling

Profile Photo - August R. Garcia August R. Garcia LARPing as a Sysadmi... Portland, OR
🗎 199 🗨 992 🐏 300
Site Owner

It's time to post code so that I can find this the next eight times I need to do this:

# ##### ##### ################# ##### ##### #
# ##### ##### With JS Rendering ##### ##### #
# ##### ##### ################# ##### ##### #
import random, time

import selenium
from selenium import webdriver 
from selenium.webdriver.firefox.options import Options
        
# Debug function 
def random_line(fn):
        with open(fn) as afile:
                line = next(afile)
                for num, aline in enumerate(afile, 2):  
                        if random.randrange(num): continue 
                        line = aline
                return line.strip() 

# Debug Function - Replace with your own list of proxies 
def random_proxy(): 
        return random_line("lists/2019-10-16--11--proxies.txt")


# Actual Function   
def init_webdriver(ip, port, headless=True):
        profile = webdriver.FirefoxProfile()

        # The port value must be an integer, such as 8080, not a string such as "8080"   
        port = int(port) 

        profile.set_preference("network.proxy.type"     ,  1    )
        profile.set_preference("network.proxy.http"     ,  ip   )
        profile.set_preference("network.proxy.http_port",  port )
        profile.set_preference("network.proxy.ssl"      ,  ip   )
        profile.set_preference("network.proxy.ssl_port" ,  port )

        # Set some other options, if that's something that you want to do. 
        #profile.set_preference("browser.content.main-window.width", 20)
        #profile.set_preference("browser.content.main-window.height", 30)

        profile.update_preferences()

        # Run without opening a full window  
        options = Options()
        if headless == True:
                options.headless = True

        #driver = webdriver.Firefox(profile, executable_path='./geckodriver')
        #driver = webdriver.Firefox(options=options, executable_path='./geckodriver')

        driver = webdriver.Firefox(profile, options=options, executable_path='./geckodriver')

        # What the fuck does this even do / Why is this needed here / is this even needed here (?)  
        profile._create_tempfolder

        return driver

# Attempt to make a request with additional exception handling built in, 
# since proxies may fail due to being down or otherwise unusable.  
def try_webdriver_request(driver, url):
        try: 
                driver.get(url)
        except selenium.common.exceptions.WebDriverException as e:
                print("Selenium Request Failed. Exception raised: ", e)

        # TODO - Put your own additional exception handling here. 

        return driver 

# Select a proxy:  
proxy_string = random_proxy()
ip           = proxy_string.split(":")[0]    
port         = proxy_string.split(":")[1]    

print("Proxy: %s" % proxy_string, "IP:    %s" % ip, "Port:  %s" % port, sep="\n")
        
# Create the webdriver object (uses FireFox)  
driver = init_webdriver(ip, port, False)
        
# Make the HTTP request and verify that the IP is correctly configured 
url    = "https://www.whatismyip.com/"
driver = try_webdriver_request(driver, url)
info   = driver.find_element_by_css_selector('ul.list-group.text-center').text
print("\n\n===== INFO ====\n", info, sep="")


# Make another HTTP request to verify that JavaScript is executing correctly 
url    = "http://avi.im/stuff/js-or-no-js.html"
driver = try_webdriver_request(driver, url)
info   = driver.find_element_by_css_selector('#intro-text').text
print("\n\n===== INFO ====\n", info, sep="")
        
        
# Get the HTML from the page (after JavaScript has rendered) 
#html = driver.page_source 
#print (html)

# Quit driver and close everything  
driver.quit()
Download more RAM. 🐏 ⨉ 0 Posted by August R. Garcia 1 month ago 🕓 Posted at 17 October, 2019 07:37 AM PDT

Sir, I can do you a nice SEO.

Post a New Comment

To leave a comment, login to your account or create an account.

Do you like having a good time?

Read Quality Articles

Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.

View Articles →

Argue with People on the Internet

Use your account to explain why people are wrong on the Internet forum.

View Forum →

Vandalize the Wiki

Or don't. I'm not your dad.

View Wiki →

Ask and/or Answer Questions

If someone asks a terrible question, post a LMGTFY link.

View Answers →

Make Some Money

Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.

Register an Account
You can also login to an existing account or recover your password. All use of this site is subject to terms outlined in the terms of service and privacy policy.