It is possible to save image on site when selenium is minimized?
At the moment i using code:
img = driver.find_element_by_xpath('//*[#id="image_img"]')
img.screenshot('C:/foo.png')
This ofcourse works, but this option opens the browser just as it takes a screenshot.
Is it possible to save an image from a given xpath in a minimized browser?
Unfortunately, downloading the url of the photo is pointless, because this image is generated only once and when I download the photo via the url, I will get an empty file or the image is other what should be.
site: https://thispersondoesnotexist.com/
You don't need selenium to get pictures from the website, you can use this code
to download image directly to your local.
import requests
r1 = requests.get("https://thispersondoesnotexist.com/image")
r1.raise_for_status()
print(r1.status_code, r1.reason)
tts_url = 'https://thispersondoesnotexist.com/image'
r2 = requests.get(tts_url, timeout=100, cookies=r1.cookies)
print(r2.status_code, r2.reason)
try:
with open('test.jpeg', "w+b") as f:
f.write(r2.content)
except IOError:
print("IOError: could not write a file")
Related
I'm new to Python and Selenium. My goal here is to download an image from the Google Image Search results page and save it as a file in a local directory, but I have been unable to initially download the image.
I'm aware there are other options (retrieving the image via the url using request, etc.), but I want to know if it's possible to do it using the image's "src" attribute, e.g., "..."
My code is below (I have removed all imports, etc., for brevity.):
# This creates the folder to store the image in
if not os.path.exists(save_folder):
os.mkdir(save_folder)
driver = webdriver.Chrome(PATH)
# Goes to the given web page
driver.get("https://www.google.com/imghp?hl=en&ogbl")
# "q" is the name of the google search field input
search_bar = driver.find_element_by_name("q")
# Input the search term(s)
search_bar.send_keys("Ben Folds Songs for Silverman Album Cover")
# Returns the results (basically clicks "search")
search_bar.send_keys(Keys.RETURN)
# Wait 10 seconds for the images to load on the page before moving on to the next part of the script
try:
# Returns a list
search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "islrg"))
)
# print(search_results.text)
# Gets all of the images on the page (it should be a list)
images = search_results.find_elements_by_tag_name("img")
# I just want the first result
image = images[0].get_attribute('src')
### Need help here ###
except:
print("Error")
driver.quit()
# Closes the browser
driver.quit()
I have tried:
urllib.request.urlretrieve(image, "00001.jpg")
and
urllib3.request.urlretrieve(image, f"{save_folder}/captcha.png")
But I've always hit the "except" block using those methods. After reading a promising post, I also tried:
bufferedImage = imageio.read(image)
outputFile = f"{save_folder}/image.png"
imageio.write(bufferedImage, "png", outputFile)
with similar results, though I believe the latter example used Java in the post and I may have made an error in translating it to Python.
I'm sure it's something obvious, but what am I doing wrong? Thank you for any help.
The URL you are dealing with in this case is a Data URL which is the data of the image itself encoded in base64.
Since Python 3.4+ you can read this data and decode it to bytes with urllib.request.urlopen:
import urllib
data_url = "..."
with urllib.request.urlopen(data_url) as response:
data = response.read()
with open("some_image.jpg", mode="wb") as f:
f.write(data)
Alternatively you can decode the base64-encoded part of the data url yourself with base64:
import base64
data_url = "..."
base64_image_data = data_url.split(",")[1]
data = base64.b64decode(base64_image_data)
with open("some_image.jpg", mode="wb") as f:
f.write(data)
Goal:
save SVG from wikipedia
Requirements:
Needs to be automated
Currently I am using selenium to get some other information, and I tried to use a python script like this to extract the svg but the extracted SVG file gives an error when rendering.
Edit:
The same error occurs when using requests, maybe it has something to do with file wikipedia uploaded?
Error code:
Errorcode for svg file
It renders part of the svg later:
Rendered part of SVG
How it should look:
Map;Oslo zoomed out
Wikipedia file
Code imageEctractSingle.py:
from selenium import webdriver
DRIVER_PATH = 'chromedriver.exe'
link = 'https://upload.wikimedia.org/wikipedia/commons/7/75/NO_0301_Oslo.svg'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get(link)
image = driver.page_source
#Creates an SVG File
f = open("kart/oslo.svg", "w")
f.write(image)
f.close()
driver.close()
Original artical that I get the file link from by running through the table in the article
Any ideas on how to exctract this image, I know chrome as a built in function to save as, how can I access that through selenium?
or does there exsist a tool for saving SVG files from selenium?
Thanks in advance for any help :D
Its not selenium but I got it working in requests, you shouldn't need selenium for something this simple unless you are doing more alongside it:
import requests
def write_text(data: str, path: str):
with open(path, 'w') as file:
file.write(data)
url = 'https://upload.wikimedia.org/wikipedia/commons/7/75/NO_0301_Oslo.svg'
svg = requests.get(url).text
write_text(svg, './NO_0301_Oslo.svg')
I need to download the images that are inside the custom made CAPTCHA in this login site. How can I do it :(?
This is the login site, there are five images
and this is the link: https://portalempresas.sb.cl/login.php
I've been trying with this code that another user (#EnriqueBet) helped me with:
from io import BytesIO
from PIL import Image
# Download image function
def downloadImage(element,imgName):
img = element.screenshot_as_png
stream = BytesIO(img)
image = Image.open(stream).convert("RGB")
image.save(imgName)
# Find all the web elements of the captcha images
image_elements = driver.find_elements_by_xpath("/html/body/div[1]/div/div/section/div[1]/div[3]/div/div/div[2]/div[*]")
# Output name for the images
image_base_name = "Imagen_[idx].png"
# Download each image
for i in range(len(image_elements)):
downloadImage(image_elements[i],image_base_name.replace("[idx]","%s"%i)
But when it tries to get all of the image elements
image_elements = driver.find_elements_by_xpath("/html/body/div[1]/div/div/section/div[1]/div[3]/div/div/div[2]/div[*]")
It fails and doesn't get any of them. Please, help! :(
Instead of defining an explicit path to the images, why not simply download all images that are present on the page. This will work since the page itself only has 5 images and you want to download all of them. See the method below.
The following should extract all images from a given page and write it to the directory where the script is being run.
import re
import requests
from bs4 import BeautifulSoup
site = ''#set image url here
response = requests.get(site)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
urls = [img['src'] for img in img_tags]
for url in urls:
filename = re.search(r'/([\w_-]+[.](jpg|gif|png))$', url)
with open(filename.group(1), 'wb') as f:
if 'http' not in url:
# sometimes an image source can be relative
# if it is provide the base url which also happens
# to be the site variable atm.
url = '{}{}'.format(site, url)
response = requests.get(url)
f.write(response.content)
The code is taken from here and credit goes to the respective owner.
This is a follow on answer from my earlier post
I have had no success getting my selenium to run due to versioning issues on selenium and my browser.
I have though thought of another way to download and extract all the images that are appearing on the captcha. As you can tell the images change on each visit, so to collect all the images the best option would be to automate them rather than manually saving the image from the site
To automate it, follow the steps below.
Firstly, navigate to the site using selenium and take a screenshot of the site. For example,
from selenium import webdriver
DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('https://www.spotify.com')
screenshot = driver.save_screenshot('my_screenshot.png')
driver.quit()
This saves it locally. You can then open the image using library such as pil and crop the images of the captcha.
This would be done like so
im = Image.open('0.png').convert('L')
im = im.crop((1, 1, 98, 33))
im.save('my_screenshot.png)
Hopefully you get the idea here. You will need to do this one by one for all the images, ideally in a for loop with crop diemensions changed appropriately.
You can also try this It will save captcha image only
from PIL import Image
element = driver.find_element_by_id('captcha_image') #div id or the captcha container id
location = element.location
#print(location)
size = element.size
driver.save_screenshot('screenshot.png')
get_captcha_text(location, size)
def get_captcha_text(location, size):
im = Image.open('screenshot.png')
left = location['x']
top = location['y']
right = location['x'] + size['width']
bottom = location['y'] + size['height']
im = im.crop((left, top, right, bottom)) # defines crop points
im.save('screenshot.png')
return true
I need to download an image through Python (can't post the url since I think the content type is not allowed here).
The image is loaded in a page, so I first get the image direct url from the page source code with BeautifoulSoup and then try do download it with:
import urllib2
urllib2.urlretrieve(image, folder)
which usually works fine, but it doesn't work on this server.
If I open the page with the image in my browser and then copy the image direct link in another browser window and open it, I get the image, but if I try to directly open the image in another window without first opening the webpage where the image is, I get the error 403:
Forbidden You don't have permission to access /1.jpg on this server.
Direct opening the image still works if I close the browser and re-open it, but if I change the browser I get error 403 again.
I thought it could be something cookie-releated, so I've tried:
response = urllib2.urlopen(PAGE_WITH_IMAGE)
cookie = response.headers.get('Set-Cookie')
req2 = urllib2.Request(DIRECT_LINK_TO_IMAGE)
req2.add_header('cookie', cookie)
urllib2.urlretrieve(req2, folder)
But I still get error 403.
I coded a small Python script to download a picture from a website using selenium:
from selenium import webdriver
import urllib.request
class FirefoxTest:
def firefoxTest(self):
self.driver=webdriver.Firefox()
self.driver.get("http://www.sitew.com")
self.r=self.driver.find_element_by_tag_name('img')
self.uri=self.r.get_attribute("src")
self.g=urllib.request.urlopen(self.uri)
with open("begueradj.png",'b+w') as self.f:
self.f.write(self.g.read())
if __name__=='__main__':
FT=FirefoxTest()
FT.firefoxTest()
How can I modify my code in order to:
download all the pictures on the webpage ?
not to name the image I downloaded and keep their default name instead ?
You need to switch to find_elements_by_tag_name. For downloading files, I'd use urllib.urlretrieve() - it would extract the filename from the url for you:
images = self.driver.find_elements_by_tag_name('img')
for image in images:
src = image.get_attribute("src")
if src:
urllib.urlretrieve(src)
You can use Ruby gems nokogiri to open the web page and download the images using their xpath.
require 'open-uri'
require 'nokogiri'
f = open('sample.flv')
begin
http.request_get('/sample.flv') do |resp|
resp.read_body do |segment|
f.write(segment)
end
end
ensure
f.close()
end