Using a constantly changing downloaded file python?

Using a constantly changing downloaded file python? - python

Hello everyone I've got my program that navigates to a webpage and clicks a link to download the pdf document I need. But I want to know if there's a way to name this file for python to use and upload it to my google drive. I don't want to manually type the upload file name as it will change every time I click a different download link that I need. So for example the current file is invoice_sample-1234 but the next download would be invoice_sample-5678.
How do I cut out the process of typing each invoice?
Thank you for any help
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
driver.get("myurl.com")
window_before = driver.window_handles[0]
driver.find_element(By.ID, "Invoice_Links").click()
window_after = driver.window_handles[1]
driver.switch_to.window(window_after)
download_button= wait.until(EC.visibility_of_element_located((By.ID,"Download Doc"))).click()
def upload_Drive():
upload_file_list = ['invoice_sample-1234.pdf']
for upload_file in upload_file_list:
gfile = drive.CreateFile({'parents': [{'id':'Folder' }]})
gfile.SetContentFile(upload_file)
gfile.Upload() #Upload the file.
print('file Uploaded')
upload_Drive()

I think you can try a random number generator or use timestamp to name it as a file. The file name is a string, like if i try with following pseudo code -
filename =random(seed).toString() + ".pdf"
I think this should work.

Related

How to download file from a page using python

I am having troubles downloading txt file from this page: https://www.ceps.cz/en/all-data#RegulationEnergy (when you scroll down and see Download: txt, xls and xml).
My goal is to create scraper that will go to the linked page, clicks on the txt link for example and saves a downloaded file.
Main problems that I am not sure how to solve:
The file doesn't have a real link that I can call and download it, but the link is created with JS based on filters and file type.
When I use requests library for python and call the link with all headers it just redirects me to https://www.ceps.cz/en/all-data .
Approaches tried:
Using scraper such as ParseHub to download link didn't work as intended. But this scraper was the closest to what I've wanted to get.
Used requests library to connect to the link using headers that HXR request uses for downloading the file but it just redirects me to https://www.ceps.cz/en/all-data .
If you could propose some solution for this task, thank you in advance. :-)

You can download this data to a directory of your choice with Selenium; you just need to specify the directory to which the data will be saved. In what follows below, I'll save the txt data to my desktop:
from selenium import webdriver
download_dir = '/Users/doug/Desktop/'
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : download_dir}
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.ceps.cz/en/all-data')
container = driver.find_element_by_class_name('download-graph-data')
button = container.find_element_by_tag_name('li')
button.click()

You should do like so:
import requests
txt_format = 'txt'
xls_format = 'xls' # open in binary mode
xml_format = 'xlm' # open in binary mode
def download(file_type):
url = f'https://www.ceps.cz/download-data/?format={txt_format}'
response = requests.get(url)
if file_type is txt_format:
with open(f'file.{file_type}', 'w') as file:
file.write(response.text)
else:
with open(f'file.{file_type}', 'wb') as file:
file.write(response.content)
download(txt_format)

How to save the screenshot in particular folder using python

I have written the below script for taking a screenshot. Currently, it saves the file in the same directory as the python file is located. I want to save the screenshot in a particular folder.
from selenium import webdriver
import option
import time
#PhantomJS
driver = webdriver.PhantomJS(executable_path=r'D:\PhantomJS\phantomjs-2.1.1-
windows\bin\phantomjs.exe')
#Selenium
#driver = webdriver.Chrome("D:\Selenium\Chrome\chromedriver.exe")
#Maximizes window to full screen
driver.maximize_window()
#Gets the URL for OMS
driver.get(option.OMS_QUERY)
#Gets the username & Password
driver.find_element_by_xpath(option.LOG_IN).click()
driver.find_element_by_id("username").send_keys(option.USERNAME)
driver.find_element_by_xpath(option.ENTER).click()
time.sleep(3)
driver.find_element_by_id("password").send_keys(option.PASSWORD)
driver.find_element_by_xpath(option.ENTER).click()
time.sleep(15)
#Saves the screenshot for OMS_SWR
driver.save_screenshot('oms_swr.png')
#Gets the URL for DMS
driver.get(option.DMS_QUERY)
time.sleep(15)
#Saves the screenshot for DMS_SWR
driver.save_screenshot('dms_swr.png')
driver.quit()

You have to set path where you want to store it, Store in system drive like this
driver.save_screenshot('D:/Folder_name/dms_swr.png')

To save the screenshot in a particular folder you can use either of the following options :
Within your Project space :
driver.save_screenshot('./project_directory/save_screenshot.png')
Within your System :
driver.save_screenshot('C:/system_directory/save_screenshot.png')

i tried doing this as well. it didnt work. i created a directory named image an then tried using driver.save_screenshot('/Users/name/PycharmProjects/RunPage/image/homepage.png')
but this didnt work
I also tried
driver.get_screenshot_as_file('/Users/name/PycharmProjects/RunPage/image/homepage.png')

Cannot open new tab for selenium in order to download a CSV file python

Problem description
I am working on ubuntu 16.04.
I want to download CSV files from a website. They are presented by links. The moment I click on the link I want to open a new tab which will download the file. I used the solution provided in https://gist.github.com/lrhache/7686903.
setup
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",download_path)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")
# create a selenium webdriver
browser = webdriver.Firefox(firefox_profile=fp)
# open QL2 website
browser.get('http://live.ql2.com')
code
csvList = browser.find_elements_by_class_name("csv")
for l in csvlist:
if 'error' not in l.text and 'after' not in l.text:
l.send_keys(Keys.CONTROL +'t')
Every Element l is represented as follows:
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="9003fc6a-d8be-472b-bced-94fffdb5fdbe", element="27e1638a-0e37-411d-8d30-896c15711b49")>
Question
Why am I not able to open a new tab. Is there something missing?

The problem seems to be that you are just making a new tab, not opening the link in a new tab.
Try using ActionChains:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# create browser as detailed in OP's setup
key_code = Keys.CONTROL
csvList = browser.find_elements_by_class_name("csv")
for l in csvlist:
if 'error' not in l.text and 'after' not in l.text:
actions = webdriver.ActionChains(browser)
# Holds down the key specified in key_code
actions.key_down(key_code)
# Clicks the link
actions.click(l)
# Releases down the key specified in key_code
actions.key_up(key_code)
# Performs the actions specified in the ActionChain
actions.perform()

Downloading multiple files using Selenium click()?

Using Firefox/Python/Selenium-- I am able to use click() on a file link on a webpage to download it, and the file downloads to my Downloads folder as expected.
However, when I add more lines to click() on more than 1 link, the script no longer runs as expected. Instead of the files being downloaded, they are all opening in separate browser windows, which all close after the script completes.
Is this by design or is there a way around it or a better way to download multiple files on a webpage?
This is the website in question: https://www.treasury.gov/about/organizational-structure/ig/Pages/igdeskbook.aspx
I am trying to download the links to the Introduction and all parts of Volumes 1-4.
I have a dictionary of the locators:
IgDeskbookPageMap = dict(IgDeskbookBannerXpath = "//div[contains(text(), 'The Inspector General Deskbook')]",
IgDeskbookIntroId = "anch_202",
IgDeskbookVol1Part1Id = "anch_203",
IgDeskbookVol1Part2Id = "anch_204",
IgDeskbookVol1Part3Id = "anch_205",
IgDeskbookVol1Part4Id = "anch_206",
IgDeskbookVol2Id = "anch_207",
IgDeskbookVol3Id = "anch_208",
IgDeskbookVol4Part1Id = "anch_209",
IgDeskbookVol4Part2Id = "anch_210",
IgDeskbookVol4Part3Id = "anch_211"
This is the method:
def click(self, waitTime, locatorMode, Locator):
self.wait_until_element_clickable(waitTime, locatorMode, Locator).click()
These are the click() calls (there are more than 3, but just truncating here for space:
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookIntroId']
)
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookVol1Part1Id']
)
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookVol1Part2Id']
)

I added the following code for launching Firefox and now the download behavior works as expected when clicking on each file:
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.helperApps.alwaysAsk.force', False)
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf,application/x-pdf')
profile.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")
profile.set_preference("pdfjs.disabled", True)
self.driver = webdriver.Firefox(profile)

A way to download such multiple files if opened in different tabs could be to follow these algorithmic steps in your own coding language :
for( all such links) :
click() the pdf link
findElement the download element
click() the download link
close the tab
switch back to last tab //should ideally be completed with previous step

How can I download a file on a click event using selenium?

I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.close()
I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?

Find the link using find_element(s)_by_*, then call click method.
from selenium import webdriver
# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
Added profile manipulation code to prevent download dialog.

I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.
Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...
Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay
The Magic
import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json
root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')
print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')
print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')
print('Injecting retrieval code into web page')
driver.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(download_url),
})
print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
# Returns the file retrieved base64 encoded (perfect for downloading binary)
downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
print(downloaded_file)
if not downloaded_file:
print('\tNot downloaded, waiting...')
time.sleep(0.5)
print('\tDone')
print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.
Explaination
We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.
Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.
Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.
This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.
Alternate Approach
While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.

In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:
docs = document
.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloads-list')
.getElementsByTagName('downloads-item')
This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)

Here is the full working code. You can use web scraping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.
from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions()
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
driver.get('xxxx') # Your Website Address
password_box = driver.find_element_by_name('password')
password_box.send_keys('xxxx') #Password
download_button = driver.find_element_by_class_name('link_w_pass')
download_button.click()
driver.quit()
except:
driver.quit()
print("Faulty URL")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using a constantly changing downloaded file python? - python

I think you can try a random number generator or use timestamp to name it as a file. The file name is a string, like if i try with following pseudo code - filename =random(seed).toString() + ".pdf" I think this should work.

Related

How to download file from a page using python

How to save the screenshot in particular folder using python

Cannot open new tab for selenium in order to download a CSV file python

Downloading multiple files using Selenium click()?

How can I download a file on a click event using selenium?

Categories

Resources