I have a csv file with the url for a list of invoices in Coupa. I need to use python to got to the Coupa and download the image scan PDF file to a specific folder. I have the following code. It runs but when I open the PDF file it comes back corrupted. Any help would be appreciated.
import requests
file_url = "https://mercuryinsurance.coupahost.com/invoice/15836/image_scan"
r = requests.get(file_url, stream = False)
with open("python.pdf","wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):
# writing one chunk at a time to pdf file
if chunk:
pdf.write(chunk)
Related
I'm trying to download some pictures from a website. When I download them with the browser they are much smaller than the one, downloaded with my code.They have the same resolution as the one downloaded with my code, but the difference between the filesizes is very large!
def download(url, pathname):
"""
Downloads a file given an URL and puts it in the folder `pathname`
"""
# if path doesn't exist, make that path dir
if not os.path.isdir(pathname):
os.makedirs(pathname)
# download the body of response by chunk, not immediately
response = requests.get(url, stream=True)
# get the total file size
file_size = int(response.headers.get("Content-Length", 0))
# get the file name
filename = os.path.join(pathname, url.split("/")[-1])
# progress bar, changing the unit to bytes instead of iteration (default by tqdm)
progress = tqdm(response.iter_content(1024), f"Downloading {filename}", total=file_size, unit="B", unit_scale=True,
unit_divisor=1024, disable=True)
with open(filename, "wb") as f:
for data in progress.iterable:
# write data read to the file
f.write(requests.get(url).content)
# update the progress bar manually
progress.update(len(data))
Example: https://wallpaper-mania.com/wp-content/uploads/2018/09/High_resolution_wallpaper_background_ID_77700030695.jpg
Browser: About 222 KB
Code: 48,4 MB
How does this difference come about? How can I improve the code to download images in way, they are smaller?
f.write(requests.get(url).content)
It looks like you're re-downloading the entire file for every 1024 byte chunk, so you're getting 222 copies of the image. Make that:
f.write(data)
Hi everyone this is my first post here and wanted to know how can ı write image files that ı scraped from a website to a csv file or if its not possible to write on csv how can ı write this header,description,time info and image to a maybe word file Here is the code
Everything works perfectly just wanna know how can ı write the images that i downloaded to disk to a csv or word file
Thanks for your helps
import csv
import requests
from bs4 import BeautifulSoup
site_link = requests.get("websitenamehere").text
soup = BeautifulSoup(site_link,"lxml")
read_file = open("blogger.csv","w",encoding="UTF-8")
csv_writer = csv.writer(read_file)
csv_writer.writerow(["Header","links","Publish Time"])
counter = 0
for article in soup.find_all("article"):
###Counting lines
counter += 1
print(counter)
#Article Headers
headers = article.find("a")["title"]
print(headers)
#### Links
links = article.find("a")["href"]
print(links)
#### Publish time
publish_time = article.find("div",class_="mkdf-post-info-date entry-date published updated")
publish_time = publish_time.a.text.strip()
print(publish_time)
###image links
images = article.find("img",class_="attachment-full size-full wp-post-image nitro-lazy")["nitro-lazy-src"]
print(images)
###Download Article Pictures to disk
pic_name = f"{counter}.jpg"
with open(pic_name, 'wb') as handle:
response = requests.get(images, stream=True)
for block in response.iter_content(1024):
handle.write(block)
###CSV Rows
csv_writer.writerow([headers, links, publish_time])
print()
read_file.close()
You could basically convert to base64 and write to a file as you need it
import base64
with open("image.png", "rb") as image_file:
encoded_string= base64.b64encode(img_file.read())
print(encoded_string.decode('utf-8'))
A csv file is supposed to only contain text fields. Even if the csv module does its best to quote fields to allow almost any character in them, including the separator or a new line, it is not able to process NULL characters that could exist in an image file.
That means that you will have to encode the image bytes if you want to store them in a csv file. Base64 is a well known format natively supported by the Python Standard Library. So you could change you code to:
import base64
...
###Download Article Pictures
response = requests.get(images, stream=True)
image = b''.join(block for block in response.iter_content(1024)) # raw image bytes
image = base64.b64encode(image) # base 64 encoded (text) string
###CSV Rows
csv_writer.writerow([headers, links, publish_time, image])
Simply the image will have to be decoded before being used...
I was trying to do that task with Matlab using :
url = 'the url of the file';
file_name = 'data.mat';
outfilename = websave(filename,url);
load(outfilename);
but it didn't work, how can i do that using python? kindly note i want the .mat as it is not an html , csv or any other format i just that file just downloaded(i can do it manually but i have hundreds that's why i need that)
.(python 3)
using urllib2:
import urllib2
response = urllib2.urlopen("the url")
file = open("filename.mat", 'w')
file.write(response.read())
file.close()
So I am trying to download a file from and API which will be in csv format
I generate a link with user inputs and store it in a variable exportLink
import requests
#getProjectName
projectName = raw_input('ProjectName')
#getApiToken
apiToken = "mytokenishere"
#getStartDate
startDate = raw_input('Start Date')
#getStopDate
stopDate = raw_input('Stop Date')
url = "https://api.awrcloud.com/get.php?action=export_ranking&project=%s&token=%s&startDate=%s&stopDate=%s" % (projectName,apiToken,startDate,stopDate)
exportLink = requests.get(url).content
exportLink will store the generated link
which I must then call to download the csv file using another
requests.get() command on exportLink
When I click the link it opens the download in a browser,
is there any way to automate this so it opens the zip and I can begin
to edit the csv using python i.e removing some stuff?
If you have bytes object zipdata that you got with requests.get(url).content, you can extract file by file to another bytes object
import zipfile
import io
import csv
with zipfile.ZipFile(io.BytesIO(zipdata)) as z:
for f in z.filelist:
csvdata = z.read(f)
and then do something with csvdata
reader = csv.reader(io.StringIO(csvdata.decode()))
...
I am new to Python. Here is my environment setup:
I have Anaconda 3 ( Python 3). I would like to be able to download an CSV file from the website:
https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD
I would like to use the requests library. I would appreciate anyhelp in figuring our how I can use the requests library in downloading the CSV file to the local directory on my machine
It is recommended to download data as stream, and flush it into the target or intermediate local file.
import requests
def download_file(url, output_file, compressed=True):
"""
compressed: enable response compression support
"""
# NOTE the stream=True parameter. It enable a more optimized and buffer support for data loading.
headers = {}
if compressed:
headers["Accept-Encoding"] = "gzip"
r = requests.get(url, headers=headers, stream=True)
with open(output_file, 'wb') as f: #open as block write.
for chunk in r.iter_content(chunk_size=4096):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.flush() #Afterall, force data flush into output file (optional)
return output_file
Considering original post:
remote_csv = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
local_output_file = "test.csv"
download_file(remote_csv, local_output_file)
#Check file content, just for test purposes:
print(open(local_output_file).read())
Base code was extracted from this post: https://stackoverflow.com/a/16696317/176765
Here, you can have more detailed information about body stream usage with requests lib:
http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow