Python3 to Download csv files from URL in txt file - python

The code below will parse JSON from the URL to retrieve 10 urls and put them in an output.txt file.
import json
import urllib.request
response = urllib.request.urlopen('https://json-test.com/test').read()
jsonResponse = json.loads(response)
jsonResponse = json.loads(response.decode('utf-8'))
for child in jsonResponse['results']:
print (child['content'], file=open("C:\\Users\\test\\Desktop\\test\\output.txt", "a"))
Now that there are 10 links to csv files in the output.txt , trying to figure out how I can download and save the 10 files. Tried doing doing something like this but not working.
urllib.request.urlretrieve(['content'], "C:\\Users\\test\\Desktop\\test\\test1.csv")
Even if I get the above working it is just for 1 file, there are 10 file links in the output.txt. Any ideas?

Here is a exhausting guide on how to download files over http.
If the text file contains one link per line, you can iterate through the lines like this:
file = open('path/to/file.ext', 'r')
id = 0
for line in file:
# ... some regex checking if the text is actually a valid url
response = urllib.request.urlretrieve(line, 'path/to/file' + str(id) + '.ext')
id+=1

Related

is it possible to write image to csv file?

Hi everyone this is my first post here and wanted to know how can ı write image files that ı scraped from a website to a csv file or if its not possible to write on csv how can ı write this header,description,time info and image to a maybe word file Here is the code
Everything works perfectly just wanna know how can ı write the images that i downloaded to disk to a csv or word file
Thanks for your helps
import csv
import requests
from bs4 import BeautifulSoup
site_link = requests.get("websitenamehere").text
soup = BeautifulSoup(site_link,"lxml")
read_file = open("blogger.csv","w",encoding="UTF-8")
csv_writer = csv.writer(read_file)
csv_writer.writerow(["Header","links","Publish Time"])
counter = 0
for article in soup.find_all("article"):
###Counting lines
counter += 1
print(counter)
#Article Headers
headers = article.find("a")["title"]
print(headers)
#### Links
links = article.find("a")["href"]
print(links)
#### Publish time
publish_time = article.find("div",class_="mkdf-post-info-date entry-date published updated")
publish_time = publish_time.a.text.strip()
print(publish_time)
###image links
images = article.find("img",class_="attachment-full size-full wp-post-image nitro-lazy")["nitro-lazy-src"]
print(images)
###Download Article Pictures to disk
pic_name = f"{counter}.jpg"
with open(pic_name, 'wb') as handle:
response = requests.get(images, stream=True)
for block in response.iter_content(1024):
handle.write(block)
###CSV Rows
csv_writer.writerow([headers, links, publish_time])
print()
read_file.close()
You could basically convert to base64 and write to a file as you need it
import base64
with open("image.png", "rb") as image_file:
encoded_string= base64.b64encode(img_file.read())
print(encoded_string.decode('utf-8'))
A csv file is supposed to only contain text fields. Even if the csv module does its best to quote fields to allow almost any character in them, including the separator or a new line, it is not able to process NULL characters that could exist in an image file.
That means that you will have to encode the image bytes if you want to store them in a csv file. Base64 is a well known format natively supported by the Python Standard Library. So you could change you code to:
import base64
...
###Download Article Pictures
response = requests.get(images, stream=True)
image = b''.join(block for block in response.iter_content(1024)) # raw image bytes
image = base64.b64encode(image) # base 64 encoded (text) string
###CSV Rows
csv_writer.writerow([headers, links, publish_time, image])
Simply the image will have to be decoded before being used...

how to make urllib.request append to an existing file?

I'm trying to download a load of text in Python and want it all to save to a single file.
The code I'm currently using creates a separate file for each url. It loops through an archive of urls, requests the data and then saves it to its own file.
filename = archive[i]
urllib.request.urlretrieve(url, path + filename + ".pgn")
I've tried using the same filename for each url but it just overwrites the file.
Is there a way to loop through the archive and, rather than saving the data in its own separate file, add each block of text to a single file? Or do I need to just loop through all the files afterwards and concatenate them together?
Python's urlretrive docs says that
If you wish to retrieve a resource via URL and store it in a temporary location, you can do so via the urlretrieve() function
so if you wish to append the retrieved data in one file you have use urlopen for that
Like this :
import urllib.request
filename = "MY_FILE_PATH"
#-----------inside your i loop-------------
with urllib.request.urlopen(url) as response:
data = response.read()
# change your file type according e.g. "ab" for binary file
with open(filename + ".pgn", "a+") as fp: fp.write(str(data))
Note that urlretrieve might become deprecated at some point in the future. So use urlopen instead.
import urllib.request
import shutil
...
filename = archive[i]
with urllib.request.urlopen(url) as response, open(filename, 'ab') as out_file:
shutil.copyfileobj(response, out_file)

Downloading XML files from a web services URL in python

Please correct me if I am wrong as I am a beginner in python.
I have a web services URL which contains an XML file:
http://abc.tch.xyz.edu:000/patientlabtests/id/1345
I have a list of values and I want to append each value in that list to the URL and download file for each value and the name of the downloaded file should be the same to the value appended from the list.
It is possible to download one file at a time but I have 1000's of values in the list and I was trying to write a function with a for loop and I am stuck.
x = [ 1345, 7890, 4729]
for i in x :
url = http://abc.tch.xyz.edu:000/patientlabresults/id/{}.format(i)
response = requests.get(url2)
****** Missing part of the code ********
with open('.xml', 'wb') as file:
file.write(response.content)
file.close()
The files downloaded from URL should be like
"1345patientlabresults.xml"
"7890patientlabresults.xml"
"4729patientlabresults.xml"
I know there is a part of the code which is missing and I am unable to fill in that missing part. I would really appreciate if anyone can help me with this.
Accessing your web service url seem not to be working. Check this.
import requests
x = [ 1345, 7890, 4729]
for i in x :
url2 = "http://abc.tch.xyz.edu:000/patientlabresults/id/"
response = requests.get(url2+str(i)) # i must be converted to a string
Note: When you use 'with' to open a file, you do not have close the file since it will closed automatically.
with open(filename, mode) as file:
file.write(data)
Since the Url you provide is not working, I am going to use a different url. And I hope you get the idea and how to write to a file using the custom name
import requests
categories = ['fruit', 'car', 'dog']
for category in categories :
url = "https://icanhazdadjoke.com/search?term="
response = requests.get(url + category)
file_name = category + "_JOKES_2018" #Files will be saved as fruit_JOKES_2018
r = requests.get(url + category)
data = r.status_code #Storing the status code in 'data' variable
with open(file_name+".txt", 'w+') as f:
f.write(str(data)) # Writing the status code of each url in the file
After running this code, the status codes will be written in each of the files. And the file will also be named as follows:
car_JOKES_2018.txt
dog_JOKES_2018.txt
fruit_JOKES_2018.txt
I hope this gives you an understanding of how to name the files and write into the files.
I think you just want to create a path using str.format as you (almost) are for the URL. maybe something like the following
import os.path
x = [ 1345, 7890, 4729]
for i in x:
path = '1345patientlabresults.xml'.format(i)
# ignore this file if we've already got it
if os.path.exists(path):
continue
# try and get the file, throwing an exception on failure
url = 'http://abc.tch.xyz.edu:000/patientlabresults/id/{}'.format(i)
res = requests.get(url)
res.raise_for_status()
# write the successful file out
with open(path, 'w') as fd:
fd.write(res.content)
I've added some error handling and better behaviour on retry

How to translate url encoded string python

This code is supposed to download a list of pdfs into a directory
for pdf in preTag:
pdfUrl = "https://the-eye.eu/public/Books/Programming/" +
pdf.get("href")
print("Downloading...%s"% pdfUrl)
#downloading pdf from url
page = requests.get(pdfUrl)
page.raise_for_status()
#saving pdf to new directory
pdfFile = open(os.path.join(filePath, os.path.basename(pdfUrl)), "wb")
for chunk in page.iter_content(1000000):
pdfFile.write(chunk)
pdfFile.close()
I used os.path.basename() just to make sure the files would actually download. However, I want to know how to change the file name from 3D%20Printing%20Blueprints%20%5BeBook%5D.pdf to something like "3D Printing Blueprints.pdf"
You can use the urllib2 unquote function:
import urllib2
print urllib2.unquote("3D%20Printing%20Blueprints%20%5BeBook%5D.pdf") #3D Printing Blueprints.pdf
use this:
os.rename("3D%20Printing%20Blueprints%20%5BeBook%5D.pdf", "3D Printing Blueprints.pdf")
you can find more info here

Opening a csv file from an API with python

So I am trying to download a file from and API which will be in csv format
I generate a link with user inputs and store it in a variable exportLink
import requests
#getProjectName
projectName = raw_input('ProjectName')
#getApiToken
apiToken = "mytokenishere"
#getStartDate
startDate = raw_input('Start Date')
#getStopDate
stopDate = raw_input('Stop Date')
url = "https://api.awrcloud.com/get.php?action=export_ranking&project=%s&token=%s&startDate=%s&stopDate=%s" % (projectName,apiToken,startDate,stopDate)
exportLink = requests.get(url).content
exportLink will store the generated link
which I must then call to download the csv file using another
requests.get() command on exportLink
When I click the link it opens the download in a browser,
is there any way to automate this so it opens the zip and I can begin
to edit the csv using python i.e removing some stuff?
If you have bytes object zipdata that you got with requests.get(url).content, you can extract file by file to another bytes object
import zipfile
import io
import csv
with zipfile.ZipFile(io.BytesIO(zipdata)) as z:
for f in z.filelist:
csvdata = z.read(f)
and then do something with csvdata
reader = csv.reader(io.StringIO(csvdata.decode()))
...

Categories