Download specific files from .txt url using Python - python

How to download specific files from .txt url?
I have a url https://.storage.public.eu/opendata/files/open_data_files_access.txt (not real) with multiple files (here are just a few, in reality there are around 5k files) that can be downloaded separately, however I would need to download only specific files, and do this with Python.
For instance, I have a list with folder name and list of file name. How do I download only those file that are on the list? Let's say the list is:
files = ['folder1_file_1.jpg', 'folder1_file_2.jpg', 'folder1_file_3.jpg', 'folder1_file_4.jpg', 'folder1_file_10.jpg', 'folder2_file_2.jpg', 'folder2_file_3.jpg', 'folder3_file_1.jpg', 'folder3_file_3.jpg', 'folder3_file_4.jpg']
How to download only these in the list and save in specified directory?
I assume that the answer is somewhere here, but no work for me:
uurl = 'https://.storage.public.eu/opendata/files/open_data_files_access.txt'
from requests import get # to make GET request
def download(url, file_name):
# open in binary mode
with open(file_name, "wb") as file:
# get request
response = get(url)
# write to file
file.write(response.content)
file_name' = ['folder1_file_1.jpg', 'folder1_file_2.jpg', 'folder1_file_3.jpg', 'folder1_file_4.jpg', 'folder1_file_10.jpg', 'folder2_file_2.jpg', 'folder2_file_3.jpg', 'folder3_file_1.jpg', 'folder3_file_3.jpg', 'folder3_file_4.jpg']
download(uurl, file_name)

Related

how do i close multiple files after uploading them via a single POST request?

files = [('file', open(os.path.join(MONITOR_DIRECTORY, f), 'rb')) for f in new_files] # wrap all new files in list for POST request
response = requests.post(SERVER_IP, files = files)
after i wrap my files and send/upload it to a flask server via a POST request, i need to be able to delete the files locally. however when i try to remove the files via os.remove(), i get a permission error (WinError32).
I know that there is a with open() command which I can use for opening and closing individual files, but in this case because I want to send multiple files in a single request, how can I remove them at once after the request is sent?
Not sure why would you need to 'open' the files. but if your're post methoid doesn't requrie batch of files you can do it individually with for loop and file context:
for f in files:
with open(os.path.join(MONITOR_DIRECTORY, f), 'rb'):
# I don't know why you're using this tuple but I just added it again for you
response = requests.post(SERVER_IP, files = [('file', f)])
You have references to the file objects in files. Just close them.
# your code:
files = [('file', open(os.path.join(MONITOR_DIRECTORY, f), 'rb')) for f in new_files] # wrap all new files in list for POST request
response = requests.post(SERVER_IP, files = files)
# then afterwards:
for (_, f) in files:
f.close()

Extracting files from zip file from GET request

I currently have a GET request to a URL that returns three things: .zip file, .zipsig file, and a .txt file.
I'm only interested in the .zip file which has dozens of .json files. I would like to extract all these .json files, preferable directly into a single pandas data frame, but extracting them into a folder also works.
Code so far, mostly stolen:
license = requests.get(url, headers={'Authorization': "Api-Token " + 'blah'})
z = zipfile.ZipFile(io.BytesIO(license.content))
billingRecord = z.namelist()[0]
z.extract(billingRecord, path = "C:\\Users\\Me\\Downloads\\Json license")
This extracts the entire .zip file to the path. I would like to extract the individual .json files from said .zip file to the path.
import io
import zipfile
import pandas as pd
import json
dfs = []
with zipfile.ZipFile(io.BytesIO(license.content)) as zfile:
for info in zfile.infolist():
if info.filename.endswith('.zip'):
zfiledata = io.BytesIO(zfile.read(info.filename))
with zipfile.ZipFile(zfiledata) as json_zips:
for info in json_zips.infolist():
if info.filename.endswith('.json'):
json_data = pd.json_normalize(json.loads(json_zips.read(info.filename)))
dfs.append(json_data)
df = pd.concat(dfs, sort=False)
print(df)
I would do something like this. Obviously this is my test.zip file but the steps are:
List the files from the archive using the .infolist() method on your z archive
Check if the filename ends with the json extension using .endswith('.json')
Extract that filename with .extract(info.filename, info.filename)
Obviously you've called your archive z but mine is archive bu that should get you started.
Example code:
import zipfile
with zipfile.ZipFile("test.zip", mode="r") as archive:
for info in archive.infolist():
print(info.filename)
if info.filename.endswith('.png'):
print('Match: ', info.filename)
archive.extract(info.filename, info.filename)

How to download more than one file in Streamlit

I need to make a download button for more than one file. Streamlit's download button doesn't let you download more than one file. I tried to make a few buttons, but the rest just disappear when I click the first one. Is there any way to download two or more files in Streamlit?
I tried this solution from Github, this is what the code looks like:
if st.button("Rozpocznij proces"):
raport2 = Raport.raport_naj_10(gender,year,week,engine)
raportM = raport2[0]
raportO = raport2[1]
st.dataframe(raportM)
st.dataframe(raportO)
zipObj = ZipFile("sample.zip", "w")
# Add multiple files to the zip
zipObj.write("raportM")
zipObj.write("raportO")
# close the Zip File
zipObj.close()
ZipfileDotZip = "sample.zip"
with open(ZipfileDotZip, "rb") as f:
bytes = f.read()
b64 = base64.b64encode(bytes).decode()
href = f"<a href=\"data:file/zip;base64,{b64}\" download='{ZipfileDotZip}.zip'>\
Click last model weights\
</a>"
st.sidebar.markdown(href, unsafe_allow_html=True)
But I get this error:
FileNotFoundError: [WinError 2] Nie można odnaleźć określonego pliku: 'raportM'
It says that can't find the file named "raportM".
You are having those errors because the code is written with an assumption that you already have the files stored and you want to generate a zip file for them. zipObj.write("raportM") is looking for the file named "raportM" and there isn't any, because in your case you do not have these files stored. I can see that you are passing variable names as files and that is not going to work.
What you will have to do is to save those variable names as CSV files in your local machine before doing the above operations.
In this case lets modify your code. But before that we need to initialize a session state for the button st.button("Rozpocznij proces") because streamlit button have no callbacks.
processbtn = st.button("Rozpocznij proces")
# Initialized session states
if "processbtn_state" not in st.session_state:
st.session_state.processbtn_state = False
if processbtn or st.session_state.processbtn_state:
st.session_state.processbtn_state = True
raport2 = Raport.raport_naj_10(gender,year,week,engine)
raportM = raport2[0]
raportO = raport2[1]
st.dataframe(raportM)
st.dataframe(raportO)
# Save files
raportM.to_csv('raportM.csv') # You can specify a directory where you want
raportO.to_csv('raportO.csv') # these files to be stored
# Create a zip folder
zipObj = ZipFile("sample.zip", "w")
# Add multiple files to the zip
zipObj.write("raportM.csv")
zipObj.write("raportO.csv")
# close the Zip File
zipObj.close()
ZipfileDotZip = "sample.zip"
with open(ZipfileDotZip, "rb") as f:
bytes = f.read()
b64 = base64.b64encode(bytes).decode()
href = f"<a href=\"data:file/zip;base64,{b64}\" download='{ZipfileDotZip}.zip'>\
Click last model weights\
</a>"
st.sidebar.markdown(href, unsafe_allow_html=True)
At this moment, when you pay close attention to your directories you will find 'raportM.csv' and 'raportO.csv' files. You can pass a condition to the download button so that whenever a download is made the files should be deleted in case you don't want to keep them.
Note: You may encounter fileNotFound Error but does not mean that it won't work, you will just need to know where you are saving the files.

Change file extension of download with Python

If I'm downloading a file with a certain extension from a certain link, but want to download the file with another extension (e.g. .doc instead of .bin), how would I go about doing this in python code?
It can be done in the following way:
Download file in the default / original format
Use pypandoc in a Python script to create a new file with the desired format from the original file.
delete original file.
These 3 steps can all be automated from a Python script.
https://pypi.org/project/pypandoc/
Example, convert a markdown file to a rst-file (remember to correct the URL):
import os
import requests
import pypandoc
# Download file
# TODO: Update URL
url = 'some_url/somefile.md'
r = requests.get(url)
orig_file = '/Users/user11508332/Downloads/somefile.md'
with open(orig_file, 'wb') as f:
f.write(r.content)
# pypandoc file extention conversion
output = pypandoc.convert_file(orig_file, 'rst')
# TODO: Place a check here to see if the new file got created
# Clean-up: Delete original file
# TODO: Place a check here to see if the old file still exists, in that case, proceed with deletion:
# os.remove(orig_file)

How to extract a specific file from an archive donwloaded from internet using only memory

I'm looking for a way to extract a specific file (knowing his name) from an archive containing multiple ones, without writing any file on the hard drive.
I tried to use both StringIO and zipfile, but I only get the entire archive, or the same error from Zipfile (open require another argument than a StringIo object)
Needed behaviour:
archive.zip #containing ex_file1.ext, ex_file2.ext, target.ext
extracted_file #the targeted unzipped file
archive.zip = getFileFromUrl("file_url")
extracted_file = extractFromArchive(archive.zip, target.ext)
What I've tried so far:
import zipfile, requests
data = requests.get("file_url")
zfile = StringIO.StringIO(zipfile.ZipFile(data.content))
needed_file = zfile.open("Needed file name", "r").read()
There is a builtin library, zipfile, made for working with zip archives.
https://docs.python.org/2/library/zipfile.html
You can list the files in an archive:
ZipFile.namelist()
and extract a subset:
ZipFile.extract(member[, path[, pwd]])
EDIT:
This question has in-memory zip info. TLDR, Zipfile does work with in-memory file-like objects.
Python in-memory zip library
I finally found why I didn't succeed to do it after few hours of testing :
I was bufferring the zipfile object instead of buffering the file itself and then open it as a Zipfile object, which raised a type error.
Here is the way to do :
import zipfile, requests
data = requests.get(url) # Getting the archive from the url
zfile = zipfile.ZipFile(StringIO.StringIO(data.content)) # Opening it in an emulated file
filenames = zfile.namelist() # Listing all files
for name in filesnames:
if name == "Needed file name": # Verify the file is present
needed_file = zfile.open(name, "r").read() # Getting the needed file content
break

Categories