I am trying to build a simple Streamlit app, where, I am uploading a CSV file, then loads it into dataframe, display the dataframe, and then upload it to a pre-defined FTP server.
The first part is working, the file is successfully uploaded and visualized, but then I cannot upload it to the FTP server. This is my code:
import ftplib
import pandas as pd
import streamlit as st
ftp_server = "ftp.test.com"
ftp_username = "user"
ftp_password = "password"
input_file = st.file_uploader("Upload a CSV File",type=['csv'])
if (input_file is not None) and input_file.name.endswith(".csv"):
df = pd.read_csv(input_file, delimiter="\t", encoding = 'ISO-8859-1')
st.dataframe(df)
session = ftplib.FTP(ftp_server, ftp_username, ftp_password)
file = open(input_file, "rb")
session.storbinary(input_file.name, input_file)
input_file.close()
session.quit()
st.success(f"The {input_file.name} was successfully uploaded to the FTP server: {ftp_server}!")
I am getting an error that
TypeError: expected str, bytes or os.PathLike object, not UploadedFile.
I am using Streamlit v.1.1.0.
Please note that I have simplified my code and replaced the FTP credentials. In the real world, I would probably use try/except for the session connection, etc.
I guess you get the error here:
file = open(input_file, "rb")
That line is both wrong and useless (you never use the file). Remove it.
You might need to seek the input_file back to the beginning after you have read it in read_csv:
file_input.seek(0)
You are missing the upload command (STOR) the storbinary call:
session.storbinary("STOR " + input_file.name, input_file)
Related
I have the following Python function to write the given content to a bucket in Cloud Storage:
import gzip
from google.cloud import storage
def upload_to_cloud_storage(json):
"""Write to Cloud Storage."""
# The contents to upload as a JSON string.
contents = json
storage_client = storage.Client()
# Path and name of the file to upload (file doesn't yet exist).
destination = "path/to/name.json.gz"
# Gzip the contents before uploading
with gzip.open(destination, "wb") as f:
f.write(contents.encode("utf-8"))
# Bucket
my_bucket = storage_client.bucket('my_bucket')
# Blob (content)
blob = my_bucket.blob(destination)
blob.content_encoding = 'gzip'
# Write to storage
blob.upload_from_string(contents, content_type='application/json')
However, I receive an error when running the function:
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/name.json.gz'
Highlighting this line as the cause:
with gzip.open(destination, "wb") as f:
I can confirm that the bucket and path both exist although the file itself is new and to be written.
I can also confirm that removing the Gzipping part sees the file successfully written to Cloud Storage.
How can I gzip a new file and upload to Cloud Storage?
Other answers I've used for reference:
https://stackoverflow.com/a/54769937
https://stackoverflow.com/a/67995040
Although #David's answer wasn't complete at the time of solving my problem, it got me on the right track. Here's what I ended up using along with explanations I found out along the way.
import gzip
from google.cloud import storage
from google.cloud.storage import fileio
def upload_to_cloud_storage(json_string):
"""Gzip and write to Cloud Storage."""
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
# Filename (include path)
blob = bucket.blob('path/to/file.json')
# Set blog meta data for decompressive transcoding
blob.content_encoding = 'gzip'
blob.content_type = 'application/json'
writer = fileio.BlobWriter(blob)
# Must write as bytes
gz = gzip.GzipFile(fileobj=writer, mode="wb")
# When writing as bytes we must encode our JSON string.
gz.write(json_string.encode('utf-8'))
# Close connections
gz.close()
writer.close()
We use the GzipFile() class instead of convenience method (compress) to enable us to pass in the mode. When trying to write using w or wt you will receive the error:
TypeError: memoryview: a bytes-like object is required, not 'str'
So we must write in binary mode (wb). This will also enable the .write() method. When doing so however we need to encode our JSON string. This can be done using str.encode() and setting it as utf-8. Failing to do this will also result in the same error.
Finally, I wanted to be able to enable decompressive transcoding where the requester (browser in my case) will receive the uncompressed version of the file when requested. To enable this google.cloud.storage.blob allows you to set some meta data including content_type and content_encoding so we can can follow best practices.
This sees the JSON object in memory written to your chosen destination in Cloud Storage in a compressed format and decompressed on the fly (without needing to download a gzip archive).
Thanks also to #JohnHanley for the troubleshooting advice.
The best solution is not to write the gzip to a file at all, and directly compress and stream to GCS.
from google.cloud import storage
from google.cloud.storage import fileio
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
blob = bucket.blob('my_object')
writer = fileio.BlobWriter(blob)
gz = gzip.GzipFile(fileobj=writer, mode="w") # use "wb" if bytes
gz.write(contents)
gz.close()
writer.close()
I am trying to create an API function, that takes in .csv file (uploaded) and opens it as pandas DataFrame. Like that:
from fastapi import FastAPI
from fastapi import UploadFile, Query, Form
import pandas as pd
app = FastAPI()
#app.post("/check")
def foo(file: UploadFile):
df = pd.read_csv(file.file)
return len(df)
Then, I am invoking my API:
import requests
url = 'http://127.0.0.1:8000/check'
file = {'file': open('data/ny_pollution_events.csv', 'rb')}
resp = requests.post(url=url, files=file)
print(resp.json())
But I got such error: FileNotFoundError: [Errno 2] No such file or directory: 'ny_pollution_events.csv'
As far as I understand from doc pandas is able to read .csv file from file-like object, which file.file is supposed to be. But it seems, that here in read_csv() method pandas obtains name (not a file object itself) and tries to find it locally.
Am I doing something wrong?
Can I somehow implement this logic?
To read the file in pandas, the file must be stored on your PC. Don't forget to import shutil. if you don't need the file to be stored on your PC, delete it using os.remove(filepath).
if not file.filename.lower().endswith(('.csv',".xlsx",".xls")):
return 404,"Please upload xlsx,csv or xls file."
if file.filename.lower().endswith(".csv"):
extension = ".csv"
elif file.filename.lower().endswith(".xlsx"):
extension = ".xlsx"
elif file.filename.lower().endswith(".xls"):
extension = ".xls"
# eventid = datetime.datetime.now().strftime('%Y%m-%d%H-%M%S-') + str(uuid4())
filepath = "location where you want to store file"+ extension
with open(filepath, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
try:
if filepath.endswith(".csv"):
df = pd.read_csv(filepath)
else:
df = pd.read_excel(filepath)
except:
return 401, "File is not proper"
Im creating an api using flask where zip file should be downloaded at client side. The zip file is converted in to binary files and sent to client. The client regenerates the binary file back in to zip file. The server side is working fine and a zip file is downloaded but inside the file is empty. How to fix this?
this is server side
'''
#app.route('/downloads/', methods=['GET'])
def download():
from flask import Response
import io
import zipfile
import time
FILEPATH = "/home/Ubuntu/api/files.zip"
fileobj = io.BytesIO()
with zipfile.ZipFile(fileobj, 'w') as zip_file:
zip_info = zipfile.ZipInfo(FILEPATH)
zip_info.date_time = time.localtime(time.time())[:6]
zip_info.compress_type = zipfile.ZIP_DEFLATED
with open(FILEPATH, 'rb') as fd:
zip_file.writestr(zip_info, fd.read())
fileobj.seek(0)
return Response(fileobj.getvalue(),
mimetype='application/zip',
headers={'Content-Disposition': 'attachment;filename=files.zip'})
# client side
bin_data=b"response.content" #Whatever binary data you have store in a variable
binary_file_path = 'files.zip' #Name for new zip file you want to regenerate
with open(binary_file_path, 'wb') as f:
f.write(bin_data)
I have a small but mysterious and unsolvable problem using python to open a password protected file in an AWS S3 bucket.
The password I have been given is definitely correct and I can download the zip to Windows and extract it to reveal the csv data I need.
However I need to code up a process to load this data into a database regularly.
The password has a pattern like this (includes mixed case letters, numbers and a single "#"):-
ABCD#Efghi12324567890
The code below works with other zip files I place in the location with the same password:-
import boto3
import pyzipper
from io import BytesIO
s3_resource = boto3.resource('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
zip_obj = s3_resource.Object(bucket_name=my_bucket, key=my_folder + my_zip)
buffer = BytesIO(zip_obj.get()["Body"].read())
z = pyzipper.ZipFile(buffer)
my_newfile=z.namelist()[0]
s3_resource.meta.client.upload_fileobj(
z.open(my_newfile, pwd=b"ABCD#Efghi12324567890"), #HERE IS THE OPEN COMMAND
Bucket=my_bucket,
Key=my_folder + my_newfile)
I am told the password is incorrect:-
RuntimeError: Bad password for file 'ThisIsTheFileName.csv'
I resorted to using pyzipper rather than zipfile, since zipfile didn't support the compression method of the file in question:-
That compression method is not supported
In 7-zip I can see the following for the zip file:-
Method: AES-256 Deflate
Characteristics: WzAES: Encrypt
Host OS: FAT
So to confirm:-
-The password is definitely correct (can open it manually)
-The code seems ok - it opens my zip files with the same password
What is the issue here please and how do I fix it?
You would have my sincere thanks!
Phil
With some help from a colleague and a useful article, I now have this working.
Firstly as per the compression type, I have found it necessary to use the AESZipFile() method of pyzipper (although this method also seemed to work on other compression types).
Secondly the AESZipFile() method apparently accepts a BytesIO object as well as a file path, presumably because this is what it sees when it opens the file.
Therefore the zip file can be extracted in situ without having to download it first.
This method creates the pyzipper object which you can then read by specifying the file name and the password.
The final code looks like this:-
import pyzipper
import boto3
from io import BytesIO
my_bucket = ''
my_folder = ''
my_zip = ''
my_password = b''
aws_access_key_id=''
aws_secret_access_key=''
s3 = boto3.client('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
s3_file = s3.get_object(Bucket=my_bucket, Key=my_folder + my_zip)
s3_iodata = BytesIO(s3_file['Body'].read())
f = pyzipper.AESZipFile(s3_iodata)
my_file = f.namelist()[0]
file_content = f.read(my_file, pwd = my_password)
response = s3.put_object(Body=file_content, Bucket=my_bucket, Key=my_folder + my_file)
Here is an article that was useful:-
https://www.linkedin.com/pulse/extract-files-from-zip-archives-in-situ-aws-s3-using-python-tom-reid
I hope this is helpful to someone,
Phil
I am trying to read a csv file (downloaded via FTP )in Pandas using read_csv
df = pandas.read_csv("file.csv")
but I get error :
CParserError: Error tokenizing data. C error: EOF inside string starting at line 652
The code to download the file via FTP:
f = open(file_name, 'wb')
ftp.retrbinary("RETR " + file_name, f.write)
But when I download the same file on browser and parse it, it does fine. Please suggest any solution.
Try this instead:
df = pandas.read_csv('ftp://...') # puth there the real FTP URL
from docs:
filepath_or_buffer : str, pathlib.
Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
The string could be a URL. Valid URL schemes include http, ftp,
s3, and file.
For file URLs, a host is expected. For instance, a local
file could be file ://localhost/path/to/table.csv