Pandas CParseError, file downloaded via FTP - python

I am trying to read a csv file (downloaded via FTP )in Pandas using read_csv
df = pandas.read_csv("file.csv")
but I get error :
CParserError: Error tokenizing data. C error: EOF inside string starting at line 652
The code to download the file via FTP:
f = open(file_name, 'wb')
ftp.retrbinary("RETR " + file_name, f.write)
But when I download the same file on browser and parse it, it does fine. Please suggest any solution.

Try this instead:
df = pandas.read_csv('ftp://...') # puth there the real FTP URL
from docs:
filepath_or_buffer : str, pathlib.
Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
The string could be a URL. Valid URL schemes include http, ftp,
s3, and file.
For file URLs, a host is expected. For instance, a local
file could be file ://localhost/path/to/table.csv

Related

Streamlit upload CSV file, parse it and push it to FTP

I am trying to build a simple Streamlit app, where, I am uploading a CSV file, then loads it into dataframe, display the dataframe, and then upload it to a pre-defined FTP server.
The first part is working, the file is successfully uploaded and visualized, but then I cannot upload it to the FTP server. This is my code:
import ftplib
import pandas as pd
import streamlit as st
ftp_server = "ftp.test.com"
ftp_username = "user"
ftp_password = "password"
input_file = st.file_uploader("Upload a CSV File",type=['csv'])
if (input_file is not None) and input_file.name.endswith(".csv"):
df = pd.read_csv(input_file, delimiter="\t", encoding = 'ISO-8859-1')
st.dataframe(df)
session = ftplib.FTP(ftp_server, ftp_username, ftp_password)
file = open(input_file, "rb")
session.storbinary(input_file.name, input_file)
input_file.close()
session.quit()
st.success(f"The {input_file.name} was successfully uploaded to the FTP server: {ftp_server}!")
I am getting an error that
TypeError: expected str, bytes or os.PathLike object, not UploadedFile.
I am using Streamlit v.1.1.0.
Please note that I have simplified my code and replaced the FTP credentials. In the real world, I would probably use try/except for the session connection, etc.
I guess you get the error here:
file = open(input_file, "rb")
That line is both wrong and useless (you never use the file). Remove it.
You might need to seek the input_file back to the beginning after you have read it in read_csv:
file_input.seek(0)
You are missing the upload command (STOR) the storbinary call:
session.storbinary("STOR " + input_file.name, input_file)

Python FileNotFoundError when reading a public json file

I have a public viewable JSON file that is hosted on s3. I can access the file directly by clicking on the public url to the s3 object and view the JSON in my browser fine. Anyone on the internet can view the file easily.
Yet with the below code is ran in Python (using Lambda connected to an API trigger) I get [Errno 2] No such file or directory: as the errorMessage, and FileNotFoundError as the errorType.
def readLeetDictionary(self):
jsonfile = 'https://s3.amazonaws.com/url/to/public/facing/file/data.json'
with open(jsonfile, 'r') as data_file:
self.data = json.load(data_file)
What am I missing here? Since the file is a publicly viewable JSON file I would assume I wouldn't be forced to use boto3 library and formally handshake to the file in order to read the file (with object = s3.get_object(Bucket=bucket_names,Key=object_name) for example) - would I?
The conde you need should be something like:
import urllib, json
def readLeetDictionary():
jsonfile = 'https://s3.amazonaws.com/url/to/public/facing/file/data.json'
response = urllib.urlopen(jsonfile)
data = json.loads(response.read())
print data
Please feel free to ask further or explain if this does not suit you.

Writing a file to S3 using Lambda in Python with AWS

In AWS, I'm trying to save a file to S3 in Python using a Lambda function. While this works on my local computer, I am unable to get it to work in Lambda. I've been working on this problem for most of the day and would appreciate help. Thank you.
def pdfToTable(PDFfilename, apiKey, fileExt, bucket, key):
# parsing a PDF using an API
fileData = (PDFfilename, open(PDFfilename, "rb"))
files = {"f": fileData}
postUrl = "https://pdftables.com/api?key={0}&format={1}".format(apiKey, fileExt)
response = requests.post(postUrl, files=files)
response.raise_for_status()
# this code is probably the problem!
s3 = boto3.resource('s3')
bucket = s3.Bucket('transportation.manifests.parsed')
with open('/tmp/output2.csv', 'rb') as data:
data.write(response.content)
key = 'csv/' + key
bucket.upload_fileobj(data, key)
# FYI, on my own computer, this saves the file
with open('output.csv', "wb") as f:
f.write(response.content)
In S3, there is a bucket transportation.manifests.parsed containing the folder csv where the file should be saved.
The type of response.content is bytes.
From AWS, the error from the current set-up above is [Errno 2] No such file or directory: '/tmp/output2.csv': FileNotFoundError. In fact, my goal is to save the file to the csv folder under a unique name, so tmp/output2.csv might not be the best approach. Any guidance?
In addition, I've tried to use wb and w instead of rb also to no avail. The error with wb is Input <_io.BufferedWriter name='/tmp/output2.csv'> of type: <class '_io.BufferedWriter'> is not supported. The documentation suggests that using 'rb' is the recommended usage, but I do not understand why that would be the case.
Also, I've tried s3_client.put_object(Key=key, Body=response.content, Bucket=bucket) but receive An error occurred (404) when calling the HeadObject operation: Not Found.
Assuming Python 3.6. The way I usually do this is to wrap the bytes content in a BytesIO wrapper to create a file like object. And, per the boto3 docs you can use the-transfer-manager for a managed transfer:
from io import BytesIO
import boto3
s3 = boto3.client('s3')
fileobj = BytesIO(response.content)
s3.upload_fileobj(fileobj, 'mybucket', 'mykey')
If that doesn't work I'd double check all IAM permissions are correct.
You have a writable stream that you're asking boto3 to use as a readable stream which won't work.
Write the file, and then simply use bucket.upload_file() afterwards, like so:
s3 = boto3.resource('s3')
bucket = s3.Bucket('transportation.manifests.parsed')
with open('/tmp/output2.csv', 'w') as data:
data.write(response.content)
key = 'csv/' + key
bucket.upload_file('/tmp/output2.csv', key)

How to download FTP file using its full FTP path?

Using the ftplib in Python, you can download files, but it seems you are restricted to use the file name only (not the full file path). The following code successfully downloads the requested code:
import ftplib
ftp=ftplib.FTP("ladsweb.nascom.nasa.gov")
ftp.login()
ftp.cwd("/allData/5/MOD11A1/2002/001")
ftp.retrbinary('RETR MOD11A1.A2002001.h00v08.005.2007079015634.hdf',open("MOD11A1.A2002001.h00v08.005.2007079015634.hdf",'wb').write)
As you can see, first a login to the site (ftp.login()) is established and then the current directory is set (ftp.cwd()). After that you need to declare the file name to download the file that resides in the current directory.
How about downloading the file directly by using its full path/link?
import ftplib
ftp = ftplib.FTP("ladsweb.nascom.nasa.gov")
ftp.login()
a = 'allData/5/MOD11A1/2002/001/MOD11A1.A2002001.h00v08.005.2007079015634.hdf'
fhandle = open('ftp-test', 'wb')
ftp.retrbinary('RETR ' + a, fhandle.write)
fhandle.close()
This solution uses the urlopen function in the urllib module. The urlopen function will let you download ftp and http urls. I like using it because you can connect and get all the data in one line. The last three lines extract the filename from the url and then save the data to that filename.
from urllib import urlopen
url = 'ftp://ladsweb.nascom.nasa.gov/allData/5/MOD11A1/2002/001/MOD11A1.A2002001.h00v08.005.2007079015634.hdf'
data = urlopen(url).read()
filename = url.split('/')[-1]
with open(filename, 'wb') as f:
f.write(data)

Django. How to upload file from memory to flickr?

I would like to know, how to upload a file from memory to Flickr.
I am using the Python Flickr API kit (http://stuvel.eu/flickrapi).
Does the file in memory have a path that can be passed as filename?
Code
response = flickr.upload(filename=f.read(), callback=None, **keywords)
Error
TypeError at /image/new/
must be encoded string without NULL bytes, not str
Thanks in advance
You can try using the tempfile module to write it to disk before uploading it
import tempfile
with tempfile.NamedTemporaryFile(delete=True) as tfile:
tfile.write(f.read())
tfile.flush()
response = flickr.upload(filename=tfile.name,callback=None,**keywords)

Categories