python stream data between sources - python

I have a large file in my FTP, and I need to write it to google cloud storage.
My first thought would be download it from ftp to local file, and then transmit local file to remote storage. But I would prefer to do it without downloading it localy.
So far I came up with the following code:
from ftplib import FTP
import io
ftp = ftp = FTP('example.com')
ftp.voidcmd('TYPE I')
sock = ftp.transfercmd('RETR file.csv')
raw = io.BytesIO()
file = io.BufferedRandom(raw)
blob = bucket.blob('blobname.csv', chunk_size=262144) # gcs blob
blob.upload_from_file(file, content_type='text/csv', rewind=True)
But I get:
Traceback (most recent call last):
File "/home/tsh/example.py", line 65, in <module>
file = io.BufferedRandom(raw)
io.UnsupportedOperation: File or stream is not seekable.
Is there a way to pipe data from ftp to cloud storage (or any other remote resource) without downloading it to local machine? I am using python 3.6.

I think you can achieve what you want, without python, using rclone. If you must use python, maybe they provide a wrapper or you could use subprocess module.

Related

Writing to data in Python to a local file and uploading to FTP at the same time does not work

I have this weird issue with my code on Raspberry Pi 4.
from gpiozero import CPUTemperature
from datetime import datetime
import ftplib
cpu = CPUTemperature()
now = datetime.now()
time = now.strftime('%H:%M:%S')
# Save data to file
f = open('/home/pi/temp/temp.txt', 'a+')
f.write(str(time) + ' - Temperature is: ' + str(cpu.temperature) + ' C\n')
# Login and store file to FTP server
ftp = ftplib.FTP('10.0.0.2', 'username', 'pass')
ftp.cwd('AiDisk_a1/usb/temperature_logs')
ftp.storbinary('STOR temp.txt', f)
# Close file and connection
ftp.close()
f.close()
When I have this code, script doesn't write anything to the .txt file and file that is transferred to FTP server has size of 0 bytes.
When I remove this part of code, script is writing to the file just fine.
# Login and store file to FTP server
ftp = ftplib.FTP('10.0.0.2', 'username', 'pass')
ftp.cwd('AiDisk_a1/usb/temperature_logs')
ftp.storbinary('STOR temp.txt', f)
...
ftp.close()
I also tried to write some random text to the file and run the script, and the file transferred normally.
Do you have any idea, what am I missing?
After you write the file, the file pointer is at the end. So if you pass file handle to FTP, it reads nothing. Hence nothing is uploaded.
I do not have a direct explanation for the fact the local file ends up empty. But the strange way of combining "append" mode and reading may be the reason. I do not even see a+ mode defined in open function documentation.
If you want to both append data to a local file and FTP, I suggest your either:
Append the data to the file – Seek back to the original position – And upload the appended file contents.
Write the data to memory and then separately 1) dump the in-memory data to a file and 2) upload it.

Cannot write xlsx to GCS from pandas

I have a strange issue.
I trigger a K8S job from airflow as a data pipeline. At the end I need to write the dataframe to a Google Cloud Storage as a .parquet and .xlsx files.
[...]
export_app.to_parquet(f"{output_path}.parquet")
export_app.to_excel(f"{output_path}.xlsx")
Everything is ok for the parquet file but I got an error for the xlsx.
severity: "INFO"
textPayload: "[Errno 2] No such file or directory: 'gs://my_bucket/incidents/prediction/2020-04-29_incidents_result.xlsx'
I try to write the file as a csv to try
export_app.to_parquet(f"{output_path}.parquet")
export_app.to_csv(f"{output_path}.csv")
export_app.to_excel(f"{output_path}.xlsx")
I get the same message every time and I find the other file as expected.
There is any limitation to write a xlsx file ?
I have the package openpyxl installed in my env.
As requested I am passing some codes how I created new xlsx file using directly gcs python3 api. I used this tutorial and this api reference:
# Imports the Google Cloud client library
from google.cloud import storage
# Instantiates a client
storage_client = storage.Client()
# Create the bucket object
bucket = storage_client.get_bucket("my-new-bucket")
#Confirm bucket connected
print("Bucket {} connected.".format(bucket.name))
#Create file in the bucket
blob = bucket.blob('test.xlsx')
with open("/home/vitooh/test.xlsx", "rb") as my_file:
blob.upload_from_file(my_file)
I hope it will help!

ftplib upload and download get stuck

I'm trying to upload a file to my VPS (hosted by GoDaddy) via Python's ftplib library:
from ftplib import FTP
session = FTP('ftp.wangsibo.xyz','wsb','Wsb.139764')
file = open('source10.png','rb')
session.storbinary('store_source10.png', file)
file.close()
session.quit()
However it gets stuck at line 4 (the file is only a few k's and it's taking minutes). The same thing happens when I'm trying to read using retrbinary.
I've tried using FileZilla and it worked fine. Any suggestions?
FTP.storbinary(command, fp[, blocksize, callback, rest])
Store a file in binary transfer mode. command should be an appropriate
STOR command: "STOR filename". fp is an open file object which is read
until EOF using its read() method in blocks of size blocksize to
provide the data to be stored.
store_source10.png is not a command, you can try to use STOR source10.png.
e.g.
from ftplib import FTP
session = FTP('ftp.wangsibo.xyz','wsb','Wsb.139764')
file=open('source10.png','rb')
session.storbinary('STOR source10.png',file)
file.close()
session.quit()

Django. How to upload file from memory to flickr?

I would like to know, how to upload a file from memory to Flickr.
I am using the Python Flickr API kit (http://stuvel.eu/flickrapi).
Does the file in memory have a path that can be passed as filename?
Code
response = flickr.upload(filename=f.read(), callback=None, **keywords)
Error
TypeError at /image/new/
must be encoded string without NULL bytes, not str
Thanks in advance
You can try using the tempfile module to write it to disk before uploading it
import tempfile
with tempfile.NamedTemporaryFile(delete=True) as tfile:
tfile.write(f.read())
tfile.flush()
response = flickr.upload(filename=tfile.name,callback=None,**keywords)

PIL.Image.save() to an FTP server

Right now, I have the following code:
pilimg = PILImage.open(img_file_tmp) # img_file_tmp just contains the image to read
pilimg.thumbnail((200,200), PILImage.ANTIALIAS)
pilimg.save(fn, 'PNG') # fn is a filename
This works just fine for saving to a local file pointed to by fn. However, what I would want this to do instead is to save the file on a remote FTP server.
What is the easiest way to achieve this?
Python's ftplib library can initiate an FTP transfer, but PIL cannot write directly to an FTP server.
What you can do is write the result to a file and then upload it to the FTP server using the FTP library. There are complete examples of how to connect in the ftplib manual so I'll focus just on the sending part:
# (assumes you already created an instance of FTP
# as "ftp", and already logged in)
f = open(fn, 'r')
ftp.storbinary("STOR remote_filename.png", f)
If you have enough memory for the compressed image data, you can avoid the intermediate file by having PIL write to a StringIO, and then passing that object into the FTP library:
import StringIO
f = StringIO()
image.save(f, 'PNG')
f.seek(0) # return the StringIO's file pointer to the beginning of the file
# again this assumes you already connected and logged in
ftp.storbinary("STOR remote_filename.png", f)

Categories