Empty file stored on Firebase with Python - python

My goal is to generate certain files (txt/pdf/excel) on my Python server and subsequently push it to the Firebase Storage.
For the Firebase Storage integration I use the pyrebase package.
So far I have managed to generate the file locally and subsequently store it on the right path on the Firebase Storage database.
However, the files I store are always empty. What is the reason for this?
1. Generating the localFile
import os
def save_templocalfile(specs):
# Random something
localFileName = "test.txt"
localFile = open(localFileName,"w+")
for i in range(1000):
localFile.write("This is line %d\r\n" % (i+1))
return {
'localFileName': localFileName,
'localFile': localFile
}
2. Storing the localFile
# Required Libraries
import pyrebase
import time
# Firebase Setup & Admin Auth
config = {
"apiKey": "<PARAMETER>",
"authDomain": "<PARAMETER>",
"databaseURL": "<PARAMETER>",
"projectId": "<PARAMETER>",
"storageBucket": "<PARAMETER>",
"messagingSenderId": "<PARAMETER>"
}
firebase = pyrebase.initialize_app(config)
storage = firebase.storage()
def fb_upload(localFile):
# Define childref
childRef = "/test/test.txt"
storage.child(childRef).put(localFile)
# Get the file url
fbResponse = storage.child(childRef).get_url(None)
return fbResponse

The problem was that I opened my file with Write permissions only:
localFile = open(localFileName,"w+")
The solution was to close the write operation and opening it with Read permissions:
# close (Write)
localFile.close()
# Open (Read)
my_file = open(localFileName, "rb")
my_bytes = my_file.read()
# Store on FB
fbUploadObj = storage.child(storageRef).put(my_bytes)

Related

Google Translate API - Reading and Writing to Cloud Storage - Python

I'm using Google Translation API to translate a csv file with multiple columns and rows. The target language is english and the file has text in multiple languages.
The code posted below uses local files for testing but I'd like to use (import) file from the cloud storage bucket and export the translated file to a different cloud storage bucket.
I've tried to run the script below with my sample file and got an error message: "FileNotFoundError: [Errno 2] No such file or directory"
I stumbled upon this link for "Reading and Writing to Cloud Storage" but I was not able to implement the suggested solution into the script below. https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/read-write-to-cloud-storage#reading_from_cloud_storage
May I ask for a suggested modification of the script to import (and translate) the file from google cloud bucket and export the translated file to a different google cloud bucket? Thank you!
Script mentioned:
from google.cloud import translate
import csv
def listToString(s):
""" Transform list to string"""
str1 = " "
return (str1.join(s))
def detect_language(project_id,content):
"""Detecting the language of a text string."""
client = translate.TranslationServiceClient()
location = "global"
parent = f"projects/{project_id}/locations/{location}"
response = client.detect_language(
content=content,
parent=parent,
mime_type="text/plain", # mime types: text/plain, text/html
)
for language in response.languages:
return language.language_code
def translate_text(text, project_id,source_lang):
"""Translating Text."""
client = translate.TranslationServiceClient()
location = "global"
parent = f"projects/{project_id}/locations/{location}"
# Detail on supported types can be found here:
# https://cloud.google.com/translate/docs/supported-formats
response = client.translate_text(
request={
"parent": parent,
"contents": [text],
"mime_type": "text/plain", # mime types: text/plain, text/html
"source_language_code": source_lang,
"target_language_code": "en-US",
}
)
# Display the translation for each input text provided
for translation in response.translations:
print("Translated text: {}".format(translation.translated_text))
def main():
project_id="your-project-id"
csv_files = ["sample1.csv","sample2.csv"]
# Perform your content extraction here if you have a different file format #
for csv_file in csv_files:
csv_file = open(csv_file)
read_csv = csv.reader(csv_file)
content_csv = []
for row in read_csv:
content_csv.extend(row)
content = listToString(content_csv) # convert list to string
detect = detect_language(project_id=project_id,content=content)
translate_text(text=content,project_id=project_id,source_lang=detect)
if __name__ == "__main__":
main()
You could download the file from GCS and run your logic against the local (downloaded file) and then upload to another GCS bucket. Example:
Download file from "my-bucket" to /tmp
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("my-bucket")
source_blob = bucket.blob("blob/path/file.csv")
new_file = "/tmp/file.csv"
download_blob = source_blob.download_to_filename(new_file)
After translating/running your code logic, upload to a bucket:
bucket = client.get_bucket('my-other-bucket')
blob = bucket.blob('myfile.csv')
blob.upload_from_filename('myfile.csv')

Django - AWS S3 - Moving Files

I am using AWS S3 as my default file storage system. I have a model with a file field like so:
class Segmentation(models.Model):
file = models.FileField(...)
I am running image processing jobs on a second server that dump processsed-images to a different AWS S3 bucket.
I want to save the processed-image in my Segmentation table.
Currently I am using boto3 to manually download the file to my "local" server (where my django-app lives) and then upload it to the local S3 bucket like so:
from django.core.files import File
import boto3
def save_file(segmentation, foreign_s3_key):
# set foreign bucket
foreign_bucket = 'foreign-bucket'
# create a temp file:
temp_local_file = 'tmp/temp.file'
# use boto3 to download foreign file locally:
s3_client = boto3.client('s3')
s3_client.download_file(foreign_bucket , foreign_s3_key, temp_local_file)
# save file to segmentation:
segmentation.file = File(open(temp_local_file, 'rb'))
segmentation.save()
# delete temp file:
os.remove(temp_local_file)
This works fine but it is resource intensive. I have some jobs that need to process hundreds of images.
Is there a way to copy a file from the foreign bucket to my local bucket and set the segmentation.file field to the copied file?
I am assuming you want to move some files from one source bucket to some destination bucket, as the OP header suggests, and do some processing in between.
import boto3
my_west_session = boto3.Session(region_name = 'us-west-2')
my_east_session = boto3.Session(region_name = 'us-east-1')
backup_s3 = my_west_session.resource("s3")
video_s3 = my_east_session.resource("s3")
local_bucket = backup_s3.Bucket('localbucket')
foreign_bucket = video_s3.Bucket('foreignbucket')
for obj in foreign_bucket.objects.all():
# do some processing
# on objects
copy_source = {
'Bucket': foreign_bucket,
'Key': obj.key
}
local_bucket.copy(copy_source, obj.key)
Session configurations
S3 Resource Copy Or CopyObject depending on your requirement.

Upload and delete file in Sharepoint with python automatically

I need to put together the following scenario.
What libraries or frameworks should I use to complete this scenario?
I have basic knowledge of Python.
I found the following way to implement a 'file upload and delete process' in SharePoint with the use of few python codes.
You will need the two python libraries 'sharepoint' and 'shareplum'.
To install 'sharepoint': pip install sharepoint
To install 'shareplum': pip install SharePlum
Then you can implement the main code to upload and delete the files
as following:
sharepoint.py
from shareplum import Site, Office365
from shareplum.site import Version
import json, os
ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
config_path = '\\'.join([ROOT_DIR, 'config.json'])
# read config file
with open(config_path) as config_file:
config = json.load(config_file)
config = config['share_point']
USERNAME = config['user']
PASSWORD = config['password']
SHAREPOINT_URL = config['url']
SHAREPOINT_SITE = config['site']
SHAREPOINT_DOC = config['doc_library']
class SharePoint:
def auth(self):
self.authcookie = Office365(SHAREPOINT_URL, username=USERNAME, password=PASSWORD).GetCookies()
self.site = Site(SHAREPOINT_SITE, version=Version.v365, authcookie=self.authcookie)
return self.site
def connect_folder(self, folder_name):
self.auth_site = self.auth()
self.sharepoint_dir = '/'.join([SHAREPOINT_DOC, folder_name])
self.folder = self.auth_site.Folder(self.sharepoint_dir)
return self.folder
def upload_file(self, file, file_name, folder_name):
self._folder = self.connect_folder(folder_name)
with open(file, mode='rb') as file_obj:
file_content = file_obj.read()
self._folder.upload_file(file_content, file_name)
def delete_file(self, file_name, folder_name):
self._folder = self.connect_folder(folder_name)
self._folder.delete_file(file_name)
I save the above code in sharepoint.py file.
Then use the methods in the following way. I import the above methods
from the above 'sharepoint.py' file and use as follows:
updelsharepoint.py
from sharepoint import SharePoint
#i.e - file_dir_path = r'E:\project\file_to_be_uploaded.xlsx'
file_dir_path = r'E:\File_Path\File_Name_with_extension'
# this will be the file name that it will be saved in SharePoint as
file_name = 'File_Name_with_extension'
# The folder in SharePoint that it will be saved under
folder_name = 'SampleUploads'
# upload file
SharePoint().upload_file(file_dir_path, file_name, folder_name)
# delete file
SharePoint().delete_file(file_name, folder_name)
Finally, to configure your email, password, and SharePoint account you have to create a
'config.json' as follows.
config.json
{
"share_point":
{
"user": "{email}",
"password": "{password}",
"url": "https://{domain}.sharepoint.com",
"site": "https://{domain}.sharepoint.com/sites/{Name}/",
"doc_library": "Shared Documents/{Document}"
}
}
I hope this will help you to solve your problem. You can further improve the above sample code and implement a better code than what I shared.

Read text file from Firebase Storage in python

I am trying to read a file from Firebase storage under the sub-folder called Transcripts. When I try to read a text file which is in the root folder it works perfectly. However, it fails to read any text file under the sub-folder called "Transcripts".
Here, is the structure of my Firebase Storage bucket:
Transcripts/
Audio 1.txt
Audio 2.txt
Audio 1.amr
Audio 2.amr
Audio Name.txt
Here is the python code where I try to read the file in the root folder:
import pyrebase
config = {
"apiKey": "XXXXXXXX",
"authDomain": "XXXXXXXX.firebaseapp.com",
"databaseURL": "https://XXXXXXXX.firebaseio.com",
"projectId": "XXXXXXXX",
"storageBucket": "XXXXXXXX.appspot.com",
"messagingSenderId": "XXXXXXXX",
"appId": "XXXXXXXX",
"measurementId": "XXXXXXXX",
"serviceAccount":"/Users/faizarahman/Desktop/MY-PROJECT.json"
}
firebase = pyrebase.initialize_app(config) # initializing firebase
storage = firebase.storage() # getting storage reference 1
storage2 = firebase.storage() # getting storage reference 2 (to avoid overwriting to storage reference 1)
url = storage.child("Audio Name").get_url(None) # getting the url from storage
print(url) # printing the url
text_file = urllib.request.urlopen(url).read() # reading the text file
name_list = storage.child("Transcripts/").list_files() # getting all the list of files inside the Transcripts folder.
folder_name = "Transcripts/ "
for file in name_list: # iterating through all the files in the list.
try:
if folder_name in file.name: # Check if the path has "Transcripts"
transcript_name = file.name.replace("Transcripts/ ", "") # Extract the name from the file from "Transcripts/ Audio Number"
unicode_text = text_file.decode("utf-8") # converting the content inside the Audio Name file to a string value.
if transcript_name == unicode_text: # If the content inside the Audio Name file (the content will be a file name) matches with the file name then read that file
text_file1 = storage2.child("Audio Name").get_url(None) # for testing purposes the "Audio Name" works here...
print(text_file1)
except:
print('Download Failed')
The link that it gives me looks like this:
https://firebasestorage.googleapis.com/v0/b/MY-PROJECT-ID.appspot.com/o/Audio%20Name?alt=media
Here, is what I get when I click the link:
Reading Audio Name text file successful.
Here is the python code where I try to read the file in the "Transcripts" folder:
import pyrebase
config = {
"apiKey": "XXXXXXXX",
"authDomain": "XXXXXXXX.firebaseapp.com",
"databaseURL": "https://XXXXXXXX.firebaseio.com",
"projectId": "XXXXXXXX",
"storageBucket": "XXXXXXXX.appspot.com",
"messagingSenderId": "XXXXXXXX",
"appId": "XXXXXXXX",
"measurementId": "XXXXXXXX",
"serviceAccount":"/Users/faizarahman/Desktop/MY-PROJECT.json"
}
firebase = pyrebase.initialize_app(config) # initializing firebase
storage = firebase.storage() # getting storage reference 1
storage2 = firebase.storage() # getting storage reference 2 (to avoid overwriting to storage reference 1)
url = storage.child("Audio Name").get_url(None) # getting the url from storage
print(url) # printing the url
text_file = urllib.request.urlopen(url).read() # reading the text file
name_list = storage.child("Transcripts/").list_files() # getting all the list of files inside the Transcripts folder.
folder_name = "Transcripts/ "
for file in name_list: # iterating through all the files in the list.
try:
if folder_name in file.name: # Check if the path has "Transcripts"
transcript_name = file.name.replace("Transcripts/ ", "") # Extract the name from the file from "Transcripts/ Audio Number"
unicode_text = text_file.decode("utf-8") # converting the content inside the Audio Name file to a string value.
if transcript_name == unicode_text: # If the content inside the Audio Name file (the content will be a file name) matches with the file name then read that file
text_file1 = storage2.child("Transcripts/" + unicode_text).get_url(None) # for testing purposes the "Audio Name" works here but reading the file under transcripts does not work here...
print(text_file1)
except:
print('Download Failed')
The link that it gives me looks like this:
https://firebasestorage.googleapis.com/v0/b/MY-PROJECT-ID.appspot.com/o/Transcripts%2FAudio%202?alt=media
Here is what I get when I try to read the file which is inside the "Transcripts" sub folder:
Reading Audio 2 under transcript sub folder failed.
I believe the error is in this line:
text_file1 = storage2.child("Transcripts/" + unicode_text).get_url(None)

How to store Dataframe data to Firebase Storage?

Given a pandas Dataframe which contains some data, what is the best to store this data to Firebase?
Should I convert the Dataframe to a local file (e.g. .csv, .txt) and then upload it on Firebase Storage, or is it also possible to directly store the pandas Dataframe without conversion? Or are there better best practices?
Update 01/03 - So far I've come with this solution, which requires writing a csv file locally, then reading it in and uploading it and then deleting the local file. I doubt however that this is the most efficient method, thus I would like to know if it can be done better and quicker?
import os
import firebase_admin
from firebase_admin import db, storage
cred = firebase_admin.credentials.Certificate(cert_json)
app = firebase_admin.initialize_app(cred, config)
bucket = storage.bucket(app=app)
def upload_df(df, data_id):
"""
Upload a Dataframe as a csv to Firebase Storage
:return: storage_ref
"""
# Storage location + extension
storage_ref = data_id + ".csv"
# Store locally
df.to_csv(data_id)
# Upload to Firebase Storage
blob = bucket.blob(storage_ref)
with open(data_id,'rb') as local_file:
blob.upload_from_file(local_file)
# Delete locally
os.remove(data_id)
return storage_ref
With python-firebase and to_dict:
postdata = my_df.to_dict()
# Assumes any auth/headers you need are already taken care of.
result = firebase.post('/my_endpoint', postdata, {'print': 'pretty'})
print(result)
# Snapshot info
You can get the data back using the snapshot info and endpoint, and reestablish the df with from_dict(). You could adapt this solution to SQL and JSON solutions, which pandas also has support for.
Alternatively and depending on where you script executes from, you might consider treating firebase as a db and using the dbapi from firebase_admin (check this out.)
As for whether it's according to best practice, it's difficult to say without knowing anything about your use case.
if you just want to reduce code length and the steps of creating and deleting files, you can use upload_from_string:
import firebase_admin
from firebase_admin import db, storage
cred = firebase_admin.credentials.Certificate(cert_json)
app = firebase_admin.initialize_app(cred, config)
bucket = storage.bucket(app=app)
def upload_df(df, data_id):
"""
Upload a Dataframe as a csv to Firebase Storage
:return: storage_ref
"""
storage_ref = data_id + '.csv'
blob = bucket.blob(storage_ref)
blob.upload_from_string(df.to_csv())
return storage_ref
https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html#google.cloud.storage.blob.Blob.upload_from_string
After figuring out for hours, the following solution works for me. You need to convert your csv file to bytes & then upload it.
import pyrebase
import pandas
firebaseConfig = {
"apiKey": "xxxxx",
"authDomain": "xxxxx",
"projectId": "xxxxx",
"storageBucket": "xxxxx",
"messagingSenderId": "xxxxx",
"appId": "xxxxx",
"databaseURL":"xxxxx"
};
firebase = pyrebase.initialize_app(firebaseConfig)
storage = firebase.storage()
df = pd.read_csv("/content/Future Prices.csv")
# here is the magic. Convert your csv file to bytes and then upload it
df_string = df.to_csv(index=False)
db_bytes = bytes(df_string, 'utf8')
fileName = "Future Prices.csv"
storage.child("predictions/" + fileName).put(db_bytes)
That's all Happy Coding!
I found that starting from very modest size of dataframe (below 100KB!), and certainly for bigger ones, it's paying off to compress the data before storing. It does not have to be a dataframe, but it can be any onject (e.g. a dictionary). I used pickle below to compress. Your object can be seen on the usual firebase storage this way, and you get gains in memory and speed, both when writing and when reading, compared to uncompressed storage. For big objects it's also worth adding timeout for to avoid ConnectionError after the default timeout of 60 seconds.
import firebase_admin
from firebase_admin import credentials, initialize_app, storage
import pickle
cred = credentials.Certificate(json_cert_file)
firebase_admin.initialize_app(cred, {'storageBucket': 'YOUR_storageBucket (without gs://)'})
bucket = storage.bucket()
file_name = data_id + ".pkl"
blob = bucket.blob(file_name)
# write df to storage
blob.upload_from_string(pickle.dumps(df, timeout=300))
# read df from storage
df = pickle.loads(blob.download_as_string(timeout=300))

Categories