I'm trying to upload and replace a file using google colab.
Currently what I do is
from google.colab import files
upload = files.upload()
then if I need to modify the file, I do it locally on my computer. If I upload it again using the same cell, the new version of the file will be uploaded as "filename(1)". I would like the new version to replace the old one.
What I do then is
!rm "filename"
And then I run the first cell again. But it is not great.
Is there an option like what follows ?
upload = files.upload(replace=True)
My approach is as follows.
lsdi = os.listdir('/content')
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name = fn, length = len(uploaded[fn])))
if fn in lsdi:
os.remove(fn)
lsdi = os.listdir('/content') # list is in arbitrary order
for k in sorted(lsdi,reverse=True): # sorted to get the most recent file name.
fil_dados = re.match(fn[:fn.rfind('.')],k)
if fil_dados:
fn = k
Related
I would like to write a script which will detect new files (csv files in this case) that have been added to a folder then upload these new files to my AWS S3 Bucket. I would like them to maintain the original name. Currently the script i have allows me to manually select a file then upload it with a name of my choice.
hc = pd.read_csv((open(r'CSV PATH')))
s3 = boto3.client('s3',aws_access_key_id = 'ACCESSKEYID',
aws_secret_access_key = 'ACCESSKEY')
csv_buf = StringIO()
hc.to_csv(csv_buf, header = True, index = False)
csv_buf.seek(0)
s3.put_object(Bucket = 'BucketName', Body = csv_buf.getvalue(), Key = 'Original CSV Name from Above')
I assume i need the following section in the code:
Code to monitor said location (but only when running the app - does not need to run 24/7)
Code to pull new file from said location
Code to upload to S3 Bucket
Any Tips?
I have a requirement to copy the file between two bucket detailed below -
Bucket A /folder A is source inbound box for daily files which are created as f1_abc_20210304_000 > I want to scan the latest file in folder A (10 files every day) and copy the latest file and next > Copy it in to Bucket B/Folder B / FILE name (ie from 10 files) / 2021/03/04 and drop the files in 04 folder.
Any suggestion how I should proceed with the design?
Thanks
RG
Did you want to do this copy task using Airflow?
If yes, Airflow provide GCSToGCSOperator
One approach is by using client libraries, for the example below I'm using the python client library for google cloud storage.
move.py
from google.cloud import storage
from google.oauth2 import service_account
import os
# as mention on https://cloud.google.com/docs/authentication/production
key_path = "credentials.json"
credentials = service_account.Credentials.from_service_account_file(key_path)
storage_client = storage.Client(credentials=credentials)
bucket_name = "source-bucket-id"
destination_bucket_name = "destination-bucket-id"
source_bucket = storage_client.bucket(bucket_name)
# prefix 'original_data' is the folder where i store the data
array_blobs = source_bucket.list_blobs(prefix='original_data')
filtered_dict = []
for blob in array_blobs:
if str(blob.name).endswith('.csv'):
#add additional logic to handle the files you want to ingest
filtered_dict.append({'name':blob.name,'time':blob.time_created})
orderedlist = sorted(filtered_dict, key=lambda d: d['time'], reverse=True)
latestblob = orderedlist[0]['name']
# prefix 'destination_data' is the folder where i want to move the data
destination_blob_name = "destination_data/{}".format(os.path.basename(latestblob))
source_blob = source_bucket.blob(latestblob)
destination_bucket = storage_client.bucket(destination_bucket_name)
blob_copy = source_bucket.copy_blob(source_blob, destination_bucket, destination_blob_name)
print(
"Blob {} in bucket {} copied to blob {} in bucket {}.".format(
source_blob.name,
source_bucket.name,
blob_copy.name,
destination_bucket.name,
)
)
For a bit of context on the code, what I did was to use the google cloud storage python client, log in, get the list of files from my source folder original_data inside bucket source-bucket-id and add the relevant files ( you can modify the pick up logic by adding your own criteria which fits your situation ). After that I pick up the latest files based on time creation and use that name to move it into my destination-bucket-id. As a note, destination_bucket_name variable includes the folder where I want to allocate the file and also the end filename.
UPDATE: I miss the airflow tag. So on that case you should use the operator that comes with google provider which is GCSToGCSOperator. The parameters to pass can be obtained using a python operator and pass it to your operator. It will work like this:
#task(task_id="get_gcs_params")
def get_gcs_params(**kwargs):
date = kwargs["next_ds"]
# logic should be as displayed on move.py
# ...
return {"source_objects":source,"destination_object":destination}
gcs_params = get_gcs_params()
copy_file = GCSToGCSOperator(
task_id='copy_single_file',
source_bucket='data',
source_objects= gcs_params.output['source_objects'],
destination_bucket='data_backup',
destination_object= gcs_params.output['destination_object'],
gcp_conn_id=google_cloud_conn_id
)
For additional guidance you can check the cloud storage examples list. I use Copy an object between buckets for guidance.
I have a .txt file that I need to upload into a Dropbox folder. On my PC it works great as it is however I need to put the code into a Google Cloud Function and as the GCP file system is read-only - this method if failing.
Can anyone recommend an alternative way of doing this that doesn't require me to save the data locally before pushing it up into Dropbox?
Here is my current working code for my local version:
import pathlib
import dropbox
api_key = 'XXXXXXXXXX'
# Build String And Save Locally To File
string = ["Item_A","Item_B","Item_C","Item_D"]
string = str(string)
with open('Item_List.txt', 'w') as f:
f.write(string)
# Define Local File Path
localfolder = pathlib.Path(".")
localpath = localfolder / 'Item_List.txt'
# Define Dropbox Target Location
targetfile = '/Data/' + 'Item_List.txt'
# Initilize Dropbox
d = dropbox.Dropbox(api_key)
# Upload File To Dropbox
with localpath.open("rb") as f:
d.files_upload(f.read(), targetfile, mode=dropbox.files.WriteMode("overwrite"))
If you need to simply use byte data, you can use the built-in bytes function to convert a string to byte data (you need to also specify encoding):
data = ["Item_A", "Item_B", "Item_C", "Item_D"]
string_data = str(data)
byte_data = bytes(string_data, encoding='utf-8')
And then later just use the byte data as the argument:
d.files_upload(byte_data, targetfile, mode=dropbox.files.WriteMode("overwrite"))
I am trying to create a simple GUI with streamlit and python for my aspect-based sentiment analysis project, the user should be able to upload a .txt file so that I can run the model on that file. I already created the widget for uploading a file. My question is:
The uploaded file should be added to a specific folder, how can I specify an exact location for the uploaded file to be saved?
uploaded_file = st.file_uploader('FILE UPLOAD')
(This is the code for the upload widget)
The file_uploader function does not save the file to disk, it writes to a BytesIO buffer.
The UploadedFile class is a subclass of BytesIO, and therefore it is “file-like”. This means you can pass them anywhere where a file is expected.
https://docs.streamlit.io/en/stable/api.html?highlight=file_uploader#streamlit.file_uploader
If you want to save the result as a file, use the standard Python file io capabilities:
with open(filename, "wb") as f:
f.write(buf.getbuffer())
To add to what #RandyZwitch said you can use this function to save to a directory of your choice (directory/folder "tempDir")
def save_uploaded_file(uploadedfile):
with open(os.path.join("tempDir",uploadedfile.name),"wb") as f:
f.write(uploadedfile.getbuffer())
return st.success("Saved file :{} in tempDir".format(uploadedfile.name))
And apply the function below your uploaded file like below
datafile = st.file_uploader("Upload CSV",type=['csv'])
if datafile is not None:
file_details = {"FileName":datafile.name,"FileType":datafile.type}
df = pd.read_csv(datafile)
st.dataframe(df)
# Apply Function here
save_uploaded_file(datafile)
You can define path like this:
from pathlib import Path
path = "C:/Projects/ex1/your_file"
file_path = Path(path)
uploaded_file = st.file_uploader(file_path)
i created a localhost api to analyis images and compare them (computer vision) project!
my plan is to upload images from my data folder to the server, each image file in folder is named (fake_name.jpg/jpeg) i am trying to add the file name as a person name in parameters but can only do it manually and for each file.
i am also trying to figure out how to upload multiple files.
def image_to_base64(self, img):
# convert image to base64
prependInfo = 'data:image/jpeg;base64,'
encodedString = base64.b64encode(img).decode("utf-8")
fullString = str(prependInfo) + encodedString
return str(fullString)
# the following part is to create entry in database:
def create_person_entry(self,img):
base_url = "localhost:8080/service/api/person/create?"
parameters = {
"person-name": 'homer simson' #manual change name from here before each upload
}
data = {
"image-data": self.image_to_base64(img)
}
r = requests.post(base_url+urllib.parse.urlencode(parameters),headers{'Authorization':self.auth_tok}, data=data).json()
return r
#to import 1 image i used:
#with open("///data/homer simson.jpg", "rb") as img:
person_name = cvis.create_person(img.read())
print (person_name)
it uploads successfuly but i have to manualy name the person entry from parameters "person-name" for each person i upload! researched everywhere to automate solution!
edit1:
i managed to get this code working and it worked
# to upload multiple images
#folder with JPEG/JPG files to upload
folder = "/home///data/"
#dict for files
upload_list = []
for files in os.listdir(folder): with open("{folder}{name}".format(folder=folder, name=files), "rb") as data:
upload_list.append(files)
person_name = cvis.create_person(data.read())
print (person_name)
i managed to upload all images from directory to server it worked but now all my files are named homer simpson :)
i finally managed to get this right at the suggestion made by AKX his solution is below plz upvote, thanks
Now i need to figure out how to delete the previous no name entries.. will check API documentation.
Am I missing something – why not just add another argument to your create_person_entry() function?
def create_person_entry(self, name, img):
parameters = {
"person-name": name,
}
# ...
return r
# ...
cvis.create_person_entry("homer_simpson", img.read())
And if you have a folderful of images,
import os
import glob
for filename in glob.glob("people/*.jpg"):
file_basename = os.path.splitext(os.path.basename(filename))[0]
with open(filename, "rb") as img:
cvis.create_person_entry(file_basename, img.read())
will use the file's name sans extension, e.g. people/homer_simpson.jpg is homer_simpson.