How to get the path of a file upload on ipywidgets? - python

How can I get the actual path of the file uploaded from the FileUpload widget? In my case, I am uploading a binary file and just need to know where it is so I can process it using an internal application.
When I iterate on the FileUpload value I see there is a file name as a key and a dict as the value:
up_value = self.widgets['file_upload'].value
for file_name, file_dict in up_value.items():
print(file_dict.keys()) # -> dict_keys(['metadata', 'content'])
print(file_dict['metadata'].keys()) # -> dict_keys(['name', 'type', 'size', 'lastModified'])
I know the content of the file is uploaded but really don't need that. Also not sure how I would pass that content to my internal processing application. I just want to create a dict that stores the filename as a key and the file path as the value.
thx

Once 'uploaded', the file exists as data in memory, so there is no concept of the originating 'path' in the widget.
If you want to read or write the file, access the up_value.value[0].content attribute. Your processing application needs to be able to handle bytes input, or you could wrap the data in a BytesIO object.

Related

Getting a file name after being loaded into Google Colab

What I want: to get the file name of the file that was just uploaded into Google Colab via the following code.
from google.colab import files
uploaded = files.upload()
I tried: printing the file once I uploaded it and I get the following output.
print(uploaded)
{'2019.06.01_Short Maintenance - Vehicle, Malibu_TGZ.csv': b'Year,Miles,$/Gallon,Total $,Vehicle\r\n6/1/2019,343.4,2.529,28,Malibu\r\n6/8/2019,34.3,2.529,5,Malibu\r\n6/8/2019,315.6,2.529,33.1,Malibu\r\n6/30/2019,323,2.399,30,Malibu\r\n7/5/2019,316.4,2.559,31,Malibu\r\n7/12/2019,334.6,2.529,30.45,Malibu\r\n7/21/2019,288.7,2.459,33.75,Malibu\r\n7/29/2019,336.7,2.419,28,Malibu\r\n8/6/2019,317.3,2.379,30.45,Malibu\r\n8/14/2019,340.9,2.359,30.1,Malibu\r\n8/22/2019,307.4,2.299,29.85,Malibu\r\n9/1/2019,239.1,2.279,29.7,Malibu\r\n9/14/2019,237.8,2.419,28.9,Malibu\r\n9/6/2019,288,2.469,30.4,Malibu\r\n10/13/2019,305.7,2.299,27.81,Malibu\r\n10/20/2019,330.7,2.369,30.05,Malibu\r\n11/8/2019,257,2.429,32.4,Malibu\r\n12/3/2019,249.3,2.319,5.01,Malibu\r\n12/7/2019,37.2,2.099,25,Malibu\r\n12/22/2019,276.4,2.229,29.4,Malibu\r\n1/12/2020,334,2.199,5,Malibu\r\n1/19/2020,51,2.009,28.15,Malibu\r\n2/8/2020,231.5,2.079,25.8,Malibu\r\n2/23/2020,254.7,2.159,25.75,Malibu\r\n3/19/2020,235.3,1.879,23.15,Malibu\r\n5/22/2020,303,1.699,23.15,Malibu\r\n'}
It appears to be a dic with the key as the file name, and value as a string of all the data in the file. I don't know how to ge the key value, assuming that is what I need to do.
It's in the keys of uploaded. You can use iter() and next() to get it.
filename = next(iter(uploaded))
Access by getting keys
filenames = uploaded.keys()
for file in filenames:
data = uploaded[file]
If you have more than one file, just create a list for your data and append retrieved values.

How to iterate over JSON files in a directory and upload to mongodb

So I have a folder with about 500 JSON files. I need to upload all of them to a local mongodb database. I tried using Mongo Compass, but Compass can only upload one file at a time. In python I tried to write some simple code to iterate through the folder and upload them one by one, but I ran into some problems. First of all the JSON files are not comma-separated, rather line separated. So the files look like:
{ some JSON object }
{ some JSON object }
...
I wrote the following code to iterate through the folder and upload it:
import pymongo
import json
import pandas as pd
import numpy as np
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient['Test']
mycol = mydb['data']
directory = os.fsencode("C:/Users/PB/Desktop/test/")
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".json"):
mycol.insert_many(filename)
The code basically goes through a folder, checks if it's a .json file, then inserts it into the database. That is what should happen. However, I get this error:
TypeError: document must be an instance of dict, bson.son.SON,
bson.raw_bson.RawBSONDocument, or a type that inherits from
collections.MutableMapping
I cannot seem to upload it through python. I tried multiple variations of the code, but for some reason the python does not accept the json files.
The problem with these files seems to be that python only allows for comma-separated JSON files.
How could I fix this to upload all the files?
You're inserting the names of the files to mongo. Not the contents of the file.
Assuming you have multiple json files in a directory, where each file contains a json-object in each line...
You need to go through all the files, filter them, open them, read them line by line, parse each line into a dict, and then insert. Something like below:
os.chdir(directory)
for file in os.listdir(directory):
if file.endswith(".json"):
with open(file) as f:
for line in f:
mongo_obj = json.loads(line)
mycol.insert(mongo_obj)
I did a chdir first to avoid having to pass the whole path to open

Uploading files with plotly-dash

I'm making an app with plotly-dash to view *.mdf files (Python Library asammdf for loading the files). I made an Upload component (https://dash.plot.ly/dash-core-components/upload) to load the files. I thought of taking the full filename to pass this to the MDF function in the asammdf library to load the file and put data in a graph. However the dash Upload component only returns the filename and not the complete path so I cannot use the MDF function on this. The Upload component also outputs the content of the file as a binary string but not sure how I can pass this to the MDF function.
Somebody knows a way forward for this problem?
Actually I found out it is possible to work with the contents variable. The MDF function (as well as most read in functions I assume) check if the input is a 'file like' object or a string. If it is a 'file like' object, it directly reads from this object. The contents can be transformed as follows:
content_type, content_string = contents[0].split(',')
decoded = base64.b64decode(content_string)
file_like_object = io.BytesIO(decoded)

Django:Any way to get date and name of uploaded file later?

Is there any way to get the uploaded file date and name which we have stored into the database using forms ?
Right now I am just creating two more database tuples for name and date and storing them like this file_name = request.FILES['file'].name for file_name and storing date using upload_date = datetime.datetime.now()
You can kind of get the date after reading the metadata of the file using the stat module.
http://docs.python.org/release/2.5.2/lib/module-stat.html
It is OS specific but ST_CTIME should give you approximately what you looking for.
For the name, you can easily get it from the way you store. Specify a custom handler that stores the file at /your/file/path/filename.extension and just manipulate the string for the filename
Just read this in the flask docs. Not sure how much it is applicable in Django but pasting here for reference
*If you want to know how the file was named on the client before it was uploaded to your application, you can access the filename attribute. However please keep in mind that this value can be forged so never ever trust that value. If you want to use the file-name of the client to store the file on the server, pass it through the secure_filename() function that Werkzeug provides for you*
You can use the original file's name as part of the file name when storing in the disk, and you probably can use the file's creation/modification date for the upload date. IMO, you should just store it explicitly in the database.

Google Docs python gdata 2.0.16 upload file to existing collection

I have managed to create a simple app which deletes (bypassing the recycle bin) any files I want to. It can also upload files. The problem I am having is that I cannot specify which collection the new file should be uploaded to.
def UploadFile(folder, filename, local_file, client):
print "Upload Resource"
doc = gdata.docs.data.Resource(type='document', title=filename)
path = _GetDataFilePath(local_file)
media = gdata.data.MediaSource()
media.SetFileHandle(path, 'application/octet-stream')
create_uri = gdata.docs.client.RESOURCE_UPLOAD_URI + '?convert=false'
collection_resource = folder
upload_doc = client.CreateResource(doc, create_uri=create_uri, collection=collection_resource, media=media)
print 'Created, and uploaded:', upload_doc.title, doc.resource_id
From what I understand the function CreateResources requires a resource object representing the collection. How do I get this object? The variable folder is currently just a string which says 'daily' which is the name of the collection, it is this variable which I need to replace with the collection resource.
From various sources, snippets and generally stuff all over the place I managed to work this out. You need to pass a uri to the FindAllResources function (one which I found no mention of in the sample code from gdata).
I have written up in more detail how I managed to upload, delete (bypassing the bin), search for and move files into collections
here

Categories