I'm making an app with plotly-dash to view *.mdf files (Python Library asammdf for loading the files). I made an Upload component (https://dash.plot.ly/dash-core-components/upload) to load the files. I thought of taking the full filename to pass this to the MDF function in the asammdf library to load the file and put data in a graph. However the dash Upload component only returns the filename and not the complete path so I cannot use the MDF function on this. The Upload component also outputs the content of the file as a binary string but not sure how I can pass this to the MDF function.
Somebody knows a way forward for this problem?
Actually I found out it is possible to work with the contents variable. The MDF function (as well as most read in functions I assume) check if the input is a 'file like' object or a string. If it is a 'file like' object, it directly reads from this object. The contents can be transformed as follows:
content_type, content_string = contents[0].split(',')
decoded = base64.b64decode(content_string)
file_like_object = io.BytesIO(decoded)
Related
How can I get the actual path of the file uploaded from the FileUpload widget? In my case, I am uploading a binary file and just need to know where it is so I can process it using an internal application.
When I iterate on the FileUpload value I see there is a file name as a key and a dict as the value:
up_value = self.widgets['file_upload'].value
for file_name, file_dict in up_value.items():
print(file_dict.keys()) # -> dict_keys(['metadata', 'content'])
print(file_dict['metadata'].keys()) # -> dict_keys(['name', 'type', 'size', 'lastModified'])
I know the content of the file is uploaded but really don't need that. Also not sure how I would pass that content to my internal processing application. I just want to create a dict that stores the filename as a key and the file path as the value.
thx
Once 'uploaded', the file exists as data in memory, so there is no concept of the originating 'path' in the widget.
If you want to read or write the file, access the up_value.value[0].content attribute. Your processing application needs to be able to handle bytes input, or you could wrap the data in a BytesIO object.
Im trying zip a few files from Google Storage.
The zipfile of Python doesnt find the files in gcloud, just in the project.
How can I do for my code find the files in gcloud?
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for revenue in revenues:
# queryset with files a lot, so, for a each file, add in zip
t = tempfile.NamedTemporaryFile()
t.write(revenue.revenue.name)
if revenue.revenue.name:
t.seek(0)
with default_storage.open(revenue.revenue.name, "r") as file_data:
zip_file.write(file_data.name, compress_type=zipfile.ZIP_DEFLATED)
# the code dont pass from this part
t.close()
response = HttpResponse(content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename=my_zip.zip'
response.write(zip_buffer.getvalue())
return response
In this part, I write the file that I opened from gcloud, but stop inside the function:
def write(self, filename, arcname=None, compress_type=None):
"""Put the bytes from filename into the archive under the name
arcname."""
if not self.fp:
raise RuntimeError(
"Attempt to write to ZIP archive that was already closed")
st = os.stat(filename)
# when I try find the file, the command os.stat search in project, not in gcloud
the "os.stat(filename)" search for a file in project, how can I do for find in the gcloud?
I will post my findings as an answer, since I would like to comment about few things.
I have understood:
You have a Python library zipfile that is used to work with ZIP files.
You are looking for files locally and add one by one into the ZIP file.
You would like to do this as well for files located in Google Cloud Storage bucket. But it is failing to find the files.
If I have misunderstood the use-case scenario, please elaborate further in a comment.
However, if this is exactly what you are trying to do, then this is not supported. In the StackOverflow Question - Compress files saved in Google cloud storage, it is stated that compressing files that are already in the Google Cloud Storage is not possible. The solution in that question is to subscribe to newly created files and then download them locally, compress them and overwrite them in GCS. As you can see, you can list the files, or iterate through the files stored in GCS, but you first need to download them to be able to process them.
Work around
Therefore, in your use-case scenario, I would recommend the following workaround, by using the Python client API:
You can use Listing objects Python API, to get all the objects from GCS.
Then you can use Downloading objects Python API, to download the objects locally.
As soon as the objects are located in local directory, you can use the zipfile Python library to ZIP them together, as you are already doing it.
Then the objects are ZIPed and if you no longer need the downloaded objects, you can delete them with os.remove("downloaded_file.txt").
In case you need to have the compressed ZIP file in the Google Cloud Storage bucket, then you can use the Uploading objects Python API to upload the ZIP file in the GCS bucket.
As I have mentioned above, processing files (e.g. Adding them to a ZIP files etc.) directly in Google Cloud Storage bucket, is not supported. You first need to download them locally in order to do so. I hope that my workaround is going to be helpful to you.
UPDATE
As I have mentioned above, zipping files while they are in GCS bucket is not supported. Therefore I have prepared for you a working example in Python on how to use the workaround.
NOTE: As I am not professional on operating os commands with Python
library and I am not familiar with zipfile library, there is
probably a better and more efficient way of achieving this. However,
the code that can be found in this GitHub link, does the following
procedures:
Under #Public variables: section change BUCKET_NAME to your corresponding bucket name and execute the python script in Google Cloud Shell. Cloud Shell
Now my bucket structure is as follows:
gs://my-bucket/test.txt
gs://my-bucket/test1.txt
gs://my-bucket/test2.txt
gs://my-bucket/directory/test4.txt
When executing the command, what the app does is the following:
Will get the path of where the script is executed. e.g. /home/username/myapp.
It will create a temporary directory within this directory e.g. /home/username/myapp/temp
It will iterate through all the files located in the bucket that you have specified and will download them locally inside that temp directory.
NOTE: If the file in the bucket is under directory it will simple download the file, instead of creating that sub-directory again. You can modify the code to make it work as you desired later.
So the new downloaded files will look like this:
/home/username/myapp/temp/test.txt
/home/username/myapp/temp/test1.txt
/home/username/myapp/temp/test2.txt
/home/username/myapp/temp/test4.txt
After that, the code will zip all those files to a new zipedFile.zip that will be located in the same directory with the main.py script that you have executed.
When this step is done as well, the script will delete the directory /home/username/myapp/temp/ with all of its contents.
As I have mentioned above, after executing the script locally, you should be able to see the main.py and an zipedFile.zip file with all the zipped files from the GCS bucket. Now you can take the idea of implementation and modify it according to your project's needs.
the final code:
zip_buffer = io.BytesIO()
base_path = '/home/everton/compressedfiles/'
fiscal_compentecy_month = datetime.date(int(year), int(month), 1)
revenues = CompanyRevenue.objects.filter(company__pk=company_id, fiscal_compentecy_month=fiscal_compentecy_month)
if revenues.count() > 0:
path = base_path + str(revenues.first().company.user.pk) + "/"
zip_name = "{}-{}-{}-{}".format(revenues.first().company.external_id, revenues.first().company.external_name, month, year)
for revenue in revenues:
filename = revenue.revenue.name.split('revenues/')[1]
if not os.path.exists(path):
os.makedirs(path)
with open(path + filename, 'wb+') as file:
file.write(revenue.revenue.read())
file.close()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for file in os.listdir(path):
zip_file.write(path + file, compress_type=zipfile.ZIP_DEFLATED)
zip_file.close()
response = HttpResponse(content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename={}.zip'.format(zip_name)
response.write(zip_buffer.getvalue())
shutil.rmtree(path)
return response
I am trying to load a dataset for my machine learning project and it requires me to load files having no extensions.
I tried :
import os
import glob
files = filter(os.path.isfile, glob.glob("./[0-9]*"))
for name in files:
with open(name) as fh:
contents = fh.read()
But doesn't return anything, mainly that glob command has nothing in it.
Also tried :
import os
import glob
path = './dataset1/training_validation/2012-07-10/'
for infile in glob.glob(os.path.join(path, '*')):
print("test")
file = open(infile, 'r')
print(file)
but this returns [] because of that glob command.
I'm stuck in here and couldn't find anything over the internet.
My actual problem is to load 'no extension files in a training and testing set' from two folders, validation, and the test itself. I can iterate through the folder but don't know how to handle those file types.
When I open those files in a text editor. it shows me something like this.
So I know that it's a binary format of an image, but have no idea how can I store and train them.
any help would be appreciated. thanks.
Two things:
File extensions (.txt , .dat , .bat, .f90, etc.) are not meaningful to python, at least when using glob or numpy or something of the sort, because it's just part of a string. Some of us are raised (within Windows) to believe that file extensions mean something (I too fell for it).
The file you are looking at is a text file, containing the ASCII representation of a binary image on 0's and 1's. So, it's not a binary file, and it's not an image file (per-se), but it is a text file, which means we can read it as such from python.
To read this in, you could do either:
1. Use numpy to do data = numpy.loadtxt(<filename>), however you might have trouble delimiting the digits.
2. Use Python's standard open function on the file, and loop through each line using for line in <file_handle>:. This way, each row of data is a string, which can be parsed easily (see documentation on string indexing).
Good luck!
IMO this simply means that your path does not exist.
Perhaps you try in a first test an absolute path to your folder, as you eventually confused the relative position of the folder to your current working directory.
I got it to work with the following code.
fileNames = [f for f in listdir(dirName) if isfile(join(dirName, f))]
random.shuffle(fileNames)
for files in fileNames:
data = open(dirName+'/'+files,'r');
Thanks for your responses.
I am not a developer and I have a very limited knowledge of programming. I understand how to use and operate python scripts, however writing them is something I am yet to learn. Please can someone help a total noob :)
I am using an API by sightengine to asses a large folder of .jpg images for their sharpness and colour properties. The documentation for the API only provides a small script for assessing one image at a time. I have spoken to Sight Engines support and they are unwilling to provide a script for batch processing which is bizarre considering all other API companies usually do.
I need some help creating a for loop that will use a python script to iterate through a folder of images and output the API result into a single JSON file. Any help in how to structure this script would be extremely appreciated.
Here is the sightengine code for a simple one image check:
from sightengine.client import SightengineClient
client = SightengineClient("{api_user}", "{api_secret}")
output = client.check('properties','type').set_file('/path/to/local/file.jpg')
Thank you
Part of this is sort of a guess, as I don't know exactly what output will look like. I am assuming it's returned as json format. Which if thats the case, you can append the individual json responses into a single json structure, then use json.dump() to write to file.
So that part is a guess. The other aspect is you want to iterate through your jpg files, which you can do by using os and fnmatch. Just adjust the root directory/folder for it to walk through while it searches fro all the .jpg extensions.
from sightengine.client import SightengineClient
import os
from fnmatch import fnmatch
import json
client = SightengineClient("{api_user}", "{api_secret}")
# Get your jpg files into a list
r = 'C:/path/to/local'
pattern = "*.jpg"
filenames = []
for path, subdirs, files in os.walk(r):
for name in files:
if fnmatch(name, pattern):
#print (path+'/'+name)
filenames.append(path+'/'+name)
# Now iterate through those jpg files
jsonData = []
for file in filenames:
output = client.check('properties','type').set_file(file)
jsonData.append(output)
with open('C:/result.json', 'w') as fp:
json.dump(jsonData, fp, indent=2)
I trying to retrieve a complete list of files from a given directory with code like this
uri = '%s' % fentry.content.src
feed = gd_client.GetDocumentListFeed(uri=uri)
for r in feed.entry:
print r.title.text.decode("utf-8")
It works except that it only return "real" Google Documents files and does not return files, which were uploaded but not converted, e.g. *.docx files.
Is there any way to get complete list of files in given directory?
I have the suspicion that you are using a wrong uri. Read here about the different options you have:
http://code.google.com/intl/en-US/apis/documents/docs/3.0/developers_guide_protocol.html#ListDocs