Allen Brain Institute - brain observatory example

Allen Brain Institute - brain observatory example - python

I'm trying to follow the example of brain observatory ipython notebook.
However, I became stuck loading the nwb file like below.
from allensdk.core.brain_observatory_cache import BrainObservatoryCache
boc = BrainObservatoryCache(manifest_file='boc/manifest.json')
data_set = boc.get_ophys_experiment_data(501940850) # problem here
So, I opened the nwb file by HDFview.
All of the brain observatory nwb files were not opened except for 502376461.nwb.
It threw the following error:
IOError: Unable to open file (Truncated file: eof = 82280448, sblock->base_addr = 0, stored_eoa = 204046519)
When I tried to open the 502376461.nwb in the ipython notebook example from allen, it worked!! But the others (501940850, 503820068...) failed like above.

Summarizing the thread from github:
https://github.com/AllenInstitute/AllenSDK/issues/22
The files were partially downloaded or corrupted somehow. No exceptions were reported during the download, so urllib must not have noticed a problem.
AllenSDK developers are investigating some sort of file consistency check and/or a different HTTP library.
https://github.com/AllenInstitute/AllenSDK/issues/28
If others run into this, you can delete the bad file and re-run the download function (BrainObservatoryCache.get_ophys_experiment_data). Files are downloaded into a subdirectory of the BrainObservatoryCache manifest file, which defaults to the current working directory if unspecified.

Related

python-win32com excel com model started generating errors

Over the last few days, I have been working on automating the generation of some pivot tables for a number of reports.
Boiled down to the minimum, the following code was working without issue:
import win32com.client
objExcelApp = win32com.client.gencache.EnsureDispatch('Excel.Application')
objExcelApp.Visible = 1
This would pop-up an instance of excel and I could continue working in Python. But suddenly, today my scripts are failing with the following:
>>>import win32com.client
>>> objExcelApp = win32com.client.gencache.EnsureDispatch('Excel.Application')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files (x86)\Python37-32\lib\site-packages\win32com\client\gencache.py", line 534, in EnsureDispatch
mod = EnsureModule(tla[0], tla[1], tla[3], tla[4], bForDemand=bForDemand)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\win32com\client\gencache.py", line 391, in EnsureModule
module = GetModuleForTypelib(typelibCLSID, lcid, major, minor)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\win32com\client\gencache.py", line 266, in GetModuleForTypelib
AddModuleToCache(typelibCLSID, lcid, major, minor)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\win32com\client\gencache.py", line 552, in AddModuleToCache
dict = mod.CLSIDToClassMap
AttributeError: module 'win32com.gen_py.00020813-0000-0000-C000-000000000046x0x1x9' has no attribute 'CLSIDToClassMap'
The code has not changed from yesterday to today. I have no idea what is happening!!!.
Another interesting kicker. if I do the same code in the same session again I get a different error:
>>> objExcelApp = win32com.client.gencache.EnsureDispatch('Excel.Application')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files (x86)\Python37-32\lib\site-packages\win32com\client\gencache.py", line 534, in EnsureDispatch
mod = EnsureModule(tla[0], tla[1], tla[3], tla[4], bForDemand=bForDemand)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\win32com\client\gencache.py", line 447, in EnsureModule
if module.MinorVersion != tlbAttributes[4] or genpy.makepy_version != module.makepy_version:
AttributeError: module 'win32com.gen_py.00020813-0000-0000-C000-000000000046x0x1x9' has no attribute 'MinorVersion'
>>>
So I jump over to a windows machine with a fresh windows install, install python37 and pip install pypiwin32. Run the very same lines and excel opens just like it did yesterday on my original machine.
I tried un-installing and re-installing with no luck. Any idea what is going on here?
NOTE:
Dynamic dispatch still works:
import win32com.client
objExcelApp = win32com.client.Dispatch("Excel.Application")
objExcelApp.Visible = 1
But I specifically need static dispatch as Pivot Tables won't work with a dynamically dispatched object (much later in my code):
objExcelPivotCache = objExcelWorkbook.PivotCaches().Create(SourceType=win32c.xlDatabase, SourceData=objExcelPivotSourceRange)

I had the same issue and I resolved it by following the instructions here: https://mail.python.org/pipermail/python-win32/2007-August/006147.html
Deleting the gen_py output directory and re-running makepy SUCCEEDS
and subsequently the test application runs OK again.
So the symptom is resolved, but any clues as to how this could have
happened. This is a VERY long running application (think 24x7 for
years) and I'm concerned that whatever caused this might occur again.
To find the output directory, run this in your python console / python session:
import win32com
print(win32com.__gen_path__)
Based on the exception message in your post, the directory you need to remove will be titled '00020813-0000-0000-C000-000000000046x0x1x9'. So delete this directory and re-run the code. And if you're nervous about deleting it (like I was) just cut the directory and paste it somewhere else.
💡Note that this directory is usually in your "TEMP" directory (copy-paste %TEMP%/gen_py in Windows File Explorer and you will arrive there directly).
I have no idea why this happens nor do I know how to prevent it from happening again, but the directions in the link I provided seemed to work for me.

A more straightforward solution was posted in a related question Issue in using win32com to access Excel file.
Basically, you just need to delete the folder C:\Users\<your username>\AppData\Local\Temp\gen_py and rerun your code.
💡TIP: You can also put in your Windows file explorer %TEMP%\gen_py to access it directly, and then delete its content.

Execute this command line in a powershell or cmd (NOT in Administrator mode => wouldn't work for me)
python -m win32com.client.makepy "Excel.Application"
It fixes all errors and you don't have to change your python code.
And keep using
win32com.client.gencache.EnsureDispatch("Excel.Application")
With gencache.EnsureDispatch you have access to the constants of the application loaded dynamically by makepy which must have the registered application (in our case Excel.Application).
If you have the same problem with Outlook, use "Outlook.Application" in above.
If still not working, reinstall pywin32 of your python distribution
<path to python root or venv>\pip.exe uninstall pywin32
<path to python root or venv>\pip.exe install pywin32

What has worked for me is:
excel = win32.gencache.EnsureDispatch('Excel.Application')
#change to =>
excel = win32.Dispatch('Excel.Application')

For me, it seems, the issue was that I have multiple processes that interact with Windows apps through win32com.
Since win32com creates the "gen_py" directory in win32api.GetTempPath() this can cause conflicts and the cache getting corrupted.
My solution is to set a custom location for "gen_py" for each process. A simple example:
from pathlib import Path
import win32com
gen_py_path = '/some/custom/location/gen_py'
Path(gen_py_path).mkdir(parents=True, exist_ok=True)
win32com.__gen_path__ = gen_py_path
# Any other imports/code that uses win32com
This way you don't have to delete the default "gen_py" folder and wonder what issues might arise. But if you still find you need to delete, you can just delete the custom folder and know you're deleting the cache just for that process.

To add to this discussion, for those receiving this error as part of an unsupervised automated process, you can fully automating the recovery process and allow any processes to continue unsupervised.
As answers by Ian and Qin mention, we need to delete the gen_py output directory and restart the process. As far as automation goes, there are two problems with this: the gen_py output directory is a temporary directory that can change, and currently running processes that rely on gen_py continue to fail even after it is regenerated.
To address these problems, we can dynamically look up the location, then we can completely nuke the whole process and restart it. I've tried deleting and re-importing win32com, but it seems a reference to the corrupted cache is still maintained, so the whole process needs to be restarted.
temp_data_dir = os.environ.get('LOCALAPPDATA'))
gen_py_dir = ''
for curr_path, dirs, files, in walk(temp_data_dir)
if 'gen_py' in dirs:
gen_py_dir = Path(curr_path).joinpath('gen_py')
shutil.rmtree(gen_py_dir)
execv(restart_args)
The restarted process should call win32com.client.gencache.EnsureDispatch('Excel.Application') again and a fresh copy of gen_py will now be utilized, then we can retry the failed Excel automation code.
The issue happens sporadically in a completely unreliable way that I can't replicate. My best guess is that the gen_py cache is sometimes corrupted somehow and just needs to be refreshed.

Why isn't my code working in command line?

I am using Python3.6 and I need to run my code in command line. The code works when I run it in PyCharm but when I use command line I get this error:
File "path", line 43, in <module>
rb = ds.GetRasterBand(1)
AttributeError: 'NoneType' object has no attribute 'GetRasterBand'
It seems that I have a problem with these lines:
ds = gdal.Open('tif_file.tif', gdal.GA_ReadOnly)
rb = ds.GetRasterBand(1)
img_array = rb.ReadAsArray()
Does anyone know what I might have done wrong?
EDIT
Some magic just happened. I tried to run my code this morning and everything seems fine. I guess what my computer needed was a restart or something. Thanks to you all for help.

from the gdal documentation:
from osgeo import gdal
dataset = gdal.Open(filename, gdal.GA_ReadOnly)
if not dataset:
...
Note that if GDALOpen() returns NULL it means the open failed, and
that an error messages will already have been emitted via CPLError().
If you want to control how errors are reported to the user review the
CPLError() documentation. Generally speaking all of GDAL uses
CPLError() for error reporting. Also, note that pszFilename need not
actually be the name of a physical file (though it usually is). It's
interpretation is driver dependent, and it might be an URL, a filename
with additional parameters added at the end controlling the open or
almost anything. Please try not to limit GDAL file selection dialogs
to only selecting physical files.
looks like the file you are trying to open is not a valid gdal file or some other magic is going on in the file selection. you could try to direct the program to a known good file online to test it.

Django TemporaryUploadedFile does not exist but nevertheless it is read successfully

I have the following situation here. My OS shows that django TemporaryUploadedFile which I got via the POST request does not exist anymore but somehow this uploaded file can be read.
Here is the code
text_file = request.FILES['text_file']
print(text_file.temporary_file_path())
os.system('ls -l ' + text_file.temporary_file_path())
fs = FileSystemStorage()
file_new =fs.save(text_file.name, text_file)
print(text_file.temporary_file_path())
os.system('ls -l ' + text_file.temporary_file_path())
fs.delete(file_new)
for chunk in text_file.chunks():
text += chunk.decode(encoding)
print('Got text OK.')
This gives the following output:
/tmp/tmp0tngal9t.upload foo.txt
-rw------- 1 mine machine 3072889 oct 18 19:29 /tmp/tmp0tngal9t.upload
/tmp/tmp0tngal9t.upload foo.txt
ls: cannot access '/tmp/tmp0tngal9t.upload': No such file or directory
Got text OK.
So TemporaryUploadedFile is disappeared after it was saved to file_new which later is also deleted. Anyway text_file is successfully read by chunks and I get all the text from uploaded foo.txt file. How it is possible? From where text_file.chunks() gets the data if text_file does not exist anymore?
I use:
python 3.5.2
django 1.10.2
ubuntu 16.04.1

I found out that this problem still remains for bare python, so it is not particularly related to django as in this example I just read text_file which were open in request.FILES['text_file'].
I re-asked the similar question here focusing on python only. It turned out that the problem is not so related with python either, but with Linux/Unix system file management. I quote here the answer of Jean-François Fabre:
Nothing to do with Python. In C, Fortran, or Visual Cobol you'd have
the same behaviour as long as the code gets its handle from open
system call.
On Linux/Unix systems, once a process has a handle on a file, it can
read it, even if the file is deleted. For more details check that
question (I wasn't sure if it was OK to do that, it seems to be)
On Windows you just wouldn't be able to delete the file as long as
it's locked by a process.

Python cannot read "warc.gz" file completely

For my work, I scrape web-sites and write them to gzipped web-archives (with extension "warc.gz"). I use Python 2.7.11 and the warc 0.2.1 library.
I noticed that for majority of files I cannot read them completely with the warc-library. For example if the warc.gz file has 517 records, I can read only about 200 of them.
After some research I found out that this problem happens only with the gzipped files. The files with extension "warc" do not have this problem.
I have found out that some people have this problem as well (https://github.com/internetarchive/warc/issues/21), while no solution for it is found.
I guess that there might be a bug in "gzip" in Python 2.7.11. Does maybe someone have experience with this, and know what can be done about this problem?
Thanks in advance!
Example:
I create new warc.gz files like this:
import warc
warc_path = "\\some_path\file_name.warc.gz"
warc_file = warc.open(warc_path, "wb")
To write records I use:
record = warc.WARCRecord(payload=value, headers=headers)
warc_file.write_record(record)
This creates perfect "warc.gz" files. There are no problems with them. All, including "\r\n" is correct. But the problem starts when I read these files.
To read files I use:
warc_file = warc.open(warc_path, "rb")
To loop through records I use:
for record in warc_file:
...
The problem is that not all records are found during this looping for "warc.gz" file, while they all are found for "warc" files. Working with both types of files is addressed in the warc-library itself.

It seems that the custom gzip handling in warc.gzip2.GzipFile, file splitting with warc.utils.FilePart and reading in warc.warc.WARCReader is broken as a whole (tested with python 2.7.9, 2.7.10 and 2.7.11). It stops short when it receives no data instead of a new header.
It would seem that basic stdlib gzip handles the catenated files just fine and so this should work as well:
import gzip
import warc
with gzip.open('my_test_file.warc.gz', mode='rb') as gzf:
for record in warc.WARCFile(fileobj=gzf):
print record.payload.read()

Issues in copying files on iOS using NSStreams

I am trying to copy image and media files using NSStreams. I can not use NSFileManager copyItemAtPath, as I have to copy the file using streams.
The data is transferred over the network and the stream is read by a Python script that writes the data to a file. This worked fine on Mac OSX but when I tried in iOS,the file was not saved in the proper format.
I am able to copy all the files, but some of the metadata like dimensions (for image and media files), and duration (for media files) is missing in the copied file, and the kind is always Document. The other metadata is fine.
When I try to read the file attributes using the NSFileManager
[[NSFileManager defaultManager] attributesOfItemAtPath:#"filePath" error:&error];
It shows an error in the console:
The operation couldn't be completed. No such file or directory
I also observed that all the copied files, irrespective of the file extension (.png,.jpeg,.mov, .zip), has a kind of Document
How do I copy the source image metadata into the copied file?
Are there any Xcode optimizations I need to turn off?
OS : Mac OSX 10.8.4, iOS 6
Xcode : 4.6.3

This works for me for any type of file:
if (![NSFileManager.defaultManager copyItemAtPath: sourceFileName toPath: targetFileName error: &error]) {
NSAlert *alert = [NSAlert alertWithError: error];
[alert runModal];
return;
}

I found out it is an issue with file extension.Some junk characters appended after the file extension( something like 1.png\\\)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.