Using a .NET Assembly in with DotNET.jl Julia - python

I am trying to use a .NET assembly from the Julia programming language. There is one package that supports this here: https://github.com/azurefx/DotNET.jl. The dll with types and methods that I want access to is here: https://github.com/nwamsley1/Thermo-DLLs. And there is more documentation here: https://github.com/thermofisherlsms/RawFileReader.
I can do this easily with python given the pythonnet package (https://github.com/pythonnet/pythonnet) in the following example code:
import clr
clr.AddReference('./ThermoFisher.CommonCore.RawFileReader')
from ThermoFisher.CommonCore.RawFileReader import RawFileReaderAdapter
rawFile = RawFileReaderAdapter.FileFactory("./MA4358_FFPE_HPVpos_01_071522.raw")
print("Is Open? ", rawFile.IsOpen)
print("Is Error? ", rawFile.IsError)
##### output
Is Open? True
Is Error? False
This ThermoFisher.CommonCore.RawFileReader.dll allows access to large files of a proprietary ".raw" file format. Python is too slow to work with these data efficiently, so I would like easy access with Julia. Here is an attempted minimal example to do the same as the above but in Julia instead of Python (using the DotNET.jl package)
using DotNET
reader = T"System.Reflection.Assembly".LoadFrom(raw"./ThermoFisher.CommonCore.RawFileReader.dll")
rawfilereaderadapter = reader.GetType("ThermoFisher.CommonCore.RawFileReader.RawFileReaderAdapter", true, true)
filefactory = rawfilereaderadapter.GetMethod("FileFactory")
filepath = convert(CLRObject, "./MA4358_FFPE_HPVpos_01_071522.raw")
raw_file = filefactory.Invoke(filefactory, [filepath])
raw_file.IsOpen
raw_file.IsError
###### output
false
true
However if I enter raw_file.IsError or raw_file.IsOpen as in python, those return true and false respectively. There is an error opening the raw file and the raw file is not open. It took trail and error to get this far. Some observations.
filefactory.Invoke() takes two arguments as specified here: https://learn.microsoft.com/en-us/dotnet/api/system.reflection.methodbase.invoke?view=net-7.0. However, the result is not sensitive to the first argument I provide to filefactory.Invoke(something, [filepath]). "something" can be anything and this still works, so long as something is there.
If instead of [filepath] I supply something that is not of type System.String, or if I supply more arguments [filepath, someotherargument] I get an error (in comments). That makes sense because the filefactory method accepts one argument of type System.String.
I am not getting any error message, but I am unable to access the .raw file as in python. However, I have clearly gotten somewhere because if I provide the wrong kind or number of arguments to the function there is an error. Are there suggestions or comments as to what the problem is or how I might diagnose further?
If it helps here is part of the .xml documentation for the dll
<member name="T:ThermoFisher.CommonCore.RawFileReader.RawFileReaderAdapter">
<summary>
This static class contains factories to open raw files
</summary>
</member>
<member name="M:ThermoFisher.CommonCore.RawFileReader.RawFileReaderAdapter.FileFactory(System.String)">
<summary>
Create an IRawDataExtended interface to read data from a raw file
</summary>
<param name="fileName">File to open</param>
<returns>Interface to read data from file</returns>
</member>

Related

Modify flow file attributes in NiFi with Python sys.stdout?

In my pipeline I have a flow file that contains some data I'd like to add as attributes to the flow file. I know in Groovy I can add attributes to flow files, but I am less familiar with Groovy and much more comfortable with using Python to parse strings (which is what I'll need to do to extract the values of these attributes). The question is, can I achieve this in Python when I use ExecuteStreamCommand to read in a file with sys.stdin.read() and write out my file with sys.stdout.write()?
So, for example, I use the code below to extract the timestamp from my flowfile. How do I then add ts as an attribute when I'm writing out ff?
import sys
ff = sys.stdin.read()
t_split = ff.split('\t')
ts = t_split[0]
sys.stdout.write(ff)
Instead of writing back the entire file again, you can simply write the attribute value from the input FlowFile
sys.stdout.write(ts) #timestamp in you case
and then, set the Output Destination Attribute property of the ExecuteStreamCommand processor with the desired attribute name.
Hence, the output of the stream command will be put into an attribute of the original FlowFile and the same can be found in the original relationship queue.
For more details, you can refer to ExecuteStreamCommand-Properties
If you're not importing any native (CPython) modules, you can try ExecuteScript with Jython rather than ExecuteStreamCommand. I have an example in Jython in an ExecuteScript cookbook. Note that you don't use stdin/stdout with ExecuteScript, instead you have to get the flow file from the session and either transfer it as-is (after you're done reading) or overwrite it (there are examples in the second part of the cookbook).

Passing to SOAP arguments from the command line

I have a python script that successfully sends SOAP to insert a record into a system. The values are static in the test. I need to make the value dynamic/argument that is passed through the command line or other stored value.
execute: python myscript.py
<d4p1:Address>MainStreet</d4p1:Address> ....this works to add hard coded "MainStreet"
execute: python myscript.py MainStreet
...this is now trying to pass the argument MainStreet
<d4p1:Address>sys.argv[1]</d4p1:Address> ....this does not work
It saves the literal text address as "sys.argv[1]" ... I have imported sys ..I have tried %, {}, etc from web searches, what syntax am I missing??
You need to read a little about how to create strings in Python, below is how it could look like in your code. Sorry it's hard to say more without seeing your actual code. And you actually shouldn't create XMLs like that, you should use for instance xml module from standard library.
test = "<d4p1:Address>" + sys.argv[1] + "</d4p1:Address>"

Dealing with à in generating code for a string literal using Python 3.5's AST module, need to open with right coding

To generate JavaScript from Python in the Transcrypt Python to JS compiler, Python 3.5's ast module is used in combination with the following code:
class Generator (ast.NodeVisitor):
...
...
def visit_Str (self, node):
self.emit (repr (node.s)) # Simplified to need less context on StackOverflow
...
...
This works fine e.g. for the following line of Python:
test = "âäéèêëiîïoôöùüû"
which is correctly translated to:
var test = 'âäéèêëiîïoôöùüû';
Only the character à gives problems:
test = "àâäéèêëiîïoôöùüû"
is translated to:
var test = 'Ĝxa0âäéèêëiîïoôöùüû';
Is there any way to have the ast module read the source file respecting coding directives like:
# coding=<encoding name>
To open a Python file for parsing, use
tokenize.open
rather than the ordinary
open
function.
It will open, read the pep263 coding hint and return the open file as if it were opened by the ordinary open using the right encoding.
Quite hard to find, not currently in the Green Tree Snakes doc. Actually found it by searching for 'coding' in the CPython sources on GitHub.
Have created an issue for Green Tree Snakes doc to add this.

How do I save python code in JSON

I've got a large python project with several components, that exchange information with JSON files. Actually, this project is our internal tool for analysis and integration testing, and our developers use it either from web-UI, or from a command line.
The python modules process a labeled database, consisting of large amount of files, and labels are encoded in file names. For example, file name ab001l_AS_5_15Fps_1.raw contains information that it stores data from user ab001l, collected in session number 1 under conditions, that we encode as AS.
There are several such encodings exist.
JSON files usually store file names.
My question is: how can I save a python code into JSON file, so that another module could load it and decode file name into components?
I guess you can store python code as text in JSON, then use the exec built-in function to execute the text. See
https://docs.python.org/3/library/functions.html?highlight=exec#exec.
But it seems a much better approach to share your module and import your module like any python code.
You can use jsonpickle. Please check the documentation page for usage.
import jsonpickle
class Thing(object):
def __init__(self, name):
self.name = name
obj = Thing('Awesome')
frozen = jsonpickle.encode(obj)
thawed = jsonpickle.decode(frozen)

read() from a ExFileObject always cause StreamError exception

I am trying to read only one file from a tar.gz file. All operations over tarfile object works fine, but when I read from concrete member, always StreamError is raised, check this code:
import tarfile
fd = tarfile.open('file.tar.gz', 'r|gz')
for member in fd.getmembers():
if not member.isfile():
continue
cfile = fd.extractfile(member)
print cfile.read()
cfile.close()
fd.close()
cfile.read() always causes "tarfile.StreamError: seeking backwards is not allowed"
I need to read contents to mem, not dumping to file (extractall works fine)
Thank you!
The problem is this line:
fd = tarfile.open('file.tar.gz', 'r|gz')
You don't want 'r|gz', you want 'r:gz'.
If I run your code on a trivial tarball, I can even print out the member and see test/foo, and then I get the same error on read that you get.
If I fix it to use 'r:gz', it works.
From the docs:
mode has to be a string of the form 'filemode[:compression]'
...
For special purposes, there is a second format for mode: 'filemode|[compression]'. tarfile.open() will return a TarFile object that processes its data as a stream of blocks. No random seeking will be done on the file… Use this variant in combination with e.g. sys.stdin, a socket file object or a tape device. However, such a TarFile object is limited in that it does not allow to be accessed randomly, see Examples.
'r|gz' is meant for when you have a non-seekable stream, and it only provides a subset of the operations. Unfortunately, it doesn't seem to document exactly which operations are allowed—and the link to Examples doesn't help, because none of the examples use this feature. So, you have to either read the source, or figure it out through trial and error.
But, since you have a normal, seekable file, you don't have to worry about that; just use 'r:gz'.
In addition to the file mode, I attempted to seek on a network stream.
I had the same error when trying to requests.get the file, so I extracted all to a tmp directory:
# stream == requests.get
inputs = [tarfile.open(fileobj=LZMAFile(stream), mode='r|')]
t = "/tmp"
for tarfileobj in inputs:
tarfileobj.extractall(path=t, members=None)
for fn in os.listdir(t):
with open(os.path.join(t, fn)) as payload:
print(payload.read())

Categories