Using Python modules in Swift and Pythonkit - python

I was looking to get some help or clarification of the limitations of using PythonKit in Swift. Well I say PythonKit, I actually installed the Tensorflow toolchain in Xcode as I couldn't get Pythonkit to work on its own as a single dependancy (MacBook would spin its wheels with fans blasting trying to import numpy).
Anyway I wanted to say its brilliant that I can use Python modules in Swift, makes it much easier to potentially start using swift for more than just iOS apps.
My issue is that I have imported Python modules fine but its not clear how much functionality they will have. I assume ones like numpy will be pretty much the same but as a scientist I use netcdf files a lot so have been trying to use netCDF4. This imports fine and I can load the data object and attributes etc fine but I can't get the actual array out.
Here is an example:
import PythonKit
PythonLibrary.useVersion(3, 7)
let nc = Python.import("netCDF4")
var Data = nc.Dataset("ncfile path")
var lat_z = Data.variables["lat_z"][:]
The [:] is causing an error that is picked up by Xcode, removing it allows the script to run but results in the variable object rather than the array. I can add stuff to the end to get the attributes etc e.g. lat_z.long_name but not sure of how to extract the array without using [:]
I am hoping this is just a syntax difference that I need to learn with swift (very much early days with using it) or is it a limitation of the PythonKit? I have not found anyone atcually using netcdf4 (examples are mostly numpy and Matplotlib) If so are there some general limitations with using python modules in swift?
I am also trying to get Matplotlib to work but am pretty sure thats due to using an commandline tool project in Xcode which hasn't got a view so makes sense it can't show me an image.
Any pointers and maybe links to upto date documentation would be great there seems to be some changes that have occurred e.g. import PythonKit rather than import Python.
Many Thanks

You can use the count property on a python iterable, which is equivalent to len. You can index Numpy array in two ways: (i) with Swift range syntax and (ii) with Numpy range objects:
import Foundation
import PythonKit
let np = Python.import("numpy")
let array = np.array([1, 2, 3, 4, 5])
print(array) // [1, 2, 3, 4, 5]
let subArray = array[0..<array.count]
print(subArray) // [1, 2, 3, 4, 5]
let subArray2 = array[np.arange(0, 2)]
print(subArray2) // [1, 2]
// Swift equivalent of Python ":"
let subArray3 = array[...]
You can also convert numpy arrays to Swift arrays and use Swift methods and subscripts:
let swiftArray = Array(array)
let swiftSubArray = swiftArray[0..<3]
print(swiftSubArray) // [1, 2, 3]
Note that you should prefer using Python.len(...) over the count property while working with PythonObjects because count will incur performance penalty because of the implementation of PythonKit that does not automatically conforms Python Object to RandomAccessCollection (thus count is O(n)).

Related

Working Around the Windows-numpy astype(int) Bug in Pandas

I have a codebase I've been developing on a Mac (and running on Linux machines) based largely on pandas (and therefore numpy). Very commonly I type-cast with astype(int).
Recently a Windows-based developer joined our team. In an effort to make the code base more platform-independent, we're trying to gracefully tackle this tricky issue whereby numpy uses a 32-bit type instead of the 64-bit type, which breaks longer integers.
On a Mac, we see:
ipdb> ids.astype(int)
id
1818726176 1818726176
1881879486 1881879486
2590366906 2590366906
284399109 284399109
299981685 299981685
370708200 370708200
387277023371 387277023371
387343898032 387343898032
406885699892 406885699892
5262665206 5262665206
544687374 544687374
6978317806 6978317806
Whereas on a Windows machine (in PowerShell), we see:
ipdb> ids.astype(int)
id
1818726176 1818726176
1881879486 1881879486
2590366906 -1704600390
284399109 284399109
299981685 299981685
370708200 370708200
387277023371 729966731
387343898032 796841392
406885699892 -1136193228
5262665206 967697910
544687374 544687374
6978317806 -1611616786
Other than using a sed call to change every astype(int) to astype(np.int64) (which would also require an import numpy as np at the top of every module where currently that doesn't exist), is there a way to do this?
In particular, I was hoping to map int to numpy.int64 somehow in a pandas option or something.
Thank you!
I'm not saying that this is a really good idea, but you can simply redefine int to whatever you want:
import numpy as np
x = 2384351503.0
print(np.array(x).astype(int))
#-2147483648
old_int = int
int = np.int64
print(np.array(x).astype(int))
#2384351503
int = old_int
print(np.array(x).astype(int))
#-2147483648
In the case you described I'd, however, strongly prefer to fix the source code instead of redefining standard data types. It's a one-time effort and any IDE can do it easyly.
Numpy is already implicitely imported by pandas, so it doesn't cost any additional time or resources. If you really want to avoid it (for whatever reason), you can use pd.Int64Dtype.type instead of np.int64 (see source).

Issues with accessing PyObjects after writing

I am trying to do some fairly simple list manipulation, using a Jupyter notebook that calls a DLL function. I'd like my Jupyter notebook/Python code to pass in a Python list to a C++ function, which modifies the list, and then I'd like the Python code to be able to access the new list values.
I can actually read (in Jupyter) the items that were not edited by the C++ code, so there must be some issue with how I'm writing, but every example I can find looks just like my code. When I try to access the item in the list that the C++ code writes, my Jupyter kernel dies with no explanation; I've tried to run the same Python code in the terminal, and the terminal session just exits, again with no explanation.
Running on Windows 10, environment with Python 3.9.2. Here's the Python:
import os
import ctypes
import _ctypes
# Import the DLL
mydll = ctypes.cdll.LoadLibrary(*path to DLL*)
# Set up
data_in = [3,6,9]
mydll.testChange.argtypes = [ctypes.py_object]
mydll.testChange.restype = ctypes.c_float
mydll.testChange(data_in)
# Returns 0.08
After running this and closing the DLL, running data_in[1] returns 6, data_in[2] returns 9, and data_in[0] causes my kernel to die.
C code for the DLL:
float testChange(PyObject *data_out) {
Py_SetPythonHome(L"%user directory%\\anaconda3");
Py_Initialize();
PyList_SetItem(data_out, 0, PyLong_FromLong(1L));
return 0.08;
}
I can also insert a number of print statements in this code that show that I can read out all three items in the DLL both before and after the call to PyList_SetItem using calls like PyLong_AsLong(PyList_GetItem(data_out, 1)). It's not clear to me that any reference counts need changing or anything like that, but perhaps I misunderstand the idea. Any ideas you all have would be greatly appreciated.

IronPython: Message: expected c_double, got c_double_Array_3

I’m currently developing a script using the python script editor in Rhino. As I’m currently working in a Windows machine, the script editor uses IronPython as language.
In the same script, I want to interact with an FE software (Straus7) which has a Python API. When doing so, I have experienced some problems as the ctypes module does not seem to work in IronPython the same way it does in regular Python. Especially, I’m finding problems when initializing arrays using the command:
ctypes.c_double*3
For example, if I want to obtain the XYZ coordinates of a node #1 in the FE model, I regular Python I would write the following:
XYZType = ctypes.c_double*3
XYZ = XYZType()
node_num = 1
st.St7GetNodeXYZ(1,node_num,XYZ)
And this returns me a variable XYZ which is a 3D array such that:
XYZ -> <straus_userfunctions.c_double_Array_3 at 0xc5787b0>
XYZ[0] = -0.7xxxxx -> (X_coord)
XYZ[1] = -0.8xxxxx -> (Y_coord)
XYZ[2] = -0.9xxxxx -> (Z_coord)
On the other side, I copy the same exact script in IronPython, the following error message appears
Message: expected c_double, got c_double_Array_3
Obviously, If I change the variable XYZ to c_double; then it becomes a double variable which contains only a single entry, which corresponds to the first element of the array (in this case, the X-coordinate)
This situation is quite annoying as all FEM softwares, the usage of matrices and arrays is widely used. Consequently, I wanted to ask if anyone nows a simple fix to this situation.
I was thinking of using the memory allocation of the first element of the array to obtain the rest but I’m not so sure how to do so.
Thanks a lot. Gerard
I've found when working with IronPython you need to explicitly cast the "Array of three doubles" to a "Pointer to double". So if you're using Grasshopper with the Strand7 / Straus7 API you will need to add an extra bit like this:
import St7API
import ctypes
# Make the pointer conversion functions
PI = ctypes.POINTER(ctypes.c_long)
PD = ctypes.POINTER(ctypes.c_double)
XYZType = ctypes.c_double*3
XYZ = XYZType()
node_num = 1
# Cast arrays whenever you pass them to St7API from IronPython
St7API.St7GetNodeXYZ(1, node_num, PD(XYZ))
I don't have access to IronPython or Strand7 / Straus7 at the moment but from memory that will do it. If that doesn't work for you you can email Strand7 Support - you would typically get feedback on something like this within a day or so.

Python/Numpy array element assignment issue

I'm trying to use Python/Numpy for a project that I'd normally do in Matlab, so I'm somewhat new to this environment (though I have played with Python/Django on the web development side). I'm now running into what I have to believe is a super simple issue that occurs when I'm trying to assign an element of a numpy array to another numpy array. The basic offending code is as follows. It does have some other fluff around it which I don't believe could be causing the issue, but I can provide that code as well if it would help.
import numpy as np
tf = 100
dt = 10
X0 = np.array([6978,0,5.8787,5.8787])
xhist = np.zeros(tf/dt+1)
yhist = np.zeros(tf/dt+1)
xhist[0] = X0[0]
yhist[0] = X0[1]
print(X0[0])
print(xhist[0])
When I run the above code, the first print statement gives me 6978, as expected; however, the second print statement gives me 0, and I can't figure out for the life of me why this is. Any ideas? Thanks in advance!

Using Numpy in different platforms

I have a piece of code which computes the Helmholtz-Hodge Decomposition.
I've been running on my Mac OS Yosemite and it was working just fine. A month ago, however, my Mac got pretty slow (it was really old), and I opted to buy a new notebook (Windows 8.1, Dell).
After installing all Python libs and so on, I continued my work running this same code (versioned in Git). And then the result was pretty weird, completely different from the one obtained in the old notebook.
For instance, what I do is to construct to matrices a and b(really long calculus) and then I call the solver:
s = numpy.linalg.solve(a, b)
This was returning a (wrong, and different of the result obtained in my Mac, which was right).
Then, I tried to use:
s = scipy.linalg.solve(a, b)
And the program exits with code 0 but at the middle of it.
Then, I just made a simple test of:
print 'here1'
s = scipy.linalg.solve(a, b)
print 'here2'
And here2 is never printed.
I tried:
print 'here1'
x, info = numpy.linalg.cg(a, b)
print 'here2'
And the same happens.
I also tried to check the solution after using numpy.linalg.solve:
print numpy.allclose(numpy.dot(a, s), b)
And I got a False (?!).
I don't know what is happening, how to find a solution, I just know that the same code runs in my Mac, but it would be very good if I could run it in other platforms. Now I'm stucked in this problem (don't have a Mac anymore) and with no clue about the cause.
The weirdest thing is that I don't receive any error on runtime warning, no feedback at all.
Thank you for any help.
EDIT:
Numpy Suit Test Results:
Scipy Suit Test Results:
Download Anaconda package manager
http://continuum.io/downloads
When you download this it will already have all the dependencies for numpy worked out for you. It installs locally and will work on most platforms.
This is not really an answer, but this blog discusses in length the problems of having a numpy ecosystem that evolves fast, at the expense of reproducibility.
By the way, which version of numpy are you using? The documentation for the latest 1.9 does not report any method called cg as the one you use...
I suggest the use of this example so that you (and others) can check the results.
>>> import numpy as np
>>> import scipy.linalg
>>> np.random.seed(123)
>>> a = np.random.random(size=(10000, 10000))
>>> b = np.random.random(size=(10000,))
>>> s_np = np.linalg.solve(a, b)
>>> s_sc = scipy.linalg.solve(a, b)
>>> np.allclose(s_np,s_sc)
>>> s_np
array([-15.59186559, 7.08345804, 4.48174646, ..., -16.43310046,
-8.81301553, -10.77509242])
I hope you can find the answer - one option in the future is to create a virtual machine for each of your projects, using Docker. This allows easy portability.
See a great article here discussing Docker for research.

Categories