How to use PyFFTW's wisom

How to use PyFFTW's wisom - python

I didn't see an actual example on pyfftw's documentation of how to use the 'wisdom' feature so I'm a little confused.
My code looks something like the following:
# first FFT
input = pyfftw.zeros_aligned(arraySize, dtype='complex64')
input[:] = image
fftwObj = pyfftw.builders.fft2(input, planner_effort='FFTW_EXHAUSTIVE')
imageFFT = fftwObj(input)
wisdom = pyfftw.export_wisdom()
pyfftw.import_wisdom(wisdom)
# second FFT with the same input size but different input
input = pyfftw.zeros_aligned(arraySize, dtype='complex64')
input[:] = image2
fftwObj = pyfftw.builders.fft2(input, planner_effort='FFTW_EXHAUSTIVE')
imageFFT2 = fftwObj(input)
The docs say that export_wisdom outputs a tuple of strings and that import_wisdom takes in this tuple as an argument.
When am I supposed to export the wisdom and am I supposed to save this tuple out to a file for each FFT?
When do I load it back in? Before the call to each FFT?

Basically, exporting and importing wisdom is a method to maintain state between sessions.
The wisdom is the knowledge about how best to plan an FFT. During a session, the internal "wisdom" is made up of all the plans that are made, and the wisdom that has been imported. Repeatedly importing the same wisdom file is not useful because that knowledge is already known after the first import.
You export wisdom when you want the knowledge about a particular transform plan to be used instead of having to work it out again. It need only plan for that transform once per session though.

Related

Is there another way to convert ee.Number to float except getInfo()?

Hello friends!
Summarization:
I got a ee.FeatureCollection containing around 8500 ee.Point-objects. I would like to calculate the distance of these points to a given coordinate, lets say (0.0, 0.0).
For this i use the function geopy.distance.distance() (ref: https://geopy.readthedocs.io/en/latest/#module-geopy.distance). As input the the function takes 2 coordinates in the form of 2 tuples containing 2 floats.
Problem: When i am trying to convert the coordinates in form of an ee.List to float, i always use the getinfo() function. I know this is a callback and it is very time intensive but i don't know another way to extract them. Long story short: To extract the data as ee.Number it takes less than a second, if i want them as float it takes more than an hour. Is there any trick to fix this?
Code:
fc_containing_points = ee.FeatureCollection('projects/ee-philadamhiwi/assets/Flensburg_100') #ee.FeatureCollection
list_containing_points = fc_containing_points.toList(fc_containing_points.size()) #ee.List
fc_containing_points_length = fc_containing_points.size() #ee.Number
for index in range(fc_containing_points_length.getInfo()): #i need to convert ee.Number to int
point_tmp = list_containing_points.get(i) #ee.ComputedObject
point = ee.Feature(point_tmp) #transform ee.ComputedObject to ee.Feature
coords = point.geometry().coordinates() #ee.List containing 2 ee.Numbers
#when i run the loop with this function without the next part
#i got all the data i want as ee.Number in under 1 sec
coords_as_tuple_of_ints = (coords.getInfo()[1],coords.getInfo()[0]) #tuple containing 2 floats
#when i add this part to the function it takes hours
PS: This is my first question, pls be patient with me.

I would use .map instead of your looping. This stays server side until you export the table (or possibly do a .getInfo on the whole thing)
fc_containing_points = ee.FeatureCollection('projects/eephiladamhiwi/assets/Flensburg_100')
fc_containing_points.map(lambda feature: feature.set("distance_to_point", feature.distance(ee.Feature(ee.Geometry.Point([0.0,0.0])))
# Then export using ee.batch.Export.Table.toXXX or call getInfo
(An alternative might be to useee.Image.paint to convert the target point to an image then, use ee.Image.distance to calculate the distance to the point (as an image), then use reduceRegions over the feature collection with all points but 1) you can only calculate distance to a certain distance and 2) I don't think it would be any faster.)
To comment on your code, you are probably aware loops (especially client side loops) are frowned upon in GEE (primarily for the performance reasons you've run into) but also note that any time you call .getInfo on a server side object it incurs a performance cost. So this line
coords_as_tuple_of_ints = (coords.getInfo()[1],coords.getInfo()[0])
Would take roughly double the time as this
coords_client = coords.getInfo()
coords_as_tuple_of_ints = (coords_client[1],coords_client[0])
Finally, you could always just export your entire feature collection to a shapefile (using ee.batch.Export.Table.... as above) and do all the operations using geopy locally.

How do I efficiently calculate the mean of nested subsets of Vaex dataframes?

I have a very large dataset comprised of data for several dozen samples, and several hundred subsamples within each sample. I need to get mean, standard deviation, confidence intervals, etc. However, im running into a (suspected) massive performance problem that causes the code to never finish executing. I'll begin by explaining what my actual code does (im not sure how much of the actual code i can share as it is part of an active research project. I hope to open-source but that will depend on IP rules in the agreement) and then i'll share some code that replicates the problem and should hopefully allow somebody a bit more well-versed in Vaex to tell me what im doing wrong!
My code currently calls the "unique()" method on my large vaex dataframe to get a list of samples, and for loops through that list of unique samples. On each loop, it uses the sample number to make an expression representing that sample (so: df[df["sample"] == i] ) and uses unique() on that subset to get a list of subsamples. Then, it uses another for-loop to repeat that process, creating an expression for the subsample and getting the statistical results for that subsample. This isnt the exact code but, in concept, it works like the code block below:
means = {}
list_of_samples = df["sample"].unique()
for sample_number in list_of_samples:
sample = df[ df["sample"] == sample_number ]
list_of_subsamples = sample["subsample"].unique()
means[sample_number] = {}
for subsample_number in list_of_subsamples:
subsample = sample[ sample["subsample"] == subsample_number ]
means[sample_number][subsample_number] = subsample["value"].mean()
If i try to run this code, it hangs on the line means[sample_number][subsample_number] = subsample["value"].mean() and never completes it (not within around an hour, at least) so something is clearly wrong there. To try and diagnose the issue, i have tested the mean function by itself, and in expressions without the looping and other stuff. If I run:
mean = df["value"].mean()
it successfully gives me the mean for the entire "value" column within about 45 seconds. However, if instead i run:
sample = df[ df["sample"] == 1 ]
subsample = sample[ sample["subsample"] == 1 ]
mean = subsample["value"].mean()
The program just hangs. I've left it for an hour and still not gotten a result!
How can i fix this and what am i doing wrong so i can avoid this mistake in the future? If my reading of some discussions regarding vaex are correct, i think i might be able to fix this using vaex "selections", but ive tried to read the documentation on those and cant wrap my head around how i would properly use them here. Any help from a more experienced vaex user would be greatly appreciated!
edit: In case anyone finds this in the future, i was able to fix it by using the groupby method. Im still really curious what was going wrong here, but i'll have to wait until i have more time to investigate it.

Looping can be slow, especially if you have many groups, it's more efficient to rely on built-in grouping:
import vaex
df = vaex.example()
df.groupby(by='id', agg="mean")
# for more complex cases, can use by=['sample', 'sub_sample']

Abaqus Python Command to insert keyword to the .inp file

The whole objective behind asking this question stems from trying to multi-process the creation of Linear Constraint Equations (http://abaqus.software.polimi.it/v6.14/books/usb/default.htm?startat=pt08ch35s02aus129.html#usb-cni-pequation) in Abaqus/CAE for applying periodic boundary conditions to a meshed model. Since my model has over a million elements and I need to perform a Monte Carlo simulation of 1000 such models, I would like to parallelize the procedure for which I haven't found a solution due to the licensing and multi-threading restrictions associated with Abaqus/CAE. Some discussions on this here: Python multiprocessing from Abaqus/CAE
I am currently trying to perform the equation definitions outside of Abaqus using the node sets created as I know the syntax of Equations for the input file.
** Constraint: <name>
*Equation
<dof>
<set1>, <dof>, <coefficient1>.
<set2>, <dof>, <coefficient2>.
<set3>, <dof>, <coefficient3>.
e.g.
** Constraint: Corner_c1_Constraint-1-pair1
*Equation
3
All-1.c1_Node-1, 1, 1.
All-1.c5_Node-1, 1, -1.
RefPoint-3.SetRefPoint3, 1, -1.
Instead of directly writing these lines into the .inp file, I can also write these commands as a separate file and link it to the .inp file of the model using
*EQUATION, INPUT=file_name
I am looking for the Abaqus Python command to write a keyword such as above to the .inp file instead of specifying the Equation constraints themselves.
The User's guide linked above instructs specifying this via GUI but that I haven't been able to do that in my version of Abaqus CAE 2018.
Abaqus/CAE Usage:
Interaction module: Create Constraint: Equation: click mouse button 3 while holding the cursor over the data table, and select Read from File.
So I am looking for a command from the scripting reference manual to do this instead. There are commands to parse input files (http://abaqus.software.polimi.it/v6.14/books/ker/pt01ch24.html) but not something to directly write to input file instead of performing it via scripting. I know I can hard code this into the input file but the sheer number of simulations that I would like to perform calls for every bit of automation possible. I have already tried optimising the code using appropriate algorithms and numpy arrays yet the pre-processing itself takes hours for one single model.
p.s. This is my first post on SO - so I'm not sure if this question is phrased in the appropriate format. Would appreciate any answers to the actual question or any other solutions to the intended outcome of parallelizing the pre-processing steps in Abaqus/CAE.

You are looking for the KeywordBlock object. This allows you to write anything to the input file (keywords, comments, etc) as a formatted string. I believe this approach is much less error-prone than alternatives that require you to programmatically write an input file (or update an existing one) on your own.
The KeywordBlock object is created when a Part Instance is created at the Assembly level. It stores everything you do in the CAE with the corresponding keywords.
Note that any changes you make to the KeywordBlock object will be synced to the Job input file when it is written, but will not update the CAE Model database. So for example, if you use the KeywordBlock to store keywords for constraint equations, the constraint definitions will not show up in the CAE model tree. The rest of the Mdb does not know they exist.
As you know, keywords must be written to the appropriate section of an input file. For example, the *equation keyword may be defined at the Part, Part Instance, or Assembly level (see the Keywords Reference Manual). This must also be considered when you store keywords in the KeywordBlock object (it will not auto-magically put your keywords in the right spot, unfortunately!). A side effect is that it is not always safe to write to the KeywordBlock - that is, whenever it may conflict with subsequent changes made via the CAE GUI. I believe that the Abaqus docs recommend that you add your keywords as a last step.
So basically, read the KeywordBlock object, find the right place, and insert your new keyword. Here's an example snippet to get you started:
partname = "Part-1"
model = mdb.models["Model-1"]
modelkwb = model.keywordBlock
assembly = model.rootAssembly
if assembly.isOutOfDate:
assembly.regenerate()
# Synch edits to modelkwb with those made in the model. We don't need
# access to *nodes and *elements as they would appear in the inp file,
# so set the storeNodesAndElements arg to False.
modelkwb.synchVersions(storeNodesAndElements=False)
# Search the modelkwb for the desired insertion point. In this example,
# we are looking for a line that indicates we are beginning the Part-Level
# block for the specific Part we are interested in. If it is found, we
# break the loop, storing the line number, and then write our keywords
# using the insert method (which actually inserts just below the specified
# line number, fyi).
line_num = 0
for n, line in enumerate(modelkwb.sieBlocks):
if line.replace(" ","").lower() == "*part,name={0}".format(partname.lower()):
line_num = n
break
if line_num:
kwds = "your keyword as a string here (may be multiple lines)..."
modelkwb.insert(position=line_num, text=kwds)
else:
e = ("Error: Part '{}' was not found".format(partname),
"in the Model KeywordBlock.")
raise Exception(" ".join(e))

Calculate hash of an h2o frame

I would like to calculate some hash value of an h2o.frame.H2OFrame. Ideally, in both R and python. My understanding of h2o.frame.H2OFrame is that these objects basically "live" on the h2o server (i.e., are represented by some Java objects) and not within R or python from where they might have been uploaded.
I want to calculate the hash value "as close as possible" to the actual training algorithm. That rules out calculation of the hash value on (serializations of) the underlying R or python objects, as well as on any underlying files from where the data was loaded.
The reason for this is that I want to capture all (possible) changes that h2o's upload functions perform on the underlying data.
Inferring from the h2o docs, there is no hash-like functionality exposed through h2o.frame.H2OFrame.
One possibility to achieve a hash-like summary of the h2o data is through summing over all numerical columns and doing something similar for categorical columns. However, I would really like to have some avalanche effect in my hash function so that small changes in the function input result in large differences of the output. This requirement rules out simple sums and the like.
Is there already some interface which I might have overlooked?
If not, how could I achieve the task described above?
import h2o
h2o.init()
iris_df=h2o.upload_file(path="~/iris.csv")
# what I would like to achieve
iris_df.hash()
# >>> ab2132nfqf3rf37
# ab2132nfqf3rf37 is the (made up) hash value of iris_df
Thank you for your help.

It is available in the REST API 1 (see screenshot) you can probably get to it in the H2OFrame object in Python as well but it is not directly exposed.

So here a complete solution in python based on Michal Kurka's and Tom Kraljevic's suggestions:
import h2o
import requests
import json
h2o.init()
iris_df = h2o.upload_file(path="~/iris.csv")
apiEndpoint = "http://127.0.0.1:54321/3/Frames/"
res = json.loads(requests.get(apiEndpoint + iris_df.frame_id).text)
print("Checksum 1: ", res["frames"][0]["checksum"])
#change a bit
iris_df[0, 1] = iris_df[0, 1] + 1e-3
res = json.loads(requests.get(apiEndpoint + iris_df.frame_id).text)
print("Checksum 2: ", res["frames"][0]["checksum"])
h2o.cluster().shutdown()
This gives
Checksum 1: 8858396055714143663
Checksum 2: -4953793257165767052
Thanks for your help!

how to use simpleitk get inverse displacement field

I just moved from matlab to python recently so I can use simpleitk and sorry if this is a dumb question.
I have a transformation tx after demons registration using simpleitk. I wish to get the displacement field and its inverse by doing the following,
disp_field = tx.GetDisplacementField()
disp_field_inv = tx.GetInverseDisplacementField()
It turns out disp_field is exactly what I need --- an image volume of 256*256*176. But disp_field_inv is an empty array. Does anyone know why?
Then I tried the following,
disp_field_inv = sitk.InverseDisplacementField(disp_field,disp_field.GetSize(),disp_field.GetOrigin(),disp_field.GetSpacing(),
subsamplingFactor=16)
But python is just running like forever. Does anybody know how to do it properly?

The following is the specification for running the InvertDisplacementField procedural interface
Image itk::simple::InvertDisplacementField (const Image & image1,
uint32_t maximumNumberOfIterations = 10u,
double maxErrorToleranceThreshold = 0.1,
double meanErrorToleranceThreshold = 0.001,
bool enforceBoundaryCondition = true)
So I think that by you passing the
disp_field.GetSize(),disp_field.GetOrigin(),disp_field.GetSpacing(), subsamplingFactor=16
as parameters 2 to 5 means you are passing the interface not what is expected?
Try just running disp_field_inv = sitk.InverseDisplacementField(disp_field)
and see if it iterates to a result!

For what it's worth after all these years, just wanted to point out that the original question and (so far only) answer by g.stevo mix-up two different filters available in SimpleITK, namely:
sitk.InverseDisplacementField
sitk.InvertDisplacementField
Each of these procedural APIs and their respective image filters have different Execute function arguments.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.