map function on mutli-dimension array - python

I have a large 4D array (time,height,latitude,longitude) of float values. I want to efficiently force any values in the array that are greater than 100.0 to be 100.0. I think the map function (+lambda?) can do this, but I'm stuck. Currently I have a crude for loop that goes through each index, but this is taking much too long!
Thanks for your help in advance!
Solution: numpy.clip(array,0,100.0)

In order to be efficient, you should probably be using NumPy.
With NumPy you can have space efficient multidimiensional arrays and ready to use solution to your problem.

I have a large 4D array (time,height,latitude,longitude) of float values. I want to efficiently
stop.
Use numpy.

Related

What is the best way to store a non-rectangular array?

I would like to store a non-rectangular array in Python. The array has millions of elements and I will be applying a function to each element in the array, so I am concerned about performance. What data structure should I use? Should I use a Python list or a numpy array of type object? Is there another data structure that would work even better?
You can use the dictionary data structure to store everything. If you have ample memory, dictionaries is a good option. The hashing process makes them faster.
I'd suggest you to use scipy sparse matrices.
UPD. Some elaboration goes below.
I assume that "non-rectangular" implies there will be empty elements in plain 2D array. Having millions of elements will make these 'holes' tax on memory usage. Sparse matrix provide a way to have familiar array interface and occupy only necessary amount of memory.
Though if array-ish indexing is not required, dictionary is pretty fine storage to use.

How can I change the data type to run the program faster?

I write a genetic algorithm and I represent the population as a list of 50 chromosomes and every chromosome is a number of 20 string bytes. In order to understand I write an example:
[['111000110','110000110','011000110','001000110'],
[''111001010','111110110','111001110','111000100']]
The operations with strings are time consuming and I want to find another way to represent the population.Any suggestions? Thank you!
You can use Numpy arrays. They are faster than the lists in the Python.As you are much concerned with speed, Numpy is probably the best option for you

Fastest way to sort 2d list by a column in reverse order

Let's say I have a 2d python (rectangular i.e. like matrix) list which I want to sort in descending order based on its 2nd column, and I want the list to change and not interested in a copy. What is the best alternative to this approach (using numpy, ...)
arr.sort(key= lambda i:i[1],revere=True)
I've searched a lot but couldn't find an intuitive way that seemed better that the code above.
Any help would be highly appreciated.
You have to give some more informations. A 2d list can be something more than a rectangular 2d numpy array.
In this answer i asume that your data can be represented as a 2d- array.
import numpy as np
np_arr=np.array(arr) #create a numpy array from your list
idx=np.argsort(-np_arr[:,1])
np_arr=np_arr[idx,:]

insertion of a list in additional dimension of numpy array

my first question, surfed a lot!
I have an array in Numpy, like
myarray=np.zeros((raws,cols))
then I have raws*cols one dimensional numpy array, all same lenght, let's say deep
then I would insert each one of this one dimensional array into myarray.
expected result:
newarray.shape
(raws,cols,deep)
I use this in a bigger function and the fact I operate this way is due to a parallelization paradigma.
Thank you in advance.

Construct huge numpy array with pytables

I generate feature vectors for examples from large amount of data, and I would like to store them incrementally while i am reading the data. The feature vectors are numpy arrays. I do not know the number of numpy arrays in advance, and I would like to store/retrieve them incrementally.
Looking at pytables, I found two options:
Arrays: They require predetermined size and I am not quite sure how
much appending is computationally efficient.
Tables: The column types do not support list or arrays.
If it is a plain numpy array, you should probably use Extendable Arrays (EArray) http://pytables.github.io/usersguide/libref/homogenous_storage.html#the-earray-class
If you have a numpy structured array, you should use a Table.
Can't you just store them into an array? You have your code and it should be a loop that will grab things from the data to generate your examples and then it generates the example. create an array outside the loop and append your vector into the array for storage!
array = []
for row in file:
#here is your code that creates the vector
array.append(vector)
then after you have gone through the whole file, you have an array with all of your generated vectors! Hopefully that is what you need, you were a bit unclear...next time please provide some code.
Oh, and you did say you wanted pytables, but I don't think it's necessary, especially because of the limitations you mentioned

Categories