I wish to store strings in multidimensional array. I tried using numpy package along with following line:
co_entity = np.zeros((5000,4))
However, I need to store strings later on. This matrix cannot be used to store strings as it has floats/int. I tried using list to store the strings but since the number of input is dynamic, I have to use multidimensional array with upper limit.
Any ideas for this?
You could try object type with empty() function like so
co_entity = np.empty((5000,4), dtype='object')
This will allow you to store a string in each of the elements generated.
Related
arr = np.array([Myclass(np.random.random(100)) for _ in range(10000)])
Is there a way to save time in this statement by creating a numpy array of objects directly (avoiding the list construction which is costly)?
I need to create and process a large number of objects of class Myclass, where each object contains several int’s, several float’s, and a list (or tuple) of floats. The point of using the array (of objects) is to take advantage of numpy array’s fast computation (e.g., column-sums) on slices of the array of objects (and other stuff; the array on which slices are taken has each row made up of one Myclass object and other scalar fields). Other than using the np.array (as above), is there any other time-saving strategy in this case? Thanks.
Numpy needs to know the length of the array in advance because it must allocate enough memory in a block.
You can start with an empty array of appropriate type using np.empty(10_000, object). (Beware that for most data types empty arrays may contain garbage data, it's usually safer to start with np.zeros() unless you really need the performance, but dtype object does get properly initialized to Nones.)
You can then apply any callable you like (like a class) over all the values using np.vectorize. It's faster to use the included vectorized functions when you can instead of converting them, since vectorize basically has to call it for each element in a for loop. But sometimes you can't.
In the case of random numbers, you can create an array sample of any shape you like using np.random.rand(). It would still have to be converted to a new array of dtype object when you apply your class to it though. I'm not sure if that's any faster than creating the samples in each __init__ (or whatever callable). You'd have to profile it.
I am trying to preallocate a list in python
c=[1]*mM #preallocate array
My Problem is that I run in to a MemoryError since
mM=4999999950000000
What is the best way to deal with this. I am thinking about creating a new object where is split my list at about a value of 500000000.
Is this what I should do or is there a best practice to create an array with a lot of inputs?
Using a Generator
You are attempting to create an object that you very likely will not be able to fit into your computer's memory. If you truly need to represent a list of that length, you can use a generator that dynamically produces values as they are needed.
def ones_generator(length):
for _ in range(length):
yield 1
gen = ones_generator(4999999950000000)
for i in gen:
print(i) # prints 1, a lot
Note: The question is tagged for Python 3, but if you are using Python 2.7, you will want to use xrange instead of range.
Using a Dictionary
By the sound of your question, you do not actually need to preallocate a list of that length, but you want to store values very sparsely at indexes that are very large. This pattern matches the dict type in Python more so than the list. You can simply store values in a dictionary, without pre-allocating they keys/space, Python handles that under the hood for you.
dct = {}
dct[100000] = "A string"
dct[592091] = 123
dct[4999999950000000] = "I promise, I need to be at this index"
print(dct[4999999950000000])
# I promise, I need to be at this index
In that example, I just stored str and int values, but they can be any object in Python. The best part about this is that this dictionary will not consume memory based on the maximum index (like a list would) but instead based on how many values are stored within it.
I have a lot of arrays, every is 2D, but has other sizes. I am looking for any good idea how to keep them in one variable. Order of them is important. What do you recommend? Arrays? Dictionaries? Any ideas?
My problem:
I have numpy array:
b=np.array([])
And now I want to add to them e.g. array:
a=np.array([0,1,2])
And later:
c=np.array([[0,1,2],[3,4,5]])
Etc
Result should be:
b=([0,1,2], [[0,1,2],[3,4,5]])
I don't know how to get it in numpy and without initializing size of first array.
If the ordering is important, store them in a list (mylist = [array1, array2, ...]) - or, if you're not going to need to change or shuffle them around after creating the list, store them in a tuple (mylist = (array1, array2, ...)).
Both of these structures can store arbitrary object types (they don't care that your arrays are different sizes, or even that they are all the same kind of object at all) and both maintain a consistent ordering which can be accessed through mylist[0], mylist[1] etc. They will also appear in the correct order when you go through them using for an_array in mylist: etc.
I want to pass a list of arrays (or a 2D array) such as [[1,2,3],[4,5,6]] from C to a Python script which computes and returns a list. What possible changes would be required to the embedding code in order to achieve this? The python script to be executed is as follows:
abc.py
import math
def xyz(size,wss):
result=[0 for i in range(size)]
for i in range(size):
wss_mag=math.sqrt(wss[i][0]*wss[i][0]+wss[i][1]*wss[i][1]+wss[i][2]*wss[i][2])
result[i]=1/wss_mag
return result
Here size is the number of 1D arrays in WSS (for e.g. 2 in case wss=[[1,2,3],[4,5,6]]) The question is different than the suggested duplicate in the sense it has to return back a list as a 1-D array to C.
I think what you want to do is to pass in some Lists, have them converted to C arrays and then back to Lists before returning to Python.
The Lists are received in C as a pointer to a PyObject, so to get the data from you'll have to use PyArg_ParseXX(YY), where XX and YY depend on what type of list object you had in Python and how you want it to be interpreted in C. This is where you would specify the shape information of the input lists and turn it into whatever shape you need for processing.
To return the arrays back to python you'll have to look at the Python-C API, which gives methods for creating and manipulating Python objects in C. As othera have suggested, using the numpy-C API is also an option with many advantages. In this case, you can use the PyArray_SimpleNew to create an array and populate it with your output.
I am building a python application where I retrieve a list of objects and I want to plot them (for ploting I use matplotlib). Each object in the list contains two properties.
For example let's say I have the list rawdata and the objects stored in it have the properties timestamp and power
rawdata[0].timestamp == 1
rawdata[1].timestamp == 2
rawdata[2].timestamp == 3
etc
rawdata[0].power == 1232.547
rawdata[1].power == 2525.423
rawdata[2].power == 1125.253
etc
I want to be able to plot those two dimensions, that the two properties represent, and I want to do it a time and space efficient way. That means that I want to avoid iterating over the list and sequentially constructing something like a numpy array out it.
Is there a way that to apply an on-the-fly transformation of the list? Or somehow plot it as it is? Since all the information is already included in the list I believe there should be a way.
The closest answer I found was this, but it includes sequential iteration over the list.
update
As pointed out by Antonio Ragagnin I can use the map builtin function to construct a numpy array efficiently. But that also means that I will have to create a second data structure. Can I use map to transform the list on the fly to a two dimensional numpy array?
From the matplotlib tutorial (emphasis mine):
If matplotlib were limited to working with lists, it would be fairly useless for numeric processing. Generally, you will use numpy arrays. In fact, all sequences are converted to numpy arrays internally.
So you lose nothing by converting it to a numpy array, if you don't do it matplotlib will.