In Python 3 I am importing several data files in a loop, and I would like to be able to store all the data in a single 2-dimensional array. I start with something like data = np.array([]) and on each iteration i want to add a new array datai = np.array([1,2,3]), how can I get my final array to look like this? [[1,2,3],[1,2,3],...,[1,2,3]]
I have tried np.append, np.concatenate, and np.stack, but none seem to work. An example code that I'm trying:
data = np.array([])
for i in range(datalen):
datai = *func to load data as array*
data = np.append(data, datai)
but of course this returns a flattened array. Is there any way I can get back a 2-dimensional array of length datalen with each element being the array datai?
Thanks!
The fastest way would be vstack
data = np.vstack((get_data() for i in range(datalen)))
vstack requires a tuple/iterable
data = np.vstack((data1, data2, data3))
or you can do this by appending with axis=0
data = np.empty(shape=(0, 3))
data = np.append(data, datai.reshape((-1, 3)), axis=0) # -1 will make the rows automatic
You can reshape your array using np.reshape like this
flattened_array = np.array([1,2,3,1,2,3,1,2,3])
wanted_array = np.reshape(flattened_array, (-1, 3))
This would result in
[[1, 2, 3],[1, 2, 3],[1, 2, 3]]
Solution 1 using list comprehensions:
data = []
datalen = 4
datai = range(1,4)
data = [list(datai) for _ in range(datalen)]
print (data)
Output
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
Solution 2 (just a bit lengthy)
data = []
datalen = 4
for i in range(datalen):
datai = range(1,4)
data.append(list(datai))
print (data)
with the same output as above. In the second method you can also just simply use data.append(list(range(1,4))). You can choose if you want to convert datai to list or not. If you want the final output as an array, you can just use np.array()
You can try this-
data = np.zeros(shape=(datalen,len(datai))
for i in range(datalen):
data[i] = datai
It's called numpy.tile.
From the official docs:
>>> c = np.array([1,2,3,4])
>>> np.tile(c,(3,1))
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
so, for your datai do np.tile(datai,(N_repeats,1))
Related
I want to copy a chunk from a matrix into a piece of another matrix.
To use this with any kind of n-dimensional array, I need to apply a list with offsets via the [] operator. Is there a way to do this?
mat_bigger[0:5, 0:5, ..] = mat_smaller[2:7, 2:7, ..]
like:
off_min = [0,0,0]
off_max = [2,2,2]
for i in range(len(off_min)):
mat_bigger[off_min[i] : off_max[i], ..] = ..
You can do this by creating a tuple of slice objects. For example:
mat_big = np.zeros((4, 5, 6))
mat_small = np.random.rand(2, 2, 2)
off_min = [2, 3, 4]
off_max = [4, 5, 6]
slices = tuple(slice(start, end) for start, end in zip(off_min, off_max))
mat_big[slices] = mat_small
I have an array of arrays:
parameters = [np.array([ 2.1e-04, -8.3e-03, 9.8e-01]), np.array([ 5.5e-04, 1.2e-01, 9.9e-01]), ...]
whose length is:
print len(parameters)
100
If we label the elements of parameters as parameters[i][j]:
it is then possible to access each number, i.e. print parameters[1][2] gives 0.99
I also have an array:
temperatures = [110.51, 1618.079, ...]
whose length is also 100:
print len(temperatures)
100
Let the elements of temperatures be k:
I would like to insert each kth element of temperatures into each ith element of parameters, in order to obtain final:
final = [np.array([ 2.1e-04, -8.3e-03, 9.8e-01, 110.51]), np.array([ 5.5e-04, 1.2e-01, 9.9e-01, 1618.079]), ...]
I have tried to make something like a zip loop:
for i,j in zip(parameters, valid_temperatures):
final = parameters[2][i].append(valid_temperatures[j])
but this does not work. I would appreciate if you could help me.
EDIT: Based on #hpaulj answer:
If you run Solution 1:
parameters = [np.array([ 2.1e-04, -8.3e-03, 9.8e-01]), np.array([ 5.5e-04, 1.2e-01, 9.9e-01])]
temperatures = [110.51, 1618.079]
for i,(arr,t) in enumerate(zip(parameters,temperatures)):
parameters[i] = np.append(arr,t)
print parameters
It gives:
[array([ 2.10000000e-04, -8.30000000e-03, 9.80000000e-01,
1.10510000e+02]), array([ 5.50000000e-04, 1.20000000e-01, 9.90000000e-01,
1.61807900e+03])]
which is the desired output.
In addition, Solution 2:
parameters = [np.array([ 2.1e-04, -8.3e-03, 9.8e-01]), np.array([ 5.5e-04, 1.2e-01, 9.9e-01])]
temperatures = [110.51, 1618.079]
parameters = [np.append(arr,t) for arr, t in zip(parameters,temperatures)]
print parameters
also gives the desired output.
As opposed to Solution 1, Solution 2 doesn't use the ith enumerate index. Therefore, if I just split Solution 2's [np.append ... for arr ] syntax the following way:
parameters = [np.array([ 2.1e-04, -8.3e-03, 9.8e-01]), np.array([ 5.5e-04, 1.2e-01, 9.9e-01])]
temperatures = [110.51, 1618.079]
for arr, t in zip(parameters,temperatures):
parameters = np.append(arr,t)
print parameters
The output contains only the last iteration, and not in an "array-format":
[ 5.50000000e-04 1.20000000e-01 9.90000000e-01 1.61807900e+03]
How would it be possible to make this to work, by printing all the iterations ?
Thanks
You have a list of arrays, plus another list or array:
In [656]: parameters = [np.array([1,2,3]) for _ in range(5)]
In [657]: temps=np.arange(5)
to combine them just iterate through (a list comprehension works fine for that), and perform a concatenate (array append) for each pair.
In [659]: [np.concatenate((arr,[t])) for arr, t in zip(parameters, temps)]
Out[659]:
[array([1, 2, 3, 0]),
array([1, 2, 3, 1]),
array([1, 2, 3, 2]),
array([1, 2, 3, 3]),
array([1, 2, 3, 4])]
the use append saves us two pairs of [], otherwise it is the same:
[np.append(arr,t) for arr, t in zip(parameters,temps)]
A clean 'in-place' version:
for i,(arr,t) in enumerate(zip(parameters,temps)):
parameters[i] = np.append(arr,t)
================
If the subarrays are all the same length, you could turn parameters into a 2d array, and concatenate the temps:
In [663]: np.hstack((np.vstack(parameters),temps[:,None]))
Out[663]:
array([[1, 2, 3, 0],
[1, 2, 3, 1],
[1, 2, 3, 2],
[1, 2, 3, 3],
[1, 2, 3, 4]])
I have a performance problem with replacing values of a list of arrays using a dictionary.
Let's say this is my dictionary:
# Create a sample dictionary
keys = [1, 2, 3, 4]
values = [5, 6, 7, 8]
dictionary = dict(zip(keys, values))
And this is my list of arrays:
# import numpy as np
# List of arrays
listvalues = []
arr1 = np.array([1, 3, 2])
arr2 = np.array([1, 1, 2, 4])
arr3 = np.array([4, 3, 2])
listvalues.append(arr1)
listvalues.append(arr2)
listvalues.append(arr3)
listvalues
>[array([1, 3, 2]), array([1, 1, 2, 4]), array([4, 3, 2])]
I then use the following function to replace all values in a nD numpy array using a dictionary:
# Replace function
def replace(arr, rep_dict):
rep_keys, rep_vals = np.array(list(zip(*sorted(rep_dict.items()))))
idces = np.digitize(arr, rep_keys, right=True)
return rep_vals[idces]
This function is really fast, however I need to iterate over my list of arrays to apply this function to each array:
replaced = []
for i in xrange(len(listvalues)):
replaced.append(replace(listvalues[i], dictionary))
This is the bottleneck of the process, as it needs to iterate over thousands of arrays.
How could I do achieve the same result without using the for-loop? It is important that the result is in the same format as the input (a list of arrays with replaced values)
Many thanks guys!!
This will do the trick efficiently, using the numpy_indexed package. It can be further simplified if all values in 'listvalues' are guaranteed to be present in 'keys'; but ill leave that as an exercise to the reader.
import numpy_indexed as npi
arr = np.concatenate(listvalues)
idx = npi.indices(keys, arr, missing='mask')
remap = np.logical_not(idx.mask)
arr[remap] = np.array(values)[idx[remap]]
replaced = np.array_split(arr, np.cumsum([len(a) for a in listvalues][:-1]))
I'm new to Python, but I use it to process data. I've got a large amount of data stored as float arrays:
data1
data2
data3
I want to run similar processing for each data file. I was thinking of using a for loop:
for i in range(1,4):
I would like to then multiply the three data files by two, but I'm not sure how to continue afterwards. I imagine it would look like his:
for i in range(1,4):
data_i=data_i*2
Thank you.
You could make a two-dimensional array, meaning you put your float arrays inside another array.
Your situation right now would look like this:
data1 = [12, 2, 5]
data2 = [2, 4, 8]
data3 = [3, 0, 1]
By putting your arrays inside another array by doing this:
datax = [data1, data2, data3]
Your new situation would look like this:
datax = [[12, 2, 5], [2, 4, 8], [3, 0, 1]]
Now we can loop over the new datax array and perform an action on it's elements, data1, data2 and data3.
Something along the lines of:
datax = [[12, 2, 5], [2, 4, 8], [3, 0, 1]]
for sub_array in datax:
perform_action(sub_array)
You can simply store the data in e.g. list
data_list = [data1, data2, data3]
for i, data in enumerate(data_list):
some_fancy_stuff
data_list[i] = data * 2
Some explanation - enumerate will literally enumerate the items of the list with index i and also assigns data_list[i] to variable data. Than you can do whatever you want with data and its index i.
You could also use Python comprehension instead of loops, here is an example to illustrate it:
>>> a1 = [1,2,3]
>>> a2 = [4,5,6]
>>> a3 = [7,8,9]
>>> A = [a1, a2, a3]
>>>
>>> print [[x*2 for x in a] for a in A]
[[2, 4, 6], [8, 10, 12], [14, 16, 18]]
>>>
Let me explain it also.
The following construction is called comprehension:
a = [x*2 for x in X]
It produces an array (as you could see the brackets [ ]) of processed values (in the example value multiplication by two) from array X. It's as if we wrote:
a = []
for x in X:
a.append(x*2)
but in more concise way.
In your situation we used two comprehension one in one:
[x*2 for x in a]
and:
[ [....] for a in A]
So it's the same as if we did:
result = []
for a in A:
for x in a:
result.append(x*2)
but in more concise way again.
If you asked about variable names modification, so it is not possible in the Python without altering the language processor. But I guess you don't need it in that task at all.
I am trying to use array slicing to reverse part of a NumPy array. If my array is, for example,
a = np.array([1,2,3,4,5,6])
then I can get a slice b
b = a[::-1]
Which is a view on the original array. What I would like is a view that is partially reversed, for example
1,4,3,2,5,6
I have encountered performance problems with NumPy if you don't play along exactly with how it is designed, so I would like to avoid "fancy" indexing if it is possible.
If you don't like the off by one indices
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[1:4][::-1]
>>> a
array([1, 4, 3, 2, 5, 6])
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[3:0:-1]
>>> a
array([1, 4, 3, 2, 5, 6])
You can use the permutation matrices (that's the numpiest way to partially reverse an array).
a = np.array([1,2,3,4,5,6])
new_order_for_index = [1,4,3,2,5,6] # Careful: index from 1 to n !
# Permutation matrix
m = np.zeros( (len(a),len(a)) )
for index , new_index in enumerate(new_order_for_index ):
m[index ,new_index -1] = 1
print np.dot(m,a)
# np.array([1,4,3,2,5,6])