Apologies if this is a simple question - I'm new to Python and numpy - but I'd be very grateful for your help.
I've got a 2D numpy array of data arranged in rows and columns, with the first column representing time, and subsequent columns showing values of different parameters at each point in time.
I want to read down a given column of data from top to bottom (i.e. time = 0 to time = number of rows), and test each element in that column in sequence to find the very first instance (and only the first instance) where the data values in that column meet given criteria.
This is different to testing 'all' or 'any' of the elements in a column 'all at once' by testing and iterating using the numpy arange() function.
As a minimal working example in pseudocode, if my array is:
myarray =
[[1, 4 ....]
[2, 3 ....]
[3, 8 ....]
[4, 9 ....]....]
...where the first column is time, and the second column contains the values of data collected at each time point.
I want to be able to iterate over the rows in sequence from top to bottom and test:
threshold = 5
for row = 0 to number of rows:
if data in [column 1, row] > threshold:
print "The first time point at which the data exceed the threshold is at time = 3 "
break
What is the most Pythonic (i.e. efficient and intelligible) way of doing this?
Is it necessary to convert the array into a list before iterating & testing, or is it possible to sequentially iterate and test over the array directly?
Hope this makes some sort of sense...
Many thanks in anticipation
Dave
Try this code:
>>> myarray = [[1, 4 ], [2, 3 ], [3, 8 ], [4, 9 ]]
>>> stop = False
>>> for row in myarray:
for d in row:
if d > 5:
print("Row: ", row, "Data: ", d)
stop = True
break
if stop:
break
('Row: ', [3, 8], 'Data: ', 8)
>>>
Related
Given a list nested with numpy lists, I want to go through each index in each corresponding list and keep track of the count depending on the element, which would be stored into a single list.
The minimum runnable code example below better showcases the problem.
import numpy as np
lst = [
np.array([1,0,1]),
np.array([1,1,1]),
np.array([2,2,1])
]
print(lst)
#Problem, take count of each corresponding index within list
#If the element == 2, subtract one. If it's 1, add 1. 0 has no value
#Make the output just one list with the calculated count for each index
#Expected output:
#lst = [1, 0, 3]
# 1 in index 0 because there's two 1s in each lists first index, but a 2 subtracts one from the count
# 0 in index 1 since there's one 1 which brings the count to 1, but a 2 subtracts one from the count
# 3 in index 2 since there's three 1's, adding 3 to the count.
I have looked into from collections import defaultdict and from itertools import groupby, but I couldn't understand how to use that to do anything else but sort a nested list.
Apologies if this issue is within the two sources I provided above.
If all the inner arrays have the same length you could transform it into a 2-dimensional array to leverage numpy vectorization:
# transform into two dimensional array
arr = np.array(lst)
# set -1 where values are equals to 2
t = np.where(arr == 2, -1, arr)
# sum across first axis (each corresponding positions on the inners lists)
res = t.sum(axis=0).tolist()
print(res)
Output
[1, 0, 3]
I'm trying to make a script, where the input is an array with random numbers. I try to delete the lowest number in the array which is no problem. But if there are several occurrences of this number in the array, how do I make sure that it is only the first occurrence of this number that gets deleted?
Let's say we have the following array:
a = np.array([2,6,2,1,6,1,9])
Here the lowest number is 1, but since it occurs two times, I only want to remove the first occurence so i get the following array as a result:
a = np.array([2,6,2,6,1,9])
Since you're using NumPy, not native Python lists:
a = np.array([2,6,2,1,6,1,9])
a = np.delete(a, a.argmin())
print(a)
# [2 6 2 6 1 9]
np.delete: Return a new array with sub-arrays along an axis deleted.
np.argmin: Returns the indices of the minimum values along an axis.
With a NumPy array, you cannot delete elemensts with del as you can in a list.
A simple way to do this with a native Python list is:
>> a = [1,2,3,4,1,2,1]
>> del a[a.index(min(a))]
>> a
[2, 3, 4, 1, 2, 1]
You can simple do two things first sort and then shift array. For example
var list = [2, 1, 4, 5, 1];
list=list.sort(); // result would be like this [1,1,2,4,5]
list=list.shift();// result would be [1,2,4,5]
I am trying to select sparse matrix elements row-wise based on BFS output array. Suppose my BFS output is
[1, 2, 3, 6, 4, 7, 5, 8, 11, 9, 12, 10, 13, 15, 14, 16, 17, 18, 19, 20]
and I have a sparse matrix of 20x20 for example.
Now I want to use BFS output as row index and select nonzero values from the sparse matrix in same order as that of BFS output array and plot. Here is my code through which I can do some job but not perfectly what I wanted.
a = numpy.loadtxt('sparsematrix.txt', float, delimiter=',') # import data
y = numpy.reshape(a, np.size(a))
pos = np.delete(y, np.arange(0, y.size, 19))
plt.plot(pos)
plt.xlabel(sample)
plt.ylabel(position)
Problem with the above code is:
It selects every value in row-wise, but not in defined order of
my BFS output. (it should use BFS output array as row index number
to select nonzero values one-by-one)
It selects all the values, even zeros. - How to get only nonzero values?
Indexing is starting from 0 and goes to 19. I want indexing to start from 1 onwards.
Important Update
Now I get what you completly wanted by reading Christian's answer. I made another function to complement the one already given. See the whole program here:
sparseMatrix = ([0,4,5,0],[2,0,4,0],[0,3,3,0],[6,6,0,0])
iList = [3,1,2,4]
def AnalyzeSparseMatrix( sMatrix, iList ):
orderedArray = [] #The array you want as output
for i in iList:
orderedArray += AnalyzeRowWise(sMatrix[i-1]) #Add non-zero selected line from sparse matrix
return orderedArray #Returns a non-zero list ordered in the selected way by the BFS output list
def AnalyzeRowWise( oldArray ):
newMatrix = []
#Code to analize row wise
for data in oldArray:
if(data != 0): #Condition
newMatrix.append(data)
return newMatrix
#Test program
print (AnalyzeSparseMatrix(sparseMatrix, iList)) #Output: [3,3,4,5,2,4,6,6]
The new method AnalyzeSparseMatrix() takes two arguments, the first argument is the sparse matrix, the second argument is the BFS output list. The method returns a list, which is the desired list. So you can assign that list to another list you want, for example:
finalOrderedList = AnalyzeSparseMatrix( sparseMatrix, iList )
Find more details about what almost every line of code does, in the code above.
I've assumed that this is what you want:
bfs_output = list of row indexes where 1 is the first/top row of a matrix.
matrix m = some matrix with elements that can be 0 or non-zero
list l = a list composed of non-zero elements chosen from m
Elements from m are chosen as follows:
Select the row, r, in m indicated by the first/next value in bfs_output
Starting from the first column in r select the non-zero elements in r
Append the elements chosen in 2 to l
Repeat until no more row indexes in bfs_output
For example:
0 3 1
bfs_output = [2 3 1] & matrix = 0 2 0 ==> list = [2 4 3 1]
4 0 0
I am not sure if this is what you are after. But if it is, we can use numpy's build in selection functions to select non-zero elements from a numpy array and also to chose rows in the order we want.
from io import StringIO
import numpy as np
bfs_output = [2,3,1]
file = StringIO(u"0,3,1\n0,2,0\n4,0,0")
matrix = np.loadtxt(file, delimiter=",")
# we are subtracting 1 from all elements in bfs_output
# in order to comply with indexes starting from 1
select_rows = matrix[np.subtract(bfs_output,1)]
select_rows_1d = np.reshape(select_rows,np.size(select_rows))
list = select_rows_1d[select_rows_1d != 0]
print(list) # output = [2 4 3 1]
I have a Pandas series (which could be a list, this is not very important) of lists which contains (to simplify, but that could also be letters of words) positive and negative number,
such as
0 [12,-13,0,6]
1 [2,-3,8,233]
2 [0,6,8,3]
for each of these, i want to fill a row in a three columns data frame, with a list of all positive values, a list of all negative values, and a list of all values comprised in some interval. Such as:
[[12,6],[-13],[0,6]]
[[2,8,233],[-3],[2,8]]
[[6,8,3],[],[6,8,3]]
What I first thought was using a list comprehension to generate a list of triadic lists of lists, which would be converted using pd.DataFrame to the right form.
This was because i don't want to loop over the list of lists 3 times to apply each time a new choice heuristics, feels slow and dull.
But the problem is that I can't actually generate well the lists of the triad [[positive],[negative], [interval]].
I was using a syntax like
[[[positivelist.extend(number)],[negativelist], [intervalist.extend(number)]]\
for listofnumbers in listoflists for number in listofnumbers\
if number>0 else [positivelist],[negativelist.extend(number)], [intervalist.extend(number)]]
but let be honest, this is unreadable, and anyway it doesn't do what I want since extend yields none.
So how could I go about that without looping three times (I could have many millions elements in the list of lists, and in the sublists, and I might want to apply more complexe formulae to these numbers, too, it is a first approach)?
I thought about using functional programming, map/lambda; but it is unpythonic. The catch is: what in python may help to do it right?
My guess would be something as:
newlistoflist=[]
for list in lists:
positive=[]
negative=[]
interval=[]
for element in list:
positive.extend(element) if element>0
negative.extend(element) if element<0
interval.extend(element) if n<element<m
triad=[positive, negative,interval]
newlistoflist.append(triad)
what do you think?
You can do:
import numpy
l = [[12,-13,0,6], [2,-3,8,233], [0,6,8,3]]
l = numpy.array([x for e in l for x in e])
positive = l[l>0]
negative = l[l<0]
n,m = 1,5
interval = l[((l>n) & (l<m))]
print positive, negative, interval
Output: [ 12 6 2 8 233 6 8 3] [-13 -3] [2 3]
Edit: Triad version:
import numpy
l = numpy.array([[12,-13,0,6], [2,-3,8,233], [0,6,8,3]])
n,m = 1,5
triad = numpy.array([[e[e>0], e[e<0], e[((e>n) & (e<m))]] for e in l])
print triad
Output:
[[array([12, 6]) array([-13]) array([], dtype=int64)]
[array([ 2, 8, 233]) array([-3]) array([2])]
[array([6, 8, 3]) array([], dtype=int64) array([3])]]
I have 3 numpy recarrays with following structure.
The first column is some position (Integer) and the second column is a score (Float).
Input:
a = [[1, 5.41],
[2, 5.42],
[3, 12.32],
dtype=[('position', '<i4'), ('score', '<f4')])
]
b = [[3, 8.41],
[6, 7.42],
[4, 6.32],
dtype=[('position', '<i4'), ('score', '<f4')])
]
c = [[3, 7.41],
[7, 6.42],
[1, 5.32],
dtype=[('position', '<i4'), ('score', '<f4')])
]
All 3 arrays contain the same amount of elements.
I am looking for an efficient way to combine these three 2d arrays into one array based on the position column.
The output arary for the example above should look like this:
Output:
output = [[3, 12.32, 8.41, 7.41],
dtype=[('position', '<i4'), ('score1', '<f4'),('score2', '<f4'),('score3', '<f4')])]
Only the row with position 3 is in the output array because this position appears in all 3 input arrays.
Update: My naive approach would be following steps:
create vector of the first columns of my 3 input arrays.
use intersect1D to get the intersection of these 3 vectors.
somehow retrieve indexes for the vector for all 3 input arrays.
create new array with filtered rows from the 3 input arrays.
Update2:
Each position value can be in one, two or all three input arrays. In my output array I only want to include rows for position values which appear in all 3 input arrays.
Here is one approach, I believe it should be reasonably fast. I think the first thing you want to do is count the number occurrences for each position. This function will handle that:
def count_positions(positions):
positions = np.sort(positions)
diff = np.ones(len(positions), 'bool')
diff[:-1] = positions[1:] != positions[:-1]
count = diff.nonzero()[0]
count[1:] = count[1:] - count[:-1]
count[0] += 1
uniqPositions = positions[diff]
return uniqPositions, count
Now using the function form above you want to take only the positions that occur 3 times:
positions = np.concatenate((a['position'], b['position'], c['position']))
uinqPos, count = count_positions(positions)
uinqPos = uinqPos[count == 3]
We will be using search sorted so we sort a b and c:
a.sort(order='position')
b.sort(order='position')
c.sort(order='position')
Now we can user search sorted to find where in each array to find each of our uniqPos:
new_array = np.empty((len(uinqPos), 4))
new_array[:, 0] = uinqPos
index = a['position'].searchsorted(uinqPos)
new_array[:, 1] = a['score'][index]
index = b['position'].searchsorted(uinqPos)
new_array[:, 2] = b['score'][index]
index = c['position'].searchsorted(uinqPos)
new_array[:, 3] = c['score'][index]
There might be a more elegant solution using dictionaries, but I thought of this one first so I'll leave that to someone else.