Removing values >0 in data set - python

I have a data set which is a list of lists, looking like this:
[[-0.519418066, -0.680905835],
[0.895518429, -0.654813183],
[0.092350219, 0.135117023],
[-0.299403315, -0.568458405],....]
its shape is (9760,) and I am trying to remove all entries where the value of the first number in each entry is greater than 0, so in this example the 2nd and 3rd entries would be removed to leave
[[-0.519418066, -0.680905835],
[-0.299403315, -0.568458405],....]
So far I have written:
for x in range(9670):
for j in filterfinal[j][0]:
if filterfinal[j][0] > 0:
np.delete(filterfinal[j])
this returns: TypeError: list indices must be integers or slices, not list
Thanks in advance for any help on this problem!

You can use numpy's boolean indexing:
>>> x = np.random.randn(10).reshape((5,2))
array([[-0.46490993, 0.09064271],
[ 1.01982349, -0.46011639],
[-0.40474591, -1.91849573],
[-0.69098115, 0.19680831],
[ 2.00139248, -1.94348869]])
>>> x[x[:,0] > 0]
array([[ 1.01982349, -0.46011639],
[ 2.00139248, -1.94348869]])
Some explanation:
x[:,0] selects the first column of your array.
x > 0 will return an array of the same shape where each value is replaced by the result of the element-wise comparison (i.e., is the value > 0 or not?)
So, x[:,0] > 0 will give you an array of shape (n,1) with True or False values depending on the first value of your row.
You can then pass this array of booleans as an index to your original array, where it will return you an array of only the indexes that are True. By passing in a boolean array of shape (n,1), you select per row.

You are talking about "shape", so I assume that you are using numpy. Also, you are mentioning np in your example code, so you are able to apply element wise operations together with boolean indexing
array = np.array([[-0.519418066, -0.680905835],
[0.895518429, -0.654813183],
[0.092350219, 0.135117023],
[-0.299403315, -0.568458405]])
filtered = array[array[:, 0] < 0]

Use a list comprehension:
lol = [[-0.519418066, -0.680905835],[0.895518429, -0.654813183],[0.092350219, 0.135117023],[-0.299403315, -0.568458405]]
filtered_lol = [l for l in lol if l[0] <= 0]

You can use a list comprehension that unpacks the first item from each sub-list and retains only those with the first item <= 0 (assuming your list of lists is stored as variable l):
[l for a, _ in l if a <= 0]

You can go through this in a for loop and making a new list without the positives like so:
new_list = []
for item in old_list:
if item[0] < 0:
new_list.append(item)
But I'd prefer to instead use the in built filter function if you are comfortable with it and do something like:
def is_negative(number):
return number < 0
filtered_list = filter(is_negative, old_list)
This is similar to a list comprehension - or just using a for loop. However it returns a generator instead so you never have to hold two lists in memory making the code more efficient.

Related

Searching index position in python

cols = [2,4,6,8,10,12,14,16,18] # selected the columns i want to work with
df = pd.read_csv('mywork.csv')
df1 = df.iloc[:, cols]
b= np.array(df1)
b
outcome
b = [['WV5 6NY' 'RE4 9VU' 'BU4 N90' 'TU3 5RE' 'NE5 4F']
['SA8 7TA' 'BA31 0PO' 'DE3 2FP' 'LR98 4TS' 0]
['MN0 4NU' 'RF5 5FG' 'WA3 0MN' 'EA15 8RE' 'BE1 4RE']
['SB7 0ET' 'SA7 0SB' 'BT7 6NS' 'TA9 0LP' 'BA3 1OE']]
a = np.concatenate(b) #concatenated to get a single array, this worked well
a = np.array([x for x in a if x != 'nan'])
a = a[np.where(a != '0')] #removed the nan
print(np.sort(a)) # to sort alphabetically
#Sorted array
['BA3 1OE' 'BA31 0PO' 'BE1 4RE' 'BT7 6NS' 'BU4 N90'
'DE3 2FP' 'EA15 8RE' 'LR98 4TS' 'MN0 4NU', 'NE5 4F' 'RE4 9VU'
'RF5 5FG' 'SA7 0SB' 'SA8 7TA' 'SB7 0ET' 'TA9 0LP' 'TU3 5RE'
'WA3 0MN' 'WV5 6NY']
#Find the index position of all elements of b in a(sorted array)
def findall_index(b, a )
result = []
for i in range(len(a)):
for j in range(len(a[i])):
if b[i][j] == a:
result.append((i, j))
return result
print(findall_index(0,result))
I am still very new with python, I tried finding the index positions of all element of b in a above. The underneath codes blocks doesn't seem to be giving me any result. Please can some one help me.
Thank you in advance.
One way you could approach this is by zipping (creating pairs) the index of elements in b with the actual elements and then sorting this new array based on the elements only. Now you have a mapping from indices of the original array to the new sorted array. You can then just loop over the sorted pairs to map the current index to the original index.
I would highly suggest you to code this yourself, since it will help you learn!

Generating a list using another list and an index list

Suppose I have the following two list and a smaller list of indices:
list1=[2,3,4,6,7]
list2=[0,0,0,0,0]
idx=[1,2]
I want to replace the values in list 2 using the values in list 1 at the specified indices.
I could do so using the following loop:
for i in idx:
list2[i]=list1[i]
If I just have list1 and idx , how could I write a list comprehension to generate list2 (same length as list1)such that list2 has values of list1 at indices idx or 0 otherwise.
This will call __contains__ on every call for idx but should be reasonable for small(ish) lists.
list2 = [list1[i] if i in idx else 0 for i in range(len(list1))]
or
list2 = [e if i in idx else 0 for i, e in enumerate(list1)]
Also, do not write code like this. It is much less readable than your example. Furthermore, numpy may give you the kind of syntax you desire without sacrificing readability or speed.
import numpy as np
...
arr1 = np.array(list1)
arr2 = np.zeros_like(list1)
arr2[idx] = arr1[idx]
I assume that you want to generate list2 by using appending values of list1 at specific indexes. All you need to do this is to check whether the idx list contains any values and then use a for each loop to append the specific list1 values to list2. If idx is empty then you would only append list1[0] to list2.
if(len(idx) > 0):
for i in idx:
list2.append(list1[i])
else:
list2.append(list1[0])

How to find max element index for a specific value in list composed of class objects?

I want to find a maximum elements index in a nested list for each row.
I got this error:
maxe = array[i].index(max(e_object.x for e_object in array[i]))
ValueError: 5 is not in list
class e_object():
def __init__(self,x):
self.x = x
array = []
array.append( [])
array[0].append(e_object(0))
array[0].append(e_object(2))
array[0].append(e_object(-3))
array[0].append(e_object(5))
array.append( [])
array[1].append(e_object(0))
array[1].append(e_object(2))
array[1].append(e_object(8))
array[1].append(e_object(5))
max_array = []
for i in range(len(array)):
maxe = array[i].index(max(e_object.x for e_object in array[i]))
max_array.append(maxe)
print(max_array)
How can I get this result?
[3,2]
Use a list comprehension to convert the nested list into one of x values, and then use np.argmax:
import numpy as np
np.argmax([[element.x for element in row] for row in array], axis=1)
Output:
array([3, 2], dtype=int64)
The problem is line
maxe = array[i].index(max(e_object.x for e_object in array[i]))
You are asking the index of object.x, but the list actually contains object. Change the max function to look inside the objects x attribute
maxe = array[i].index(max(array[i], key=lambda o: o.x))
The error happens because your max effectively operates on a list of integers [0, 2, -3, 5], while index searches in a list of e_objects. There are any number of ways of fixing this issue.
The simplest is probably to just have max return the index:
max_array = [max(range(len(a)), key=lambda x: a[x].x) for a in array]
This is very similar to using numpy's argmax, but without the heavy import and intermediate memory allocations. Notice that this version does not require two passes over each list since you don't call index on the result of max.
A more long term solution would be to add the appropriate comparison methods, like __eq__ and __gt__/__lt__ to the e_object class.

how to get a range of values from a list on python?

I'd like to do the following Matlab code:
indexes=find(data>0.5);
data2=data(indexes(1):indexes(length(indexes))+1);
in Python, so I did:
indexes=[x for x in data if x>0.5]
init=indexes[1]
print(indexes)
end=indexes[len(indexes)]+1
data2=data[init:end]
but I'm getting this error:
end=indexes[len(indexes)]+1 IndexError: list index out of range
I think the indexes in Python may not be the same ones as I get in Matlab?
Your list comprehension isn't building a list of indices, but a list of the items themselves. You should generate the indices alongside the items using enumerate:
ind = [i for i, x in enumerate(data) if x > 0.5]
And no need to be so verbose with slicing:
data2 = data[ind[0]: ind[-1]+1] # Matlab's index 1 is Python's index 0
Indexing the list of indices with len(ind) will give an IndexError as indexing in Python starts from 0 (unlike Matlab) and the last index should be fetched with ind[len(ind)-1] or simply ind[-1].
len(indexes) will give you the index of the last element of the list, so that value plus 1 is out of the range of the list.
It looks like what you're trying to do is find the indices of the list that have values of greater that 0.5 and put those values into data2. This is better suited to a numpy array.
import numpy as np
data2 = data[data > 0.5]

Function squaring 2-d array python

I have a function that takes in any 2-d array and return a 2-d array (the same format as the array being implemented) but the values are squared.
i.e [[1,2],[3,4]] -----> [[1,4],[9,16]]
my code so far:
m0 = [[1,2],[3,4]]
empty_list = []
for x in m0:
for i in x:
empyt_list.append(x**2)
This gives me a 1-d array but how would i return a 2-d array as the imputed value?
You can make a recursive function to handle any depth of nested lists:
def SquareList(L):
if type(L) is list:
return [SquareList(x) for x in L]
else:
return L**2
Example:
> print(SquareList([1,[3],[2,[3]],4]))
[1, [9], [4, [9]], 16]
Working with an outer list
The point is that you will need an extra list outside to store the columns. So we can introduce temporary lists we build up and add as rows:
m0 = [[1,2],[3,4]]
result = []
for sublist in m0:
row = []
for item in sublist:
row.append(item**2)
result.append(row)
Notice that we here iterate over the items of the sublist.
Using list comprehension
We can however write this more elegantly with list comprehension
result = [[x*x for x in sublist] for sublist in m0]
Note: if you have to square a number x, it is usually more efficient to use x * x, then to write x ** 2.
Using numpy (for rectangular lists)
In case the list is rectangular (all sublists have the same length), we can use numpy instead:
from numpy import array
a0 = array(m0)
result = a0 ** 2
You can just do this by a list comprehension:
empty_list = [[m0[i][j]**2 for j in range(len(m0[i]))] for i in range(len(m0))]
Or like your Codestyle:
empty_list = m0
for i in range(len(m0)):
for j in range(len(m0[i])):
empty_list[i][j] = m0[i][j] ** 2
Your problem is that you never created a 2D-list and you just append the values on the created 1D-list.

Categories