I have a list of lists
list = [[-2.0, 5.0], [-1.0, -3.0], [1.0, 3.0], [2.0, -5.0]]
What I want to do is delete one the elements of same value should I divide first element with second. For example [-2.0, 5.0] = -2/5 and [2.0, -5.0] = -2/5. I want to delete either [-2.0, 5.0] or [2.0, -5.0] since they produce the same value.
Any ideas?
Can i try like this:
Tuple could be a dictionary key, so I converted the list into tuple after changing to abs
value of the list element and keeping the original list as the values.
>>> lis
[[-2.0, 5.0], [-1.0, -3.0], [1.0, 3.0], [2.0, -5.0]]
>>> dict([(tuple([abs(x[0]), abs(x[1])]), x) for x in lis]).values()
[[2.0, -5.0], [1.0, 3.0]]
>>>
Assuming all your values are all floats (so you can always use float division) you can do the following:
my_list = [[-2.0, 5.0], [-1.0, -3.0], [1.0, 3.0], [2.0, -5.0]]
values_seen = []
new_list = []
for x,y in my_list:
if x/y in values_seen:
continue
else:
values_seen.append(x/y)
new_list.append([x,y])
Now the list you want will be stored as new_list. Note that you should avoid writing a value to the keyword list as you have above.
*Clarification, I am assuming that if you have any more than 2 values that return the same ratio (for example [[1,3],[2,6],[3,9]]) you will want to keep only one of these.
If you want to eliminate all equivalent fractions (meaning [-2.0, 5.0] and [4.0, -10.0] are considered equivalent), then the following code would work.
seen = set()
for numerator, denominator in lst:
quotient = numerator / denominator
if quotient not in seen:
seen.add(quotient)
yield numerator, denominator
Otherwise, if you want the final list to contain both [-2.0, 5.0] and [4.0, -10.0]:
seen = set()
for numerator, denominator in lst:
value = (abs(numerator), abs(denominator), sign(numerator)*sign(denominator))
if value not in seen:
seen.add(value)
yield numerator, denominator
If you're writing this in Python, a language that lacks a sign function, you'll either need to use math.copysign or (numerator > 0) ^ (denominator > 0) where ^ is the xor operator.
This code assumes both numerator and denominator are nonzero.
If you really are keeping a list of numerator-denominator number pairs, consider storing the pairs as immutable tuples or better yet, as Python fractions.
I would first get a unique set of ratios using set:
In [1]: lst = [[-2.0, 5.0], [-1.0, -3.0], [1.0, 3.0], [2.0, -5.0]]
In [2]: rs = list(set([ l[0]/l[1] for l in lst]))
And then just filter out the first occurance of the ratios:
In [3]: [ filter(lambda m: m[0]/m[1] == r , lst )[0] for r in rs ]
Out[3]: [[-2.0, 5.0], [-1.0, -3.0]]
In [4]:
Quick and dirty way, since keys in a dictionary are unique.
{num/denom : [num, denom] for (num, denom) in lst}.values()
In general, comparing floats using == is unreliable, it's normally better to check if they're within a tolerance. e.g.
abs(x-y) < tolerance
a more robust way would might look like the following. An else attached to a for loop just means do this unless you exited the loop early. It's quite handy. This version, however, is quadratic rather than linear time.
div = lambda x,y : x/y
unique = []
for j in range(len(lst)):
for i in range(j):
if abs( div(*lst[i])-div(*lst[j]) ) < tolerance:
break
else
unique.append(lst[j])
unique
Related
This question already has answers here:
Rolling or sliding window iterator?
(29 answers)
Closed 6 days ago.
I have an array of digits: array = [1.0, 1.0, 2.0, 4.0, 1.0]
I would like to create a function that extracts sequences of digits from the input array and appends to one of two lists depending on defined conditions being met
The first condition f specifies the number of places to look ahead from index i and check if a valid index exists. If true, append array[i] to list1. If false, append to list2.
I have implemented it as follows:
def somefunc(array, f):
list1, list2 = [], []
for i in range(len(array)):
if i + f < len(array):
list1.append(array[i])
else:
list2.append(array[i])
return list1, list2
This functions correctly as follows:
somefunc(array,f=1) returns ([1.0, 1.0, 2.0, 4.0], [1.0])
somefunc(array,f=2) returns ([1.0, 1.0, 2.0], [4.0, 1.0])
somefunc(array,f=3) returns ([1.0, 1.0], [2.0, 4.0, 1.0])
However, I would like to add a second condition to this function, b, that specifies the window length for previous digits to be summed and then appended to the lists according to the f condition above.
The logic is this:
iterate through array and at each index i check if i+f is a valid index.
If true, append the sum of the previous b digits to list1
If false, append the sum of the previous b digits to list2
If the length of window b isn't possible (i.e. b=2 when i=0) continue to next index.
With both f and b conditions implemented. I would expect:
somefunc(array,f=1, b=1) returns ([1.0, 1.0, 2.0, 4.0], [1.0])
somefunc(array,f=1, b=2) returns ([2.0, 3.0, 6.0], [5.0])
somefunc(array,f=2, b=2) returns ([2.0, 3.0], [6.0, 5.0])
My first challenge is implementing the b condition. I cannot seem to figure out how. see edit below
I also wonder if there is a more efficient approach than the iterative method I have begun?
Given only the f condition, I know that the following functions correctly and would bypass the need for iteration:
def somefunc(array, f):
return array[:-f], array[-f:]
However, I again don't know how to implement the b condition in this approach.
Edit
I have managed an iterative solution which implements the f and b conditions:
def somefunc(array, f, b):
list1, list2 = [], []
for i in range(len(array)):
if i >= (b-1):
if i + f < len(array):
list1.append(sum(array[i+1-b: i+1]))
else:
list2.append(sum(array[i+1-b: i+1]))
return list1, list2
However, the indexing syntax feels horrible and I so I am certain there must be a more elegant solution. Also, anything with improved runtime would really be preferable.
I can see two minor improvements you could implement in your code:
def somefunc(array, f, b):
list1, list2 = [], []
size = len(array) # Will only measure the length of the array once
for i in range(b-1, size): # By starting from b-1 you can remove an if statement
if i + f < size: # We use the size here
list1.append(sum(array[i+1-b: i+1]))
else:
list2.append(sum(array[i+1-b: i+1]))
return list1, list2
Edit:
An ever better solution would be to add the new digit and substract the last at each iteration. This way you don't need to redo the whole sum each iteration:
def somefunc(array, f, b):
list1, list2 = [], []
value = 0
size = len(array)
for i in range(b-1, size):
if value != 0:
value = value - array[i-b] + array[i] # Get the last value, add the value at index i and remove the value at index i-b
else:
value = sum(array[i+1-b: i+1])
if i + f < size:
list1.append(value)
else:
list2.append(value)
return list1, list2
In the following code I want to check how many unique values are in the list and this can be done in for loop. After knowing the number of unique values I want to see how many times a single unique values appear in a and then I want to count their number. Can someone please guide me how to do that. List contains floating points. What if I convert it in numpy array and then find same values.
`a= [1.0, 1.0, 1.0, 1.0, 1.5, 1.5, 1.5, 3.0, 3.0]
list = []
for i in a:
if i not in list:
list.append(i)
print(list)
for j in range(len(list))
g= np.argwhere(a==list[j])
print(g)`
You can use np.unique to get it done
np.unique(np.array(a),return_counts=True)
You can also do it using counters from collections
from collections import Counter
Var=dict(Counter(a))
print(Var)
The primitive way is to use loops
[[x,a.count(x)] for x in set(a)]
If you are not familiar with list comprehensions, this is its explaination
ls=[]
for x in set(a):
ls.append([x,a.count(x)])
print(ls)
If you want it using if else,
counter = dict()
for k in a:
if not k in counter:
counter[k] = 1
else:
counter[k] += 1
print(counter)
I have a list of around 131000 arrays, each of length 300. I am using python
I want to check which of the arrays are repeating in this list. I am trying this by comparing each array with others. like :
Import numpy as np
wordEmbeddings = [[0.8,0.4....upto 300 elements]....upto 131000 arrays]
count = 0
for i in range(0,len(wordEmbeddings)):
for j in range(0,len(wordEmbeddings)):
if i != j:
if np.array_equal(wordEmbeddings[i],wordEmbeddings[j]):
count += 1
this is running very slowly, It might take hours to finish, how can I do this efficiently ?
You can use collections.Counter to count the frequency of each sub list
>>> from collections import Counter
>>> Counter(list(map(tuple, wordEmbeddings)))
We need to cast the sublist to tuples since list is unhashable i.e. it cannot be used as a key in dict.
This will give you result like this:
>>> Counter({(...4, 5, 6...): 1, (...1, 2, 3...): 1})
The key of Counter object here is the list and value is the number of times this list occurs. Next you can filter the resulting Counter object to only yield elements where value is > 1:
>>> items = Counter(list(map(tuple, wordEmbeddings)))
>>> list(filter(lambda x: items[x] > 1,items))
Timeit results:
$ python -m timeit -s "a = [range(300) for _ in range(131000)]" -s "from collections import Counter" "Counter(list(map(tuple, a)))"
10 loops, best of 3: 1.18 sec per loop
You can remove duplicate comparisons by using
for i in range(0,len(wordEmbeddings)):
for j in range(i,len(wordEmbeddings)):
You could look in to pypy for general purpose speed ups.
It might also be worth looking into hashing the arrays somehow.
Here's a question on the speeding up np array comparison. Do the order of the elements matter to you?
You can use set and tuple to find duplicated arrays inside another array. Create a new list contains tuples, we use tuples because lists are unhashable type. And then filter new list with using set.
tuple = list(map(tuple, wordEmbeddings))
duplications = set([t for t in tuple if tuple.count(t) > 1])
print(duplications)
maybe you can reduce the initial list to unique hashes, or non-unique sums,
and go over the hashes first - which may be a faster way to compare elements
I suggest you first sort the list (might also be helpful for further processing) and then compare. The advantage is that you only need to compare every array element to the previous one:
import numpy as np
from functools import cmp_to_key
wordEmbeddings = [[0.8, 0.4, 0.3, 0.2], [0.2,0.3,0.7], [0.8, 0.4, 0.3, 0.2], [ 1.0, 3.0, 4.0, 5.0]]
def smaller (x,y):
for i in range(min(len(x), len(y))):
if x[i] < y[i]:
return 1
elif y[i] < x[i]:
return -1
if len(x) > len(y):
return 1
else:
return -1
wordEmbeddings = sorted(wordEmbeddings, key=cmp_to_key(smaller))
print(wordEmbeddings)
# output: [[1.0, 3.0, 4.0, 5.0], [0.8, 0.4, 0.3, 0.2], [0.8, 0.4, 0.3, 0.2], [0.2, 0.3, 0.7]]
count = 0
for i in range(1, len(wordEmbeddings)):
if (np.array_equal(wordEmbeddings[i], wordEmbeddings[i-1])):
count += 1
print(count)
# output: 1
If N is the length of word embedding and n is the length of the inner array, then your approach was to do O(N*N*n) comparisons. When reducing the comparisons as in con--'s answer, then you still have O(N*N*n/2) comparisons.
Sorting will take O(N*log(N)*n) time and the subsequent step of counting only takes O(N*n) time which all in all is shorter than O(N*N*n/2)
I have a nested list which looks like this.
[[0.0, 1.4142135623730951, 2.8284271247461903, 2.23606797749979],
[1.4142135623730951, 0.0, 1.4142135623730951, 1.0],
[2.8284271247461903, 1.4142135623730951, 0.0, 1.0],
[2.23606797749979, 1.0, 1.0, 0.0]]
I want to find the minimum element in every sub list. Thanks for the help!
Well because others are already posting answers, you can store the minimum value of each sublist in a list using what is called list comprehension like so:
new_s = [min(x) for x in s]
Python has a built-in min() function that takes an iterable (i.e. one of your sublists) and finds the minimum value. By using list comprehension you build a list of those values. It can be read as:
"A list of minimum values for each x (sublist) in s (parent list)"
Edit: For commented use:
new_s = [sorted(x)[1] for x in s]
Can be read as:
"A list of the 2nd element in the sorted array of x for each x (sublist) in s (parent list)"
You can use map, which is slightly more efficient than list comprehension when utilizing a builtin function, in this case min:
s = [[0.0, 1.4142135623730951, 2.8284271247461903, 2.23606797749979],
[1.4142135623730951, 0.0, 1.4142135623730951, 1.0],
[2.8284271247461903, 1.4142135623730951, 0.0, 1.0],
[2.23606797749979, 1.0, 1.0, 0.0]]
new_s = list(map(min, s))
Output:
[0.0, 0.0, 0.0, 0.0]
An alternative list comprehension as #pstatix mentioned:
new_s = [min(i) for i in s]
Hi I am quite new to python and what I want to do is simple but I just can't seem to get around it.
I have a simple array as shown below:
A1 = [('1.000000', '4.000000'), ('2.000000', '5.000000'), ('3.000000', '6.000000'), ('1.000000', '4.000000'), ('2.000000', '5.000000'), ('3.000000', '6.000000')]
I want to change all elements within the array into floats so I can do calculations on them (such as sum etc.). The end results should look something like this:
A2 = [(1.000000, 4.000000), (2.000000, 5.000000), (3.000000, 6.000000), (1.000000, 4.000000), (2.000000, 5.000000), (3.000000, 6.000000)]
I have tried the following:
A2 = [float(i) for i in A1]
however I get the error:
TypeError: float() argument must be a string or a number
Could anyone point me towards a solution.
Thanks in advance
Here's one pretty simple way:
>>> [map(float, x) for x in A1]
[[1.0, 4.0], [2.0, 5.0], [3.0, 6.0], [1.0, 4.0], [2.0, 5.0], [3.0, 6.0]]
I like it's because it's short (some would say terse) and since using map() makes it not hardcode or be explicit about the expected format of each x, it just says that it assumes A1 to be a list of sequences.
I have no idea how this compares performance-wise to other solutions (such as the more explicit [(float(x), float(y) for (x, y) in A1] seen below).
Each element of A1 is a tuple ('1.000000', '4.000000'). You will have to convert each item of the tuple:
A2 = [(float(i), float(j)) for (i, j) in A1]
You need to iterate over the inner tuples as well.
A2 = [tuple(float(s) for s in i) for i in A1]