I have a 3 element tuple where one of the elements is dissimilar from the other two. For example, it could be something like: (0.456, 0.768, 0.456).
What is the easiest way to find the index of this dissimilar element? One way I can think of is consider index (0, 1) and (1, 2) and one of these will be dissimilar. If it is (0, 1) then compare their elements to the element at 2 otherwise, compare elements of (1, 2) to index 0 to find the dissimilar element.
Feels like I am missing a pythonic way to do this.
A simple approach:
def func(arr):
x, y, z = arr
return 2 * (x == y) + (x == z)
Test:
func(['B', 'A', 'A'])
# 0
func(['A', 'B', 'A'])
# 1
func(['A', 'A', 'B'])
# 2
You could count the occurences of each element in the list then find the index of the place where only one element exists but I have a feeling this may not be as performant as your solution. It also wouldn't work if all 3 values are distinct.
my_tuple[[my_tuple.count(x) for x in my_tuple].index(1)]
You could try this:
index = [my_tuple.index(i) for i in my_tuple if my_tuple.count(i) == 1][0]
I'm not sure it is great performance-wise though.
What may look like A huge overkill in python 3, but couldn't help posting:
import collections
a = (0.768, 0.456, 0.456)
print("Dissimilar object index: ", a.index(list(collections.Counter(a).keys())[list(collections.Counter(a).values()).index(1)]))
Explanation:
collections.Counter(a): will return a frequency dict e.g {0.768:1, 0.456:2} etc. Then we just create a list to leverage index(1) to find out the value that's odd one out. Then we use a.index(odd_one_out_val) to find index.
Related
I'm trying to get the number of the two elements that are the most frequent in an array. For example, in the list ['aa','bb','cc','dd','bb','bb','cc','ff'] the number of the most frequent should be 3(the number of times 'bb' appear in the array) and the second most frequent 2(number of times 'cc' appear in the array).
I tried this:
max = 0
snd_max = 0
for i in x:
aux=x.count(i)
if aux > max
snd_max=max
max=aux
print(max, snd_max)
But I was in doubt if there is an easier way?
You can use collections.Counter:
from collections import Counter
x = ['aa','bb','cc','dd','bb','bb','cc','ff']
counter = Counter(x)
print(counter.most_common(2))
[('bb', 3), ('cc', 2)]
Try this:
l = ['aa','bb','cc','dd','bb','bb','cc','ff']
b = list(dict.fromkeys(l))
a = [(l.count(x), x) for x in b]
a.sort(reverse=True)
a = a[:2]
print(a)
I use max(), it's simple.
lst = ['aa','bb','cc','dd','bb','bb','cc','ff']
print(max(set(lst), key=lst.count))
You could use pandas value_counts()
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html
Put the list into the dataframe, then use value counts.
That will give you a dataframe with each element and and how many times it appears, sorted by the most common on top.
I need to check to see if a list of tuples is sorted, by the first attribute of the tuple. Initially, I thaught to check this list against its sorted self. such as...
list1 = [(1, 2), (4, 6), (3, 10)]
sortedlist1 = sorted(list1, reverse=True)
How can I then check to see if list1 is identical to sortedlist1? Identical, as in list1[0] == sortedlist1[0], and list1[1] == sortedlist1[1].
The list may have a length of 5 or possibly 100, so carrying out list1[0] == sortedlist1[0], and list1[1] == sortedlist1[1] would not be an option because I am not sure how long the list is.
Thanks
I believe you can just do list1 == sortedlist1, without having to look into each element individually.
#joce already provided an excellent answer (and I would suggest accepting that one as it is more concise and directly answers your question), but I wanted to address this portion of your original post:
The list may have a length of 5 or possibly 100, so carrying out list1[0] == sortedlist1[0], and list1[1] == sortedlist1[1] would not be an option because I am not sure how long the list is.
If you want to compare every element of two lists, you do not need to know exactly how long the lists are. Programming is all about being lazy, so you can bet no good programmer would write out that many comparisons by hand!
Instead, we can iterate through both lists with an index. This will allow us to perform operations on each element of the two lists simultaneously. Here's an example:
def compare_lists(list1, list2):
# Let's initialize our index to the first element
# in any list: element #0.
i = 0
# And now we walk through the lists. We have to be
# careful that we do not walk outside the lists,
# though...
while i < len(list1) and i < len(list2):
if list1[i] != list2[i]:
# If any two elements are not equal, say so.
return False
# We made it all the way through at least one list.
# However, they may have been different lengths. We
# should check that the index is at the end of both
# lists.
if i != (len(list1) - 1) or i != (len(list2) - 2):
# The index is not at the end of one of the lists.
return False
# At this point we know two things:
# 1. Each element we compared was equal.
# 2. The index is at the end of both lists.
# Therefore, we compared every element of both lists
# and they were equal. So we can safely say the lists
# are in fact equal.
return True
That said, this is such a common thing to check for that Python has this functionality built in through the quality operator, ==. So it's much easier to simply write:
list1 == list2
If you want to check if a list is sorted or not a very simple solution comes to mind:
last_elem, is_sorted = None, True
for elem in mylist:
if last_elem is not None:
if elem[0] < last_elem[0]:
is_sorted = False
break
last_elem = elem
This has the added advantage of only going over your list once. If you sort it and then compare it, you're going over the list at least greater than once.
If you still want to do it that way, here's another method:
list1 = [(1, 2), (4, 6), (3, 10)]
sortedlist1 = sorted(list1, reverse=True)
all_equal = all(i[0] == j[0] for i, j in zip(list1, sortedlist1))
In python 3.x, you can check if two lists of tuples
a and b are equal using the eq operator
import operator
a = [(1,2),(3,4)]
b = [(3,4),(1,2)]
# convert both lists to sets before calling the eq function
print(operator.eq(set(a),set(b))) #True
Use this:
sorted(list1) == sorted(list2)
Say I have 3 integer arrays: {1,2,3}, {2,3}, {1}
I must take exactly one element from each array, to form a new array where all numbers are unique. In this example, the correct answers are: {2,3,1} and {3,2,1}. (Since I must take one element from the 3rd array, and I want all numbers to be unique, I must never take the number 1 from the first array.)
What I have done:
for a in array1:
for b in array2:
for c in array3:
if a != b and a != c and b != c:
AddAnswer(a,b,c)
This is brute force, which works, but it doesn't scale well. What if now we are dealing with 20 arrays instead of just 3. I don't think it's good to write a 20 nested for-loops. Is there a clever way to do this?
What about:
import itertools
arrs = [[1,2,3], [2,3], [1]]
for x in itertools.product(*arrs):
if len(set(x)) < len(arrs): continue
AddAnswer(x)
AddAnswer(x) is called twice, with the tuples:
(2, 3, 1)
(3, 2, 1)
You can think of this as finding a matching in a bipartite graph.
You are trying to select one element from each set, but are not allowed to select the same element twice, so you are trying to match sets to numbers.
You can use the matching function in the graph library NetworkX to do this efficiently.
Python example code:
import networkx as nx
A=[ [1,2,3], [2,3], [1] ]
numbers = set()
for s in A:
for n in s:
numbers.add(n)
B = nx.Graph()
for n in numbers:
B.add_node('%d'%n,bipartite=1)
for i,s in enumerate(A):
set_name = 's%d'%i
B.add_node(set_name,bipartite=0)
for n in s:
B.add_edge(set_name,n)
matching = nx.maximal_matching(B)
if len(matching) != len(A):
print 'No complete matching'
else:
for number,set_name in matching:
print 'choose',number,'from',set_name
This is a simple, efficient method for finding a single matching.
If you want to enumerate through all matchings you may wish to read:
Algorithms for Enumerating All Perfect, Maximum and
Maximal Matchings in Bipartite Graphs by Takeaki UNO which gives O(V) complexity per matching.
A recursive solution (not tested):
def max_sets(list_of_sets, excluded=[]):
if not list_of_sets:
return [set()]
else:
res = []
for x in list_of_sets[0]:
if x not in excluded:
for candidate in max_sets(list_of_sets[1:], exclude+[x]):
candidate.add(x)
res.append(candidate)
return res
(You could probably dispense with the set but it's not clear if it was in the question or not...)
Variations of this question have been asked before, I'm still having trouble understanding how to actually slice a python series/pandas dataframe based on conditions that I'd like to set.
In R, what I'm trying to do is:
df[which(df[,colnumber] > somenumberIchoose),]
The which() function finds indices of row entries in a column in the dataframe which are greater than somenumberIchoose, and returns this as a vector. Then, I slice the dataframe by using these row indices to indicate which rows of the dataframe I would like to look at in the new form.
Is there an equivalent way to do this in python? I've seen references to enumerate, which I don't fully understand after reading the documentation. My sample in order to get the row indices right now looks like this:
indexfuture = [ x.index(), x in enumerate(df['colname']) if x > yesterday]
However, I keep on getting an invalid syntax error. I can hack a workaround by for looping through the values, and manually doing the search myself, but that seems extremely non-pythonic and inefficient.
What exactly does enumerate() do? What is the pythonic way of finding indices of values in a vector that fulfill desired parameters?
Note: I'm using Pandas for the dataframes
I may not understand clearly the question, but it looks like the response is easier than what you think:
using pandas DataFrame:
df['colname'] > somenumberIchoose
returns a pandas series with True / False values and the original index of the DataFrame.
Then you can use that boolean series on the original DataFrame and get the subset you are looking for:
df[df['colname'] > somenumberIchoose]
should be enough.
See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
What what I know of R you might be more comfortable working with numpy -- a scientific computing package similar to MATLAB.
If you want the indices of an array who values are divisible by two then the following would work.
arr = numpy.arange(10)
truth_table = arr % 2 == 0
indices = numpy.where(truth_table)
values = arr[indices]
It's also easy to work with multi-dimensional arrays
arr2d = arr.reshape(2,5)
col_indices = numpy.where(arr2d[col_index] % 2 == 0)
col_values = arr2d[col_index, col_indices]
enumerate() returns an iterator that yields an (index, item) tuple in each iteration, so you can't (and don't need to) call .index() again.
Furthermore, your list comprehension syntax is wrong:
indexfuture = [(index, x) for (index, x) in enumerate(df['colname']) if x > yesterday]
Test case:
>>> [(index, x) for (index, x) in enumerate("abcdef") if x > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]
Of course, you don't need to unpack the tuple:
>>> [tup for tup in enumerate("abcdef") if tup[1] > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]
unless you're only interested in the indices, in which case you could do something like
>>> [index for (index, x) in enumerate("abcdef") if x > "c"]
[3, 4, 5]
And if you need an additional statement panda.Series allows you to do Operations between Series (+, -, /, , *).
Just multiplicate the indexes:
idx1 = df['lat'] == 49
idx2 = df['lng'] > 15
idx = idx1 * idx2
new_df = df[idx]
Instead of enumerate, I usually just use .iteritems. This saves a .index(). Namely,
[k for k, v in (df['c'] > t).iteritems() if v]
Otherwise, one has to do
df[df['c'] > t].index()
This duplicates the typing of the data frame name, which can be very long and painful to type.
A nice simple and neat way of doing this is the following:
SlicedData1 = df[df.colname>somenumber]]
This can easily be extended to include other criteria, such as non-numeric data:
SlicedData2 = df[(df.colname1>somenumber & df.colname2=='24/08/2018')]
And so on...
It may be the fact I haven't slept yet, but I can't find the solution to this problem, so I come to you all. I have a list, with a series of sub-lists, each containing two values, like so:
list = (
(2, 5),
(-1, 4),
( 7, -3)
)
I also have a variable, a similar list with two values, that is as such:
var = (0, 0)
I want to add all the x values in list, then all the y values, and then store the sums in var, so the desired value of var is:
var = (8, 6)
How could I do it? I apologize if the answer is something stupid simple, I just need to get this done before I can sleep.
sumvar = map(sum,zip(*my_list))
should do what you want i think
This sounds like a job for "reduce" to me:
reduce(lambda a,b: (a[0]+b[0],a[1]+b[1]), list)
(8,6)
you could also use another list comprehension method, (a bit more readable):
sum(a for a,b in tpl), sum(b for a,b in tpl)
(8,6)