Python: Combining numbers in a list - python

I have a list, say
x = [0,1,2,3,"a","b","cd"]
I want to keep the smallest number and all of the letters, hence in the example it would become
x = [0,"a","b","cd"]
How can I do this? Ideally the code would be very efficient, as I'm doing this for millions of lists.
Attempts: I've tried finding min(x), however it results in an error as there are strings in the list

I think the below code is most efficient. It does not take extra memory and complexity is O(N)
import sys
x = [0, 1, 2, 3, "a", "b", "cd"]
minimum = sys.maxsize # for python 3.x
# minimum = sys.maxint #for python 2
j = 0
for i in range(len(x)):
if isinstance(x[i], str):
x[j] = x[i]
j+=1
else:
minimum = min(minimum, x[i])
print([minimum]+x[:j])
Output
[0, 'a', 'b', 'cd']

try this:
Python 2.7
output = [s for s in x if isinstance(s, str)]
output.append(min(x))
#>>> output
#['a', 'b', 'cd', 0]
Python3:
output = [s for s in x if isinstance(s, str)]
output.append(min([i for i in x if isinstance(i, int)]))

You can use itertools.groupby
>>> x = [0,1,2,3,"a","b","cd"]
>>> [min(n, *g) if t == int else n for t, g in groupby(x, type) for n in g]
[0, 'a', 'b', 'cd']
More efficient would be to just min the integers and unpack the strings.
>>> x = [0,1,2,3,"a","b","cd"]
>>> grouped = [list(g) for t, g in groupby(x, type)]
>>> [min(grouped[0]), *grouped[1]]
[0, 'a', 'b', 'cd']

One option would be to use the .isnumeric() method to find the minimum number while building a new list for the strings. This should be O(n). Not super fast, but not slow either.
You could say something like:
min_number = None
string_list = []
for i in x:
if i.isnumeric():
if min_number is None or i < min_number:
min_number = i
elif isinstance(i, str):
string_list.append(i)
if min_number is not None:
x = string_list.insert(0, min_number)

I don't think this would be the most efficient but you could separate the list into two lists - one with ints and one with strings - and then find the min then rejoin them. This would look something like:
x = [0, 1, 2, 'a', 'b', 'c']
nums = []
strings = []
for item in x:
if isinstance(item, int):
nums.append(item)
else:
strings.append(item)
Now, after you run this, you can get the min and then rejoin the lists
result = [min(nums)] + chars
This will give [0, 'a', 'b', 'c']

Something like this should work
def minNum(array):
min = None
numPos = []
for i in array:
if type(i) == int or type(i) == float:
if min is None or i < min:
min = i
numPos.append(array.index(i))
else:
numPos.append(array.index(i))
else:
pass
numPos.reverse()
for j in numPos:
if array[j] != min:
del array[j]
return array
Definitely not the only solution but its fairly compact and works well for all the test cases I gave

Related

How to work with items from a list that is parallel to another list?

I tried the following code in order to get the list k to have ['c','b'] - (i.e. no 'x') but I get a syntax error on the penultimate line. Could someone please clarify where I have gone wrong? j1 was introduced when j became a list type instead of remaining an integer. Tuple M would be columns of matching strings, to be imported from XL later while list m is user input.
j = 29
k = []
M = (['A', 'B', 'C'], ['a','b', 'c'])
m = ['X','C','B']
for i in (range(len(m))):
j = [M[0].index(m[i]) if m[i] in M[0] else 30,]
j1 = j[0]
k[i]= [M[1][j1] if j1 < 30 else k[j1]='']
print (k)
Your Problem
= is a statement. You cannot use it inside a list comprehension, which is an expression. There this line does not work:
k[i]= [M[1][j1] if j1 < 30 else k[j1]='']
Better Solution
Use a dictionary and a list comprehension:
>>> M = (['A', 'B', 'C'], ['a','b', 'c'])
... m = ['X','C','B']
... mapping = dict(zip(*M))
... k = [mapping[x] for x in m if x in mapping]
... k
...
['c', 'b']

I want to write a function that takes a list and returns a count of total number of duplicate elements in the list

I have tried this, for some unknown reason when it prints h, it prints None, so i thought if it counts the number of None printed then divided by 2 it will give the number of duplicates, but i cant use function count here
a= [1,4,"hii",2,4,"hello","hii"]
def duplicate(L):
li=[]
lii=[]
h=""
for i in L:
y= L.count(i)
if y>1:
h=y
print h
print h.count(None)
duplicate(a)
Use the Counter container:
from collections import Counter
c = Counter(['a', 'b', 'a'])
c is now a dictionary with the data: Counter({'a': 2, 'b': 1})
If you want to get a list with all duplicated elements (with no repetition), you can do as follows:
duplicates = filter(lambda k: c[k] > 1, c.iterkeys())
If you want to only count the duplicates, you can then just set
duplicates_len = len(duplicates)
You can use a set to get the count of unique elements, and then compare the sizes - something like that:
def duplicates(l):
uniques = set(l)
return len(l) - len(uniques)
i found an answer which is
a= [1,4,"hii",2,4,"hello",7,"hii"]
def duplicate(L):
li=[]
for i in L:
y= L.count(i)
if y>1:
li.append(i)
print len(li)/2
duplicate(a)
the answer by egualo is much better, but here is another way using a dictionary.
def find_duplicates(arr):
duplicates = {}
duplicate_elements = []
for element in arr:
if element not in duplicates:
duplicates[element] = False
else:
if duplicates[element] == False:
duplicate_elements.append(element)
duplicates[element] = True
return duplicate_elements
It's pretty simple and doesn't go through the lists twice which is kind of nice.
>> test = [1,2,3,1,1,2,2,4]
>> find_duplicates(test)
[1, 2]

How to convert dictionary of indices to list of keys?

Say you have a dictionary listing the indices where each unique value appear. For example say you alphabet is just a and b then this dictionary will look something like: d = {'a': [1, 2, 6], 'b': [3, 7]}. I would like to convert it to the raw list which shows at the right index the right value, such that in the last example, l = ['a','a','b',None,None,'a',b']. I prefer an easy small solution rather than one which has tedious for loops. Thank!
Obviously doing this without for loops is a terrible idea, because the easiest way is (it's not perfect, but it does the job):
r = {}
for key, value in d.items():
for element in value:
r[element] = key
l = [r.get(i) for i in xrange(1, max(r) + 1)]
But if you REALLY want to know how to do this without any for then have a look:
m = {}
i = 0
d_keys = d.keys()
max_value = 0
while i < len(d):
d_i = d[d_keys[i]]
j = 0
while j < len(d_i):
d_i_j = d_i[j]
if max_value < d_i_j:
max_value = d_i_j
m[d_i_j] = d_keys[i]
j += 1
i += 1
l = []
i = 1
while i <= max_value:
l.append(m.get(i))
i += 1
It's quite easy, isn't it?
I don't know why you need that, but here is a dirty answer, without loops.
d = {'a': [1, 2, 6], 'b': [3, 7]}
map(lambda x: x[0] if x else None, map(lambda x: filter(lambda l: x in d[l], d), range(1, max(reduce(lambda x, y: x+y, map(lambda x:d[x], d)))+1)))
d.keys()
keys()
Return a copy of the dictionary’s list of keys. See the note for dict.items()
from Python Docs

Get a unique list of items that occur more than once in a list

I have a list of items:
mylist = ['A','A','B','C','D','E','D']
I want to return a unique list of items that appear more than once in mylist, so that my desired output would be:
[A,D]
Not sure how to even being this, but my though process is to first append a count of each item, then remove anything equal to 1. Then dedupe, but this seems like a really roundabout, inefficient way to do it, so I am looking for advice.
You can use collections.Counter to do what you have described easily:
from collections import Counter
mylist = ['A','A','B','C','D','E','D']
cnt = Counter(mylist)
print [k for k, v in cnt.iteritems() if v > 1]
# ['A', 'D']
>>> mylist = ['A','A','B','C','D','E','D']
>>> set([i for i in mylist if mylist.count(i)>1])
set(['A', 'D'])
import collections
cc = collections.Counter(mylist) # Counter({'A': 2, 'D': 2, 'C': 1, 'B': 1, 'E': 1})
cc.subtract(cc.keys()) # Counter({'A': 1, 'D': 1, 'C': 0, 'B': 0, 'E': 0})
cc += collections.Counter() # remove zeros (trick from the docs)
print cc.keys() # ['A', 'D']
Try some thing like this:
a = ['A','A','B','C','D','E','D']
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
['A', 'D']
Reference: How to find duplicate elements in array using for loop in Python?
OR
def list_has_duplicate_items( mylist ):
return len(mylist) > len(set(mylist))
def get_duplicate_items( mylist ):
return [item for item in set(mylist) if mylist.count(item) > 1]
mylist = [ 'oranges' , 'apples' , 'oranges' , 'grapes' ]
print 'List: ' , mylist
print 'Does list have duplicate item(s)? ' , list_has_duplicate_items( mylist )
print 'Redundant item(s) in list: ' , get_duplicate_items( mylist )
Reference https://www.daniweb.com/software-development/python/threads/286996/get-redundant-items-in-list
Using a similar approach to others here, heres my attempt:
from collections import Counter
def return_more_then_one(myList):
counts = Counter(my_list)
out_list = [i for i in counts if counts[i]>1]
return out_list
It can be as simple as ...
print(list(set([i for i in mylist if mylist.count(i) > 1])))
Use set to help you do that, like this maybe :
X = ['A','A','B','C','D','E','D']
Y = set(X)
Z = []
for val in Y :
occurrences = X.count(val)
if(occurrences > 1) :
#print(val,'occurs',occurrences,'times')
Z.append(val)
print(Z)
The list Z will save the list item which occur more than once. And the part I gave comment (#), that will show the number of occurrences of each list item which occur more than once
Might not be as fast as internal implementations, but takes (almost) linear time (since set lookup is logarithmic)
mylist = ['A','A','B','C','D','E','D']
myset = set()
dups = set()
for x in mylist:
if x in myset:
dups.add(x)
else:
myset.add(x)
dups = list(dups)
print dups
another solution what's written:
def delete_rep(list_):
new_list = []
for i in list_:
if i not in list_[i:]:
new_list.append(i)
return new_list
This is my approach without using packages
result = []
for e in listy:
if listy.count(e) > 1:
result.append(e)
else:
pass
print(list(set(result)))

How To Get All The Contiguous Substrings Of A String In Python?

Here is my code, but I want a better solution, how do you think about the problem?
def get_all_substrings(string):
length = len(string)
alist = []
for i in xrange(length):
for j in xrange(i,length):
alist.append(string[i:j + 1])
return alist
print get_all_substring('abcde')
The only improvement I could think of is, to use list comprehension like this
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j+1] for i in xrange(length) for j in xrange(i,length)]
print get_all_substrings('abcde')
The timing comparison between, yours and mine
def get_all_substrings(string):
length = len(string)
alist = []
for i in xrange(length):
for j in xrange(i,length):
alist.append(string[i:j + 1])
return alist
def get_all_substrings_1(input_string):
length = len(input_string)
return [input_string[i:j + 1] for i in xrange(length) for j in xrange(i,length)]
from timeit import timeit
print timeit("get_all_substrings('abcde')", "from __main__ import get_all_substrings")
# 3.33308315277
print timeit("get_all_substrings_1('abcde')", "from __main__ import get_all_substrings_1")
# 2.67816185951
can be done concisely with itertools.combinations
from itertools import combinations
def get_all_substrings_2(string):
length = len(string) + 1
return [string[x:y] for x, y in combinations(range(length), r=2)]
You could write it as a generator to save storing all the strings in memory at once if you don't need to
def get_all_substrings(string):
length = len(string)
for i in xrange(length):
for j in xrange(i + 1, length + 1):
yield(string[i:j])
for i in get_all_substrings("abcde"):
print i
you can still make a list if you really need one
alist = list(get_all_substrings("abcde"))
The function can be reduced to return a generator expression
def get_all_substrings(s):
length = len(s)
return (s[i: j] for i in xrange(length) for j in xrange(i + 1, length + 1))
Or of course you can change two characters to return a list if you don't care about memory
def get_all_substrings(s):
length = len(s)
return [s[i: j] for i in xrange(length) for j in xrange(i + 1, length + 1)]
I've never been fond of range(len(seq)), how about using enumerate and just using the index value:
def indexes(seq, start=0):
return (i for i,_ in enumerate(seq, start=start))
def gen_all_substrings(s):
return (s[i:j] for i in indexes(s) for j in indexes(s[i:], i+1))
def get_all_substrings(string):
return list(gen_all_substrings(string))
print(get_all_substrings('abcde'))
Python 3
s='abc'
list(s[i:j+1] for i in range (len(s)) for j in range(i,len(s)))
['a', 'ab', 'abc', 'b', 'bc', 'c']
Use itertools.permutations to generate all pairs of possible start and end indexes,
and filter out only those where the start index is less than then end index. Then
use these pairs to return slices of the original string.
from itertools import permutations
def gen_all_substrings(s):
lt = lambda pair: pair[0] < pair[1]
index_pairs = filter(lt, permutations(range(len(s)+1), 2))
return (s[i:j] for i,j in index_pairs)
def get_all_substrings(s):
return list(gen_all_substrings(s))
print(get_all_substrings('abcde'))
Another solution:
def get_all_substrings(string):
length = len(string)+1
return [string[x:y] for x in range(length) for y in range(length) if string[x:y]]
print get_all_substring('abcde')
Another solution using 2-D matrix approach
p = "abc"
a = list(p)
b = list(p)
c = list(p)
count = 0
for i in range(0,len(a)):
dump = a[i]
for j in range(0, len(b)):
if i < j:
c.append(dump+b[j])
dump = dump + b[j]
If you want to get the substrings sorted by the length:
s = 'abcde'
def allSubstrings(s: str) -> List[str]:
length = len(s)
mylist = []
for i in range(1, length+1):
for j in range(length-i+1):
mylist.append(s[j:j+i])
return mylist
print(allSubstrings(s))
['a', 'b', 'c', 'd', 'e', 'ab', 'bc', 'cd', 'de', 'abc', 'bcd', 'cde', 'abcd', 'bcde', 'abcde']

Categories