Map unique strings to integers in Python [duplicate]

Map unique strings to integers in Python [duplicate] - python

This question already has answers here:
Assign a number to each unique value in a list
(9 answers)
Closed 5 years ago.
I have a list, let say
L = ['apple','bat','apple','car','pet','bat'].
I want to convert it into
Lnew = [ 1,2,1,3,4,2].
Every unique string is associated with a number.
I have a java solution using hashmap, but I don't know how to use hashmap in python.
Please help.

Here's a quick solution:
l = ['apple','bat','apple','car','pet','bat']
Create a dict that maps all unique strings to integers:
d = dict([(y,x+1) for x,y in enumerate(sorted(set(l)))])
Map each string in the original list to its respective integer:
print [d[x] for x in l]
# [1, 2, 1, 3, 4, 2]

x = list(set(L))
dic = dict(zip(x, list(range(1,len(x)+1))))
>>> [dic[v] for v in L]
[1, 2, 1, 3, 4, 2]

You can use a map dictionary:
d = {'apple':1, 'bat':2, 'car':3, 'pet':4}
L = ['apple','bat','apple','car','pet','bat']
[d[x] for x in L] # [1, 2, 1, 3, 4, 2]
For auto creating map dictionary you can use defaultdict(int) with a counter.
from collections import defaultdict
d = defaultdict(int)
co = 1
for x in L:
if not d[x]:
d[x] = co
co+=1
d # defaultdict(<class 'int'>, {'pet': 4, 'bat': 2, 'apple': 1, 'car': 3})
Or as #Stuart mentioned you can use d = dict(zip(set(L), range(len(L)))) for creating dictionary

You'd use a hashmap in Python, too, but we call it a dict.
>>> L = ['apple','bat','apple','car','pet','bat']
>>> idx = 1
>>> seen_first = {}
>>>
>>> for word in L:
... if word not in seen_first:
... seen_first[word] = idx
... idx += 1
...
>>> [seen_first[word] for word in L]
[1, 2, 1, 3, 4, 2]

You can try:
>>> L = ['apple','bat','apple','car','pet','bat']
>>> l_dict = dict(zip(set(L), range(len(L))))
>>> print l_dict
{'pet': 0, 'car': 1, 'bat': 2, 'apple': 3}
>>> [l_dict[x] for x in L]
[3, 2, 3, 1, 0, 2]

Lnew = []
for s in L:
Lnew.append(hash(s)) # hash(x) returns a unique int based on string

Related

How to efficiently count each element in a list in Python? [duplicate]

This question already has answers here:
Using a dictionary to count the items in a list
(8 answers)
Closed 7 months ago.
Given an unordered list of values like
a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
How can I get the frequency of each value that appears in the list, like so?
# `a` has 4 instances of `1`, 4 of `2`, 2 of `3`, 1 of `4,` 2 of `5`
b = [4, 4, 2, 1, 2] # expected output

In Python 2.7 (or newer), you can use collections.Counter:
>>> import collections
>>> a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
>>> counter = collections.Counter(a)
>>> counter
Counter({1: 4, 2: 4, 5: 2, 3: 2, 4: 1})
>>> counter.values()
dict_values([2, 4, 4, 1, 2])
>>> counter.keys()
dict_keys([5, 1, 2, 4, 3])
>>> counter.most_common(3)
[(1, 4), (2, 4), (5, 2)]
>>> dict(counter)
{5: 2, 1: 4, 2: 4, 4: 1, 3: 2}
>>> # Get the counts in order matching the original specification,
>>> # by iterating over keys in sorted order
>>> [counter[x] for x in sorted(counter.keys())]
[4, 4, 2, 1, 2]
If you are using Python 2.6 or older, you can download an implementation here.

If the list is sorted, you can use groupby from the itertools standard library (if it isn't, you can just sort it first, although this takes O(n lg n) time):
from itertools import groupby
a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
[len(list(group)) for key, group in groupby(sorted(a))]
Output:
[4, 4, 2, 1, 2]

Python 2.7+ introduces Dictionary Comprehension. Building the dictionary from the list will get you the count as well as get rid of duplicates.
>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>> d = {x:a.count(x) for x in a}
>>> d
{1: 4, 2: 4, 3: 2, 4: 1, 5: 2}
>>> a, b = d.keys(), d.values()
>>> a
[1, 2, 3, 4, 5]
>>> b
[4, 4, 2, 1, 2]

Count the number of appearances manually by iterating through the list and counting them up, using a collections.defaultdict to track what has been seen so far:
from collections import defaultdict
appearances = defaultdict(int)
for curr in a:
appearances[curr] += 1

In Python 2.7+, you could use collections.Counter to count items
>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>>
>>> from collections import Counter
>>> c=Counter(a)
>>>
>>> c.values()
[4, 4, 2, 1, 2]
>>>
>>> c.keys()
[1, 2, 3, 4, 5]

Counting the frequency of elements is probably best done with a dictionary:
b = {}
for item in a:
b[item] = b.get(item, 0) + 1
To remove the duplicates, use a set:
a = list(set(a))

You can do this:
import numpy as np
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
np.unique(a, return_counts=True)
Output:
(array([1, 2, 3, 4, 5]), array([4, 4, 2, 1, 2], dtype=int64))
The first array is values, and the second array is the number of elements with these values.
So If you want to get just array with the numbers you should use this:
np.unique(a, return_counts=True)[1]

Here's another succint alternative using itertools.groupby which also works for unordered input:
from itertools import groupby
items = [5, 1, 1, 2, 2, 1, 1, 2, 2, 3, 4, 3, 5]
results = {value: len(list(freq)) for value, freq in groupby(sorted(items))}
results
format: {value: num_of_occurencies}
{1: 4, 2: 4, 3: 2, 4: 1, 5: 2}

I would simply use scipy.stats.itemfreq in the following manner:
from scipy.stats import itemfreq
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
freq = itemfreq(a)
a = freq[:,0]
b = freq[:,1]
you may check the documentation here: http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.itemfreq.html

from collections import Counter
a=["E","D","C","G","B","A","B","F","D","D","C","A","G","A","C","B","F","C","B"]
counter=Counter(a)
kk=[list(counter.keys()),list(counter.values())]
pd.DataFrame(np.array(kk).T, columns=['Letter','Count'])

seta = set(a)
b = [a.count(el) for el in seta]
a = list(seta) #Only if you really want it.

Suppose we have a list:
fruits = ['banana', 'banana', 'apple', 'banana']
We can find out how many of each fruit we have in the list like so:
import numpy as np
(unique, counts) = np.unique(fruits, return_counts=True)
{x:y for x,y in zip(unique, counts)}
Result:
{'banana': 3, 'apple': 1}

This answer is more explicit
a = [1,1,1,1,2,2,2,2,3,3,3,4,4]
d = {}
for item in a:
if item in d:
d[item] = d.get(item)+1
else:
d[item] = 1
for k,v in d.items():
print(str(k)+':'+str(v))
# output
#1:4
#2:4
#3:3
#4:2
#remove dups
d = set(a)
print(d)
#{1, 2, 3, 4}

For your first question, iterate the list and use a dictionary to keep track of an elements existsence.
For your second question, just use the set operator.

def frequencyDistribution(data):
return {i: data.count(i) for i in data}
print frequencyDistribution([1,2,3,4])
...
{1: 1, 2: 1, 3: 1, 4: 1} # originalNumber: count

I am quite late, but this will also work, and will help others:
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
freq_list = []
a_l = list(set(a))
for x in a_l:
freq_list.append(a.count(x))
print 'Freq',freq_list
print 'number',a_l
will produce this..
Freq [4, 4, 2, 1, 2]
number[1, 2, 3, 4, 5]

a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
counts = dict.fromkeys(a, 0)
for el in a: counts[el] += 1
print(counts)
# {1: 4, 2: 4, 3: 2, 4: 1, 5: 2}

a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
# 1. Get counts and store in another list
output = []
for i in set(a):
output.append(a.count(i))
print(output)
# 2. Remove duplicates using set constructor
a = list(set(a))
print(a)
Set collection does not allow duplicates, passing a list to the set() constructor will give an iterable of totally unique objects. count() function returns an integer count when an object that is in a list is passed. With that the unique objects are counted and each count value is stored by appending to an empty list output
list() constructor is used to convert the set(a) into list and referred by the same variable a
Output
D:\MLrec\venv\Scripts\python.exe D:/MLrec/listgroup.py
[4, 4, 2, 1, 2]
[1, 2, 3, 4, 5]

Simple solution using a dictionary.
def frequency(l):
d = {}
for i in l:
if i in d.keys():
d[i] += 1
else:
d[i] = 1
for k, v in d.iteritems():
if v ==max (d.values()):
return k,d.keys()
print(frequency([10,10,10,10,20,20,20,20,40,40,50,50,30]))

#!usr/bin/python
def frq(words):
freq = {}
for w in words:
if w in freq:
freq[w] = freq.get(w)+1
else:
freq[w] =1
return freq
fp = open("poem","r")
list = fp.read()
fp.close()
input = list.split()
print input
d = frq(input)
print "frequency of input\n: "
print d
fp1 = open("output.txt","w+")
for k,v in d.items():
fp1.write(str(k)+':'+str(v)+"\n")
fp1.close()

from collections import OrderedDict
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
def get_count(lists):
dictionary = OrderedDict()
for val in lists:
dictionary.setdefault(val,[]).append(1)
return [sum(val) for val in dictionary.values()]
print(get_count(a))
>>>[4, 4, 2, 1, 2]
To remove duplicates and Maintain order:
list(dict.fromkeys(get_count(a)))
>>>[4, 2, 1]

i'm using Counter to generate a freq. dict from text file words in 1 line of code
def _fileIndex(fh):
''' create a dict using Counter of a
flat list of words (re.findall(re.compile(r"[a-zA-Z]+"), lines)) in (lines in file->for lines in fh)
'''
return Counter(
[wrd.lower() for wrdList in
[words for words in
[re.findall(re.compile(r'[a-zA-Z]+'), lines) for lines in fh]]
for wrd in wrdList])

For the record, a functional answer:
>>> L = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>> import functools
>>> >>> functools.reduce(lambda acc, e: [v+(i==e) for i, v in enumerate(acc,1)] if e<=len(acc) else acc+[0 for _ in range(e-len(acc)-1)]+[1], L, [])
[4, 4, 2, 1, 2]
It's cleaner if you count zeroes too:
>>> functools.reduce(lambda acc, e: [v+(i==e) for i, v in enumerate(acc)] if e<len(acc) else acc+[0 for _ in range(e-len(acc))]+[1], L, [])
[0, 4, 4, 2, 1, 2]
An explanation:
we start with an empty acc list;
if the next element e of L is lower than the size of acc, we just update this element: v+(i==e) means v+1 if the index i of acc is the current element e, otherwise the previous value v;
if the next element e of L is greater or equals to the size of acc, we have to expand acc to host the new 1.
The elements do not have to be sorted (itertools.groupby). You'll get weird results if you have negative numbers.

Another approach of doing this, albeit by using a heavier but powerful library - NLTK.
import nltk
fdist = nltk.FreqDist(a)
fdist.values()
fdist.most_common()

Found another way of doing this, using sets.
#ar is the list of elements
#convert ar to set to get unique elements
sock_set = set(ar)
#create dictionary of frequency of socks
sock_dict = {}
for sock in sock_set:
sock_dict[sock] = ar.count(sock)

For an unordered list you should use:
[a.count(el) for el in set(a)]
The output is
[4, 4, 2, 1, 2]

Yet another solution with another algorithm without using collections:
def countFreq(A):
n=len(A)
count=[0]*n # Create a new list initialized with '0'
for i in range(n):
count[A[i]]+= 1 # increase occurrence for value A[i]
return [x for x in count if x] # return non-zero count

num=[3,2,3,5,5,3,7,6,4,6,7,2]
print ('\nelements are:\t',num)
count_dict={}
for elements in num:
count_dict[elements]=num.count(elements)
print ('\nfrequency:\t',count_dict)

You can use the in-built function provided in python
l.count(l[i])
d=[]
for i in range(len(l)):
if l[i] not in d:
d.append(l[i])
print(l.count(l[i])
The above code automatically removes duplicates in a list and also prints the frequency of each element in original list and the list without duplicates.
Two birds for one shot ! X D

This approach can be tried if you don't want to use any library and keep it simple and short!
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
marked = []
b = [(a.count(i), marked.append(i))[0] for i in a if i not in marked]
print(b)
o/p
[4, 4, 2, 1, 2]

Sum integer list when next integer is the same value

So I need to have a code that checks one integer, and checks if the integer after it is the same value. If so, it will add the value to x.
input1 = [int(i) for i in str(1234441122)]
x= 0
So my code currently gives the result [1, 2, 3, 4, 4, 4, 1, 1 ,2 ,2]. I want it to give the result of x = 0+4+4+1+2.
I do not know any way to do that.

The following will work. Zip together adjacent pairs and only take the first elements if they are the same as the second ones:
>>> lst = [1, 2, 3, 4, 4, 4, 1, 1, 2, 2]
>>> sum(x for x, y in zip(lst, lst[1:]) if x == y)
11
While this should be a little less [space-]efficent in theory (as the slice creates an extra list), it still has O(N) complexity in time and space and is well more readable than most solutions based on indexed access. A tricky way to avoid the slice while still being concise and avoiding any imports would be:
>>> sum((lst[i] == lst[i-1]) * lst[i] for i in range(1, len(lst))) # Py2: xrange
11
This makes use of the fact that lst[i]==lst[i-1] will be cast to 0 or 1 appropriately.

Another way using itertools.groupby
l = [1, 2, 3, 4, 4, 4, 1, 1 ,2 ,2]
from itertools import groupby
sum(sum(g)-k for k,g in groupby(l))
#11

You can try this:
s = str(1234441122)
new_data = [int(a) for i, a in enumerate(s) if i+1 < len(s) and a == s[i+1]]
print(new_data)
final_data = sum(new_data)
Output:
[4, 4, 1, 2]
11

No need for that list. You can remove the "non-repeated" digits from the string already:
>>> n = 1234441122
>>> import re
>>> sum(map(int, re.sub(r'(.)(?!\1)', '', str(n))))
11

You are simply iterating on string and converting character to integer. You need to iterate and compare to next character.
a = str(1234441122)
sum = 0
for i,j in enumerate(a[:-1]):
if a[i] == a[i+1]:
sum+=int(a[i])
print(sum)
Output
11

Try this one too:
input1 = [int(i) for i in str(1234441122)]
x= 0
res = [input1[i] for i in range(len(input1)-1) if input1[i+1]==input1[i]]
print(res)
print(sum(res))
Output:
[4, 4, 1, 2]
11

Here's a slightly more space efficient version of #schwobaseggl's answer.
>>> lst = [1, 2, 3, 4, 4, 4, 1, 1, 2, 2]
>>> it = iter(lst)
>>> next(it) # throw away first value
>>> sum(x for x,y in zip(lst, it) if x == y)
11
Alernatively, using an islice from the itertools module is equivalent but looks a bit nicer.
>>> from itertools import islice
>>> sum(x for x,y in zip(lst, islice(lst, 1, None, 1)) if x == y)
11

Count the same list's occur frequency in a multi-dimensional list?

I have a multi-dimensional list as like below
multilist = [[1,2],[3,4,5],[3,4,5],[5,6],[5,6],[5,6]]
How can I get below results fast:
[1,2]: count 1 times
[3,4,5]: count 2 times
[5,6]: count 3 times
and also get the unique multi-dimensional list (remove duplicates) :
multi_list = [[1,2],[3,4,5],[5,6]]
Thanks a lot.

You can use tuples which are hashable and collections.Counter:
>>> multilist = [[1,2],[3,4,5],[3,4,5],[5,6],[5,6],[5,6]]
>>> multituples = [tuple(l) for l in multilist]
>>> from collections import Counter
>>> tc = Counter(multituples)
>>> tc
Counter({(5, 6): 3, (3, 4, 5): 2, (1, 2): 1})
To get the set of elements you just need the keys:
>>> tc.keys()
dict_keys([(1, 2), (3, 4, 5), (5, 6)])

If you want to guarantee that the order of the unique items is the same as in the original list, you could do something like:
>>> class Seen(set):
... def __contains__(self, item):
... res = super(Seen, self).__contains__(item)
... self.add(item)
... return res
...
>>> seen = Seen()
>>> [item for item in multilist if tuple(item) not in seen]
[[1, 2], [3, 4, 5], [5, 6]]
>>>

As #ReutSharabani suggested, you can use tuples as dictionary keys, and then convert back to lists for display purposes. The code below doesn't reply on collections (not that there's anything wrong with that).
multilist = [[1,2],[3,4,5],[3,4,5],[5,6],[5,6],[5,6]]
histogram = {}
for x in multilist:
xt = tuple(x)
if xt not in histogram:
histogram[xt] = 1
else:
histogram[xt] += 1
for k,c in histogram.items():
print "%r: count %d times" % (list(k),c)
print [list(x) for x in histogram.keys()]

You can try like this,
>>> multilist = [[1,2],[3,4,5],[3,4,5],[5,6],[5,6],[5,6]]
>>> c = [multilist.count(l) for l in multilist]
>>> for ind, l in enumerate(multilist):
... print( "%s: count %d times" % (str(l), c[ind]))
...
[1, 2]: count 1 times
[3, 4, 5]: count 2 times
[3, 4, 5]: count 2 times
[5, 6]: count 3 times
[5, 6]: count 3 times
[5, 6]: count 3 times
>>> {str(item): multilist.count(item) for item in multilist }
{'[1, 2]': 1, '[3, 4, 5]': 2, '[5, 6]': 3}

How about using repr( alist) to convert it to its text string representation?
from collections import defaultdict
d = defaultdict(int)
for e in multilist: d[ repr(e)] += 1
for k,v in d.items(): print "{0}: count {1} times".format( k,v)

You can use a dictionary for this
count_data = {}
for my_list in multilist:
count_data.setdefault(tuple(my_list), 0)
count_data[tuple(my_list)] += 1

Update dictionary items with a for loop

I would like update a dictionary items in a for loop here is what I have:
>>> d = {}
>>> for i in range(0,5):
... d.update({"result": i})
>>> d
{'result': 4}
But I want d to have following items:
{'result': 0,'result': 1,'result': 2,'result': 3,'result': 4}

As mentioned, the whole idea of dictionaries is that they have unique keys.
What you can do is have 'result' as the key and a list as the value, then keep appending to the list.
>>> d = {}
>>> for i in range(0,5):
... d.setdefault('result', [])
... d['result'].append(i)
>>> d
{'result': [0, 1, 2, 3, 4]}

Keys have to be unique in a dictionnary, so what you are trying to achieve is not possible. When you assign another item with the same key, you simply override the previous entry, hence the result you see.
Maybe this would be useful to you?
>>> d = {}
>>> for i in range(3):
... d['result_' + str(i)] = i
>>> d
{'result_0': 0, 'result_1': 1, 'result_2': 2}
You can modify this to fit your needs.

PHA in dictionary the key cant be same :p in your example
{'result': 0,'result': 1,'result': 2,'result': 3,'result': 4}
you can use list of multiplw dict:
[{},{},{},{}]

You can't have different values for the same key in your dictionary. One option would be to number the result:
d = {}
for i in range(0,5):
result = 'result' + str(i)
d[result] = i
d
>>> {'result0': 0, 'result1': 1, 'result4': 4, 'result2': 2, 'result3': 3}

d = {"key1": [8, 22, 38], "key2": [7, 3, 12], "key3": [5, 6, 71]}
print(d)
for key, value in d.items():
value_new = [sum(value)]
d.update({key: value_new})
print(d)

>>> d = {"result": []}
>>> for i in range(0,5):
... d["result"].append(i)
...
>>> d
{'result': [0, 1, 2, 3, 4]}

Looking for more pythonic list comparison solution

Ok so I have two lists:
x = [1, 2, 3, 4]
y = [1, 1, 2, 5, 6]
I compare them in such a way so I get the following output:
x = [3, 4]
y = [1, 5, 6]
The basic is idea to go through each list and compare them. If they have an element in common remove that element. But only one of that element not all of them. If they don't have an element in common leave it. Two identical lists would become x = [], y = []
Here is my rather hacked up and pretty lame solution. I hoping other can recommend a better and / or more pythonic way of doing this. 3 loops seems excessive...
done = True
while not done:
done = False
for x in xlist:
for y in ylist:
if x == y:
xlist.remove(x)
ylist.remove(y)
done = False
print xlist, ylist
Thanks as always for taking the time to read this question. XOXO

It's possible that the data structure you are looking for is the multiset (or "bag"), and if so, a good way to implement it in Python is to use collections.Counter:
>>> from collections import Counter
>>> x = Counter([1, 2, 3, 4])
>>> y = Counter([1, 1, 2, 5, 6])
>>> x - y
Counter({3: 1, 4: 1})
>>> y - x
Counter({1: 1, 5: 1, 6: 1})
If you want to convert the Counter objects back to lists with multiplicity, you can use the elements method:
>>> list((x - y).elements())
[3, 4]
>>> list((y - x).elements())
[1, 5, 6]

If you don't care about order, use collections.Counter to do it in one line:
>>> Counter(x)-Counter(y)
Counter({3: 1, 4: 1})
>>> Counter(y)-Counter(x)
Counter({1: 1, 5: 1, 6: 1})
If you care about order, you can probably iterate through your lists grabbing elements from the above dictionaries:
def prune(seq, toPrune):
"""Prunes elements from front of seq in O(N) time"""
remainder = Counter(seq)-Counter(toPrune)
R = []
for x in reversed(seq):
if remainder.get(x):
remainder[x] -= 1
R.insert(0,x)
return R
Demo:
>>> prune(x,y)
[3, 4]
>>> prune(y,x)
[1, 1, 5, 6]

To build on Gareth's answer:
>>> a = Counter([1, 2, 3, 4])
>>> b = Counter([1, 1, 2, 5, 6])
>>> (a - b).elements()
[3, 4]
>>> (b - a).elements()
[1, 5, 6]
Benchmark code:
from collections import Counter
from collections import defaultdict
import random
# short lists
#a = [1, 2, 3, 4, 7, 8, 9]
#b = [1, 1, 2, 5, 6, 8, 8, 10]
# long lists
a = []
b = []
for i in range(0, 1000):
q = random.choice((1, 2, 3, 4))
if q == 1:
a.append(i)
elif q == 2:
b.append(i)
elif q == 3:
a.append(i)
b.append(i)
else:
a.append(i)
b.append(i)
b.append(i)
# Modifies the lists in-place! Naughty! And it doesn't actually work, to boot.
def original(xlist, ylist):
done = False
while not done:
done = True
for x in xlist:
for y in ylist:
if x == y:
xlist.remove(x)
ylist.remove(y)
done = False
return xlist, ylist # not strictly necessary, see above
def counter(xlist, ylist):
x = Counter(xlist)
y = Counter(ylist)
return ((x-y).elements(), (y-x).elements())
def nasty(xlist, ylist):
x = sum(([i]*(xlist.count(i)-ylist.count(i)) for i in set(xlist)),[])
y = sum(([i]*(ylist.count(i)-xlist.count(i)) for i in set(ylist)),[])
return x, y
def gnibbler(xlist, ylist):
d = defaultdict(int)
for i in xlist: d[i] += 1
for i in ylist: d[i] -= 1
return [k for k,v in d.items() for i in range(v)], [k for k,v in d.items() for i in range(-v)]
# substitute algorithm to test in the call
for x in range(0, 100000):
original(list(a), list(b))
Running the Insufficiently Rigorous Benchmarks[tm] (short lists are the original ones, long lists are randomly generated lists approximately 1000 entries long with a mix of matches and repeats, times given in multipliers of the Original algorithm):
100K iterations, short lists 1K iterations, long lists
Original 1.0 1.0
Counter 9.3 0.06
Nasty 2.9 1.4
Gnibbler 2.4 0.02
Note 1: The creation of the Counter object seems to overshadow the actual algorithm at small list sizes.
Note 2: Original and gnibbler are the same at list lengths of approximately 35, above which gnibbler (and Counter) are faster.

Just using collections.defaultdict so will work on Python2.5+
>>> x = [1, 2, 3, 4]
>>> y = [1, 1, 2, 5, 6]
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in x:
... d[i] += 1
...
>>> for i in y:
... d[i] -= 1
...
>>> [k for k,v in d.items() for i in range(v)]
[3, 4]
>>> [k for k,v in d.items() for i in range(-v)]
[1, 5, 6]
I find this is better than range (or xrange) if the number repetitions get large
>>> from itertools import repeat
>>> [k for k,v in d.items() for i in repeat(None, v)]

Quite nasty :P
a = sum(([i]*(x.count(i)-y.count(i)) for i in set(x)),[])
b = sum(([i]*(y.count(i)-x.count(i)) for i in set(y)),[])
x,y = a,b

This is simple if you dont care about the duplicates:
>>> x=[1,2,3,4]
>>> y=[1,1,2,5,6]
>>> list(set(x).difference(set(y)))
[3, 4]
>>> list(set(y).difference(set(x)))
[5, 6]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Map unique strings to integers in Python [duplicate] - python

Here's a quick solution: l = ['apple','bat','apple','car','pet','bat'] Create a dict that maps all unique strings to integers: d = dict([(y,x+1) for x,y in enumerate(sorted(set(l)))]) Map each string in the original list to its respective integer: print [d[x] for x in l] # [1, 2, 1, 3, 4, 2]

x = list(set(L)) dic = dict(zip(x, list(range(1,len(x)+1)))) >>> [dic[v] for v in L] [1, 2, 1, 3, 4, 2]

You'd use a hashmap in Python, too, but we call it a dict. >>> L = ['apple','bat','apple','car','pet','bat'] >>> idx = 1 >>> seen_first = {} >>> >>> for word in L: ... if word not in seen_first: ... seen_first[word] = idx ... idx += 1 ... >>> [seen_first[word] for word in L] [1, 2, 1, 3, 4, 2]

You can try: >>> L = ['apple','bat','apple','car','pet','bat'] >>> l_dict = dict(zip(set(L), range(len(L)))) >>> print l_dict {'pet': 0, 'car': 1, 'bat': 2, 'apple': 3} >>> [l_dict[x] for x in L] [3, 2, 3, 1, 0, 2]

Lnew = [] for s in L: Lnew.append(hash(s)) # hash(x) returns a unique int based on string

Related

How to efficiently count each element in a list in Python? [duplicate]

Sum integer list when next integer is the same value

Count the same list's occur frequency in a multi-dimensional list?

Update dictionary items with a for loop

Looking for more pythonic list comparison solution

Categories

Resources