python .count for multidimensional arrays (list of lists) - python

How would I count the number of occurrences of some value in a multidimensional array made with nested lists? as in, when looking for 'foobar' in the following list:
list = [['foobar', 'a', 'b'], ['x', 'c'], ['y', 'd', 'e', 'foobar'], ['z', 'f']]
it should return 2.
(yes I am aware that I could write a loop that just searches through all of it, but I dislike that solution as it is rather time-consuming, (to write and during runtime))
.count maybe?

>>> list = [['foobar', 'a', 'b'], ['x', 'c'], ['y', 'd', 'e', 'foobar'], ['z', 'f']]
>>> sum(x.count('foobar') for x in list)
2

First join the lists together using itertools, then just count each occurrence using the Collections module:
import itertools
from collections import Counter
some_list = [['foobar', 'a', 'b'], ['x', 'c'], ['y', 'd', 'e', 'foobar'], ['z', 'f']]
totals = Counter(i for i in list(itertools.chain.from_iterable(some_list)))
print(totals["foobar"])

>> from collections import Counter
>> counted = Counter([item for sublist in my_list for item in sublist])
>> counted.get('foobar', 'not found!')
>> 2
#or if not found in your counter
>> 'not found!'
This uses flattening of sublists and then using the collections module and Counter
to produce the counts of words.

Related

replace duplicate values in a list with 'x'?

I am trying to understand the process of creating a function that can replace duplicate strings in a list of strings. for example, I want to convert this list
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
to this
mylist = ['a', 'b', 'x', 'x', 'c', 'x']
initially, I know I need create my function and iterate through the list
def replace(foo):
newlist= []
for i in foo:
if foo[i] == foo[i+1]:
foo[i].replace('x')
return foo
However, I know there are two problems with this. the first is that I get an error stating
list indices must be integers or slices, not str
so I believe I should instead be operating on the range of this list, but I'm not sure how to implement it. The other being that this would only help me if the duplicate letter comes directly after my iteration (i).
Unfortunately, that's as far as my understanding of the problem reaches. If anyone can provide some clarification on this procedure for me, I would be very grateful.
Go through the list, and keep track of what you've seen in a set. Replace things you've seen before in the list with 'x':
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
seen = set()
for i, e in enumerate(mylist):
if e in seen:
mylist[i] = 'x'
else:
seen.add(e)
print(mylist)
# ['a', 'b', 'x', 'x', 'c', 'x']
Simple Solution.
my_list = ['a', 'b', 'b', 'a', 'c', 'a']
new_list = []
for i in range(len(my_list)):
if my_list[i] in new_list:
new_list.append('x')
else:
new_list.append(my_list[i])
print(my_list)
print(new_list)
# output
#['a', 'b', 'b', 'a', 'c', 'a']
#['a', 'b', 'x', 'x', 'c', 'x']
The other solutions use indexing, which isn't necessarily required.
Really simply, you could check if the value is in the new list, else you can append x. If you wanted to use a function:
old = ['a', 'b', 'b', 'a', 'c']
def replace_dupes_with_x(l):
tmp = list()
for char in l:
if char in tmp:
tmp.append('x')
else:
tmp.append(char)
return tmp
new = replace_dupes_with_x(old)
You can use the following solution:
from collections import defaultdict
mylist = ['a', 'b', 'b', 'a', 'c', 'a']
ret, appear = [], defaultdict(int)
for c in mylist:
appear[c] += 1
ret.append(c if appear[c] == 1 else 'x')
Which will give you:
['a', 'b', 'x', 'x', 'c', 'x']

Getting specific indexed distinct values in nested lists

I have a nested list of around 1 million records like:
l = [['a', 'b', 'c', ...], ['d', 'b', 'e', ...], ['f', 'z', 'g', ...],...]
I want to get the distinct values of inner lists on second index, so that my resultant list be like:
resultant = ['b', 'z', ...]
I have tried nested loops but its not fast, any help will be appreciated!
Since you want the unique items you can use collections.OrderedDict.fromkeys() in order to keep the order and unique items (because of using hashtable fro keys) and use zip() to get the second items.
from collections import OrderedDict
list(OrderedDict.fromkeys(zip(my_lists)[2]))
In python 3.x since zip() returns an iterator you can do this:
colls = zip(my_lists)
next(colls)
list(OrderedDict.fromkeys(next(colls)))
Or use a generator expression within dict.formkeys():
list(OrderedDict.fromkeys(i[1] for i in my_lists))
Demo:
>>> lst = [['a', 'b', 'c'], ['d', 'b', 'e'], ['f', 'z', 'g']]
>>>
>>> list(OrderedDict().fromkeys(sub[1] for sub in lst))
['b', 'z']
You can unzip the list of lists then choice the second tuple with set like below :
This code take 4.05311584473e-06 millseconds, in my laptop
list(set(zip(*lst)[1]))
Input :
lst = [['a', 'b', 'c'], ['d', 'b', 'e'], ['f', 'z', 'g']]
Out put :
['b', 'z']
Would that work for you?
result = set([inner_list[1] for inner_list in l])
I can think of two options.
Set comprehension:
res = {x[1] for x in l}
I think numpy arrays work faster than list/set comprehensions, so converting this list to an array and then using array functions can be faster. Here:
import numpy as np
res = np.unique(np.array(l)[:, 1])
Let me explain: np.array(l) converts the list to a 2d array, then [:, 1] take the second column (starting to count from 0) which consists of the second item of each sublist in the original l, and finally taking only unique values using np.unique.

Removing duplicates (not by using set)

My data look like this:
let = ['a', 'b', 'a', 'c', 'a']
How do I remove the duplicates? I want my output to be something like this:
['b', 'c']
When I use the set function, I get:
set(['a', 'c', 'b'])
This is not what I want.
One option would be (as derived from Ritesh Kumar's answer here)
let = ['a', 'b', 'a', 'c', 'a']
onlySingles = [x for x in let if let.count(x) < 2]
which gives
>>> onlySingles
['b', 'c']
Try this,
>>> let
['a', 'b', 'a', 'c', 'a']
>>> dict.fromkeys(let).keys()
['a', 'c', 'b']
>>>
Sort the input, then removing duplicates becomes trivial:
data = ['a', 'b', 'a', 'c', 'a']
def uniq(data):
last = None
result = []
for item in data:
if item != last:
result.append(item)
last = item
return result
print uniq(sorted(data))
# prints ['a', 'b', 'c']
This is basically the shell's cat data | sort | uniq idiom.
The cost is O(N * log N), same as with a tree-based set.
Instead of sorting, or linearly scanning and re-counting the main list for its occurrences each time.
Count the number of occurrences and then filter on items that appear once...
>>> from collections import Counter
>>> let = ['a', 'b', 'a', 'c', 'a']
>>> [k for k, v in Counter(let).items() if v == 1]
['c', 'b']
You have to look at the sequence at least once regardless - although it makes sense to limit the amount of times you do so.
If you really want to avoid any type or set or otherwise hashed container (because you perhaps can't use them?), then yes, you can sort it, then use:
>>> from itertools import groupby, islice
>>> [k for k,v in groupby(sorted(let)) if len(list(islice(v, 2))) == 1]
['b', 'c']

Python: how to seperate a list to several list based on empty string?

I'm working a on a list like this, a = ['a','b','','','c','d'], the real list is including thousands of data entries. Is there a fancy way to make the list a as [['a','b'],['c','d]] because the data is really huge?
You can use itertools.groupby for this. You basically group by consecutive empty strings, or consecutive non-empty strings. Then keep all groups that were grouped by True from the lambda in a list comprehension.
>>> from itertools import groupby
>>> [list(i[1]) for i in groupby(a, lambda i: i != '') if i[0]]
[['a', 'b'], ['c', 'd']]
For another example
>>> b = ['a','b','','','c','d', '', 'e', 'f', 'g', '', '', 'h']
>>> [list(i[1]) for i in groupby(b, lambda i: i != '') if i[0]]
[['a', 'b'], ['c', 'd'], ['e', 'f', 'g'], ['h']]

Naming lists dynamically in Python

As the title says I'm trying to name lists dynamically in Python. The purpose of the code is to create lists around consecutive letters. Here is my code:
consecutive_duplicates=["a","a","a","a","b","c","c","a","a","d","e","e","e","e","X"]
count=0
name=0
for i in consecutive_duplicates:
if consecutive_duplicates[count]==consecutive_duplicates[count+1] or
consecutive_duplicates[count]==consecutive_duplicates[count-1]:
consecutive_duplicates[name].append(i)
count=count+1
else:
consecutive_duplicates[name+1].append(i)
name=name+1
I'm at a loss of how to name the lists. Obviously this doesn't work as it is. What would make it work?
I'm also having trouble defining the dynamic lists. How should I do that?
You can use itertools.groupby:
>>> from itertools import groupby
>>> lis = ["a","a","a","a","b","c","c","a","a","d","e","e","e","e","X"]
>>> [list(g) for k,g in groupby(lis)]
[['a', 'a', 'a', 'a'], ['b'], ['c', 'c'], ['a', 'a'], ['d'], ['e', 'e', 'e', 'e'], ['X']]
And instead of creating dynamic variables it's better to use a dict:
>>> dic = { 'lis'+str(i): list(g) for i,(k,g) in enumerate(groupby(lis), 1)}
>>> dic['lis1']
['a', 'a', 'a', 'a']
>>> dic['lis2']
['b']

Categories