counting 2D list in python - python

my 2D list just like:
log = [[time1, 'aaa', '123.123.123.123'], [time2, 'def', '123.123.123.123'], [time3, 'aaa', '123.123.123.123'], [time4, 'bbb', '123.123.123.123'], [time5, 'bbb', '123.123.123.123']]
what I want is, the output below by using for loop:
aaa: 2
def: 1
bbb: 2
how can I count the specific col in a 2D list by loop?

This here should give you the solution
from collections import Counter
for k, v in Counter([a[1] for a in log]).items():
print(f"{k}: {v}")
Output:
aaa: 2
def: 1
bbb: 2

If you want to try with the regular dict:
log = [[time1, 'aaa', '123.123.123.123'], [time2, 'def', '123.123.123.123'], [time3, 'aaa', '123.123.123.123'], [time4, 'bbb', '123.123.123.123'], [time5, 'bbb', '123.123.123.123']]
#Keep track of the counts in the dictionary
counter = dict()
for item in log:
key = item[1]
counter[key] = counter.get(key, 0) + 1 #If the key doesn't exist, initialize its count to 0
print(counter)
This would give you the expected output as:

This code should meet your requirements :
import numpy as np
from collections import Counter
Counter(np.array(log)[:,1])

from collections import Counter
ele = [r[1] for r in log]
ele_counts = Counter(ele)
print(dict(ele_counts))
OUTPUT
{'aaa': 2, 'def': 1, 'bbb': 2}

check this code
log = [[time1, 'aaa', '123.123.123.123'], [time2, 'def', '123.123.123.123'], [time3, 'aaa', '123.123.123.123'], [time4, 'bbb', '123.123.123.123'], [time5, 'bbb', '123.123.123.123']]
ans = [0,0,0]
for x in log:
if x[1] == 'aaa':
ans[0] += 1
elif x[1] == 'def':
ans[1] += 1
else:
ans[2] += 1
print(f'aaa: {ans[0]}\ndef: {ans[1]}\nbbb: {ans[2]}')
you must define time1 ~ time5 before check the code

Related

Splitting consecutive similar characters of a specific length in an array of strings

I have an array
["ejjjjmmtthh", "zxxuueeg", "aanlljrrrxx", "dqqqaaabbb", "oocccffuucccjjjkkkjyyyeehh"]
and need to extract consecutive characters in each string element of length k (in this case 3) without using regex or groupby.
This is what I have so far:
s = ["ejjjjmmtthh", "zxxuueeg", "aanlljrrrxx", "dqqqaaabbb", "oocccffuucccjjjkkkjyyyeehhh"]
k = 3
output = []
for i in s:
result = ""
for j in range(1,len(i)-1):
if i[j]==i[j-1] or i[j]==i[j+1]:
result+=i[j]
if i[-1] == result[-1]:
result+=i[-1]
if i[0]==result[0]:
result=i[0]+result
output.append(result)
print(output)
#current output = ['jjjjmmtthh', 'xxuuee', 'aallrrrxx', 'qqqaaabbb', 'oocccffuucccjjjkkkyyyeehhh']
#expected outcome(for k =3) = ['rrr','qqq','aaa','bbb','ccc','ccc','jjj','kkk','yyy','hhh']
My questions:
How can I accommodate the k condition?
Is there a more optimal way to do this?
This solution is more readable and not too long. It works for k > 0.
s = ["ejjjjmmtthh", "zxxuueeg", "aanlljrrrxx", "dqqqaaabbb", "oocccffuucccjjjkkkjyyyeehhh"]
k = 3
output = []
for element in s:
state = "" #State variable (reset on every list item)
for char in element: #For each character
if state != "" and char == state[-1]: # Check if the last character is the same (only if state isn't empty)
state += char #Add it to the state
else:
if len(state) == k: #Otherwise, check if we have k characters
output.append(state) #Append te result if we do
state = char #Reset the state
#If there are no more characters (end of element), check too
if len(state) == k:
output.append(state)
print(output)
Output for k = 3
['rrr', 'qqq', 'aaa', 'bbb', 'ccc', 'ccc', 'jjj', 'kkk', 'yyy', 'hhh']
Output for k = 1
['e', 'z', 'g', 'n', 'j', 'd', 'j']
Here I group the letters manually by consecutive same letters. Then I only count them to the result in case they have the same length as k. This works but I am sure there is a more optimal way:
s = ["ejjjjmmtthh", "zxxuueeg", "aanlljrrrxx", "dqqqaaabbb", "oocccffuucccjjjkkkjyyyeehhh"]
k = 3
def _next_group(st):
if not st:
return None
first = st[0]
res = [first]
for s in st[1:]:
if s == first:
res.append(s)
else:
break
return res
result = []
for st in s:
while True:
group = _next_group(st)
if not group:
break
if len(group) == k:
result.append("".join(group))
if len(group) == len(st):
break
st = st[len(group):]
print(result)
Output:
['rrr', 'qqq', 'aaa', 'bbb', 'ccc', 'ccc', 'jjj', 'kkk', 'yyy', 'hhh']
A for-loop approach.
Remark: I suggest a divide-and-conquer solution: focus on a single string (and not on a list of strings) and make a function that works and then generalize it with loops/comprehension...
def repeated_chars(string, k=3):
out = []
c, tmp = 0, '' # counter, tmp char
for char in s:
if tmp == '':
tmp = char
c += 1
continue
if tmp == char:
c += 1
else:
if c == k:
out.append((tmp, c))
tmp = char
c = 1
# last term
if c == k:
out.append((tmp, c))
return [char * i for char, i in out]
data = ['jjjjmmtthh', 'xxuuee', 'aallrrrxx', 'qqqaaabbb', 'oocccffuucccjjjkkkyyyeehhh']
# apply the function to all strings
out = []
for s in data:
out.extend(repeated_chars(s, k=3))
print(out)
#['rrr', 'qqq', 'aaa', 'bbb', 'ccc', 'ccc', 'jjj', 'kkk', 'yyy', 'hhh']
Edit: Yes, groubpy shouldn't be used as per requirement, but doing the job requires to groupby in some way (see accepted answer for example), so it seems a good idea to split the responsibilities into multiple functions, as is good practice, by reimplementing a groupby.
groupby seems obvious in that case, if you can't use the one from itertools, just write one.
Also the core function should work on a string, not a list of strings -- just for loop in case you have a list of strings.
Once you have your groubpy it is straightforward:
def extract_groups(s: str, k: int):
return [group for group in groupby(s) if len(group) == k]
Let's try it out:
input_strings = [
"ejjjjmmtthh",
"zxxuueeg",
"aanlljrrrxx",
"dqqqaaabbb",
"oocccffuucccjjjkkkjyyyeehhh",
]
expected_outputs = [
[],
[],
["rrr"],
["qqq", "aaa", "bbb"],
["ccc", "ccc", "jjj", "kkk", "yyy", "hhh"],
]
outputs = [extract_groups(s, k=3) for s in input_strings]
print(outputs == expected_outputs) # True
As it is outputs is a list of groups:
In [ ]: outputs
Out[ ]: [[], [], ['rrr'], ['qqq', 'aaa', 'bbb'], ['ccc', 'ccc', 'jjj', 'kkk', 'yyy', 'hhh']]
If you really want it flat, flatten it:
In [ ]: from itertools import chain
... : list(chain.from_iterable(outputs))
Out[ ]: ['rrr', 'qqq', 'aaa', 'bbb', 'ccc', 'ccc', 'jjj', 'kkk', 'yyy', 'hhh']
In [ ]: [group for s in input_strings for group in extract_groups(s, k)]
Out[ ]: ['rrr', 'qqq', 'aaa', 'bbb', 'ccc', 'ccc', 'jjj', 'kkk', 'yyy', 'hhh']
The groupby function for reference:
def groupby(s: str):
if not s:
return []
result = []
tgt = s[0]
counter = 1
for c in s[1:]:
if c == tgt:
counter += 1
else:
result.append(tgt * counter)
tgt = c
counter = 1
result.append(tgt * counter)
return result

Minimum number of items to cover all cases

I am looking for a way to find the minimum number of items needed to cover all the cases in a key-value pair setting.
pd.DataFrame({'key': ['AAA', 'BBB', 'BBB','BBB', 'CCC', 'CCC'],
'value': ['1', '1', '2','4', '1','3']})
I have 4 values (1,2,3,4) and in order to cover them all I need at least the following keys
BBB is the only one to give me 2 and 4
CCC is the only one to give me 3
and both BBB and CCC give me 1
So in that case the minimum number of keys to include all the values is 2 (BBB and CCC)
Is there a model/library to help with this type of calculation?
The problem you are describing is closely related to the set cover problem. Finding a hitting set is NP-hard.
I have implemented your solution as follows:
keys = pd.unique(df['key'])
values = pd.unique(df['value'])
x = len(keys)
count = x
result = keys
for i in range(1 << x):
subset_keys = [keys[j] for j in range(x) if (i & (1 << j))]
subset_values = []
for key in subset_keys:
subset_values += list(df.query("key=='"+key+"'")['value'])
if len(set(subset_values))==len(list(values)) and len(subset_keys)<count:
result = subset_keys
print(result)
Complexity is O(2^n) where n is the number of unique keys.
You could approach the problem with .mode() like this:
import pandas as pd
df = pd.DataFrame({'key': ['AAA', 'BBB', 'BBB','BBB', 'CCC', 'CCC'],
'value': ['1', '1', '2','4', '1','3']})
lst = list()
while not df.empty:
x = df['key'].mode().iloc[0]
df = df[~df['value'].isin(df.loc[df['key'].eq(x), 'value'])]
lst.append(x)
print(lst)
# ['BBB', 'CCC']

Replace duplicates in a list column

I got a list, in one (the last) column is a string of comma separated items:
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
Now I want to remove the duplicates in that column.
I tried to make a list out of every column:
e = [s.split(',') for s in temp]
print e
Which gave me:
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF', 'FFF', 'EE']]
Now I tried to remove the duplicates with:
y = list(set(e))
print y
What ended up in an error
TypeError: unhashable type: 'list'
I'd appreciate any help.
Edit:
I didn't exactly said what the end result should be. The list should look like that
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']
Just the duplicates should get removed in the last column.
Apply set on the elements of the list not on the list of lists. You want your set to contain the strings of each list, not the lists.
e = [list(set(x)) for x in e]
You can do it directly as well:
e = [list(set(s.split(','))) for s in temp]
>>> e
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF']]
you may want sorted(set(s.split(','))) instead to ensure lexicographic order (sets aren't ordered, even in python 3.7)
for a flat, ordered list, create a flat set comprehension and sort it:
e = sorted({x for s in temp for x in s.split(',')})
result:
['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']
Here is solution, that uses itertools.chain method
import itertools
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
y = list(set(itertools.chain(*[s.split(',') for s in temp])))
# ['EE', 'FFF', 'AAA', 'BBB', 'CCC-DDD']
a = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
b = [s.split(',') for s in a]
c = []
for i in b:
c = c + i
c = list(set(c))
['EE', 'FFF', 'AAA', 'BBB', 'CCC-DDD']
Here is a pure functional way to do it in Python:
from functools import partial
split = partial(str.split, sep=',')
list(map(list, map(set, (map(split, temp)))))
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF']]
Or as I see the answer doesn't need lists inside of a list:
from itertools import chain
list(chain(*map(set, (map(split, temp)))))
['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']

Join items in python list separated by delimiter [duplicate]

This question already has answers here:
Combine elements of lists if some condition
(3 answers)
Closed 8 years ago.
I have a list like the following
list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
I was trying to join the items between the items with the '>" sign. So what I want is:
list_1 = ['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
How can I do that in python?
Use a generator function; that lets you control when items are 'done' to yield:
def join_unescaped(it):
tojoin = []
for element in it:
if element.startswith('>'):
if tojoin:
yield ''.join(tojoin)
tojoin = []
yield element
else:
tojoin.append(element)
if tojoin:
yield ''.join(tojoin)
To produce a new list then from your input, pass the generator object produced to the list() function:
result = list(join_unescaped(list_1))
Demo:
>>> list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> def join_unescaped(it):
... tojoin = []
... for element in it:
... if element.startswith('>'):
... if tojoin:
... yield ''.join(tojoin)
... tojoin = []
... yield element
... else:
... tojoin.append(element)
... if tojoin:
... yield ''.join(tojoin)
...
>>> list(join_unescaped(list_1))
['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
>>> from itertools import groupby
>>> list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> [''.join(v) for k, v in groupby(list_1, key=lambda s: s.startswith('>'))]
['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
The only case to watch for here is if you have no items between > signs, which requires a simple fix.
>>> list_1 = ['>name', '>name0', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> [''.join(v) for k,v in groupby(list_1,key=lambda s:s.startswith('>')and s)]
['>name', '>name0', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
Sub note: just in the extremely unlikely case that you can have duplicate >names like ['>name', '>name', 'aaa'....] just change and s to and object()(which is unique) and that handles every possible case

Split a list in sublists according to charcter length

I have a list of strings and I like to split that list in different "sublists" based on the character length of the words in th list e.g:
List = [a, bb, aa, ccc, dddd]
Sublist1 = [a]
Sublist2= [bb, aa]
Sublist3= [ccc]
Sublist2= [dddd]
How can i achieve this in python ?
Thank you
by using itertools.groupby:
values = ['a', 'bb', 'aa', 'ccc', 'dddd', 'eee']
from itertools import groupby
output = [list(group) for key,group in groupby(sorted(values, key=len), key=len)]
The result is:
[['a'], ['bb', 'aa'], ['ccc', 'eee'], ['dddd']]
If your list is already sorted by string length and you just need to do grouping, then you can simplify the code to:
output = [list(group) for key,group in groupby(values, key=len)]
I think you should use dictionaries
>>> dict_sublist = {}
>>> for el in List:
... dict_sublist.setdefault(len(el), []).append(el)
...
>>> dict_sublist
{1: ['a'], 2: ['bb', 'aa'], 3: ['ccc'], 4: ['dddd']}
>>> from collections import defaultdict
>>> l = ["a", "bb", "aa", "ccc", "dddd"]
>>> d = defaultdict(list)
>>> for elem in l:
... d[len(elem)].append(elem)
...
>>> sublists = list(d.values())
>>> print(sublists)
[['a'], ['bb', 'aa'], ['ccc'], ['dddd']]
Assuming you're happy with a list of lists, indexed by length, how about something like
by_length = []
for word in List:
wl = len(word)
while len(by_length) < wl:
by_length.append([])
by_length[wl].append(word)
print "The words of length 3 are %s" % by_length[3]

Categories