Find all index values for string t in string s? - python

Input:
s = "ADOBECODEBANC"
t = "ABC"
Output:
{A:[0,10] , B:[3.9], C:[5,12]}
Do we have any in built function ?

There's no builtin function, but you can use enumerate() for the task:
s = "ADOBECODEBANC"
t = "ABC"
out = {}
for i, ch in enumerate(s):
if ch in t:
out.setdefault(ch, []).append(i)
print(out)
Prints:
{'A': [0, 10], 'B': [3, 9], 'C': [5, 12]}

Related

how to get index of a giving string in liste python?

my list is like this, in example the string is 'a' and 'b' ;
i want to return the index of string 'a' and for 'b' then i want to calculate how many time is 'a' repeated in the list1 :
list1=['a','a','b','a','a','b','a','a','b','a','b','a','a']
i want to return the order of evry 'a' in list1
the result should be like this :
a_position=[1,2,4,5,7,8,10,12,13]
and i want to calculate how many time 'a' is repeated in list1:
a_rep=9
You could do below:
a_positions = [idx + 1 for idx, el in enumerate(list1) if el == 'a']
a_repitition = len(a_positions)
print(a_positions):
[1, 2, 4, 5, 7, 8, 10, 12, 13]
print(a_repitition):
9
If you need repititions of each element you can also use collections.Counter
from collections import Counter
counter = Counter(list1)
print(counter['a']):
9
If you want to get the indices and counts of all letters:
list1=['a','a','b','a','a','b','a','a','b','a','b','a','a']
pos = {}
for i,c in enumerate(list1, start=1): # 1-based indexing
pos.setdefault(c, []).append(i)
pos
# {'a': [1, 2, 4, 5, 7, 8, 10, 12, 13],
# 'b': [3, 6, 9, 11]}
counts = {k: len(v) for k,v in pos.items()}
# {'a': 9, 'b': 4}

How can I find groups of consecutive items in a python list?

I have a sorted list like this [1,2,3,4,6,7,8,9,10,12,14]
I have looked up different similar solutions but they dont provide help in my case
I want this list to output like this
[ [1,4], [6,10], [12], [14] ]
so basically a list of lists with a start and an end of a sequence.
Honestly looks very easy but i am kind of stuck on it right now. Any help would be greatly appreciated !
You can use more_itertools.consecutive_groups for this available at https://pypi.org/project/more-itertools/
from more_itertools import consecutive_groups
#Get the groupings of consecutive items
li = [list(item) for item in consecutive_groups([1,2,3,4,6,7,8,9,10,12,14])]
#[[1, 2, 3, 4], [6, 7, 8, 9, 10], [12], [14]]
#Use the result to get range groupings
result = [ [item[0],item[-1]] if len(item) > 1 else [item[0]] for item in li]
print(result)
#[[1, 4], [6, 10], [12], [14]]
Use groupby from the standard itertools:
from itertools import groupby
lst = [1,2,3,4,6,7,8,9,10,12,14]
result = []
for k, g in groupby(enumerate(lst), lambda x: x[0] - x[1]):
g = list(map(lambda x: x[1], g))
if len(g) > 1:
result.append([g[0], g[-1]])
else:
result.append([g[0]])
print(result)
# [[1, 4], [6, 10], [12], [14]]
Using pandas
import pandas as pd
s = pd.Series([1,2,3,4,6,7,8,9,10,12,14])
s.groupby(s.diff().ne(1).cumsum()).apply(lambda x: [x.iloc[0], x.iloc[-1]] if len(x) >= 2 else [x.iloc[0]]).tolist()
Outputs
[[1, 4], [6, 10], [12], [14]]
Solution can look like this
def make_ranges(l: list):
prev = l[0]
start = l[1]
res = []
for v in l[1:]:
if v - 1 != prev:
if start == prev:
res.append([start])
else:
res.append([start, prev])
start = v
prev = v
if l[-1] - 1 == l[-2]:
res.append([start, l[-1])
return res
For example:
print(make_ranges(list(range(10)) + list(range(13, 20)) + [22]))
This code will print [[0, 9], [13, 19], [22]]
Using numpy
import numpy as np
myarray = [1,2,3,4,6,7,8,9,10,12,14]
sequences = np.split(myarray, np.array(np.where(np.diff(myarray) > 1)[0]) + 1)
l = []
for s in sequences:
if len(s) > 1:
l.append((np.min(s), np.max(s)))
else:
l.append(s[0])
print(l)
Output:
[(1, 4), (6, 10), 12, 14]

merge two dictionaries with same key values

I have two dictionaries which consist same keys
a = {'a':[3,2,5],
'b':[9,8],
'c':[1,6]}
b = {'b':[7,4],
'c':[10,11]}
When i merge them the keys of dictionary b replaces the keys of a because of the same name. Here's the merge code I am using
z = dict(list(a.items()) + list(b.items()))
Is there somehow I can keep all the keys, I know dictionaries can't have same key name but I can work with something like this:
a = {'a':[3,2,5],
'b':[9,8],
'c':[1,6],
'b_1':[7,4],
'c_1':[10,11]}
You can use a generator expression inside the method update():
a.update((k + '_1' if k in a else k, v) for k, v in b.items())
# {'a': [3, 2, 5], 'b': [9, 8], 'c': [1, 6], 'b_1': [7, 4], 'c_1': [10, 11]}
Do something like this perhaps:
a = {'a':[3,2,5],
'b':[9,8],
'c':[1,6]}
b = {'b':[7,4],
'c':[10,11]}
z = {}
for key in a:
if key in b:
z[key + "_1"] = b[key]
z[key] = a[key]
else:
z[key] = a[key]
print(z)
Output:
{'a': [3, 2, 5], 'b_1': [7, 4], 'b': [9, 8], 'c_1': [10, 11], 'c': [1, 6]}
While I think Usman's answer is probably the "right" solution, technically you asked for this:
for key, value in b.items():
if key in a:
a[key + "_1"] = value
else:
a[key] = value
Check if key of b present in a then add in a with key_1 value of b for key other wise add in key in a the value of b for key.
a = {'a':[3,2,5],
'b':[9,8],
'c':[1,6]}
b = {'b':[7,4],
'c':[10,11]}
for k in b:
if k in a:
a[k+'_1']=b[k]
else:
a[k]=b[k]
print(a)

How to assign certain scores from a list to values in multiple lists and get the sum for each value in python?

Could you explain how to assign certain scores from a list to values in multiple lists and get the total score for each value?
score = [1,2,3,4,5] assigne a score based on the position in the list
l_1 = [a,b,c,d,e]
assign a=1, b=2, c=3, d=4, e=5
l_2 = [c,a,d,e,b]
assign c=1, a=2, d=3, e=4, b=5
I am trying to get the result like
{'e':9, 'b': 7, 'd':7, 'c': 4, 'a': 3}
Thank you!
You can zip the values of score to each list, which gives you a tuple of (key, value) for each letter-score combination. Make each zipped object a dict. Then use a dict comprehension to add the values for each key together.
d_1 = dict(zip(l_1, score))
d_2 = dict(zip(l_2, score))
{k: v + d_2[k] for k, v in d_1.items()}
# {'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}
You better use zip function:
dic = {'a':0, 'b': 0, 'c':0, 'd': 0, 'e': 0}
def score(dic, *args):
for lst in args:
for k, v in zip(lst, range(len(lst))):
dic[k] += v+1
return dic
l_1 = ['a','b','c','d','e']
l_2 = ['c','a','d','e','b']
score(dic, l_1, l_2)
Instead of storing your lists in separate variables, you should put them in a list of lists so that you can iterate through it and calculate the sums of the scores according to each key's indices in the sub-lists:
score = [1, 2, 3, 4, 5]
lists = [
['a','b','c','d','e'],
['c','a','d','e','b']
]
d = {}
for l in lists:
for i, k in enumerate(l):
d[k] = d.get(k, 0) + score[i]
d would become:
{'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}
from collections import defaultdict
score = [1,2,3,4,5] # note: 0 no need to use this list if there is no scenario like [5,6,9,10,4]
l_1 = ['a','b','c','d','e']
l_2 = ['c','a','d','e','b']
score_dict = defaultdict(int)
'''
for note: 0
if your score is always consecutive
like score = [2,3,4,5,6] or [5,6,7,8,9]...
you don't need to have seperate list of score you can set
start = score_of_char_at_first_position_ie_at_zero-th_index
like start = 2, or start = 5
else use this function
def add2ScoreDict( lst):
for pos_score, char in zip(score,lst):
score_dict[char] += pos_score
'''
def add2ScoreDict( lst):
for pos, char in enumerate( lst,start =1):
score_dict[char] += pos
# note: 1
add2ScoreDict( l_1)
add2ScoreDict( l_2)
#print(score_dict) # defaultdict(<class 'int'>, {'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9})
score_dict = dict(sorted(score_dict.items(), reverse = True, key=lambda x: x[1]))
print(score_dict) # {'e': 9, 'b': 7, 'd': 7, 'c': 4, 'a': 3}
edit 1:
if you have multiple lists put them in list_of_list = [l_1, l_2] so that you don't have to call func add2ScoreDict yourself again and again.
# for note: 1
for lst in list_of_list:
add2ScoreDict( lst)
You could zip both lists with score as one list l3 then you could use dictionary comprehension with filterto construct your dicitonary. The key being index 1 of the the newly formed tuples in l3, and the value being the sum of all index 0's in l3 after creating a sublist that is filtered for only matching index 0's
score = [1,2,3,4,5]
l_1 = ['a', 'b', 'c', 'd', 'e']
l_2 = ['c', 'a', 'd', 'e', 'b']
l3 = [*zip(score, l_1), *zip(score,l_2)]
d = {i[1]: sum([j[0] for j in list(filter(lambda x: x[1] ==i[1], l3))]) for i in l3}
{'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}
Expanded Explanation:
d = {}
for i in l3:
f = list(filter(lambda x: x[1] == i[1], l3))
vals = []
for j in f:
vals.append(j[0])
total_vals = sum(vals)
d[i[1]] = total_vals
The simplest way is probably to use a Counter from the Python standard library.
from collections import Counter
tally = Counter()
scores = [1, 2, 3, 4, 5]
def add_scores(letters):
for letter, score in zip(letters, scores):
tally[letter] += score
L1 = ['a', 'b', 'c', 'd', 'e']
add_scores(L1)
L2 = ['c', 'a', 'd', 'e', 'b']
add_scores(L2)
print(tally)
>>> python tally.py
Counter({'e': 9, 'b': 7, 'd': 7, 'c': 4, 'a': 3})
zip is used to pair letters and scores, a for loop to iterate over them and a Counter to collect the results. A Counter is actually a dictionary, so you can write things like
tally['a']
to get the score for letter a or
for letter, score in tally.items():
print('Letter %s scored %s' % (letter, score))
to print the results, just as you would with a normal dictionary.
Finally, small ells and letter O's can be troublesome as variable names because they are hard to distinguish from ones and zeros. The Python style guide (often referred to as PEP8) recommends avoiding them.

Pandas find duplicate concatenated values across selected columns

I want to find duplicates in a selection of columns of a df,
# converts the sub df into matrix
mat = df[['idx', 'a', 'b']].values
str_dict = defaultdict(set)
for x in np.ndindex(mat.shape[0]):
concat = ''.join(str(x) for x in mat[x][1:])
# take idx as values of each key a + b
str_dict[concat].update([mat[x][0]])
dups = {}
for key in str_dict.keys():
dup = str_dict[key]
if len(dup) < 2:
continue
dups[key] = dup
The code finds duplicates of the concatenation of a and b. Uses the concatenation as key for a set defaultdict (str_dict), updates the key with idx values; finally uses a dict (dups) to store any concatenation if the length of its value (set) is >= 2.
I am wondering if there is a better way to do that in terms of efficiency.
You can just concatenate and convert to set:
res = set(df['a'].astype(str) + df['b'].astype(str))
Example:
df = pd.DataFrame({'idx': [1, 2, 3],
'a': [4, 4, 5],
'b': [5, 5,6]})
res = set(df['a'].astype(str) + df['b'].astype(str))
print(res)
# {'56', '45'}
If you need to map indices too:
df = pd.DataFrame({'idx': [1, 2, 3],
'a': [41, 4, 5],
'b': [3, 13, 6]})
df['conc'] = (df['a'].astype(str) + df['b'].astype(str))
df = df.reset_index()
res = df.groupby('conc')['index'].apply(set).to_dict()
print(res)
# {'413': {0, 1}, '56': {2}}
You can filter the column you need before drop_duplicate
df[['a','b']].drop_duplicates().astype(str).apply(np.sum,1).tolist()
Out[1027]: ['45', '56']

Categories