Get the highest number of a list using regex - python

I have a dict like this :
my_dict = {
"['000A']":
['1653418_a0001b001.jpg',
'2132018_a0002b002.jpg',
'4789562_a0001b003.jpg',
'8469844_a0009b004.jpg',
'4815099_a0004b000.jpg',
'9085654_a0001b001.jpg',
'9742212_a0007b002.jpg',
'1325874_a0002b009.jpg',
'1474856_a0090f014.jpg']
,
"['000B']":
['1653418_a0001b001.jpg',
'2132018_a0002b002.jpg',
'4789562_a0001b003.jpg',
'8469844_a0009b004.jpg',
'4815099_a0004b000.jpg',
'9085654_a0001b001.jpg',
'9742212_a0007b002.jpg',
'1325874_a0002b009.jpg',
'123456_a0090f020.jpg']
}
And I want to find the highest number following the "b" for each keys of the dict
for key, value in my_dict
number = re.findall('\d+', string)
#convert it into integer
number = map(int, number)
print("Max_value:",max(number))
It doesn't work because then it would find the value at the begining of the string. I was thinking then to use a .endswith(("....."))
But still, I don't know how to formulate it to match my need, which would be a pattern that after 'b' matches 4 numbers or 4 number followed by '.jpg' or even endwith 'b' + 4 numbers and '.jpg'
but also I would like the code to find what number is the highest bXXXX and then return :
{"['000A']": '1474856_a0090f014.jpg', "['000B']": '123456_a0090f020.jpg'}

I suppose it can appear both "b" or "f".
output_dict = {}
for key, value in my_dict.items():
output_dict[key] = sorted(value, key=lambda x: int(re.match(".*[bf](\d+)\.jpg", x).groups()[0]))

There are 3 numbers after the b or f. If there can be more variations of lowercase chars, you can match a single lowercase char with [a-z] If there can be a variation of digits, you can match 1 or more using \d+ or match 3 or more using \d{3,}
Then you could match.jpg at the end of the string.
If there is a match, get the capture group 1 value and convert it to an int and use that to sort on.
After the sorting, get the first item from the list (assuming there are no empty lists)
import re
my_dict = {
"['000A']":
['1653418_a0001b001.jpg',
'4789562_a0001b003.jpg',
'8469844_a0009b004.jpg',
'4815099_a0004b000.jpg',
'1474856_a0090f014.jpg',
'9085654_a0001b001.jpg',
'9742212_a0007b002.jpg',
'1325874_a0002b009.jpg']
,
"['000B']":
['1653418_a0001b001.jpg',
'2132018_a0002b002.jpg',
'4789562_a0001b003.jpg',
'8469844_a0009b004.jpg',
'4815099_a0004b000.jpg',
'9085654_a0001b001.jpg',
'9742212_a0007b002.jpg',
'123456_a0090f020.jpg',
'1325874_a0002b009.jpg']
}
dct_highest_number = {}
for key, value in my_dict.items():
dct_highest_number[key] = sorted(
value,
key=lambda x: [int(m.group(1)) for m in [re.search(r"[a-z](\d+)\.jpg$", x)] if m],
reverse=True
)[0]
print(dct_highest_number)
Output
{"['000A']": '1474856_a0090f014.jpg', "['000B']": '123456_a0090f020.jpg'}

Related

Smallest alphabet with maximum occurences in python string

There is a python string for example "programming"
I have to find the smallest letter with highest occurrences.
That is for given input string = "programming"
the output should be g 2
I tried this , but not able to achieve the solution
str1 = 'programming'
max_freq = {}
for i in str1:
if i in max_freq:
max_freq[i] += 1
else:
max_freq[i] = 1
res = max(max_freq, key = max_freq.get)
Any help would be really appreciated
You can use Counter and achieve this.
Count the frequency of each letter using Counter. This will give you a dict
Sort the dict first by the values in descending order and then by the keys value in ascending order
The First item is your answer
from collections import Counter
str1 = 'programming'
d = Counter(str1)
d = sorted(d.items(), key= lambda x: (-x[1], x[0]))
print(d[0])
('g', 2)
For your code to work, replace the last line with this
res = sorted(max_freq.items(), key=lambda x: (-x[1], x[0]))[0]
res will have the smallest letter with maximum occurrences. i.e, ('g', 2)
You are close, you just aren't getting the max correctly. If all you care about is the number, then you could modify your example slightly:
str1 = 'programming'
max_freq = {}
for i in str1:
if i in max_freq:
max_freq[i] += 1
else:
max_freq[i] = 1
res = max(max_freq.values())

How to match repeating patterns (that have random text between them) using regex?

Given the string below:
"This is my value 3435 string with number ID123. Also I have the value 1234 with number ID999. Also a random number 23 preceding the real value 3434 and the number ID34"
I would like to get the values and the IDs that close to each other:
3435, ID123
1234, ID99
3434, ID34
How could I do it? I'm using python 3.7
Assuming value is always numeric and space separated and ID is always formatted as "ID" followed by numeric digits:
>>> import re
>>> s = "This is my value 3435 string with number ID123. Also I have the value 1234 with number ID999"
>>> re.findall(r'\s?(\d+)\s.*?(ID\d+)', s)
[('3435', 'ID123'), ('1234', 'ID999')]
If you can guarantee there are no numeric digits between value and ID you can use this:
>>> import re
>>> s = "This is my value 3435 string with number ID123. Also I have the value 1234 with number ID999. Also a random number 23 preceding the real value 3434 and the number ID34"
>>> re.findall(r'\s?(\d+)\s[^\d]*?(ID\d+)', s)
[('3435', 'ID123'), ('1234', 'ID999'), ('3434', 'ID34')]
You could try this.
x = "This is my value 3435 string with number ID123. Also I have the value 1234 with number ID999"
import re
id = re.findall(r'\bID\w+', x)
x = x.split(' ')
numbers = [char for char in x if char.isnumeric()]
res = {numbers[i]: id[i] for i in range(len(numbers))}
print(res)
Returns a dictionary
You could do this:
import re
text = "This is my value 3435 string with number ID123. Also I have the value 1234 with number ID999. Also a random number 23 preceding the real value 3434 and the number ID34"
nums = re.findall('\d{4}', text)
ids = re.findall('ID\d+', text)
result = [(num, id) for num, id in zip(nums, ids)]
print(result) # [('3435', 'ID123'), ('1234', 'ID999'), ('3434', 'ID34')]
Or simply:
result = re.findall('\d{4}|ID\d+', text)
print(result) # ['3435', 'ID123', '1234', 'ID999', '3434', 'ID34']

Common words frequency and their frequency sum?

I have two dictionary. Each of dictionary include words. some words are common some are not. I want to show to output common word frequency1 frequency2 and frequency sum. How can I do that ? and I have to find the top 20.
For example my output must be like:
Common WORD frequ1. freq2 freqsum
1 print 10. 5. 15
2 number. 2. 1. 3.
3 program 19. 20. 39
Here is my code:
commonwordsbook1andbook2 = []
for element in finallist1:
if element in finallist2:
commonwordsbook1andbook2.append(element)
common1 = {}
for word in commonwordsbook1andbook2:
if word not in common1:
common1[word] = 1
else:
common1[word] += 1
common1 = sorted(common1.items(), key=lambda x: x[1], reverse=True) #distinct2
for k, v in wordcount2[:a]:
print(k, v)
Assuming that the dictionaries have individual frequencies of each word, we can do something simpler. Like...
print("Common Word | Freq-1 | Freq-2 | Freq-Sum")
for i in freq1:
if i in freq2:
print(i,freq1[i],freq2[i],freq1[i]+freq2[i])
Since you aren't allowed to use Counter, you can implement the same functionality using dictionaries. Let's define a function to return a dictionary that contains the counts of all words in the given list. Dictionaries have a get() function that gets the value of the given key, while also allowing you to specify a default if the key is not found.
def countwords(lst):
dct = {}
for word in lst:
dct[word] = dct.get(word, 0) + 1
return dct
count1 = countwords(finallist1)
count2 = countwords(finallist2)
words1 = set(count1.keys())
words2 = set(count2.keys())
count1.keys() will give us all the unique words in finallist1.
Then we convert both of these to sets and then find their intersection to get the common words.
common_words = words1.intersection(words2)
Now that you know the common words, printing them and their counts should be trivial:
for w in common_words:
print(f"{w}\t{count1[w]}\t{count2[w]}\t{count1[w] + count2[w]}")

How can I shift patterns in a string one place ahead (removing the first, replacing the last)?

I have a string in Python, and I would like to shift a pattern 1 place earlier.
This is my string:
my_string = [AudioLengthInSecs: 37.4]hello[seconds_silence:
0.65]one[seconds_silence: 0.54]two[seconds_silence: 0.59]three[seconds_silence:
0.48]hello[seconds_silence: 2.32]
I would like to shift the numbers, after [seconds_silence: XXXX] one place earlier (and removing the first one, and the last one (since that one is shifted)). The result should be like this:
my_desired_string = [AudioLengthInSecs: 37.4]hello[seconds_silence: 0.54]one[seconds_silence: 0.59]two[seconds_silence:
0.48]three[seconds_silence: 2.32]hello
Here is my code:
import re
my_string = "[AudioLengthInSecs: 37.4]hello[seconds_silence:0.65]one[seconds_silence: 0.54]two[seconds_silence: 0.59]three[seconds_silence: 0.48]hello[seconds_silence: 2.32]"
# First, find all the numbers in the string
all_numbers = (re.findall('\d+', my_string ))
# Secondly, remove the first 4 numbers ()
all_numbers = all_numbers[4:]
# combine the numbers into one string
all_numbers
combined_numbers = [i+j for i,j in zip(all_numbers[::2], all_numbers[1::2])]
# Than loop over the string and instert
for word in my_string.split():
print(word)
if word == "[seconds_silence":
print(word)
# here i wanted to check if [soconds_silence was recognized
# and replace with value from combined_numbers
# however, this is failing obviously
The idea is to find all pairs:
the string preceding [seconds_silence: ...] fragment (capturing group No 1),
and the above fragment itself (capturing group No 2).
Then:
drop the first [seconds_silence: ...] fragment,
and join both lists,
but as they now have different length, itertools.zip_longest is needed.
So the whole code to do your task is:
import itertools
import re
my_string = '[AudioLengthInSecs: 37.4]hello[seconds_silence:0.65]'\
'one[seconds_silence: 0.54]two[seconds_silence: 0.59]'\
'three[seconds_silence: 0.48]hello[seconds_silence: 2.32]'
gr1 = []
gr2 = []
for mtch in re.findall(r'(.+?)(\[seconds_silence: ?[\d.]+\])', my_string):
g1, g2 = mtch
gr1.append(g1)
gr2.append(g2)
gr2.pop(0)
my_desired_string = ''
for g1, g2 in itertools.zip_longest(gr1, gr2, fillvalue=''):
my_desired_string += g1 + g2
print(my_desired_string)

Loop through json array in string python

I have a string and I created a JSON array which contains strings and values:
amount = 0
a = "asaf,almog,arnon,elbar"
values_li={'asaf':'1','almog':'6','elbar':'2'}
How can I create a loop that will search all items on values_li in a and for each item it will find it will do
amount = amount + value(the value that found from value_li in a)
I tried to do this but it doesn't work:
for k,v in values_li.items():
if k in a:
amount = amount + v
It's working.
I figure out my problem.
v is a string and I tried to do math with a string so I had to convert v to an int
amount = amount + int(v)
Now It's working :)
You should be careful using:
if k in a:
a is the string: "asaf,almog,arnon,elbar" not a list. This means that:
"bar" in a # == True
"as" in a # == True
..etc Which is probably not what you want.
You should consider splitting it into an array, then you'll only get complete matches. With that you can simply use:
a = "asaf,almog,arnon,elbar".split(',')
values_li={'asaf':'1','almog':'6','elbar':'2'}
amount = sum([int(values_li[k]) for k in a if k in values_li])
# 9
collections.Counter() is your friend:
from collections import Counter
a = "asaf,almog,arnon,elbar"
values_li = Counter({'asaf':1,'almog':6,'elbar':2})
values_li.update(a.split(','))
values_li
That will result in:
Counter({'almog': 7, 'elbar': 3, 'asaf': 2, 'arnon': 1})
And if you want the sum of all values in values_li, you can simply do:
sum(values_li.values())
Which will result in 13, for the key/value pairs in your example.

Categories