comparing values in two dictionaries in python - python

I am stuck in the middle of my coding because of this:
I have two dictionaries as follows:
a = {0:['1'],1:['0','-3']}
b = {'box 4': ['0 and 2', '0 and -3', ' 0 and -1', ' 2 and 3'], 'box 0': [' 1 ', ' 1 and 4 ', ' 3 and 4']
I want to find if the values in the first dictionaries match the values in the second and if it does, I want to return the matched key and values in dictionary b.
For example: The result of the comparison will return box4, ['0','-3'] here as ['0','-3'] is an item in a and it has been found also in b ['0 and -3'], however if only '3' has been found I don't want it to return anything as there's no values match it. the result will also return box0, ['1'] as it is an item in a and it has been found also in b.
Any ideas ? I appreciate any helps.

You say, "the result of the comparison will return box4 here as ['0','-3'] is an item in a and it has been found also in b ['0 and -3'],". I do not see '0 and -3' in b.
Also, your question is not clear enough. Your code snippets are not complete and you have presented just once case here.
Nevertheless, I will make the mistake of assuming that you want something like this
normalized_values = set([" and ".join(tokens) for tokens in a.values()])
for k in b:
if normalized_values.intersection(set(b[k])):
print k

here you go: its simple coded,
>>> a_values = a.values()
>>> for x,y in b.items():
... for i in y:
... i = i.strip()
... if len(i)>1:
... i = i.split()[::2]
... if i in a_values:
... print x,i
... else:
... if list(i) in a_values:
... print x,list(i)
box 4 ['0', '-3']
box 0 ['1']
pythonic way:
>>> [ [x,i] for x,y in b.items() for i in y if re.findall('-?\d',i) in a_values ]
[['box 4', ' 0 and -3'], ['box 0', ' 1 ']]

Related

Find "n" in a list full of strings

I'm looking to go through a list and find any element with a number.
This is what i got so far
list = ['Alvarez, S', 'Crawford, B', 'Fury, 8', 'Mayweather, F', 'Lopez, 44']
num = '8'
for s in home_pitchers:
if num in s:
print(s)
print(ex)
>>> Fury, 8
What I'm looking to do is to have num be 0 - 9. I thought about using '[^0-9]' but that didn't work.
Ultimately I'm looking to print out this
print
>>> Fury, 8
>>> Lopez, 44
Just a heads up, I'm pretty new to coding so some concept might go over my head
You can use isdigit method with any function. The isdigit method return True if the string is a digit string, False otherwise.
>>> lst = ['Alvarez, S', 'Crawford, B', 'Fury, 8', 'Mayweather, F', 'Lopez, 44']
>>>
>>> for s in lst:
... if any(char.isdigit() for char in s):
... print(s)
...
Fury, 8
Lopez, 44
Using the re library:
import re
lst = ['Alvarez, S', 'Crawford, B', 'Fury, 8', 'Mayweather, F', 'Lopez, 44']
list(filter(lambda x:re.match(".*[0-9]+$",x), lst))
OUTPUT:
['Fury, 8', 'Lopez, 44']
The pattern matches any string ending with one or more numbers.

Delete spaces in dictionary values python

I'm reading the data from an outsource. The data has "Name" and "Value with warnings" so I put those in a dictionary in a manner as
d[data[i:i+6]] = data[i+8:i+17], data[i+25:i+36]
Thus at the end I have my dict as;
{'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' '), .....
As seen above both keys and values might have unnecessary spaces.
I was able to overcome spaces in keys with;
d = {x.replace(' ',''): v
for x, v in d.items()}
but can't seem to manage similar for values. I tried using d.values() but it trims the key name and also works only for 1 of the values.
Can you help me understand how I can remove space for several values (2 values in this particular case) and end up with something like;
{'GPT-P': ('169', 'H'), 'GOT-P ': ('47', ''), .....
Thanks. Stay safe and healthy
You will need to do the space replacement in your v values also but
it seems that in your case the values in your dictionary are tuples.
I guess you will want to remove spaces in all elements of each tuple so you will need a second iteration here. You can do something like this:
d = {'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' ')}
{x.replace(' ', ''): tuple(w.replace(' ', '') for w in v) for x, v in d.items()}
Which returns:
{'GPT-P': ('169', 'H'), 'GOT-P': ('47', '')}
Notice that there is list (or tuple) comprehension tuple(w.replace(' ', '') for w in v) within the dictionary comprehension.
Given:
DoT={'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' ')}
Since you have tuples of strings as your values, you need to apply .strip() to each string in the tuple:
>>> tuple(e.strip() for e in ('47 ', ' '))
('47', '')
Apply that to each key, value in a dict comprehension and there you are:
>>> {k.strip():tuple(e.strip() for e in t) for k,t in DoT.items()}
{'GPT-P': ('169', 'H'), 'GOT-P': ('47', '')}
You use .replace(' ','') in your attempt. That will replace ALL spaces:
>>> ' 1 2 3 '.replace(' ','')
'123'
It is more typical to use one of the .strips():
>>> ' 1 2 3 '.strip()
'1 2 3'
>>> ' 1 2 3 '.lstrip()
'1 2 3 '
>>> ' 1 2 3 '.rstrip()
' 1 2 3'
You can use .replace or any of the .strips() in the comprehensions that I used above.

Align numbers in sublist

I have a set of numbers that I want to align considering the comma:
10 3
200 4000,222 3 1,5
200,21 0,3 2
30000 4,5 1
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
What I want is to align this list considering the comma:
expected result:
mylist = [[' 10 ', ' 3 ', ' ', ' '],
[' 200 ', '4000,222', '3 ', '1,5'],
[' 200,21', ' ', '0,3', '2 '],
['30000 ', ' 4,5 ', '1 ', ' ']]
I tried to turn the list:
mynewlist = list(zip(*mylist))
and to find the longest part after the comma in every sublist:
for m in mynewlist:
max([x[::-1].find(',') for x in m]
and to use rjust and ljust but I don't know how to ljust after a comma and rjust before the comma, both in the same string.
How can I resolve this without using format()?
(I want to align with ljust and rjust)
Here's another approach that currently does the trick. Unfortunately, I can't see any simple way to make this work, maybe due to the time :-)
Either way, I'll explain it. r is the result list created before hand.
r = [[] for i in range(4)]
Then we loop through the values and also grab an index with enumerate:
for ind1, vals in enumerate(zip(*mylist)):
Inside the loop we grab the max length of the decimal digits present and the max length of the word (the word w/o the decimal digits):
l = max(len(v.partition(',')[2]) for v in vals) + 1
mw = max(len(v if ',' not in v else v.split(',')[0]) for v in vals)
Now we go through the values inside the tuple vals and build our results (yup, can't currently think of a way to avoid this nesting).
for ind2, v in enumerate(vals):
If it contains a comma, it should be formatted differently. Specifically, we rjust it based on the max length of a word mw and then add the decimal digits and any white-space needed:
if ',' in v:
n, d = v.split(',')
v = "".join((n.rjust(mw),',', d, " " * (l - 1 - len(d))))
In the opposite case, we simply .rjust and then add whitespace:
else:
v = "".join((v.rjust(mw) + " " * l))
finally, we append to r.
r[ind1].append(v)
All together:
r = [[] for i in range(4)]
for ind1, vals in enumerate(zip(*mylist)):
l = max(len(v.partition(',')[2]) for v in vals) + 1
mw = max(len(v if ',' not in v else v.split(',')[0]) for v in vals)
for ind2, v in enumerate(vals):
if ',' in v:
n, d = v.split(',')
v = "".join((n.rjust(mw),',', d, " " * (l - 1 - len(d))))
else:
v = "".join((v.rjust(mw) + " " * l))
r[ind1].append(v)
Now, we can print it out:
>>> print(*map(list,zip(*r)), sep='\n)
[' 10 ', ' 3 ', ' ', ' ']
[' 200 ', '4000,222', '3 ', '1,5']
[' 200,21', ' ', '0,3', '2 ']
['30000 ', ' 4,5 ', '1 ', ' ']
Here's a bit different solution that doesn't transpose my_list but instead iterates over it twice. On the first pass it generates a list of tuples, one for each column. Each tuple is a pair of numbers where first number is length before comma and second number is length of comma & everything following it. For example '4000,222' results to (4, 4). On the second pass it formats the data based on the formatting info generated on first pass.
from functools import reduce
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
# Return tuple (left part length, right part length) for given string
def part_lengths(s):
left, sep, right = s.partition(',')
return len(left), len(sep) + len(right)
# Return string formatted based on part lengths
def format(s, first, second):
left, sep, right = s.partition(',')
return left.rjust(first) + sep + right.ljust(second - len(sep))
# Generator yielding part lengths row by row
parts = ((part_lengths(c) for c in r) for r in mylist)
# Combine part lengths to find maximum for each column
# For example data it looks like this: [[5, 3], [4, 4], [1, 2], [1, 2]]
sizes = reduce(lambda x, y: [[max(z) for z in zip(a, b)] for a, b in zip(x, y)], parts)
# Format result based on part lengths
res = [[format(c, *p) for c, p in zip(r, sizes)] for r in mylist]
print(*res, sep='\n')
Output:
[' 10 ', ' 3 ', ' ', ' ']
[' 200 ', '4000,222', '3 ', '1,5']
[' 200,21', ' ', '0,3', '2 ']
['30000 ', ' 4,5 ', '1 ', ' ']
This works for python 2 and 3. I didn't use ljust or rjust though, i just added as many spaces before and after the number as are missing to the maximum sized number in the column:
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
transposed = list(zip(*mylist))
sizes = [[(x.index(",") if "," in x else len(x), len(x) - x.index(",") if "," in x else 0)
for x in l] for l in transposed]
maxima = [(max([x[0] for x in l]), max([x[1] for x in l])) for l in sizes]
withspaces = [
[' ' * (maxima[i][0] - sizes[i][j][0]) + number + ' ' * (maxima[i][1] - sizes[i][j][1])
for j, number in enumerate(l)] for i, l in enumerate(transposed)]
result = list(zip(*withspaces))
Printing the result in python3:
>>> print(*result, sep='\n')
(' 10 ', ' 3 ', ' ', ' ')
(' 200 ', '4000,222', '3 ', '1,5')
(' 200,21', ' ', '0,3', '2 ')
('30000 ', ' 4,5 ', '1 ', ' ')

How to grep for same pattern multiple times in a string using python?

I am using python to pattern match a string multiple times in a string.
Problem:
string = 'The value = 1 The value = 2 The value = 3'
I want to grep only value but my output should be like:
['value = 1 value = 2 value = 3']
I am doing like this:
pattern = re.compile('[value = (\d+)]*')
values = pattern.search(string)
values.group(0)
Output:
''
i.e NULL (no match)
Please help me give the right regular expression to grep the required output.
>>> [' '.join(re.findall(r'value = \d+', string))]
['value = 1 value = 2 value = 3']
You are using a character class grouping. You should just use a normal grouping with (.
import re
string = 'The value = 1 The value = 2 The value = 3'
pattern = re.compile(r'(value = \d+)')
pattern.findall(string)
# OUT: ['value = 1', 'value = 2', 'value = 3']
" ".join(pattern.findall(string))
# OUT: 'value = 1 value = 2 value = 3'
Your use of square brackets ([]) in the RE source is very odd. Those form character sets.
You should use something like:
>>> pattern = re.compile(r'([^=]+)\s*=\s*(\d+)')
>>> pattern.findall(string)
[('The value ', '1'), (' The value ', '2'), (' The value ', '3')]
Note use of findall() to get all the matches, and grouping to get the value names too.

Finding the index and sum in a list of lists

Hello I have what I hope is an easy problem to solve. I am attempting to read a csv file and write a portion into a list. I need to determine the index and the value in each row and then summarize.
so the row will have 32 values...each value is a classification (class 0, class 1, etc.) with a number associated with it. I need a pythonic solution to make this work.
import os,sys,csv
csvfile=sys.argv[1]
f=open(csvfile,'rt')
reader=csv.reader(f)
classes=[]
for row in reader:
classes.append(row[60:92])
f.close()
classes = [' ', '1234', '645', '9897'], [' ', '76541', ' ', '8888']
how would i extract the index values from each list to get a sum for each?
for example: 0=(' ', ' ') 1=('1234', '76541') 2= ('645', ' ') 3= ('9897', '8888')
then find the sum of each
class 0 = 0
class 1 = 77775
class 2 = 645
class3 = 18785
Any assistance would be greatly appreciated
I find your use case a bit difficult to understand, but does this list comprehension give you some new ideas about how to solve your problem?
>>> classes = [' ', '1234', '645', '9897'], [' ', '76541', ' ', '8888']
>>> [sum(int(n) for n in x if n != ' ') for x in zip(*classes)]
[0, 77775, 645, 18785]
>>> classes = [[' ', '1234', '645', '9897'], [' ', '76541', ' ', '8888']]
>>> my_int = lambda s: int(s) if s.isdigit() else 0
>>> class_groups = dict(zip(range(32), zip(*classes)))
>>> class_groups[1]
('1234', '76541')
>>> class_sums = {}
>>> for class_ in class_groups:
... group_sum = sum(map(my_int, class_groups[class_]))
... class_sums[class_] = group_sum
...
>>> class_sums[1]
77775
>>> class_sums[3]
18785
>>>
You could sum as you go through the CSV file rows. (e.g. put the for class_ loop inside your rows loop) .. :
>>> classes
[[' ', '1234', '645', '9897'], [' ', '76541', ' ', '8888']]
>>> sums = {}
>>> for row in classes:
... for class_, num in enumerate(row):
... try:
... num = int(num)
... except ValueError:
... num = 0
... sums[class_] = sums.get(class_, 0) + num
...
>>> sums
{0: 0, 1: 77775, 2: 645, 3: 18785}

Categories