Hello I have what I hope is an easy problem to solve. I am attempting to read a csv file and write a portion into a list. I need to determine the index and the value in each row and then summarize.
so the row will have 32 values...each value is a classification (class 0, class 1, etc.) with a number associated with it. I need a pythonic solution to make this work.
import os,sys,csv
csvfile=sys.argv[1]
f=open(csvfile,'rt')
reader=csv.reader(f)
classes=[]
for row in reader:
classes.append(row[60:92])
f.close()
classes = [' ', '1234', '645', '9897'], [' ', '76541', ' ', '8888']
how would i extract the index values from each list to get a sum for each?
for example: 0=(' ', ' ') 1=('1234', '76541') 2= ('645', ' ') 3= ('9897', '8888')
then find the sum of each
class 0 = 0
class 1 = 77775
class 2 = 645
class3 = 18785
Any assistance would be greatly appreciated
I find your use case a bit difficult to understand, but does this list comprehension give you some new ideas about how to solve your problem?
>>> classes = [' ', '1234', '645', '9897'], [' ', '76541', ' ', '8888']
>>> [sum(int(n) for n in x if n != ' ') for x in zip(*classes)]
[0, 77775, 645, 18785]
>>> classes = [[' ', '1234', '645', '9897'], [' ', '76541', ' ', '8888']]
>>> my_int = lambda s: int(s) if s.isdigit() else 0
>>> class_groups = dict(zip(range(32), zip(*classes)))
>>> class_groups[1]
('1234', '76541')
>>> class_sums = {}
>>> for class_ in class_groups:
... group_sum = sum(map(my_int, class_groups[class_]))
... class_sums[class_] = group_sum
...
>>> class_sums[1]
77775
>>> class_sums[3]
18785
>>>
You could sum as you go through the CSV file rows. (e.g. put the for class_ loop inside your rows loop) .. :
>>> classes
[[' ', '1234', '645', '9897'], [' ', '76541', ' ', '8888']]
>>> sums = {}
>>> for row in classes:
... for class_, num in enumerate(row):
... try:
... num = int(num)
... except ValueError:
... num = 0
... sums[class_] = sums.get(class_, 0) + num
...
>>> sums
{0: 0, 1: 77775, 2: 645, 3: 18785}
Related
I'm reading the data from an outsource. The data has "Name" and "Value with warnings" so I put those in a dictionary in a manner as
d[data[i:i+6]] = data[i+8:i+17], data[i+25:i+36]
Thus at the end I have my dict as;
{'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' '), .....
As seen above both keys and values might have unnecessary spaces.
I was able to overcome spaces in keys with;
d = {x.replace(' ',''): v
for x, v in d.items()}
but can't seem to manage similar for values. I tried using d.values() but it trims the key name and also works only for 1 of the values.
Can you help me understand how I can remove space for several values (2 values in this particular case) and end up with something like;
{'GPT-P': ('169', 'H'), 'GOT-P ': ('47', ''), .....
Thanks. Stay safe and healthy
You will need to do the space replacement in your v values also but
it seems that in your case the values in your dictionary are tuples.
I guess you will want to remove spaces in all elements of each tuple so you will need a second iteration here. You can do something like this:
d = {'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' ')}
{x.replace(' ', ''): tuple(w.replace(' ', '') for w in v) for x, v in d.items()}
Which returns:
{'GPT-P': ('169', 'H'), 'GOT-P': ('47', '')}
Notice that there is list (or tuple) comprehension tuple(w.replace(' ', '') for w in v) within the dictionary comprehension.
Given:
DoT={'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' ')}
Since you have tuples of strings as your values, you need to apply .strip() to each string in the tuple:
>>> tuple(e.strip() for e in ('47 ', ' '))
('47', '')
Apply that to each key, value in a dict comprehension and there you are:
>>> {k.strip():tuple(e.strip() for e in t) for k,t in DoT.items()}
{'GPT-P': ('169', 'H'), 'GOT-P': ('47', '')}
You use .replace(' ','') in your attempt. That will replace ALL spaces:
>>> ' 1 2 3 '.replace(' ','')
'123'
It is more typical to use one of the .strips():
>>> ' 1 2 3 '.strip()
'1 2 3'
>>> ' 1 2 3 '.lstrip()
'1 2 3 '
>>> ' 1 2 3 '.rstrip()
' 1 2 3'
You can use .replace or any of the .strips() in the comprehensions that I used above.
I have a header list as below:
h_list = ['h1','h2','h3','h4','h5']
Now I have data list (Nested):
d_list = [
[1, None, 3, ' ', 5],
[1, ' ', 2, ' ', 9]
]
Both lists are of same length every time, so I want to match in each list of nested list at same index position and if its all corresponding values are either None or ' ', then replace the item from h_list to ' ' (Empty string)
My expected output is:
h_list = ['h1',' ','h3',' ','h5']
Try a list comprehension:
h_list = ['h1','h2','h3','h4','h5']
d_list = [
[1, None, 3, ' ', 5],
[1, ' ', 2, ' ', 9]
]
empty = [' ', None]
h_list = [' ' if any(b[i] in empty for b in d_list) else v for i, v in enumerate(h_list)]
print(h_list)
Output:
['h1', ' ', 'h3', ' ', 'h5']
Breaking down this part of the code:
h_list = [' ' if any(b[i] in empty for b in d_list) else v for i, v in enumerate(h_list)]
First, lets have only
[(i, v) for i, v in enumerate(h_list)]
The above will be a list of the indices and values of each element in h_list.
Now, we use an if statement to determine when to add the ' '. First, we need to recognize the any() function:
any(b[i] in empty for b in d_list)
returns True if any of the arrays inside d_list at index i is in the empty list. We want None and ' ' to be in place for all the strings in h_list that its index returns a ' ' or None for any of the lists in d_list, so:
[' ' for i, v in enumerate(h_list) if any(b[i] in empty for b in d_list)]
Finally, we want to use the original string if not any(b[i] in empty for b in d_list). For that, we use an else statement (note that with an else, the statements get shifted to the left side of the for loop.):
h_list = [' ' if any(b[i] in empty for b in d_list) else v for i, v in enumerate(h_list)]
I believe this should work for your examples:
new_list = []
for orig_element, *values in zip(h_list, *d_list):
new_list.append(orig_element if any(not (v is None or str(v).strip() == '') for v in values) else '')
If you want to modify the list in-place simply do:
for i, (orig_element, *values) in enumerate(h_list, *d_list):
h_list[i] = orig_element if any(not (v is None or str(v).strip() == '') for v in values) else ''
You can use zip with all:
h_list = ['h1','h2','h3','h4','h5']
d_list = [[1, None, 3, ' ', 5], [1, ' ', 2, ' ', 9]]
r = [' ' if all(k in {None, ' '} for k in j) else a for a, j in zip(h_list, zip(*d_list))]
Output:
['h1', ' ', 'h3', ' ', 'h5']
Basic solution
h_list = ['h1','h2','h3','h4','h5']
d_list = [
[1, None, 3, ' ', 5],
[1, ' ', 2, ' ', 9]
]
# loop over d_list
for i in d_list:
# loop over inner list
for k in i:
# if type not int, give me space in that index
if type(k)!=int:
h_list[i.index(k)]=" "
h_list
I have a set of numbers that I want to align considering the comma:
10 3
200 4000,222 3 1,5
200,21 0,3 2
30000 4,5 1
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
What I want is to align this list considering the comma:
expected result:
mylist = [[' 10 ', ' 3 ', ' ', ' '],
[' 200 ', '4000,222', '3 ', '1,5'],
[' 200,21', ' ', '0,3', '2 '],
['30000 ', ' 4,5 ', '1 ', ' ']]
I tried to turn the list:
mynewlist = list(zip(*mylist))
and to find the longest part after the comma in every sublist:
for m in mynewlist:
max([x[::-1].find(',') for x in m]
and to use rjust and ljust but I don't know how to ljust after a comma and rjust before the comma, both in the same string.
How can I resolve this without using format()?
(I want to align with ljust and rjust)
Here's another approach that currently does the trick. Unfortunately, I can't see any simple way to make this work, maybe due to the time :-)
Either way, I'll explain it. r is the result list created before hand.
r = [[] for i in range(4)]
Then we loop through the values and also grab an index with enumerate:
for ind1, vals in enumerate(zip(*mylist)):
Inside the loop we grab the max length of the decimal digits present and the max length of the word (the word w/o the decimal digits):
l = max(len(v.partition(',')[2]) for v in vals) + 1
mw = max(len(v if ',' not in v else v.split(',')[0]) for v in vals)
Now we go through the values inside the tuple vals and build our results (yup, can't currently think of a way to avoid this nesting).
for ind2, v in enumerate(vals):
If it contains a comma, it should be formatted differently. Specifically, we rjust it based on the max length of a word mw and then add the decimal digits and any white-space needed:
if ',' in v:
n, d = v.split(',')
v = "".join((n.rjust(mw),',', d, " " * (l - 1 - len(d))))
In the opposite case, we simply .rjust and then add whitespace:
else:
v = "".join((v.rjust(mw) + " " * l))
finally, we append to r.
r[ind1].append(v)
All together:
r = [[] for i in range(4)]
for ind1, vals in enumerate(zip(*mylist)):
l = max(len(v.partition(',')[2]) for v in vals) + 1
mw = max(len(v if ',' not in v else v.split(',')[0]) for v in vals)
for ind2, v in enumerate(vals):
if ',' in v:
n, d = v.split(',')
v = "".join((n.rjust(mw),',', d, " " * (l - 1 - len(d))))
else:
v = "".join((v.rjust(mw) + " " * l))
r[ind1].append(v)
Now, we can print it out:
>>> print(*map(list,zip(*r)), sep='\n)
[' 10 ', ' 3 ', ' ', ' ']
[' 200 ', '4000,222', '3 ', '1,5']
[' 200,21', ' ', '0,3', '2 ']
['30000 ', ' 4,5 ', '1 ', ' ']
Here's a bit different solution that doesn't transpose my_list but instead iterates over it twice. On the first pass it generates a list of tuples, one for each column. Each tuple is a pair of numbers where first number is length before comma and second number is length of comma & everything following it. For example '4000,222' results to (4, 4). On the second pass it formats the data based on the formatting info generated on first pass.
from functools import reduce
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
# Return tuple (left part length, right part length) for given string
def part_lengths(s):
left, sep, right = s.partition(',')
return len(left), len(sep) + len(right)
# Return string formatted based on part lengths
def format(s, first, second):
left, sep, right = s.partition(',')
return left.rjust(first) + sep + right.ljust(second - len(sep))
# Generator yielding part lengths row by row
parts = ((part_lengths(c) for c in r) for r in mylist)
# Combine part lengths to find maximum for each column
# For example data it looks like this: [[5, 3], [4, 4], [1, 2], [1, 2]]
sizes = reduce(lambda x, y: [[max(z) for z in zip(a, b)] for a, b in zip(x, y)], parts)
# Format result based on part lengths
res = [[format(c, *p) for c, p in zip(r, sizes)] for r in mylist]
print(*res, sep='\n')
Output:
[' 10 ', ' 3 ', ' ', ' ']
[' 200 ', '4000,222', '3 ', '1,5']
[' 200,21', ' ', '0,3', '2 ']
['30000 ', ' 4,5 ', '1 ', ' ']
This works for python 2 and 3. I didn't use ljust or rjust though, i just added as many spaces before and after the number as are missing to the maximum sized number in the column:
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
transposed = list(zip(*mylist))
sizes = [[(x.index(",") if "," in x else len(x), len(x) - x.index(",") if "," in x else 0)
for x in l] for l in transposed]
maxima = [(max([x[0] for x in l]), max([x[1] for x in l])) for l in sizes]
withspaces = [
[' ' * (maxima[i][0] - sizes[i][j][0]) + number + ' ' * (maxima[i][1] - sizes[i][j][1])
for j, number in enumerate(l)] for i, l in enumerate(transposed)]
result = list(zip(*withspaces))
Printing the result in python3:
>>> print(*result, sep='\n')
(' 10 ', ' 3 ', ' ', ' ')
(' 200 ', '4000,222', '3 ', '1,5')
(' 200,21', ' ', '0,3', '2 ')
('30000 ', ' 4,5 ', '1 ', ' ')
I am stuck in the middle of my coding because of this:
I have two dictionaries as follows:
a = {0:['1'],1:['0','-3']}
b = {'box 4': ['0 and 2', '0 and -3', ' 0 and -1', ' 2 and 3'], 'box 0': [' 1 ', ' 1 and 4 ', ' 3 and 4']
I want to find if the values in the first dictionaries match the values in the second and if it does, I want to return the matched key and values in dictionary b.
For example: The result of the comparison will return box4, ['0','-3'] here as ['0','-3'] is an item in a and it has been found also in b ['0 and -3'], however if only '3' has been found I don't want it to return anything as there's no values match it. the result will also return box0, ['1'] as it is an item in a and it has been found also in b.
Any ideas ? I appreciate any helps.
You say, "the result of the comparison will return box4 here as ['0','-3'] is an item in a and it has been found also in b ['0 and -3'],". I do not see '0 and -3' in b.
Also, your question is not clear enough. Your code snippets are not complete and you have presented just once case here.
Nevertheless, I will make the mistake of assuming that you want something like this
normalized_values = set([" and ".join(tokens) for tokens in a.values()])
for k in b:
if normalized_values.intersection(set(b[k])):
print k
here you go: its simple coded,
>>> a_values = a.values()
>>> for x,y in b.items():
... for i in y:
... i = i.strip()
... if len(i)>1:
... i = i.split()[::2]
... if i in a_values:
... print x,i
... else:
... if list(i) in a_values:
... print x,list(i)
box 4 ['0', '-3']
box 0 ['1']
pythonic way:
>>> [ [x,i] for x,y in b.items() for i in y if re.findall('-?\d',i) in a_values ]
[['box 4', ' 0 and -3'], ['box 0', ' 1 ']]
After a certain calculation i am getting an output like:
(' ','donor',' ','distance')
(' ','ARG','A','43',' ','3.55')
(' ','SOD','B',93', ' ','4.775')
(' ','URX','C',33', ' ','3.55')
while i was intending to get like:
donor distance
ARG A 43 3.55
SOD B 93 4.77
URX C 33 3.55
the thing what i am getting is a tuple, but i am very confused on how to make this tuple into a well formatted look as per my desire.
Please give some idea...
thank you.
Use str.join() on each tuple:
' '.join(your_tuple)
before printing.
If your data looks like this
data = [
(' ', 'donor', ' ', 'distance'),
(' ', 'ARG', 'A', '43', ' ', '3.55'),
(' ', 'SOD', 'B', '93', ' ', '4.775'),
(' ', 'URX', 'C', '33', ' ', '3.55')
]
Then you can just
print '\n'.join(map(' '.join, data))
You can use a for-loop and str.join:
lis = [
(' ','donor',' ','distance'),
(' ','ARG','A','43',' ','3.55'),
(' ','SOD','B','93', ' ','4.775'),
(' ','URX','C','33', ' ','3.55')
]
for item in lis:
print " ".join(item)
Output:
donor distance
ARG A 43 3.55
SOD B 93 4.775
URX C 33 3.55
It sounds like you want to use format strings. For example, assuming that you are not storing padding strings in your items:
print "{0} {1} {2} {3:>10.2f}".format(*item)
You can specify the exact format (including width and alignment) of each field of the record in the format string. In this example, the fourth string is right-aligned to fit into 10 characters, with 2 digits displayed to the right of the decimal point.
Example using your data:
>>> x = ((' ','ARG','A','43',' ','3.55'),(' ','SOD','B','93', ' ','4.775'),(' ','URX','C','33', ' ','3.55'))
>>> f = "{0:3s}{1:1s}{2:2s} {3:>10.3f}"
>>> for item in x: print f.format(item[1], item[2], item[3], float(item[5]))
...
ARGA43 3.550
SODB93 4.775
URXC33 3.550