I have a set of numbers that I want to align considering the comma:
10 3
200 4000,222 3 1,5
200,21 0,3 2
30000 4,5 1
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
What I want is to align this list considering the comma:
expected result:
mylist = [[' 10 ', ' 3 ', ' ', ' '],
[' 200 ', '4000,222', '3 ', '1,5'],
[' 200,21', ' ', '0,3', '2 '],
['30000 ', ' 4,5 ', '1 ', ' ']]
I tried to turn the list:
mynewlist = list(zip(*mylist))
and to find the longest part after the comma in every sublist:
for m in mynewlist:
max([x[::-1].find(',') for x in m]
and to use rjust and ljust but I don't know how to ljust after a comma and rjust before the comma, both in the same string.
How can I resolve this without using format()?
(I want to align with ljust and rjust)
Here's another approach that currently does the trick. Unfortunately, I can't see any simple way to make this work, maybe due to the time :-)
Either way, I'll explain it. r is the result list created before hand.
r = [[] for i in range(4)]
Then we loop through the values and also grab an index with enumerate:
for ind1, vals in enumerate(zip(*mylist)):
Inside the loop we grab the max length of the decimal digits present and the max length of the word (the word w/o the decimal digits):
l = max(len(v.partition(',')[2]) for v in vals) + 1
mw = max(len(v if ',' not in v else v.split(',')[0]) for v in vals)
Now we go through the values inside the tuple vals and build our results (yup, can't currently think of a way to avoid this nesting).
for ind2, v in enumerate(vals):
If it contains a comma, it should be formatted differently. Specifically, we rjust it based on the max length of a word mw and then add the decimal digits and any white-space needed:
if ',' in v:
n, d = v.split(',')
v = "".join((n.rjust(mw),',', d, " " * (l - 1 - len(d))))
In the opposite case, we simply .rjust and then add whitespace:
else:
v = "".join((v.rjust(mw) + " " * l))
finally, we append to r.
r[ind1].append(v)
All together:
r = [[] for i in range(4)]
for ind1, vals in enumerate(zip(*mylist)):
l = max(len(v.partition(',')[2]) for v in vals) + 1
mw = max(len(v if ',' not in v else v.split(',')[0]) for v in vals)
for ind2, v in enumerate(vals):
if ',' in v:
n, d = v.split(',')
v = "".join((n.rjust(mw),',', d, " " * (l - 1 - len(d))))
else:
v = "".join((v.rjust(mw) + " " * l))
r[ind1].append(v)
Now, we can print it out:
>>> print(*map(list,zip(*r)), sep='\n)
[' 10 ', ' 3 ', ' ', ' ']
[' 200 ', '4000,222', '3 ', '1,5']
[' 200,21', ' ', '0,3', '2 ']
['30000 ', ' 4,5 ', '1 ', ' ']
Here's a bit different solution that doesn't transpose my_list but instead iterates over it twice. On the first pass it generates a list of tuples, one for each column. Each tuple is a pair of numbers where first number is length before comma and second number is length of comma & everything following it. For example '4000,222' results to (4, 4). On the second pass it formats the data based on the formatting info generated on first pass.
from functools import reduce
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
# Return tuple (left part length, right part length) for given string
def part_lengths(s):
left, sep, right = s.partition(',')
return len(left), len(sep) + len(right)
# Return string formatted based on part lengths
def format(s, first, second):
left, sep, right = s.partition(',')
return left.rjust(first) + sep + right.ljust(second - len(sep))
# Generator yielding part lengths row by row
parts = ((part_lengths(c) for c in r) for r in mylist)
# Combine part lengths to find maximum for each column
# For example data it looks like this: [[5, 3], [4, 4], [1, 2], [1, 2]]
sizes = reduce(lambda x, y: [[max(z) for z in zip(a, b)] for a, b in zip(x, y)], parts)
# Format result based on part lengths
res = [[format(c, *p) for c, p in zip(r, sizes)] for r in mylist]
print(*res, sep='\n')
Output:
[' 10 ', ' 3 ', ' ', ' ']
[' 200 ', '4000,222', '3 ', '1,5']
[' 200,21', ' ', '0,3', '2 ']
['30000 ', ' 4,5 ', '1 ', ' ']
This works for python 2 and 3. I didn't use ljust or rjust though, i just added as many spaces before and after the number as are missing to the maximum sized number in the column:
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
transposed = list(zip(*mylist))
sizes = [[(x.index(",") if "," in x else len(x), len(x) - x.index(",") if "," in x else 0)
for x in l] for l in transposed]
maxima = [(max([x[0] for x in l]), max([x[1] for x in l])) for l in sizes]
withspaces = [
[' ' * (maxima[i][0] - sizes[i][j][0]) + number + ' ' * (maxima[i][1] - sizes[i][j][1])
for j, number in enumerate(l)] for i, l in enumerate(transposed)]
result = list(zip(*withspaces))
Printing the result in python3:
>>> print(*result, sep='\n')
(' 10 ', ' 3 ', ' ', ' ')
(' 200 ', '4000,222', '3 ', '1,5')
(' 200,21', ' ', '0,3', '2 ')
('30000 ', ' 4,5 ', '1 ', ' ')
Related
I have a column of data containing id numbers that are between 4 and 10 digits in length. However, these id numbers are manually entered and have no systematic delimiters. In some cases, id numbers are delimited by a comment. With the caveat that the real data is unpredictable, here is an example of values in a python list.
[ '13796352',
'2113146, 2113148, 2113147',
'asdf ee A070_321 on 4.3.99 - MC',
'blah blah3',
'1914844\xa0, 3310339, 1943270, 2190351, 1215262',
'789702/ 89057',
'1 of 5 blah blah',
'688327/ 6712563/> 5425153',
'1820196/1964143/ 249805/ 300510',
'731862\n\nAccepted: 176666\nRejected: 8787' ]
Here is the regex that is not working:
r'^[0-9]{4,10}([\s\S]*)[[0-9]{4,10}]*'
The desired output (looping through the list) is:
[''],
[', ',', '],
[''],
[''],
['\xa0, ',', ',', ',', '],
['/ '],
[''],
['/ ,'/> '],
[''/','/ ','/ '],
['\n\nAccepted: ','\nRejected: ']
I am not getting this with the regex above. What am I doing wrong?
If you want to extract the ids, you could use for example:
import re
data = [
'13796352',
'2113146, 2113148, 2113147',
'asdf ee A070_321 on 4.3.99 - MC',
'blah blah3',
'1914844\xa0, 3310339, 1943270, 2190351, 1215262',
'789702/ 89057',
'1 of 5 blah blah',
'688327/ 6712563/> 5425153',
'1820196/1964143/ 249805/ 300510',
'731862\n\nAccepted: 176666\nRejected: 8787'
]
for el in data:
print(re.findall(r'(?<!\d)\d{4,10}(?!\d)', el))
Resulting in:
['13796352']
['2113146', '2113148', '2113147']
[]
[]
['1914844', '3310339', '1943270', '2190351', '1215262']
['789702', '89057']
[]
['688327', '6712563', '5425153']
['1820196', '1964143', '249805', '300510']
['731862', '176666', '8787']
(?<!\d)\d{4,10}(?!\d) means match a sequence of 4 to 10 digits that is not preceded or followed by a digit.
This is just a quick sketch but it looks pretty close to what you want. Basically try to match 4 or more digits, split at the matches and exclude
empty strings
entries without any matches.
>>> data = [...] # your sample
>>> num_re = re.compile(r'\d{4,}')
>>> [[x for x in num_re.split(d) if x] if num_re.search(d) else [] for d in data]
[[],
[', ', ', '],
[],
[],
['\xa0, ', ', ', ', ', ', '],
['/ '],
[],
['/ ', '/> '],
['/', '/ ', '/ '],
['\n\nAccepted: ', '\nRejected: ']]
I'm reading the data from an outsource. The data has "Name" and "Value with warnings" so I put those in a dictionary in a manner as
d[data[i:i+6]] = data[i+8:i+17], data[i+25:i+36]
Thus at the end I have my dict as;
{'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' '), .....
As seen above both keys and values might have unnecessary spaces.
I was able to overcome spaces in keys with;
d = {x.replace(' ',''): v
for x, v in d.items()}
but can't seem to manage similar for values. I tried using d.values() but it trims the key name and also works only for 1 of the values.
Can you help me understand how I can remove space for several values (2 values in this particular case) and end up with something like;
{'GPT-P': ('169', 'H'), 'GOT-P ': ('47', ''), .....
Thanks. Stay safe and healthy
You will need to do the space replacement in your v values also but
it seems that in your case the values in your dictionary are tuples.
I guess you will want to remove spaces in all elements of each tuple so you will need a second iteration here. You can do something like this:
d = {'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' ')}
{x.replace(' ', ''): tuple(w.replace(' ', '') for w in v) for x, v in d.items()}
Which returns:
{'GPT-P': ('169', 'H'), 'GOT-P': ('47', '')}
Notice that there is list (or tuple) comprehension tuple(w.replace(' ', '') for w in v) within the dictionary comprehension.
Given:
DoT={'GPT-P ': ('169 ', 'H '), 'GOT-P ': ('47 ', ' ')}
Since you have tuples of strings as your values, you need to apply .strip() to each string in the tuple:
>>> tuple(e.strip() for e in ('47 ', ' '))
('47', '')
Apply that to each key, value in a dict comprehension and there you are:
>>> {k.strip():tuple(e.strip() for e in t) for k,t in DoT.items()}
{'GPT-P': ('169', 'H'), 'GOT-P': ('47', '')}
You use .replace(' ','') in your attempt. That will replace ALL spaces:
>>> ' 1 2 3 '.replace(' ','')
'123'
It is more typical to use one of the .strips():
>>> ' 1 2 3 '.strip()
'1 2 3'
>>> ' 1 2 3 '.lstrip()
'1 2 3 '
>>> ' 1 2 3 '.rstrip()
' 1 2 3'
You can use .replace or any of the .strips() in the comprehensions that I used above.
I have a header list as below:
h_list = ['h1','h2','h3','h4','h5']
Now I have data list (Nested):
d_list = [
[1, None, 3, ' ', 5],
[1, ' ', 2, ' ', 9]
]
Both lists are of same length every time, so I want to match in each list of nested list at same index position and if its all corresponding values are either None or ' ', then replace the item from h_list to ' ' (Empty string)
My expected output is:
h_list = ['h1',' ','h3',' ','h5']
Try a list comprehension:
h_list = ['h1','h2','h3','h4','h5']
d_list = [
[1, None, 3, ' ', 5],
[1, ' ', 2, ' ', 9]
]
empty = [' ', None]
h_list = [' ' if any(b[i] in empty for b in d_list) else v for i, v in enumerate(h_list)]
print(h_list)
Output:
['h1', ' ', 'h3', ' ', 'h5']
Breaking down this part of the code:
h_list = [' ' if any(b[i] in empty for b in d_list) else v for i, v in enumerate(h_list)]
First, lets have only
[(i, v) for i, v in enumerate(h_list)]
The above will be a list of the indices and values of each element in h_list.
Now, we use an if statement to determine when to add the ' '. First, we need to recognize the any() function:
any(b[i] in empty for b in d_list)
returns True if any of the arrays inside d_list at index i is in the empty list. We want None and ' ' to be in place for all the strings in h_list that its index returns a ' ' or None for any of the lists in d_list, so:
[' ' for i, v in enumerate(h_list) if any(b[i] in empty for b in d_list)]
Finally, we want to use the original string if not any(b[i] in empty for b in d_list). For that, we use an else statement (note that with an else, the statements get shifted to the left side of the for loop.):
h_list = [' ' if any(b[i] in empty for b in d_list) else v for i, v in enumerate(h_list)]
I believe this should work for your examples:
new_list = []
for orig_element, *values in zip(h_list, *d_list):
new_list.append(orig_element if any(not (v is None or str(v).strip() == '') for v in values) else '')
If you want to modify the list in-place simply do:
for i, (orig_element, *values) in enumerate(h_list, *d_list):
h_list[i] = orig_element if any(not (v is None or str(v).strip() == '') for v in values) else ''
You can use zip with all:
h_list = ['h1','h2','h3','h4','h5']
d_list = [[1, None, 3, ' ', 5], [1, ' ', 2, ' ', 9]]
r = [' ' if all(k in {None, ' '} for k in j) else a for a, j in zip(h_list, zip(*d_list))]
Output:
['h1', ' ', 'h3', ' ', 'h5']
Basic solution
h_list = ['h1','h2','h3','h4','h5']
d_list = [
[1, None, 3, ' ', 5],
[1, ' ', 2, ' ', 9]
]
# loop over d_list
for i in d_list:
# loop over inner list
for k in i:
# if type not int, give me space in that index
if type(k)!=int:
h_list[i.index(k)]=" "
h_list
code = ("1 2 3 4")
b = code.split()
print(b)
This code returns ['1','2','3','4'] when I want it to return ['1',' ','2', '3',' ','4']. How would I do that?
I suppose you want to split words, not characters (otherwise list(code) would decompose that for you).
use re.split preserving split char by wrapping the split regex between parentheses:
import re
code = "1 2 3 4"
print(re.split("( )",code))
result:
['1', ' ', '2', ' ', '3', ' ', '4']
make it re.split("(\s+)",code) if you want to match more than 1 space, tabs, etc...
>>> code.replace(' ', ', ,').split(',')
['1', ' ', '2', ' ', '3', ' ', '4']
Here's a readable and easy approach without importing any library
code = ("1 2 3 4")
b = []
for element in code.split():
b.append(element)
b.append(" ")
del b[-1]
print b
>>> [x for y in code.split() for x in (y, ' ')][:-1]
['1', ' ', '2', ' ', '3', ' ', '4']
Or
>>> from itertools import cycle,chain
>>> list(chain(*zip(code.split(), cycle(' '))))[:-1]
['1', ' ', '2', ' ', '3', ' ', '4']
Someone posted this and deleted it, but this works quite simply. list() will give you a list of an iterable.
code = ("1 2 3 4")
b = list(code)
print(b)
['1', ' ', '2', ' ', '3', ' ', '4']
Simply convert it to list
code = ("1 2 3 4")
b = list(code)
print(b)
Note:- if you have 2 digit no. It will also split those numbers
code = ("1 2 3 4")
list(code)
After a certain calculation i am getting an output like:
(' ','donor',' ','distance')
(' ','ARG','A','43',' ','3.55')
(' ','SOD','B',93', ' ','4.775')
(' ','URX','C',33', ' ','3.55')
while i was intending to get like:
donor distance
ARG A 43 3.55
SOD B 93 4.77
URX C 33 3.55
the thing what i am getting is a tuple, but i am very confused on how to make this tuple into a well formatted look as per my desire.
Please give some idea...
thank you.
Use str.join() on each tuple:
' '.join(your_tuple)
before printing.
If your data looks like this
data = [
(' ', 'donor', ' ', 'distance'),
(' ', 'ARG', 'A', '43', ' ', '3.55'),
(' ', 'SOD', 'B', '93', ' ', '4.775'),
(' ', 'URX', 'C', '33', ' ', '3.55')
]
Then you can just
print '\n'.join(map(' '.join, data))
You can use a for-loop and str.join:
lis = [
(' ','donor',' ','distance'),
(' ','ARG','A','43',' ','3.55'),
(' ','SOD','B','93', ' ','4.775'),
(' ','URX','C','33', ' ','3.55')
]
for item in lis:
print " ".join(item)
Output:
donor distance
ARG A 43 3.55
SOD B 93 4.775
URX C 33 3.55
It sounds like you want to use format strings. For example, assuming that you are not storing padding strings in your items:
print "{0} {1} {2} {3:>10.2f}".format(*item)
You can specify the exact format (including width and alignment) of each field of the record in the format string. In this example, the fourth string is right-aligned to fit into 10 characters, with 2 digits displayed to the right of the decimal point.
Example using your data:
>>> x = ((' ','ARG','A','43',' ','3.55'),(' ','SOD','B','93', ' ','4.775'),(' ','URX','C','33', ' ','3.55'))
>>> f = "{0:3s}{1:1s}{2:2s} {3:>10.3f}"
>>> for item in x: print f.format(item[1], item[2], item[3], float(item[5]))
...
ARGA43 3.550
SODB93 4.775
URXC33 3.550