Related
I'm trying to understand the solutions to question 5 from pythonchallenge, but I don't understand how the for loop is printing that data from the tuple. The solution is from here
Data contains a list of tuples, eg. data = [[(' ', 95)], [(' ', 14), ('#', 5), (' ', 70), ('#', 5), (' ', 1) ...]]
for line in data:
print("".join([k * v for k, v in line]))
What should be printed out is an ASCII graphic made up of '#'.
This one is sneaky. It's a list of lists of tuples. The inner list is a row on the terminal, and each tuple is a character followed by the number of times that
character should be printed.
It looks like it's iterating through the list, and for each tuple,
printing out tuple[0] tuple[1]-times.
It prints '' 95 times, then '' 14 times, then '#' 5 times, etc, inserting newlines
inbetween each inner list.
Consider:
>>> line = [(' ', 3), ('#', 5), (' ', 3), ('#', 5)]
>>> strs = [k * v for k, v in line]
Then:
>>> strs
[' ', '#####', ' ', '#####']
Furthermore:
>>> ''.join(strs)
' ##### #####'
I have a set of numbers that I want to align considering the comma:
10 3
200 4000,222 3 1,5
200,21 0,3 2
30000 4,5 1
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
What I want is to align this list considering the comma:
expected result:
mylist = [[' 10 ', ' 3 ', ' ', ' '],
[' 200 ', '4000,222', '3 ', '1,5'],
[' 200,21', ' ', '0,3', '2 '],
['30000 ', ' 4,5 ', '1 ', ' ']]
I tried to turn the list:
mynewlist = list(zip(*mylist))
and to find the longest part after the comma in every sublist:
for m in mynewlist:
max([x[::-1].find(',') for x in m]
and to use rjust and ljust but I don't know how to ljust after a comma and rjust before the comma, both in the same string.
How can I resolve this without using format()?
(I want to align with ljust and rjust)
Here's another approach that currently does the trick. Unfortunately, I can't see any simple way to make this work, maybe due to the time :-)
Either way, I'll explain it. r is the result list created before hand.
r = [[] for i in range(4)]
Then we loop through the values and also grab an index with enumerate:
for ind1, vals in enumerate(zip(*mylist)):
Inside the loop we grab the max length of the decimal digits present and the max length of the word (the word w/o the decimal digits):
l = max(len(v.partition(',')[2]) for v in vals) + 1
mw = max(len(v if ',' not in v else v.split(',')[0]) for v in vals)
Now we go through the values inside the tuple vals and build our results (yup, can't currently think of a way to avoid this nesting).
for ind2, v in enumerate(vals):
If it contains a comma, it should be formatted differently. Specifically, we rjust it based on the max length of a word mw and then add the decimal digits and any white-space needed:
if ',' in v:
n, d = v.split(',')
v = "".join((n.rjust(mw),',', d, " " * (l - 1 - len(d))))
In the opposite case, we simply .rjust and then add whitespace:
else:
v = "".join((v.rjust(mw) + " " * l))
finally, we append to r.
r[ind1].append(v)
All together:
r = [[] for i in range(4)]
for ind1, vals in enumerate(zip(*mylist)):
l = max(len(v.partition(',')[2]) for v in vals) + 1
mw = max(len(v if ',' not in v else v.split(',')[0]) for v in vals)
for ind2, v in enumerate(vals):
if ',' in v:
n, d = v.split(',')
v = "".join((n.rjust(mw),',', d, " " * (l - 1 - len(d))))
else:
v = "".join((v.rjust(mw) + " " * l))
r[ind1].append(v)
Now, we can print it out:
>>> print(*map(list,zip(*r)), sep='\n)
[' 10 ', ' 3 ', ' ', ' ']
[' 200 ', '4000,222', '3 ', '1,5']
[' 200,21', ' ', '0,3', '2 ']
['30000 ', ' 4,5 ', '1 ', ' ']
Here's a bit different solution that doesn't transpose my_list but instead iterates over it twice. On the first pass it generates a list of tuples, one for each column. Each tuple is a pair of numbers where first number is length before comma and second number is length of comma & everything following it. For example '4000,222' results to (4, 4). On the second pass it formats the data based on the formatting info generated on first pass.
from functools import reduce
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
# Return tuple (left part length, right part length) for given string
def part_lengths(s):
left, sep, right = s.partition(',')
return len(left), len(sep) + len(right)
# Return string formatted based on part lengths
def format(s, first, second):
left, sep, right = s.partition(',')
return left.rjust(first) + sep + right.ljust(second - len(sep))
# Generator yielding part lengths row by row
parts = ((part_lengths(c) for c in r) for r in mylist)
# Combine part lengths to find maximum for each column
# For example data it looks like this: [[5, 3], [4, 4], [1, 2], [1, 2]]
sizes = reduce(lambda x, y: [[max(z) for z in zip(a, b)] for a, b in zip(x, y)], parts)
# Format result based on part lengths
res = [[format(c, *p) for c, p in zip(r, sizes)] for r in mylist]
print(*res, sep='\n')
Output:
[' 10 ', ' 3 ', ' ', ' ']
[' 200 ', '4000,222', '3 ', '1,5']
[' 200,21', ' ', '0,3', '2 ']
['30000 ', ' 4,5 ', '1 ', ' ']
This works for python 2 and 3. I didn't use ljust or rjust though, i just added as many spaces before and after the number as are missing to the maximum sized number in the column:
mylist = [['10', '3', '', ''],
['200', '4000,222', '3', '1,5'],
['200,21', '', '0,3', '2'],
['30000', '4,5', '1', '']]
transposed = list(zip(*mylist))
sizes = [[(x.index(",") if "," in x else len(x), len(x) - x.index(",") if "," in x else 0)
for x in l] for l in transposed]
maxima = [(max([x[0] for x in l]), max([x[1] for x in l])) for l in sizes]
withspaces = [
[' ' * (maxima[i][0] - sizes[i][j][0]) + number + ' ' * (maxima[i][1] - sizes[i][j][1])
for j, number in enumerate(l)] for i, l in enumerate(transposed)]
result = list(zip(*withspaces))
Printing the result in python3:
>>> print(*result, sep='\n')
(' 10 ', ' 3 ', ' ', ' ')
(' 200 ', '4000,222', '3 ', '1,5')
(' 200,21', ' ', '0,3', '2 ')
('30000 ', ' 4,5 ', '1 ', ' ')
I have a list in Python with each element being a single German word, e.g.:
my_list = [..., 'Stahl ', 'Stahl ', 'Die ', '*die ', 'Rheinhausener ', 'Rhein=Hausener ', 'Mittelstreckenraketen', 'Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete',...]
In this list, compound nouns are immediately followed by their possible decompositions/splits (there can be an arbitrary number of decompositions/splits)
e.g. 'Mittelstreckenraketen' has 3 decompositions/splits:
'Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete'
while 'Rheinhausener ' has only one:
'Rhein=Hausener '
The list is approximately 50,000 elements in length.
What I would like to do is extract only the compound nouns and their decompositions/splits, (discarding all other elements in the list) and read them into a dictionary with the compound noun as the key, and the decomposition/splits as values, e.g.:
my_dict = {...,'Rheinhausener ': ['Rhein=Hausener '], 'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete'],...}
Thereby discarding elements such as:
'Stahl ', 'Stahl ', 'Die ', '*die '
I was thinking of looping through the list and every time an element with one or more equals signs '=' appears, taking the preceding element and storing it as the key. But I'm too much of a Python newbie to figure out how to account for the arbitrary number of values for each dictionary entry; so I appreciate any help.
Here's one way to do it, using a defaultdict. The defaultdict automatically creates an empty list if we attempt to access a key that doesn't exist.
#!/usr/bin/env python
from collections import defaultdict
my_list = [
'Stahl ',
'Stahl ',
'Die ',
'*die ',
'Rheinhausener ',
'Rhein=Hausener ',
'Mittelstreckenraketen',
'Mittel=Strecken=Rakete',
'Mittel=strecken=Rakete',
'Mittels=trecken=Rakete'
]
my_dict = defaultdict(list)
key = None
for word in my_list:
if '=' in word:
if key is None:
print 'Error: No key found for', word
my_dict[key].append(word)
else:
key = word
for key in my_dict:
print '%r: %r' % (key, my_dict[key])
output
'Rheinhausener ': ['Rhein=Hausener ']
'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']
Note that this code will not function correctly if the key element doesn't immediately precede the series of decompositions.
You can using defaultdict:
from collections import defaultdict
my_list = ['Stahl ', 'Stahl ', 'Die ', '*die ', 'Rheinhausener ', 'Rhein=Hausener ', 'Mittelstreckenraketen', 'Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']
my_dict = defaultdict(list)
value = ''
for item in my_list:
if '=' not in item:
value = item
else:
my_dict[value].append(item)
print my_dict
which prints
defaultdict(<type 'list'>, {'Rheinhausener ': ['Rhein=Hausener '], 'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']})
It assumes the last item it saw without a '=' character in it, is the word we're trying to get the decomposition of.
line = (' 1.')
print(line.split(), len(line.split()))
This gives
['1.'] 1
But if I do
for value in line.split():
val = value
print(val, len(val))
I get
1. 2
Inspecting val gives me
val[0]
'1'
val[1]
'.'
I'm confused as to why ".split()" is dividing from 1 index to 2 in the second example?
split() divides a string using spaces as separator (more than one spaces together are just un separator for this method).
see: http://docs.python.org/2.7/library/stdtypes.html#str.split
You can also give the separator you want as a parameter, for example mystr.split(",") will use the comma as separator to split mystr
There is also a second parameter that tell the method how many splits do you want to perform.
So:
mystr = "1 - 2 - 3 - 4"
print(mystr.split()) # split using spaces
print(mystr.split("-")) # split using "-"
print(mystr.split("-",2)) # split using "-" with 2 splits maximum
will produce the following output:
['1', '-', '2', '-', '3', '-', '4']
['1 ', ' 2 ', ' 3 ', ' 4']
['1 ', ' 2 ', ' 3 - 4']
When you do line.split() it returns a list with one element ['1.'].
When you iterate over the line.split (for value in line.split()) the variable value is each element of the list resultant by the split ('1.' not ['1.']).
Then you run len in the string, wich has 2 elements (the "1" and the ".").
After a certain calculation i am getting an output like:
(' ','donor',' ','distance')
(' ','ARG','A','43',' ','3.55')
(' ','SOD','B',93', ' ','4.775')
(' ','URX','C',33', ' ','3.55')
while i was intending to get like:
donor distance
ARG A 43 3.55
SOD B 93 4.77
URX C 33 3.55
the thing what i am getting is a tuple, but i am very confused on how to make this tuple into a well formatted look as per my desire.
Please give some idea...
thank you.
Use str.join() on each tuple:
' '.join(your_tuple)
before printing.
If your data looks like this
data = [
(' ', 'donor', ' ', 'distance'),
(' ', 'ARG', 'A', '43', ' ', '3.55'),
(' ', 'SOD', 'B', '93', ' ', '4.775'),
(' ', 'URX', 'C', '33', ' ', '3.55')
]
Then you can just
print '\n'.join(map(' '.join, data))
You can use a for-loop and str.join:
lis = [
(' ','donor',' ','distance'),
(' ','ARG','A','43',' ','3.55'),
(' ','SOD','B','93', ' ','4.775'),
(' ','URX','C','33', ' ','3.55')
]
for item in lis:
print " ".join(item)
Output:
donor distance
ARG A 43 3.55
SOD B 93 4.775
URX C 33 3.55
It sounds like you want to use format strings. For example, assuming that you are not storing padding strings in your items:
print "{0} {1} {2} {3:>10.2f}".format(*item)
You can specify the exact format (including width and alignment) of each field of the record in the format string. In this example, the fourth string is right-aligned to fit into 10 characters, with 2 digits displayed to the right of the decimal point.
Example using your data:
>>> x = ((' ','ARG','A','43',' ','3.55'),(' ','SOD','B','93', ' ','4.775'),(' ','URX','C','33', ' ','3.55'))
>>> f = "{0:3s}{1:1s}{2:2s} {3:>10.3f}"
>>> for item in x: print f.format(item[1], item[2], item[3], float(item[5]))
...
ARGA43 3.550
SODB93 4.775
URXC33 3.550