Python - Huffman bit compression - python

I need help on this. I made a Huffman compression program. For example I'll input "google", I can just print it like this:
Character | Frequency | Huffman Code
------------------------------------
'g' | 2 | 0
'o' | 2 | 11
'l' | 1 | 101
'e' | 1 | 100
I also need to print the combined bits so if I use that it will be:
011101100
Which is wrong because as far as I understand Huffman compression it should be:
011110101100
So my output should be:
Character | Frequency | Huffman Code
------------------------------------
'g' | 2 | 0
'o' | 2 | 11
'o' | 2 | 11
'g' | 2 | 0
'l' | 1 | 101
'e' | 1 | 100
Basically I need to display it based on what I input. So if I input "test" it should also print test vertically with their corresponding bits since I'm just appending the bits and displaying them. How do I achieve this? Here's the printing part:
freq = {}
for c in word:
if c in freq:
freq[c] += 1
else:
freq[c] = 1
freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
if check:
print (" Char | Freq ")
for key, c in freq:
print (" %4r | %d" % (key, c))
nodes = freq
while len(nodes) > 1:
key1, c1 = nodes[-1]
key2, c2 = nodes[-2]
nodes = nodes[:-2]
node = NodeTree(key1, key2)
nodes.append((node, c1 + c2))
nodes = sorted(nodes, key=lambda x: x[1], reverse=True)
if check:
print ("left: %s" % nodes[0][0].nodes()[0])
print ("right: %s" % nodes[0][0].nodes()[1])
huffmanCode = huffman(nodes[0][0])
print ("\n")
print (" Character | Frequency | Huffman code ")
print ("---------------------------------------")
for char, frequency in freq:
print (" %-9r | %10d | %12s" % (char, frequency, huffmanCode[char]))
P.S. I know I shouldn't be sorting them I'll remove the sorting part don't worry

Related

How should I solve logic error in timestamp using Python?

I have written a code to calculate a, b, and c. They were initialized at 0.
This is my input file
-------------------------------------------------------------
| Line | Time | Command | Data |
-------------------------------------------------------------
| 1 | 0015 | ACTIVE | |
| 2 | 0030 | WRITING | |
| 3 | 0100 | WRITING_A | |
| 4 | 0115 | PRECHARGE | |
| 5 | 0120 | REFRESH | |
| 6 | 0150 | ACTIVE | |
| 7 | 0200 | WRITING | |
| 8 | 0314 | PRECHARGE | |
| 9 | 0318 | ACTIVE | |
| 10 | 0345 | WRITING_A | |
| 11 | 0430 | WRITING_A | |
| 12 | 0447 | WRITING | |
| 13 | 0503 | WRITING | |
and the timestamps and commands are used to process the calculation for a, b, and c.
import re
count = {}
timestamps = {}
with open ("page_stats.txt", "r") as f:
for line in f:
m = re.split(r"\s*\|\s*", line)
if len(m) > 3 and re.match(r"\d+", m[1]):
count[m[3]] = count[m[3]] + 1 if m[3] in count else 1
#print(m[2])
if m[3] in timestamps:
timestamps[m[3]].append(m[2])
#print(m[3], m[2])
else:
timestamps[m[3]] = [m[2]]
#print(m[3], m[2])
a = b = c = 0
for key in count:
print("%-10s: %2d, %s" % (key, count[key], timestamps[key]))
if timestamps["ACTIVE"] > timestamps["PRECHARGE"]: #line causing logic error
a = a + 1
print(a)
Before getting into the calculation, I assign the timestamps with respect to the commands. This is the output for this section.
ACTIVE : 3, ['0015', '0150', '0318']
WRITING : 4, ['0030', '0200', '0447', '0503']
WRITING_A : 3, ['0100', '0345', '0430']
PRECHARGE : 2, ['0115', '0314']
REFRESH : 1, ['0120']
To get a, the timestamps of ACTIVE must be greater than PRECHARGE and WRITING must be greater than ACTIVE. (Line 4, 6, 7 will contribute to the first a and Line 8, 9, and 12 contributes to the second a)
To get b, the timestamps of WRITING must be greater than ACTIVE. For the lines that contribute to a such as Line 4, 6, 7, 8, 9, and 12, they cannot be used to calculate b. So, Line 1 and 2 contribute to b.
To get c, the rest of the unused lines containing WRITING will contribute to c.
The expected output:
a = 2
b = 1
c = 1
However, in my code, when I print a, it displays 0, which shows the logic has some error. Any suggestion to amend my code to achieve the goal? I have tried for a few days and the problem is not solved yet.
I made a function that will return the commands in order that match a pattern with gaps allowed.
I also made a more compact version of your file reading.
There is probably a better version to divide the list into two parts, the problem was to only allow elements in that match the whole pattern. In this one I iterate over the elements twice.
import re
commands = list()
with open ("page_stats.txt", "r") as f:
for line in f:
m = re.split(r"\s*\|\s*", line)
if len(m) > 3 and re.match(r"\d+", m[1]):
_, line, time, command, data, _ = m
commands.append((line,time,command))
def search_pattern(pattern, iterable, key=None):
iter = 0
count = 0
length = len(pattern)
results = []
sentinel = object()
for elem in iterable:
original_elem = elem
if key is not None:
elem = key(elem)
if elem == pattern[iter]:
iter += 1
results.append((original_elem,sentinel))
if iter >= length:
iter = iter % length
count += length
else:
results.append((sentinel,original_elem))
matching = []
nonmatching = []
for res in results:
first,second = res
if count > 0:
if second is sentinel:
matching.append(first)
count -= 1
elif first is sentinel:
nonmatching.append(second)
else:
value = first if second is sentinel else second
nonmatching.append(value)
return matching, nonmatching
pattern_a = ['PRECHARGE','ACTIVE','WRITING']
pattern_b = ['ACTIVE','WRITING']
pattern_c = ['WRITING']
matching, nonmatching = search_pattern(pattern_a, commands, key=lambda t: t[2])
a = len(matching)//len(pattern_a)
matching, nonmatching = search_pattern(pattern_b, nonmatching, key=lambda t: t[2])
b = len(matching)//len(pattern_b)
matching, nonmatching = search_pattern(pattern_c, nonmatching, key=lambda t: t[2])
c = len(matching)//len(pattern_c)
print(f'{a=}')
print(f'{b=}')
print(f'{c=}')
Output:
a=2
b=1
c=1

Calculating comparisson matrix using strings as input in python - HARD

I have two strings of DNA sequences and I want to compare both sequences, character by character, in order to get a matrix with comparisson values. The general idea is to have three essential points:
If there is the complementary AT (A in one sequence and T in the other) then 2/3.
If there is the complementary CG (C in one sequence and G in the other) then 1.
Otherwise, 0 is returned.
For example if I have two sequences ACTG then the result would be:
| A | C | T | G |
A| 0 | 0 | 2/3 | 0 |
C| 0 | 0 | 0 | 1 |
T| 2/3 | 0 | 0 | 0 |
G| 0 | 1 | 0 | 0 |
I saw there is some help in this post
Calculating a similarity/difference matrix from equal length strings in Python and it really work if you are using only a 4 nucleotide long sequence-
I tried using a larger sequence and this error was printed:
ValueError: shapes (5,4) and (5,4) not aligned: 4 (dim 1) != 5 (dim 0)
I have the code in R which is
##2.1 Separas los strings
seq <- "ACTG"
seq1 <- unlist(as.matrix(strsplit(seq,""),ncol=nchar(seq),
nrow=nchar(seq)))
a <- matrix(ncol=length(seq),nrow=length(seq))
a[,1] <- seq1
a[1,] <- seq1
b <- matrix(ncol=length(a[1,]),nrow=length(a[1,]))
for (i in seq(nchar(seq))){
for (j in seq(nchar(seq))){
if (a[i,1] == "A" & a[1,j] == "T" | a[i,1] == "T" & a[1,j] == "A"){
b[[i,j]] <- 2/3
} else if (a[i,1] == "C" & a[1,j] == "G" | a[i,1] == "G" & a[1,j] == "C"){
b[[i,j]] <- 1
} else
b[[i,j]] <- 0
}
But I can't get it code in python.
I think you're making it harder than it needs to be.
import numpy as np
seq1 = 'AACCTTGG'
seq2 = 'ACGTACGT'
matrix = np.zeros((len(seq1),len(seq2)))
for y,c2 in enumerate(seq2):
for x,c1 in enumerate(seq1):
if c1+c2 in ('TA','AT'):
matrix[x,y] = 1.
elif c1+c2 in ('CG','GC'):
matrix[x,y] = 2/3
print(matrix)

How to print a number so it takes up exactly same space regardless of its number of digits? [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
The community reviewed whether to reopen this question 4 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm looking for a way to pretty-print tables like this:
=======================
| column 1 | column 2 |
=======================
| value1 | value2 |
| value3 | value4 |
=======================
I've found the asciitable library but it doesn't do the borders, etc. I don't need any complex formatting of data items, they're just strings. I do need it to auto-size columns.
Do other libraries or methods exist, or do I need to spend a few minutes writing my own?
I've read this question long time ago, and finished writing my own pretty-printer for tables: tabulate.
My use case is:
I want a one-liner most of the time
which is smart enough to figure the best formatting for me
and can output different plain-text formats
Given your example, grid is probably the most similar output format:
from tabulate import tabulate
print tabulate([["value1", "value2"], ["value3", "value4"]], ["column 1", "column 2"], tablefmt="grid")
+------------+------------+
| column 1 | column 2 |
+============+============+
| value1 | value2 |
+------------+------------+
| value3 | value4 |
+------------+------------+
Other supported formats are plain (no lines), simple (Pandoc simple tables), pipe (like tables in PHP Markdown Extra), orgtbl (like tables in Emacs' org-mode), rst (like simple tables in reStructuredText). grid and orgtbl are easily editable in Emacs.
Performance-wise, tabulate is slightly slower than asciitable, but much faster than PrettyTable and texttable.
P.S. I'm also a big fan of aligning numbers by a decimal column. So this is the default alignment for numbers if there are any (overridable).
Here's a quick and dirty little function I wrote for displaying the results from SQL queries I can only make over a SOAP API. It expects an input of a sequence of one or more namedtuples as table rows. If there's only one record, it prints it out differently.
It is handy for me and could be a starting point for you:
def pprinttable(rows):
if len(rows) > 1:
headers = rows[0]._fields
lens = []
for i in range(len(rows[0])):
lens.append(len(max([x[i] for x in rows] + [headers[i]],key=lambda x:len(str(x)))))
formats = []
hformats = []
for i in range(len(rows[0])):
if isinstance(rows[0][i], int):
formats.append("%%%dd" % lens[i])
else:
formats.append("%%-%ds" % lens[i])
hformats.append("%%-%ds" % lens[i])
pattern = " | ".join(formats)
hpattern = " | ".join(hformats)
separator = "-+-".join(['-' * n for n in lens])
print hpattern % tuple(headers)
print separator
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
for line in rows:
print pattern % tuple(_u(t) for t in line)
elif len(rows) == 1:
row = rows[0]
hwidth = len(max(row._fields,key=lambda x: len(x)))
for i in range(len(row)):
print "%*s = %s" % (hwidth,row._fields[i],row[i])
Sample output:
pkid | fkn | npi
-------------------------------------+--------------------------------------+----
405fd665-0a2f-4f69-7320-be01201752ec | 8c9949b9-552e-e448-64e2-74292834c73e | 0
5b517507-2a42-ad2e-98dc-8c9ac6152afa | f972bee7-f5a4-8532-c4e5-2e82897b10f6 | 0
2f960dfc-b67a-26be-d1b3-9b105535e0a8 | ec3e1058-8840-c9f2-3b25-2488f8b3a8af | 1
c71b28a3-5299-7f4d-f27a-7ad8aeadafe0 | 72d25703-4735-310b-2e06-ff76af1e45ed | 0
3b0a5021-a52b-9ba0-1439-d5aafcf348e7 | d81bb78a-d984-e957-034d-87434acb4e97 | 1
96c36bb7-c4f4-2787-ada8-4aadc17d1123 | c171fe85-33e2-6481-0791-2922267e8777 | 1
95d0f85f-71da-bb9a-2d80-fe27f7c02fe2 | 226f964c-028d-d6de-bf6c-688d2908c5ae | 1
132aa774-42e5-3d3f-498b-50b44a89d401 | 44e31f89-d089-8afc-f4b1-ada051c01474 | 1
ff91641a-5802-be02-bece-79bca993fdbc | 33d8294a-053d-6ab4-94d4-890b47fcf70d | 1
f3196e15-5b61-e92d-e717-f00ed93fe8ae | 62fa4566-5ca2-4a36-f872-4d00f7abadcf | 1
Example
>>> from collections import namedtuple
>>> Row = namedtuple('Row',['first','second','third'])
>>> data = Row(1,2,3)
>>> data
Row(first=1, second=2, third=3)
>>> pprinttable([data])
first = 1
second = 2
third = 3
>>> pprinttable([data,data])
first | second | third
------+--------+------
1 | 2 | 3
1 | 2 | 3
For some reason when I included 'docutils' in my google searches I stumbled across texttable, which seems to be what I'm looking for.
I too wrote my own solution to this. I tried to keep it simple.
https://github.com/Robpol86/terminaltables
from terminaltables import AsciiTable
table_data = [
['Heading1', 'Heading2'],
['row1 column1', 'row1 column2'],
['row2 column1', 'row2 column2']
]
table = AsciiTable(table_data)
print table.table
+--------------+--------------+
| Heading1 | Heading2 |
+--------------+--------------+
| row1 column1 | row1 column2 |
| row2 column1 | row2 column2 |
+--------------+--------------+
table.inner_heading_row_border = False
print table.table
+--------------+--------------+
| Heading1 | Heading2 |
| row1 column1 | row1 column2 |
| row2 column1 | row2 column2 |
+--------------+--------------+
table.inner_row_border = True
table.justify_columns[1] = 'right'
table.table_data[1][1] += '\nnewline'
print table.table
+--------------+--------------+
| Heading1 | Heading2 |
+--------------+--------------+
| row1 column1 | row1 column2 |
| | newline |
+--------------+--------------+
| row2 column1 | row2 column2 |
+--------------+--------------+
I just released termtables for this purpose. For example, this
import termtables as tt
tt.print(
[[1, 2, 3], [613.23236243236, 613.23236243236, 613.23236243236]],
header=["a", "bb", "ccc"],
style=tt.styles.ascii_thin_double,
padding=(0, 1),
alignment="lcr"
)
gets you
+-----------------+-----------------+-----------------+
| a | bb | ccc |
+=================+=================+=================+
| 1 | 2 | 3 |
+-----------------+-----------------+-----------------+
| 613.23236243236 | 613.23236243236 | 613.23236243236 |
+-----------------+-----------------+-----------------+
By default, the table is rendered with Unicode box-drawing characters,
┌─────────────────┬─────────────────┬─────────────────┐
│ a │ bb │ ccc │
╞═════════════════╪═════════════════╪═════════════════╡
│ 1 │ 2 │ 3 │
├─────────────────┼─────────────────┼─────────────────┤
│ 613.23236243236 │ 613.23236243236 │ 613.23236243236 │
└─────────────────┴─────────────────┴─────────────────┘
termtables are very configurable; check out the tests for more examples.
If you want a table with column and row spans, then try my library dashtable
from dashtable import data2rst
table = [
["Header 1", "Header 2", "Header3", "Header 4"],
["row 1", "column 2", "column 3", "column 4"],
["row 2", "Cells span columns.", "", ""],
["row 3", "Cells\nspan rows.", "- Cells\n- contain\n- blocks", ""],
["row 4", "", "", ""]
]
# [Row, Column] pairs of merged cells
span0 = ([2, 1], [2, 2], [2, 3])
span1 = ([3, 1], [4, 1])
span2 = ([3, 3], [3, 2], [4, 2], [4, 3])
my_spans = [span0, span1, span2]
print(data2rst(table, spans=my_spans, use_headers=True))
Which outputs:
+----------+------------+----------+----------+
| Header 1 | Header 2 | Header3 | Header 4 |
+==========+============+==========+==========+
| row 1 | column 2 | column 3 | column 4 |
+----------+------------+----------+----------+
| row 2 | Cells span columns. |
+----------+----------------------------------+
| row 3 | Cells | - Cells |
+----------+ span rows. | - contain |
| row 4 | | - blocks |
+----------+------------+---------------------+
You can try BeautifulTable. It does what you want to do. Here's an example from it's documentation
>>> from beautifultable import BeautifulTable
>>> table = BeautifulTable()
>>> table.columns.header = ["name", "rank", "gender"]
>>> table.rows.append(["Jacob", 1, "boy"])
>>> table.rows.append(["Isabella", 1, "girl"])
>>> table.rows.append(["Ethan", 2, "boy"])
>>> table.rows.append(["Sophia", 2, "girl"])
>>> table.rows.append(["Michael", 3, "boy"])
>>> print(table)
+----------+------+--------+
| name | rank | gender |
+----------+------+--------+
| Jacob | 1 | boy |
+----------+------+--------+
| Isabella | 1 | girl |
+----------+------+--------+
| Ethan | 2 | boy |
+----------+------+--------+
| Sophia | 2 | girl |
+----------+------+--------+
| Michael | 3 | boy |
+----------+------+--------+
Version using w3m designed to handle the types MattH's version accepts:
import subprocess
import tempfile
import html
def pprinttable(rows):
esc = lambda x: html.escape(str(x))
sour = "<table border=1>"
if len(rows) == 1:
for i in range(len(rows[0]._fields)):
sour += "<tr><th>%s<td>%s" % (esc(rows[0]._fields[i]), esc(rows[0][i]))
else:
sour += "<tr>" + "".join(["<th>%s" % esc(x) for x in rows[0]._fields])
sour += "".join(["<tr>%s" % "".join(["<td>%s" % esc(y) for y in x]) for x in rows])
with tempfile.NamedTemporaryFile(suffix=".html") as f:
f.write(sour.encode("utf-8"))
f.flush()
print(
subprocess
.Popen(["w3m","-dump",f.name], stdout=subprocess.PIPE)
.communicate()[0].decode("utf-8").strip()
)
from collections import namedtuple
Row = namedtuple('Row',['first','second','third'])
data1 = Row(1,2,3)
data2 = Row(4,5,6)
pprinttable([data1])
pprinttable([data1,data2])
results in:
┌───────┬─┐
│ first │1│
├───────┼─┤
│second │2│
├───────┼─┤
│ third │3│
└───────┴─┘
┌─────┬───────┬─────┐
│first│second │third│
├─────┼───────┼─────┤
│1 │2 │3 │
├─────┼───────┼─────┤
│4 │5 │6 │
└─────┴───────┴─────┘
I know it the question is a bit old but here's my attempt at this:
https://gist.github.com/lonetwin/4721748
It is a bit more readable IMHO (although it doesn't differentiate between single / multiple rows like #MattH's solutions does, nor does it use NamedTuples).
I use this small utility function.
def get_pretty_table(iterable, header):
max_len = [len(x) for x in header]
for row in iterable:
row = [row] if type(row) not in (list, tuple) else row
for index, col in enumerate(row):
if max_len[index] < len(str(col)):
max_len[index] = len(str(col))
output = '-' * (sum(max_len) + 1) + '\n'
output += '|' + ''.join([h + ' ' * (l - len(h)) + '|' for h, l in zip(header, max_len)]) + '\n'
output += '-' * (sum(max_len) + 1) + '\n'
for row in iterable:
row = [row] if type(row) not in (list, tuple) else row
output += '|' + ''.join([str(c) + ' ' * (l - len(str(c))) + '|' for c, l in zip(row, max_len)]) + '\n'
output += '-' * (sum(max_len) + 1) + '\n'
return output
print get_pretty_table([[1, 2], [3, 4]], ['header 1', 'header 2'])
output
-----------------
|header 1|header 2|
-----------------
|1 |2 |
|3 |4 |
-----------------
from sys import stderr, stdout
def create_table(table: dict, full_row: bool = False) -> None:
min_len = len(min((v for v in table.values()), key=lambda q: len(q)))
max_len = len(max((v for v in table.values()), key=lambda q: len(q)))
if min_len < max_len:
stderr.write("Table is out of shape, please make sure all columns have the same length.")
stderr.flush()
return
additional_spacing = 1
heading_separator = '| '
horizontal_split = '| '
rc_separator = ''
key_list = list(table.keys())
rc_len_values = []
for key in key_list:
rc_len = len(max((v for v in table[key]), key=lambda q: len(str(q))))
rc_len_values += ([rc_len, [key]] for n in range(len(table[key])))
heading_line = (key + (" " * (rc_len + (additional_spacing + 1)))) + heading_separator
stdout.write(heading_line)
rc_separator += ("-" * (len(key) + (rc_len + (additional_spacing + 1)))) + '+-'
if key is key_list[-1]:
stdout.flush()
stdout.write('\n' + rc_separator + '\n')
value_list = [v for vl in table.values() for v in vl]
aligned_data_offset = max_len
row_count = len(key_list)
next_idx = 0
newline_indicator = 0
iterations = 0
for n in range(len(value_list)):
key = rc_len_values[next_idx][1][0]
rc_len = rc_len_values[next_idx][0]
line = ('{:{}} ' + " " * len(key)).format(value_list[next_idx], str(rc_len + additional_spacing)) + horizontal_split
if next_idx >= (len(value_list) - aligned_data_offset):
next_idx = iterations + 1
iterations += 1
else:
next_idx += aligned_data_offset
if newline_indicator >= row_count:
if full_row:
stdout.flush()
stdout.write('\n' + rc_separator + '\n')
else:
stdout.flush()
stdout.write('\n')
newline_indicator = 0
stdout.write(line)
newline_indicator += 1
stdout.write('\n' + rc_separator + '\n')
stdout.flush()
Example:
table = {
"uid": ["0", "1", "2", "3"],
"name": ["Jon", "Doe", "Lemma", "Hemma"]
}
create_table(table)
Output:
uid | name |
------+------------+-
0 | Jon |
1 | Doe |
2 | Lemma |
3 | Hemma |
------+------------+-
Here's my solution:
def make_table(columns, data):
"""Create an ASCII table and return it as a string.
Pass a list of strings to use as columns in the table and a list of
dicts. The strings in 'columns' will be used as the keys to the dicts in
'data.'
Not all column values have to be present in each data dict.
>>> print(make_table(["a", "b"], [{"a": "1", "b": "test"}]))
| a | b |
|----------|
| 1 | test |
"""
# Calculate how wide each cell needs to be
cell_widths = {}
for c in columns:
values = [str(d.get(c, "")) for d in data]
cell_widths[c] = len(max(values + [c]))
# Used for formatting rows of data
row_template = "|" + " {} |" * len(columns)
# CONSTRUCT THE TABLE
# The top row with the column titles
justified_column_heads = [c.ljust(cell_widths[c]) for c in columns]
header = row_template.format(*justified_column_heads)
# The second row contains separators
sep = "|" + "-" * (len(header) - 2) + "|"
# Rows of data
rows = []
for d in data:
fields = [str(d.get(c, "")).ljust(cell_widths[c]) for c in columns]
row = row_template.format(*fields)
rows.append(row)
return "\n".join([header, sep] + rows)
This can be done with only builtin modules fairly compactly using list and string comprehensions. Accepts a list of dictionaries all of the same format...
def tableit(dictlist):
lengths = [ max(map(lambda x:len(x.get(k)), dictlist) + [len(k)]) for k in dictlist[0].keys() ]
lenstr = " | ".join("{:<%s}" % m for m in lengths)
lenstr += "\n"
outmsg = lenstr.format(*dictlist[0].keys())
outmsg += "-" * (sum(lengths) + 3*len(lengths))
outmsg += "\n"
outmsg += "".join(
lenstr.format(*v) for v in [ item.values() for item in dictlist ]
)
return outmsg

Does anyone know how to return a grid into the shell based on a txt file?

Iam trying to make a small interior designing app, I type a txt file and make my program return the grid on shell.
I just need to know how to make a grid where the height and width are both 20.
These are the codes I have so far.. I only know how to make the width but not the height. I also don't know how to get the numbers and letters from my txt file but i made my txt file into a list line by line.
f = open('Apt_3_4554_Hastings_Coq.txt','r')
bigListA = [ line.strip().split(',') for line in f ]
offset = " "
width = 20
string1 = offset
for number in range(width):
if len(str(number)) == 1:
string1 += " " + str(number) + " "
else:
string1 += str(number) + " "
print (string1)
A bit of an overkill, but it was fun to make a class around it:
def decimal_string(number, before=True):
"""
Convert a number between 0 and 99 to a space padded string.
Parameters
----------
number: int
The number to convert to string.
before: bool
Whether to place the spaces before or after the nmuber.
Examples
--------
>>> decimal_string(1)
' 1'
>>> decimal_string(1, False)
'1 '
>>> decimal_string(10)
'10'
>>> decimal_string(10, False)
'10'
"""
number = int(number)%100
if number < 10:
if before:
return ' ' + str(number)
else:
return str(number) + ' '
else:
return str(number)
class Grid(object):
def __init__(self, doc=None, shape=(10,10)):
"""
Create new grid object from a given file or with a given shape.
Parameters
----------
doc: file, None
The name of the file from where to read the data.
shape: (int, int), (10, 10)
The shape to use if no `doc` is provided.
"""
if doc is not None:
self.readfile(doc)
else:
self.empty_grid(shape)
def __repr__(self):
"""
Representation method.
"""
# first lines
# 0 1 2 3 4 5 6 7 ...
# - - - - - - - - ...
number_line = ' '
traces_line = ' '
for i in range(len(self.grid)):
number_line += decimal_string(i) + ' '
traces_line += ' - '
lines = ''
for j in range(len(self.grid[0])):
line = decimal_string(j, False) + '|'
for i in range(len(self.grid)):
line += ' ' + self.grid[i][j] + ' '
lines += line + '|\n'
return '\n'.join((number_line, traces_line, lines[:-1], traces_line))
def readfile(self, doc):
"""
Read instructions from a file, overwriting current grid.
"""
with open(doc, 'r') as open_doc:
lines = open_doc.readlines()
shape = lines[0].split(' ')[-2:]
# grabs the first line (line[0]),
# splits the line into pieces by the ' ' symbol
# grabs the last two of them ([-2:])
shape = (int(shape[0]), int(shape[1]))
# and turns them into numbers (np.array(..., dtype=int))
self.empty_grid(shape=shape)
for instruction in lines[1:]:
self.add_pieces(*self._parse(instruction))
def empty_grid(self, shape=None):
"""
Empty grid, changing the shape to the new one, if provided.
"""
if shape is None:
# retain current shape
shape = (len(self.grid), len(self.grid[0]))
self.grid = [[' ' for i in range(shape[0])]
for j in range(shape[1])]
def _parse(self, instruction):
"""
Parse string instructions in the shape:
"C 5 6 13 13"
where the first element is the charachter,
the second and third elements are the vertical indexes
and the fourth and fifth are the horizontal indexes
"""
pieces = instruction.split(' ')
char = pieces[0]
y_start = int(pieces[1])
y_stop = int(pieces[2])
x_start = int(pieces[3])
x_stop = int(pieces[4])
return char, y_start, y_stop, x_start, x_stop
def add_pieces(self, char, y_start, y_stop, x_start, x_stop):
"""
Add a piece to the current grid.
Parameters
----------
char: str
The char to place in the grid.
y_start: int
Vertical start index.
y_stop: int
Vertical stop index.
x_start: int
Horizontal start index.
x_stop: int
Horizontal stop index.
Examples
--------
>>> b = Grid(shape=(4, 4))
>>> b
0 1 2 3
- - - -
0 | |
1 | |
2 | |
3 | |
- - - -
>>> b.add_pieces('a', 0, 1, 0, 0)
>>> b
0 1 2 3
- - - -
0 | a |
1 | a |
2 | |
3 | |
- - - -
>>> b.add_pieces('b', 3, 3, 2, 3)
>>> b
0 1 2 3
- - - -
0 | a |
1 | a |
2 | |
3 | b b |
- - - -
"""
assert y_start <= y_stop < len(self.grid[0]),\
"Vertical index out of bounds."
assert x_start <= x_stop < len(self.grid),\
"Horizontal index out of bounds."
for i in range(x_start, x_stop+1):
for j in range(y_start, y_stop+1):
self.grid[i][j] = char
You can then have file.txt with:
20 20
C 5 6 13 13
C 8 9 13 13
C 5 6 18 18
C 8 9 18 18
C 3 3 15 16
C 11 11 15 16
E 2 3 3 6
S 17 18 2 7
t 14 15 3 6
T 4 10 14 17
and make:
>>> a = Grid('file.txt')
>>> a
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
- - - - - - - - - - - - - - - - - - - -
0 | |
1 | |
2 | E E E E |
3 | E E E E C C |
4 | T T T T |
5 | C T T T T C |
6 | C T T T T C |
7 | T T T T |
8 | C T T T T C |
9 | C T T T T C |
10| T T T T |
11| C C |
12| |
13| |
14| t t t t |
15| t t t t |
16| |
17| S S S S S S |
18| S S S S S S |
19| |
- - - - - - - - - - - - - - - - - - - -
You can represent the room as a 20x20 grid. One idea is a list of lists; personally I prefer a dict.
Read through the file and assign each point. (I assume you've handled parsing the file, since that wasn't the issue you posted.) For instance:
room[5, 13] = 'C'
Then you can iterate over the coordinates to provide your output.
for i in range(N_ROWS):
for j in range(N_COLS):
# Print the character if it exists, or a blank space.
print(room.get((i, j), default=' '), end='')
print() # Start a new line.

How can I pretty-print ASCII tables with Python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
The community reviewed whether to reopen this question 4 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm looking for a way to pretty-print tables like this:
=======================
| column 1 | column 2 |
=======================
| value1 | value2 |
| value3 | value4 |
=======================
I've found the asciitable library but it doesn't do the borders, etc. I don't need any complex formatting of data items, they're just strings. I do need it to auto-size columns.
Do other libraries or methods exist, or do I need to spend a few minutes writing my own?
I've read this question long time ago, and finished writing my own pretty-printer for tables: tabulate.
My use case is:
I want a one-liner most of the time
which is smart enough to figure the best formatting for me
and can output different plain-text formats
Given your example, grid is probably the most similar output format:
from tabulate import tabulate
print tabulate([["value1", "value2"], ["value3", "value4"]], ["column 1", "column 2"], tablefmt="grid")
+------------+------------+
| column 1 | column 2 |
+============+============+
| value1 | value2 |
+------------+------------+
| value3 | value4 |
+------------+------------+
Other supported formats are plain (no lines), simple (Pandoc simple tables), pipe (like tables in PHP Markdown Extra), orgtbl (like tables in Emacs' org-mode), rst (like simple tables in reStructuredText). grid and orgtbl are easily editable in Emacs.
Performance-wise, tabulate is slightly slower than asciitable, but much faster than PrettyTable and texttable.
P.S. I'm also a big fan of aligning numbers by a decimal column. So this is the default alignment for numbers if there are any (overridable).
Here's a quick and dirty little function I wrote for displaying the results from SQL queries I can only make over a SOAP API. It expects an input of a sequence of one or more namedtuples as table rows. If there's only one record, it prints it out differently.
It is handy for me and could be a starting point for you:
def pprinttable(rows):
if len(rows) > 1:
headers = rows[0]._fields
lens = []
for i in range(len(rows[0])):
lens.append(len(max([x[i] for x in rows] + [headers[i]],key=lambda x:len(str(x)))))
formats = []
hformats = []
for i in range(len(rows[0])):
if isinstance(rows[0][i], int):
formats.append("%%%dd" % lens[i])
else:
formats.append("%%-%ds" % lens[i])
hformats.append("%%-%ds" % lens[i])
pattern = " | ".join(formats)
hpattern = " | ".join(hformats)
separator = "-+-".join(['-' * n for n in lens])
print hpattern % tuple(headers)
print separator
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
for line in rows:
print pattern % tuple(_u(t) for t in line)
elif len(rows) == 1:
row = rows[0]
hwidth = len(max(row._fields,key=lambda x: len(x)))
for i in range(len(row)):
print "%*s = %s" % (hwidth,row._fields[i],row[i])
Sample output:
pkid | fkn | npi
-------------------------------------+--------------------------------------+----
405fd665-0a2f-4f69-7320-be01201752ec | 8c9949b9-552e-e448-64e2-74292834c73e | 0
5b517507-2a42-ad2e-98dc-8c9ac6152afa | f972bee7-f5a4-8532-c4e5-2e82897b10f6 | 0
2f960dfc-b67a-26be-d1b3-9b105535e0a8 | ec3e1058-8840-c9f2-3b25-2488f8b3a8af | 1
c71b28a3-5299-7f4d-f27a-7ad8aeadafe0 | 72d25703-4735-310b-2e06-ff76af1e45ed | 0
3b0a5021-a52b-9ba0-1439-d5aafcf348e7 | d81bb78a-d984-e957-034d-87434acb4e97 | 1
96c36bb7-c4f4-2787-ada8-4aadc17d1123 | c171fe85-33e2-6481-0791-2922267e8777 | 1
95d0f85f-71da-bb9a-2d80-fe27f7c02fe2 | 226f964c-028d-d6de-bf6c-688d2908c5ae | 1
132aa774-42e5-3d3f-498b-50b44a89d401 | 44e31f89-d089-8afc-f4b1-ada051c01474 | 1
ff91641a-5802-be02-bece-79bca993fdbc | 33d8294a-053d-6ab4-94d4-890b47fcf70d | 1
f3196e15-5b61-e92d-e717-f00ed93fe8ae | 62fa4566-5ca2-4a36-f872-4d00f7abadcf | 1
Example
>>> from collections import namedtuple
>>> Row = namedtuple('Row',['first','second','third'])
>>> data = Row(1,2,3)
>>> data
Row(first=1, second=2, third=3)
>>> pprinttable([data])
first = 1
second = 2
third = 3
>>> pprinttable([data,data])
first | second | third
------+--------+------
1 | 2 | 3
1 | 2 | 3
For some reason when I included 'docutils' in my google searches I stumbled across texttable, which seems to be what I'm looking for.
I too wrote my own solution to this. I tried to keep it simple.
https://github.com/Robpol86/terminaltables
from terminaltables import AsciiTable
table_data = [
['Heading1', 'Heading2'],
['row1 column1', 'row1 column2'],
['row2 column1', 'row2 column2']
]
table = AsciiTable(table_data)
print table.table
+--------------+--------------+
| Heading1 | Heading2 |
+--------------+--------------+
| row1 column1 | row1 column2 |
| row2 column1 | row2 column2 |
+--------------+--------------+
table.inner_heading_row_border = False
print table.table
+--------------+--------------+
| Heading1 | Heading2 |
| row1 column1 | row1 column2 |
| row2 column1 | row2 column2 |
+--------------+--------------+
table.inner_row_border = True
table.justify_columns[1] = 'right'
table.table_data[1][1] += '\nnewline'
print table.table
+--------------+--------------+
| Heading1 | Heading2 |
+--------------+--------------+
| row1 column1 | row1 column2 |
| | newline |
+--------------+--------------+
| row2 column1 | row2 column2 |
+--------------+--------------+
I just released termtables for this purpose. For example, this
import termtables as tt
tt.print(
[[1, 2, 3], [613.23236243236, 613.23236243236, 613.23236243236]],
header=["a", "bb", "ccc"],
style=tt.styles.ascii_thin_double,
padding=(0, 1),
alignment="lcr"
)
gets you
+-----------------+-----------------+-----------------+
| a | bb | ccc |
+=================+=================+=================+
| 1 | 2 | 3 |
+-----------------+-----------------+-----------------+
| 613.23236243236 | 613.23236243236 | 613.23236243236 |
+-----------------+-----------------+-----------------+
By default, the table is rendered with Unicode box-drawing characters,
┌─────────────────┬─────────────────┬─────────────────┐
│ a │ bb │ ccc │
╞═════════════════╪═════════════════╪═════════════════╡
│ 1 │ 2 │ 3 │
├─────────────────┼─────────────────┼─────────────────┤
│ 613.23236243236 │ 613.23236243236 │ 613.23236243236 │
└─────────────────┴─────────────────┴─────────────────┘
termtables are very configurable; check out the tests for more examples.
If you want a table with column and row spans, then try my library dashtable
from dashtable import data2rst
table = [
["Header 1", "Header 2", "Header3", "Header 4"],
["row 1", "column 2", "column 3", "column 4"],
["row 2", "Cells span columns.", "", ""],
["row 3", "Cells\nspan rows.", "- Cells\n- contain\n- blocks", ""],
["row 4", "", "", ""]
]
# [Row, Column] pairs of merged cells
span0 = ([2, 1], [2, 2], [2, 3])
span1 = ([3, 1], [4, 1])
span2 = ([3, 3], [3, 2], [4, 2], [4, 3])
my_spans = [span0, span1, span2]
print(data2rst(table, spans=my_spans, use_headers=True))
Which outputs:
+----------+------------+----------+----------+
| Header 1 | Header 2 | Header3 | Header 4 |
+==========+============+==========+==========+
| row 1 | column 2 | column 3 | column 4 |
+----------+------------+----------+----------+
| row 2 | Cells span columns. |
+----------+----------------------------------+
| row 3 | Cells | - Cells |
+----------+ span rows. | - contain |
| row 4 | | - blocks |
+----------+------------+---------------------+
You can try BeautifulTable. It does what you want to do. Here's an example from it's documentation
>>> from beautifultable import BeautifulTable
>>> table = BeautifulTable()
>>> table.columns.header = ["name", "rank", "gender"]
>>> table.rows.append(["Jacob", 1, "boy"])
>>> table.rows.append(["Isabella", 1, "girl"])
>>> table.rows.append(["Ethan", 2, "boy"])
>>> table.rows.append(["Sophia", 2, "girl"])
>>> table.rows.append(["Michael", 3, "boy"])
>>> print(table)
+----------+------+--------+
| name | rank | gender |
+----------+------+--------+
| Jacob | 1 | boy |
+----------+------+--------+
| Isabella | 1 | girl |
+----------+------+--------+
| Ethan | 2 | boy |
+----------+------+--------+
| Sophia | 2 | girl |
+----------+------+--------+
| Michael | 3 | boy |
+----------+------+--------+
Version using w3m designed to handle the types MattH's version accepts:
import subprocess
import tempfile
import html
def pprinttable(rows):
esc = lambda x: html.escape(str(x))
sour = "<table border=1>"
if len(rows) == 1:
for i in range(len(rows[0]._fields)):
sour += "<tr><th>%s<td>%s" % (esc(rows[0]._fields[i]), esc(rows[0][i]))
else:
sour += "<tr>" + "".join(["<th>%s" % esc(x) for x in rows[0]._fields])
sour += "".join(["<tr>%s" % "".join(["<td>%s" % esc(y) for y in x]) for x in rows])
with tempfile.NamedTemporaryFile(suffix=".html") as f:
f.write(sour.encode("utf-8"))
f.flush()
print(
subprocess
.Popen(["w3m","-dump",f.name], stdout=subprocess.PIPE)
.communicate()[0].decode("utf-8").strip()
)
from collections import namedtuple
Row = namedtuple('Row',['first','second','third'])
data1 = Row(1,2,3)
data2 = Row(4,5,6)
pprinttable([data1])
pprinttable([data1,data2])
results in:
┌───────┬─┐
│ first │1│
├───────┼─┤
│second │2│
├───────┼─┤
│ third │3│
└───────┴─┘
┌─────┬───────┬─────┐
│first│second │third│
├─────┼───────┼─────┤
│1 │2 │3 │
├─────┼───────┼─────┤
│4 │5 │6 │
└─────┴───────┴─────┘
I know it the question is a bit old but here's my attempt at this:
https://gist.github.com/lonetwin/4721748
It is a bit more readable IMHO (although it doesn't differentiate between single / multiple rows like #MattH's solutions does, nor does it use NamedTuples).
I use this small utility function.
def get_pretty_table(iterable, header):
max_len = [len(x) for x in header]
for row in iterable:
row = [row] if type(row) not in (list, tuple) else row
for index, col in enumerate(row):
if max_len[index] < len(str(col)):
max_len[index] = len(str(col))
output = '-' * (sum(max_len) + 1) + '\n'
output += '|' + ''.join([h + ' ' * (l - len(h)) + '|' for h, l in zip(header, max_len)]) + '\n'
output += '-' * (sum(max_len) + 1) + '\n'
for row in iterable:
row = [row] if type(row) not in (list, tuple) else row
output += '|' + ''.join([str(c) + ' ' * (l - len(str(c))) + '|' for c, l in zip(row, max_len)]) + '\n'
output += '-' * (sum(max_len) + 1) + '\n'
return output
print get_pretty_table([[1, 2], [3, 4]], ['header 1', 'header 2'])
output
-----------------
|header 1|header 2|
-----------------
|1 |2 |
|3 |4 |
-----------------
from sys import stderr, stdout
def create_table(table: dict, full_row: bool = False) -> None:
min_len = len(min((v for v in table.values()), key=lambda q: len(q)))
max_len = len(max((v for v in table.values()), key=lambda q: len(q)))
if min_len < max_len:
stderr.write("Table is out of shape, please make sure all columns have the same length.")
stderr.flush()
return
additional_spacing = 1
heading_separator = '| '
horizontal_split = '| '
rc_separator = ''
key_list = list(table.keys())
rc_len_values = []
for key in key_list:
rc_len = len(max((v for v in table[key]), key=lambda q: len(str(q))))
rc_len_values += ([rc_len, [key]] for n in range(len(table[key])))
heading_line = (key + (" " * (rc_len + (additional_spacing + 1)))) + heading_separator
stdout.write(heading_line)
rc_separator += ("-" * (len(key) + (rc_len + (additional_spacing + 1)))) + '+-'
if key is key_list[-1]:
stdout.flush()
stdout.write('\n' + rc_separator + '\n')
value_list = [v for vl in table.values() for v in vl]
aligned_data_offset = max_len
row_count = len(key_list)
next_idx = 0
newline_indicator = 0
iterations = 0
for n in range(len(value_list)):
key = rc_len_values[next_idx][1][0]
rc_len = rc_len_values[next_idx][0]
line = ('{:{}} ' + " " * len(key)).format(value_list[next_idx], str(rc_len + additional_spacing)) + horizontal_split
if next_idx >= (len(value_list) - aligned_data_offset):
next_idx = iterations + 1
iterations += 1
else:
next_idx += aligned_data_offset
if newline_indicator >= row_count:
if full_row:
stdout.flush()
stdout.write('\n' + rc_separator + '\n')
else:
stdout.flush()
stdout.write('\n')
newline_indicator = 0
stdout.write(line)
newline_indicator += 1
stdout.write('\n' + rc_separator + '\n')
stdout.flush()
Example:
table = {
"uid": ["0", "1", "2", "3"],
"name": ["Jon", "Doe", "Lemma", "Hemma"]
}
create_table(table)
Output:
uid | name |
------+------------+-
0 | Jon |
1 | Doe |
2 | Lemma |
3 | Hemma |
------+------------+-
Here's my solution:
def make_table(columns, data):
"""Create an ASCII table and return it as a string.
Pass a list of strings to use as columns in the table and a list of
dicts. The strings in 'columns' will be used as the keys to the dicts in
'data.'
Not all column values have to be present in each data dict.
>>> print(make_table(["a", "b"], [{"a": "1", "b": "test"}]))
| a | b |
|----------|
| 1 | test |
"""
# Calculate how wide each cell needs to be
cell_widths = {}
for c in columns:
values = [str(d.get(c, "")) for d in data]
cell_widths[c] = len(max(values + [c]))
# Used for formatting rows of data
row_template = "|" + " {} |" * len(columns)
# CONSTRUCT THE TABLE
# The top row with the column titles
justified_column_heads = [c.ljust(cell_widths[c]) for c in columns]
header = row_template.format(*justified_column_heads)
# The second row contains separators
sep = "|" + "-" * (len(header) - 2) + "|"
# Rows of data
rows = []
for d in data:
fields = [str(d.get(c, "")).ljust(cell_widths[c]) for c in columns]
row = row_template.format(*fields)
rows.append(row)
return "\n".join([header, sep] + rows)
This can be done with only builtin modules fairly compactly using list and string comprehensions. Accepts a list of dictionaries all of the same format...
def tableit(dictlist):
lengths = [ max(map(lambda x:len(x.get(k)), dictlist) + [len(k)]) for k in dictlist[0].keys() ]
lenstr = " | ".join("{:<%s}" % m for m in lengths)
lenstr += "\n"
outmsg = lenstr.format(*dictlist[0].keys())
outmsg += "-" * (sum(lengths) + 3*len(lengths))
outmsg += "\n"
outmsg += "".join(
lenstr.format(*v) for v in [ item.values() for item in dictlist ]
)
return outmsg

Categories