Python - Printing in line from dict - python

I'm a beginner at python and I'm experiencing difficulties printing nicely.
I made a program that stores names and prices in dictionary.
( e.g : {"PERSON_1":"50","PERSON_2":"75","PERSON_WITH_EXTREMELY_LONG_NAME":"80"}
Now the problem is that I want to be able to print the keys and their supposed values in a nice scheme.
I used the code:
for i in eter.eters:
print(i + "\t | \t" + str(eter.eters[i]))
with eter.eters being my dictionary.
The problem is that some names are a lot longer than others, so the tabs don't align.
As well as my header: "Names" | "Price" should be aligned with the information below.
I've already looked up some solutions, but I don't really understand the ones I found.
Desired outcome:
**********************************************************************
De mensen die blijven eten zijn:
**********************************************************************
Naam | bedrag
----------------------------------------------------------------------
PERSON 1 | 50
PERSON 2 | 75
PERSON WITH EXTREMELY LONG NAME | 80
**********************************************************************

try this:
given eter.eters is your dictionary
print('%-35s | %6s' % ('Names', 'Price')) # align to the left
for k in eter:
print('%-35s | %6s' % (k,eter[k]))
or
print("{0:<35}".format('Name')+'|'+"{0:>6}".format('Price'))
for k in eter:
print("{0:<35}".format(k)+'|'+"{0:>6}".format(eter.eters[k]))

You can try to get all of the names and find the maximum length of it. Then show every name with special padding instead of tabulator (\t). This code should explain:
>>> d={"Marius":"50","John":"75"}
>>> d
{'Marius': '50', 'John': '75'}
>>> for i in d:
... print(i)
...
Marius
John
>>> d = {"Marius":"50","John":"75"}
>>> m = 0
>>> for i in d:
... m = max(m, len(i))
...
>>> m
6 # now we know the place reserved for Name column should be 6 chars width
>>> for i in d:
... print( i + (m-len(i))*' ' , d[i]) # so add to the name space char that fit this 6 chars space
...
Marius 50
John 75

Related

Formatted strings, decimals and commas question

I have a .txt file that I read in and wish to create formatted strings using these values. Columns 3 and 4 need decimals and the last column needs a percent sign and 2 decimal places. The formatted string will say something like "The overall attendance at Bulls was 894659, average attendance was 21,820 and the capacity was 104.30%’
the shortened .txt file has these lines:
1 Bulls 894659 21820 104.3
2 Cavaliers 843042 20562 100
3 Mavericks 825901 20143 104.9
4 Raptors 812863 19825 100.1
5 NY_Knicks 812292 19812 100
So far my code looks like this and its mostly working, minus the commas and decimal places.
file_1 = open ('basketball.txt', 'r')
count = 0
list_1 = [ ]
for line in file_1:
count += 1
textline = line.strip()
items = textline.split()
list_1.append(items)
print('Number of teams: ', count)
for line in list_1:
print ('Line: ', line)
file_1.close()
for line in list_1: #iterate over the lines of the file and print the lines with formatted strings
a, b, c, d, e = line
print (f'The overall attendance at the {b} game was {c}, average attendance was {d}, and the capacity was {e}%.')
Any help with how to format the code to show the numbers with commas (21820 ->21,828) and last column with 2 decimals and a percent sign (104.3 -> 104.30%) is greatly appreciated.
You've got some options for how to tackle this.
Option 1: Using f strings (Python 3 only)
Since your provided code already uses f strings, this solution should work for you. For others reading here, this will only work if you are using Python 3.
You can do string formatting within f strings, signified by putting a colon : after the variable name within the curly brackets {}, after which you can use all of the usual python string formatting options.
Thus, you could just change one of your lines of code to get this done. Your print line would look like:
print(f'The overall attendance at the {b} game was {int(c):,}, average attendance was {int(d):,}, and the capacity was {float(e):.2f}%.')
The variables are getting interpreted as:
The {b} just prints the string b.
The {int(c):,} and {int(d):,} print the integer versions of c and d, respectively, with commas (indicated by the :,).
The {float(e):.2f} prints the float version of e with two decimal places (indicated by the :.2f).
Option 2: Using string.format()
For others here who are looking for a Python 2 friendly solution, you can change the print line to the following:
print("The overall attendance at the {} game was {:,}, average attendance was {:,}, and the capacity was {:.2f}%.".format(b, int(c), int(d), float(e)))
Note that both options use the same formatting syntax, just the f string option has the benefit of having you write your variable name right where it will appear in the resulting printed string.
This is how I ended up doing it, very similar to the response from Bibit.
file_1 = open ('something.txt', 'r')
count = 0
list_1 = [ ]
for line in file_1:
count += 1
textline = line.strip()
items = textline.split()
items[2] = int(items[2])
items[3] = int(items[3])
items[4] = float(items[4])
list_1.append(items)
print('Number of teams/rows: ', count)
for line in list_1:
print ('Line: ', line)
file_1.close()
for line in list_1:
print ('The overall attendance at the {:s} games was {:,}, average attendance was {:,}, and the capacity was {:.2f}%.'.format(line[1], line[2], line[3], line[4]))

How to replace() a specific value within a list item directly after a keyword [duplicate]

This question already has answers here:
replace characters not working in python [duplicate]
(3 answers)
Closed 2 years ago.
Currently have a standard txt-style file that I'm trying to open, copy, and change a specific value within. However, a standard replace() fn isn't producing any difference. Here's what the 14th line of the file looks like:
' Bursts: 1 BF: 50 OF: 1 On: 2 Off: 8'
Here's the current code I have:
conf_file = 'configs/m-b1-of1.conf'
read_conf = open(conf_file, 'r')
conf_txt = read_conf.readlines()
conf_txt[14].replace(conf_txt[14][13], '6')
v_conf
Afterwards, though, no changes have been applied to the specific value I'm referencing (in this case, the first '1' in the 14th line above.
Any help would be appreciated - thanks!
There are few things here I think:
I copied your string and the first 1 is actually character 12
replace result needs to be assigned back to something (it gives you a new string)
replace, will replace all "1"s with "6"s!
Example:
>>> a = ' Bursts: 1 BF: 50 OF: 1 On: 2 Off: 8'
>>> a = a.replace(a[12], '6')
>>> a
' Bursts: 6 BF: 50 OF: 6 On: 2 Off: 8'
If you only want to replace the first instance (or N instances) of that character you need to let replace() know:
>>> a = ' Bursts: 1 BF: 50 OF: 1 On: 2 Off: 8'
>>> a = a.replace(a[12], '6', 1)
>>> a
' Bursts: 6 BF: 50 OF: 1 On: 2 Off: 8'
Note that above only "Bursts" are replaced and not "OF"
try this
conf_file = 'configs/m-b1-of1.conf'
read_conf = open(conf_file, 'r')
conf_txt = read_conf.readlines()
conf_txt[14] = conf_txt[14].replace(conf_txt[14][13], '6')
the replace function does not edit the actual string, it just returns the replaced value, so you have to redefine the string in the array

regex extract data from raw text

I work in hotel. here is raw file from rapports i have.I need to extract data in order to have something like data['roomNumber']=('paxNumber',isbb,)
Here is a sample that concern only 2 room, the 10 and 12 so the data i need should be BreakfastData = {'10':['2','BB'],'12':['1','BB']}
1)roomNumber : 'start and ends with number' or 'start with number and strictly one or more space followd by string'
2)paxNumber are the two numbers just before the 'VA' string
3)isbb is defined by the 'BB' or 'HPDJ' occurrence which can be find between two '/'. But sometimes the format is not good so it can be '/HPDJ/' or '/ HPDJ /' or '/ HPDJ/' etc
10 PxxxxD,David,Mme, Mr T- EXPEDIA TRAVEL
08.05.17 12.05.17 TP
SUP DBL / HPDJ / DEBIT CB AGENCE - NR
2 0 VA
NR
12
LxxxxSH,Claudia,Mrs
08.05.17 19.05.17 TP
1 0 VA
NR BB
SUP SGL / BB / EN ATTENTE DE VIREMENT- EVITER LA 66 -
.... etc
edit :latest
import re
data = {}
pax=''
r = re.compile(r"(\d+)\W*(\d+)\W*VA")
r2 = re.compile(r"/\s*(BB|HPDJ)\s*/")
r3 = re.compile(r"\d+\n")
r4 = re.compile(r"\d+\s+\w")
PATH = "/home/ryms/regextest"
with open(PATH, 'rb') as raw:
text=raw.read()
#roomNumber = re.search(r4, text).group()
#roomNumber2 = re.search(r3, text).group()
roomNumber = re.search(r4, text).group().split()[0]
roomNumber2 = re.search(r3, text).group().split()[0]
pax = re.findall(r, text)
adult = pax[0]; enfant = pax[1]
# if enfant is '0':
# pax=adult
# else:
# pax=(str(adult)+'+'+str(enfant))
bb = re.findall(r2, text) #On recherche BB ou HPDJ
data[roomNumber]=pax,bb
print(data)
print(roomNumber)
print(roomNumber2)
return
{'10': ([('2', '2'), ('1', '1')], ['HPDJ', 'BB'])}
10
12
[Finished in 0.1s]
How can i get the two roomNumber in my return?
I have lot of trouble with the \n issue and read(), readline(), readlines().what is the trick?
When i will have all raw data, how will i get the proper BreakfastData{}? will i use .zip()?
At the bigining i wanted to split the file and then parse it , but i try so may things, i get lost. And for that i need a regex that match both pattern.
On first case you want to select two numbers which are followed by 'VA' you can do like this
r = re.compile(r"(\d+)\W*(\d+)\W*VA")
In second case you can get HPDJ or BB like this
r = re.compile(r"/\s*(HPDJ|BB)\s*/")
this will handle all cases you mentioned >> /HPDJ/' or '/ HPDJ /' or '/ HPDJ/'
The regex expression to get the text before the VA is as follows:
r = re.compile(r"(.*) VA")
Then the "number" (which will be a string) will be stored in the first group of the search match object, once you run the search.
I am not quite sure what the room number even is, because your description is a bit unclear, so I cannot help with that unless you clarify.

How to edit .csv in python to proceed NLP

Hello i am not very familiar with programming and found Stackoverflow while researching my task. I want to do natural language processing on a .csv file that looks like this and has about 15.000 rows
ID | Title | Body
----------------------------------------
1 | Who is Jack? | Jack is a teacher...
2 | Who is Sam? | Sam is a dog....
3 | Who is Sarah?| Sarah is a doctor...
4 | Who is Amy? | Amy is a wrestler...
I want to read the .csv file and do some basic NLP operations and write the results back in a new or in the same file. After some research python and nltk seams to be the technologies i need. (i hope thats right). After tokenizing i want my .csv file to look like this
ID | Title | Body
-----------------------------------------------------------
1 | "Who" "is" "Jack" "?" | "Jack" "is" "a" "teacher"...
2 | "Who" "is" "Sam" "?" | "Sam" "is" "a" "dog"....
3 | "Who" "is" "Sarah" "?"| "Sarah" "is" "a" "doctor"...
4 | "Who" "is" "Amy" "?" | "Amy" "is" "a" "wrestler"...
What i have achieved after a day of research and putting pieces together looks like this
ID | Title | Body
----------------------------------------------------------
1 | "Who" "is" "Jack" "?" | "Jack" "is" "a" "teacher"...
2 | "Who" "is" "Sam" "?" | "Jack" "is" "a" "teacher"...
3 | "Who" "is" "Sarah" "?"| "Jack" "is" "a" "teacher"...
4 | "Who" "is" "Amy" "?" | "Jack" "is" "a" "teacher"...
My first idea was to read a specific cell in the .csv ,do an operation and write it back to the same cell. And than somehow do that automatically on all rows. Obviously i managed to read a cell and tokenize it. But i could not manage to write it back in that specific cell. And i am far away from "do that automatically to all rows". I would appreciate some help if possible.
My code:
import csv
from nltk.tokenize import word_tokenize
############Read CSV File######################
########## ID , Title, Body####################
line_number = 1 #line to read (need some kind of loop here)
column_number = 2 # column to read (need some kind of loop here)
with open('test10in.csv', 'rb') as f:
reader = csv.reader(f)
reader = list(reader)
text = reader[line_number][column_number]
stringtext = ''.join(text) #tokenizing just work on strings
tokenizedtext = (word_tokenize(stringtext))
print(tokenizedtext)
#############Write back in same cell in new CSV File######
with open('test11out.csv', 'wb') as g:
writer = csv.writer(g)
for row in reader:
row[2] = tokenizedtext
writer.writerow(row)
I hope i asked the question correctly and someone can help me out.
The pandas library will make all of this much easier.
pd.read_csv() will handle the input much more easily, and you can apply the same function to a column using pd.DataFrame.apply()
Here's a quick example of how the key parts you'll want work. In the .applymap() method, you can replace my lambda function with word_tokenize() to apply that across all elements instead.
In [58]: import pandas as pd
In [59]: pd.read_csv("test.csv")
Out[59]:
0 1
0 wrestler Amy dog is teacher dog dog is
1 is wrestler ? ? Sarah doctor teacher Jack
2 a ? Sam Sarah is dog Sam Sarah
3 Amy a a doctor Amy a Amy Jack
In [60]: df = pd.read_csv("test.csv")
In [61]: df.applymap(lambda x: x.split())
Out[61]:
0 1
0 [wrestler, Amy, dog, is] [teacher, dog, dog, is]
1 [is, wrestler, ?, ?] [Sarah, doctor, teacher, Jack]
2 [a, ?, Sam, Sarah] [is, dog, Sam, Sarah]
3 [Amy, a, a, doctor] [Amy, a, Amy, Jack]
Also see: http://pandas.pydata.org/pandas-docs/stable/basics.html#row-or-column-wise-function-application
You first need to parse your file and then process (tokenize, etc.) each field separately.
If our file really looks like your sample, I wouldn't call it a CSV. You could parse it with the csv module, which is specifically for reading all sorts of CSV files: Add delimiter="|" to the arguments of csv.reader(), to separate your rows into cells. (And don't open the file in binary mode.) But your file is easy enough to parse directly:
with open('test10in.csv', encoding="utf-8") as fp: # Or whatever encoding is right
content = fp.read()
lines = content.splitlines()
allrows = [ [ fld.strip() for fld in line.split("|") ] for line in lines ]
# Headers and data:
headers = allrows[0]
rows = allrows[2:]
You can then use nltk.word_tokenize() to tokenize each field of rows, and go on from there.

How to format string padding both sides to center it around particular point

I'm sorry, I feel like there is a better title for my question, but I can't think of it.
The situation, essentially, is this: I am creating a fixed-width information table. I have a list of (k,v) for k in a list of keys.
When done, the " : " should be centered in the line, and there should be a "|" on the far left and right sides.
My problem is, I have a few lists that are too long to fix into a single line. I either need to be able to have it so that once the list is x characters in, it will start a new line, and have the text indented to the same level, or I need to be able to have a number of values encoded such that they align with the same left padding (or what I'll have be the "tabbed" version of the content.
An example of what I have so far:
def make_information_file(t):
pairs=[("SERiES","Game of Thrones"),("SiZE","47,196,930,048 bytes"),(AUDiO TRACKS,['English: DTS-HD Master Audio 5.1', 'French: DTS 5.1', 'Spanish: DTS 5.1', 'Polish: DTS 2.0', 'Spanish: DTS 2.0'])]
general_keys=["COLLECTiON NAME","UPC","RETAiL RELEASE DATE","REGiON","STUDiO"]
video_keys=["ViDEO","SOURCE","RESOLUTiON","ASPECT RATiO"]
audio_keys=["AUDiO FORMATS"]
subtitle_keys=["SUBTiTLES"]
all_keys=general_keys+video_keys+audio_keys+subtitle_keys
longest_key=(sorted(all_keys,key=len) or [""])[-1]
longest_key_length=len(longest_key)
left_padding=longest_key_length+5
right_padding=106-left_padding
empty_left_padding=left_padding+3
empty_right_padding=106-left_padding-3
line_formatter=lambda p: "|{field:>{left_padding}} : {value:<{right_padding}}|".format(field=p[0],value=p[-1],left_padding=left_padding,right_padding=right_padding)
now, notice that depending on how long the longest key is, everything with be aligned such that the ":" is at a fixed point, with one space to either side of it and right text to the left right-aligned, and the stuff to the left left-aligned.
however, the "AUDiO TRACKS" list is too long to fit into one line. I can either have it automatically split if the word is going to push it over it's limit (my preference, I believe, at which point the second line (and any lines thereafter) will have to indent the text to keep in inline with the first line's text. The other option is to have it so that I have it so that every value is centered with empty_left_padding to the left, followed by the string value, followed by enough blank spaces such that the line's final length is the standard 111 characters long, with a "|" as the first and last characters
desired_output=""""
| SERiES : Game of Thrones |
| SiZE : 47,196,930,048 bytes |
| AUDiO FORMATS : English: DTS-HD Master Audio 5.1, French: DTS 5.1, Spanish: DTS 5.1, |
| Polish: DTS 2.0, Spanish: DTS 2.0 |
| UPC : 883929191505 |
| RETAiL RELEASE DATE : 03-06-2012 |
| REGiON : A, B, C |
| STUDiO : HBO Home Video |
| ViDEO : 1080p 1.78:1 |
| SOURCE : BD50 |
| RESOLUTiON : 1080p |
| ASPECT RATiO : 16:9 |"""
So, I can't figure out how to deal with the "AUDiO FORMATS" case above (I have the same issue with the list of subtitles available).
First:
store your informations in an ordered dictionary, to preserve its order, and also get the data organized into key-value pairs:
import collections
d = collections.OrderedDict([
('SERiES', 'Game of Thrones'),
('SiZE', '47,196,930,048 bytes'),
('AUDiO FORMATS', 'English: DTS-HD Master Audio 5.1, French: DTS 5.1, Spanish: DTS 5.1, Polish: DTS 2.0, Spanish: DTS 2.0'),
('UPC', '883929191505'),
('RETAiL RELEASE DATE', '03-06-2012'),
('REGiON', 'A, B, C'),
('STUDiO', 'HBO Home Video'),
('ViDEO', '1080p 1.78:1'),
('SOURCE', 'BD50'),
('RESOLUTiON', '1080p'),
('ASPECT RATiO', '16:9')
])
Second:
you will need a word-wrapper, to wrap only whole words at a given column length:
def word_wrap(string, width=80, indent=4, tab=True):
output = list()
for ln in string.replace('\t', ' ' * indent).split('\n'):
line = list()
for wd in ln.split(' '):
if len(' '.join(line) + ' ' + wd) > width:
output.append(' '.join(line))
line = list()
line.append(wd)
output.append(' '.join(line))
return [l.replace(' ' * indent, '\t') for l in output] if tab else output
Third:
you need a dictionary formatter, where you can set up the separator, the width of the whole text, the padding space before the texts and the spaces before and after the separator:
def format_dict(dictionary, sep=':', width=80, pad=2, pre=1, post=1):
max_len = max(len(k) for k in dictionary)
para_pad = '%s%s' % ('\n', (pad + max_len + pre + len(sep) + post)*' ')
separator = '%s%s%s' % (pre*' ', sep, post*' ')
output = list()
for key, value in dictionary.iteritems():
output.append(
'%s%s%s' % (
'%s%s' % ((max_len + pad - len(key))*' ', key),
separator,
para_pad.join(word_wrap(str(value), width - len(para_pad)))
)
)
return [''.join(l) for l in output]
Fourth:
decorate the left and the right side of the paragraph:
def decorate_para(string, deco='|', width=80):
output = list()
for line in string.split('\n'):
output.append('%s%s%s%s' % (deco, line, (width - len(line))*' ', deco))
return '\n'.join(output)
Fifth:
test the output by printing it:
w = 79
print decorate_para('\n'.join(format_dict(d, width=w - 2)), width=w)
And voilá! the output is:
| SERiES : Game of Thrones |
| SiZE : 47,196,930,048 bytes |
| AUDiO FORMATS : English: DTS-HD Master Audio 5.1, French: DTS 5.1, |
| Spanish: DTS 5.1, Polish: DTS 2.0, Spanish: DTS 2.0 |
| UPC : 883929191505 |
| RETAiL RELEASE DATE : 03-06-2012 |
| REGiON : A, B, C |
| STUDiO : HBO Home Video |
| ViDEO : 1080p 1.78:1 |
| SOURCE : BD50 |
| RESOLUTiON : 1080p |
| ASPECT RATiO : 16:9 |
Here's some hint that might help you:
Store your pairs as a dict. This is more intuitive than using a list and will come handy later on
pairs = dict([("SERiES","Game of Thrones"),("SiZE","47,196,930,048 bytes"),
('AUDiO TRACKS',['English: DTS-HD Master Audio 5.1', 'French: DTS 5.1',
'Spanish: DTS 5.1', 'Polish: DTS 2.0', 'Spanish: DTS 2.0'])])
To get the max length for the keys use the max built in method:
longest_key = max(pairs.keys(), key=len)
longest_key_length=len(longest_key)
To align a string to the right use the string function str.rjust
left_padding = longest_key_length + 5
for k in pairs.keys():
print '|' + (k + ' : ').rjust(left_padding)
to fill a string with spaces to a fixed with use the string function str.ljust
max_l = 105
max_right = max_l - left_padding
for k, v in pairs.items():
left = '|' + (k + ' : ').rjust(left_padding)
right = str(v) if len(str(v)) < max_right else ''
right = right.ljust(max_right) + '|'
print left + right
Finally you can use the textwrap library to wrap paragraphs or use a more advanced lib for this kind of stuff
You can also use other string methods like center, capitalize, title...
With this, you should be able to get everything going.

Categories