below is c.txt
CO11 CSE C1 8
CO12 ETC C1 8
CO13 Electrical C2 12
CO14 Mech E 5
my program needs to print a course summary on screen and save that summary into a file
named cr.txt. Given the above c.txt, your program output should look like
below. The content of course_report.txt should also be the same, except the last line. Course
names in the second column use * to indicate a compulsory course and – to indicate an elective
course. The fourth column is the number of students enrolled in that course. The fifth column is the average score of the course.
CID Name Points. Enrollment. Average.
----------------------------------
CO11 * CSE 8 2 81
CO12 * ETC 8 10 71
CO13 * Electrical 12 8 61
CO14 - Mech 5 4 51
----------------------------------
poor-performing subject is CO14 with an average 51.
cr.txt generated!
below is what I've tried:
def read(self):
ctype = []
fi = open("c.txt", "r")
l = fi.readline()
while l != "":
fields = l.strip().split(" ")
self.c.append(fields)
l = fi.readline().strip()
f.close()
# print(f"{'CID'}{'Name':>20}{'Points.':>16}{'Enrollment.':>18}{'Average.':>10}")
# print("-" * 67, end="")
print()
for i in range(0, len(self.c)):
for j in range(len(self.c[i])):
obj = self.c[i][j]
print(obj.ljust(18), end="")
print()
print("-" * 67, end="")
print()
you can try use 'file.read' or 'file.readlines' after use 'open' function, if you choose 'file.readlines' you'll have to use 'for row in file.readlines()' look my example with 'file.read':
headers = ['CID', 'Name', 'Points.', 'Enrollment.', 'Average.']
compulsory_course = ['CO11', 'CO12', 'CO13']
elective_course = ['CO14']
count = 0
with open('c.txt', 'r') as file_c:
file_c.seek(0, 0)
file_string = file_c.read().replace('\n', ' ')
fields = file_string.split(' ')
with open('cr.txt', 'w') as file_cr:
for field in headers:
file_cr.write(f'{field} ')
file_cr.write('\n')
for v in fields:
if count == 4:
file_cr.write('\n')
count = 0
count += 1
if v in compulsory_course:
file_cr.write(f'{v} * ')
continue
elif v in elective_course:
file_cr.write(f'{v} - ')
continue
elif count == 3:
file_cr.write(f' ')
continue
file_cr.write(f'{v} ')
I am trying to create a program which selects specific information from a bulk paste, extract the relevant information and then proceed to paste said information into lines.
Here is some example data;
1. Track1 03:01
VOC:PersonA
LYR:LyrcistA
COM:ComposerA
ARR:ArrangerA
ARR:ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR:LyrcistA
LYR:LyrcistC
COM:ComposerA
ARR:ArrangerA
I would like to have the output where the relevant data for the Track1 is grouped together in a single line, with semicolon joining identical information and " - " seperating between others.
LyrcistA - ComposerA - ArrangerA; ArrangerB
LyrcistA; LyrcistC - ComposerA - ArrangerA
I have not gotten very far despite my best efforts
while True:
YodobashiData = input("")
SplitData = YodobashiData.splitlines();
returns the following
['1. Track1 03:01']
['VOC:PersonA ']
['LYR:LyrcistA']
['COM:ComposerA']
['ARR:ArrangerA']
['ARR:ArrangerB']
[]
['2. Track2 04:18']
['VOC:PersonB']
['VOC:PersonC']
['LYR:LyrcistA']
['LYR:LyrcistC']
['COM:ComposerA']
['ARR:ArrangerA']
Whilst I have all the data now in separate lists, I have no idea how to identify and extract the information from the list I need from the ones I do not.
Also, it seems I need to have the while loop or else it will only return the first list and nothing else.
Here's a script that doesn't use regular expressions.
It assumes that header lines, and only the header lines, will always start with a digit, and that the overall structure of header line then credit lines is consistent. Empty lines are ignored.
Extraction and formatting of the track data are handled separately, so it's easier to change formats, or use the extracted data in other ways.
import collections
import unicodedata
data_from_question = """\
1. Track1 03:01
VOC:PersonA
LYR:LyrcistA
COM:ComposerA
ARR:ArrangerA
ARR:ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR:LyrcistA
LYR:LyrcistC
COM:ComposerA
ARR:ArrangerA
"""
def prepare_data(data):
# The "colons" in the credits lines are actually
# "full width colons". Replace them (and other such characters)
# with their normal width equivalents.
# If full normalisation is undesirable then we could return
# data.replace('\N{FULLWIDTH COLON}', ':')
return unicodedata.normalize('NFKC', data)
def is_new_track(line):
return line[0].isdigit()
def parse_track_header(line):
id_, title, duration = line.split()
return {'id': id_.rstrip('.'), 'title': title, 'duration': duration}
def get_credit(line):
credit, _, name = line.partition(':')
return credit.strip(), name.strip()
def format_track_heading(track):
return 'id: {id} title: {title} length: {duration}'.format(**track)
def format_credits(track):
order = ['ARR', 'COM', 'LYR', 'VOC']
parts = ['; '.join(track[k]) for k in order]
return ' - '.join(parts)
def get_data():
# The data is expected to be a multiline string.
return data_from_question
def parse_data(data):
track = None
for line in filter(None, data.splitlines()):
if is_new_track(line):
if track:
yield track
track = collections.defaultdict(list)
header_data = parse_track_header(line)
track.update(header_data)
else:
role, name = get_credit(line)
track[role].append(name)
yield track
def report(tracks):
for track in tracks:
print(format_track_heading(track))
print(format_credits(track))
print()
def main():
data = get_data()
prepared_data = prepare_data(data)
tracks = parse_data(prepared_data)
report(tracks)
main()
Output:
id: 1 title: Track1 length: 03:01
ArrangerA; ArrangerB - ComposerA - LyrcistA - PersonA
id: 2 title: Track2 length: 04:18
ArrangerA - ComposerA - LyrcistA; LyrcistC - PersonB; PersonC
Here's another take on an answer to your question:
data = """
1. Track1 03:01
VOC:PersonA
LYR:LyrcistA
COM:ComposerA
ARR:ArrangerA
ARR:ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR:LyrcistA
LYR:LyrcistC
COM:ComposerA
ARR:ArrangerA"""
import re
import collections
# Regular expression to pull apart the headline of each entry
headlinePattern = re.compile(r"(\d+)\.\s+(.*?)\s+(\d\d:\d\d)")
def main():
# break the data into lines
lines = data.strip().split("\n")
# while we have more lines...
while lines:
# The next line should be a title line
line = lines.pop(0)
m = headlinePattern.match(line)
if not m:
raise Exception("Unexpected data format")
id = m.group(1)
title = m.group(2)
length = m.group(3)
people = collections.defaultdict(list)
# Now read person lines until we hit a blank line or the end of the list
while lines:
line = lines.pop(0)
if not line:
break
# Break the line into label and name
label, name = re.split(r"\W+", line, 1)
# Add this entry to a map of lists, where the map's keys are the label and the
# map's values are all the people who had that label
people[label].append(name)
# Now we have everything for one entry in the data. Print everything we got.
print("id:", id, "title:", title, "length:", length)
print(" - ".join(["; ".join(person) for person in people.values()]))
# go on to the next entry...
main()
Result:
id: 1 title: Track1 length: 03:01
PersonA - LyrcistA - ComposerA - ArrangerA; ArrangerB
id: 2 title: Track2 length: 04:18
PersonB; PersonC - LyrcistA; LyrcistC - ComposerA - ArrangerA
You can just comment out the line that prints the headline info if you really just want the line with all of the people on it. Just replace the built in data with data = input("") if you want to read the data from a user prompt.
Assuming your data is in the format you specified in a file called tracks.txt, the following code should work:
import re
with open('tracks.txt') as fp:
tracklines = fp.read().splitlines()
def split_tracks(lines):
track = []
all_tracks = []
while True:
try:
if lines[0] != '':
track.append(lines.pop(0))
else:
all_tracks.append(track)
track = []
lines.pop(0)
except:
all_tracks.append(track)
return all_tracks
def gather_attrs(tracks):
track_attrs = []
for track in tracks:
attrs = {}
for line in track:
match = re.match('([A-Z]{3}):', line)
if match:
attr = line[:3]
val = line[4:].strip()
try:
attrs[attr].append(val)
except KeyError:
attrs[attr] = [val]
track_attrs.append(attrs)
return track_attrs
if __name__ == '__main__':
tracks = split_tracks(tracklines)
attrs = gather_attrs(tracks)
for track in attrs:
semicolons = map(lambda va: '; '.join(va), track.values())
hyphens = ' - '.join(semicolons)
print(hyphens)
The only thing you may have to change is the colon characters in your data - some of them are ASCII colons : and others are Unicode colons :, which will break the regex.
import re
list_ = data_.split('\n') # here data_ is your data
regObj = re.compile(rf'[A-Za-z]+(:|{chr(65306)})[A-Za-z]+')
l = []
pre = ''
for i in list_:
if regObj.findall(i):
if i[:3] != 'VOC':
if pre == i[:3]:
l.append('; ')
else:
l.append(' - ')
l.append(i[4:].strip())
else:
l.append(' => ')
pre = i[:3]
track_list = list(map(lambda item: item.strip(' - '), filter(lambda item: item, ''.join(l).split(' => '))))
print(track_list)
OUTPUT : list of result you want
['LyrcistA - ComposerA - ArrangerA; ArrangerB', 'LyrcistA; LyrcistC - ComposerA - ArrangerA']
So, I'm using Python with PyQt and I have a very strange problem. A string that prints OK at one point doesn't print OK after a few lines of code! Here's my code:
name = str(self.lineEdit.text().toUtf8())
self.let_change = Search()
name_no_ind = self.let_change.indentation(name)
print(name_no_ind)
name_cap = self.let_change.capital(name)
name_low = self.let_change.lower(name)
print(name_no_ind, name_cap, name_low)
col = self.combobox.currentIndex()
row = 0
for i in range(0, self.tableWidget.rowCount()):
try:
find_no_ind = self.let_change.indentation(self.tableWidget.item(row, col).text())
find_cap = self.let_change.capital(self.tableWidget.item(row, col).text())
find_lower = self.let_change.lower(self.tableWidget.item(row, col).text())
if name_no_ind or name_cap or name_low in find_no_ind or find_cap or find_lower:
self.tableWidget.setItemSelected(self.tableWidget.item(row, col), True)
print("Item found in %d, %d" % (row,col))
row += 1
except AttributeError:
row += 1
And here's what I get:
Αντωνης
('\xce\x91\xce\xbd\xcf\x84\xcf\x89\xce\xbd\xce\xb7\xcf\x82', '\xce\x91\xce\x9d\xce\xa4\xcf\x8e\xce\x9d\xce\x97\xce\xa3', '\xce\xb1\xce\xbd\xcf\x84\xcf\x8e\xce\xbd\xce\xb7\xcf\x82')
Item found in 0, 0
Isn't that strange? It prints OK and then it doesn't. Does anybody know what can I do?
P.S.: Here are the functions:
# -*- coding: utf-8 -*-
class Search():
#A function that removes indentations:
def indentation(self, name):
a = name
b = ["ά", "Ά", "ή", "Ή", "ώ", "Ώ", "έ", "Έ", "ύ", "Ύ", "ί", "Ί", "ό", "Ό"]
c = ['α', 'Α', 'η', 'Η', 'ω', 'Ω', 'ε', 'Ε', 'υ', 'Υ', 'ι', 'Ι', 'ο', 'Ο']
for i in b:
a = a.replace(i, c[b.index(i)])
return a
# A function that makes letters capital:
def capital(self, name):
a = name
greek_small = ["α", "β", "γ", "δ", "ε", "ζ", "η", "θ", "ι", "κ", "λ", "μ", "ν", "ξ", "ο", "π", "ρ", "σ", "τ", "υ", "φ", "χ", "ψ", "ω", "ς"]
greek_capital = ["Α", "Β", "Γ", "Δ", "Ε", "Ζ", "Η", "Θ", "Ι", "Κ", "Λ", "Μ", "Ν", "Ξ", "Ο", "Π", "Ρ", "Σ", "Τ", "Υ", "Φ", "Χ", "Ψ", "Ω", "Σ"]
for i in greek_small:
a = a.replace(i, greek_capital[greek_small.index(i)])
return a
#A function that makes letters lower:
def lower(self, name):
a = name
greek_small = ["α", "β", "γ", "δ", "ε", "ζ", "η", "θ", "ι", "κ", "λ", "μ", "ν", "ξ", "ο", "π", "ρ", "σ", "τ", "υ", "φ", "χ", "ψ", "ω", "ς"]
greek_capital = ["Α", "Β", "Γ", "Δ", "Ε", "Ζ", "Η", "Θ", "Ι", "Κ", "Λ", "Μ", "Ν", "Ξ", "Ο", "Π", "Ρ", "Σ", "Τ", "Υ", "Φ", "Χ", "Ψ", "Ω", "Σ"]
for i in greek_capital:
a = a.replace(i, greek_small[greek_capital.index(i)])
return a
Basically, it capitalizes or lowers Greek characters...
SOLUTION!!!:
Steve solved the initial problem and based on what he said, I came up with this that solves everything:
name = str(self.lineEdit.text().toUtf8())
self.let_change = Search()
name_no_ind = self.let_change.indentation(name)
name_cap = self.let_change.capital(name)
name_low = self.let_change.lower(name)
name_list = [name, name_no_ind, name_cap, name_low]
col = self.combobox.currentIndex()
row = 0
for i in range(0, self.tableWidget.rowCount()):
try:
item_ = str(self.tableWidget.item(row, col).text().toUtf8())
find_no_ind = self.let_change.indentation(item_)
find_cap = self.let_change.capital(item_)
find_lower = self.let_change.lower(item_)
item_list = [find_no_ind, find_cap, find_lower]
for x in name_list:
for y in item_list:
if x in y:
self.tableWidget.setItemSelected(self.tableWidget.item(row, col), True)
row += 1
except AttributeError:
row += 1
I would say that one r both of self.let_change.capital(name) or self.let_change.lower(name) is overwriting it by using the name of the input parameter or possibly changing the encoding. Since you have not posted the code for them I can not be sure.
Sorry, they are not the problem. The problem is that you are printing them differently:
>>> print(capital(name))
ΑΝΤΩΝΗΣ
>>> print(capital(name), name)
('\xce\x91\xce\x9d\xce\xa4\xce\xa9\xce\x9d\xce\x97\xce\xa3', '\xce\x91\xce\xbd\xcf\x84\xcf\x89\xce\xbd\xce\xb7\xcf\x82')
>>> print(capital(name))
ΑΝΤΩΝΗΣ
>>> print(name, name)
('\xce\x91\xce\xbd\xcf\x84\xcf\x89\xce\xbd\xce\xb7\xcf\x82', '\xce\x91\xce\xbd\xcf\x84\xcf\x89\xce\xbd\xce\xb7\xcf\x82')
>>> print(name,)
('\xce\x91\xce\xbd\xcf\x84\xcf\x89\xce\xbd\xce\xb7\xcf\x82',)
>>> print(name)
Αντωνης
>>> print("%s = %s" % (name, capital(name)))
Αντωνης = ΑΝΤΩΝΗΣ
>>>
So you either need separate print statements or the use of a format string.
I have a column of data (easily imported from Google Docs thanks to gspread) that I'd like to intelligently align. I ingest entries into a dictionary. Input can include email, twitter handle or a blog URL. For example:
mike.j#gmail.com
#mikej45
j.mike#world.eu
_http://tumblr.com/mikej45
Right now, the "dumb" version is:
def NomineeCount(spreadsheet):
worksheet = spreadsheet.sheet1
nominees = worksheet.col_values(6) # F = 6
unique_nominees = {}
for c in nominees:
pattern = re.compile(r'\s+')
c = re.sub(pattern, '', c)
if unique_nominees.has_key(c) == True: # If we already have the name
unique_nominees[c] += 1
else:
unique_nominees[c] = 1
# Print out the alphabetical list of nominees with leading vote count
for w in sorted(unique_nominees.keys()):
print string.rjust(str(unique_nominees[w]), 2)+ " " + w
return nominees
What's an efficient(-ish) way to add in some smarts during the if process?
You can try with defaultdict:
from collections import defaultdict
unique_nominees = defaultdict(lambda: 0)
unique_nominees[c] += 1
How can I do the following in Python:
I have a command output that outputs this:
Datexxxx
Clientxxx
Timexxx
Datexxxx
Client2xxx
Timexxx
Datexxxx
Client3xxx
Timexxx
And I want to work this in a dict like:
Client:(date,time), Client2:(date,time) ...
After reading the data into a string subject, you could do this:
import re
d = {}
for match in re.finditer(
"""(?mx)
^Date(.*)\r?\n
Client\d*(.*)\r?\n
Time(.*)""",
subject):
d[match.group(2)] = (match.group(1), match.group(2))
How about something like:
rows = {}
thisrow = []
for line in output.split('\n'):
if line[:4].lower() == 'date':
thisrow.append(line)
elif line[:6].lower() == 'client':
thisrow.append(line)
elif line[:4].lower() == 'time':
thisrow.append(line)
elif line.strip() == '':
rows[thisrow[1]] = (thisrow[0], thisrow[2])
thisrow = []
print rows
Assumes a trailing newline, no spaces before lines, etc.
What about using a dict with tuples?
Create a dictionary and add the entries:
dict = {}
dict['Client'] = ('date1','time1')
dict['Client2'] = ('date2','time2')
Accessing the entires:
dict['Client']
>>> ('date1','time1')