I’m struggling to write a Python script to process a file and produce an output text file containing the tickets in a format that is ready for printing via a dot matrix printer. For reference I have also attached an example of what the resultant text file should look like.
ConcertTickets.txt and
ConcertTickets_result.txt
My major problem is architecting an approach to this problem. I can’t figure out how to print column by column. I was able to read the file, print row by row, do the validation and write the file with a new name. I’m not sure how to do the layout_name, columns, column_width, column_spacing, left_margin, row spacing and line_item, the best I could do was ljust() for the left margin between the tickets.
I don’t expect someone to do the work for me, but would greatly appreciate tips on architectural approaches with and without third party packages.
The input concert ticket file consists of a header containing formatting information and a body containing the actual tickets.
The header lines are as follows:
download_datetime - the date and time the file was downloaded
order_datetime - the date and time the order for the tickets were placed
layout_name - the name of the layout used for formatting the tickets
columns - the number of columns of tickets per page width
column_width - the width of each ticket column
column_spacing - the number of spaces between ticket columns
left_margin - the leading space to the left of the first ticket column
row_spacing - the number of horizontal lines between tickets
line_item - the line items represent how the ticket elements must appear in the
ticket, e.g. the PIN at the top, followed by two empty lines, then the description, serial number and expiry date. Valid values for line items are: pin, description, serial_number, expiry_date and empty (space)
ticket_summary - Each ticket summary contains the ticket description followed by the number of ticket of that type in the file and the total face value of the ticket, e.g. "Gold 10.00,10,100.00" means there are 10 Gold $10.00 tickets to the value of $100.00 in the file
ticket_fields - the ticket fields indicate the fields and their order that are present in the ticket data that follows. This is the last line of the header and all data that follows this line should be interpreted as body data, i.e. the actual tickets in a CSV type format
The script also needs to do some basic file validation by checking that the number of actual tickets in the body of the file match the ticket summary values in the header of the file. If file validation fails the program must exit with an appropriate error message.
The resultant output file name must be the same as the input file name, but with the word "_result" appended to it just before the file extension. E.g. if the input file name is ConcertTickets.txt then the output file name must be ConcertTickets_result.txt
I also need to develop a set of test cases for the script.
This is my code thus far
data = []
data_description = []
data_pin = []
data_serial_number = []
data_expiry_date = []
tickets_in_body = 0
# read file from line 19 and create two-dimensional array
result_f = open('ConcertTickets.txt')
for each_line in result_f.readlines()[18:]:
(description, pin, serial_number, expiry_date) = each_line.split(',')
data_description.append(description)
data_pin.append(pin)
data_serial_number.append(serial_number)
data_expiry_date.append(expiry_date.replace("\r\n",""))
tickets_in_body += 1
data = [data_description, data_pin, data_serial_number, data_expiry_date]
# ticket validation and writing to file
result_golden_summary = open('ConcertTickets.txt')
golden_summary = result_golden_summary.readlines()
(golden_description, golden_summary_amount, golden_summary_value) = (golden_summary[15 - 1]).split(',')
if int(golden_summary_amount) != tickets_in_body:
print('The ticket summary in the header does not match the amount of tickets in body')
else:
(filename, extension) = (result_f.name).split('.')
result_f = open(filename + "_result.txt", 'w')
for row in data:
result_f.write("".join(str(item).ljust(25) for item in row))
result_f.close()
here's some code for you:
import math
result_f = open('ConcertTickets.txt')
all_lines_arr = []
for each_line in result_f.readlines()[18:]:
(description, pin, serial_number, expiry_date) = each_line.split(',')
line_dict = {}
line_dict["description"] = description
line_dict["pin"] = pin
line_dict["serial_number"] = serial_number
line_dict["expiry_date"] = expiry_date.strip()
all_lines_arr.append(line_dict)
per_row = 5
line_space = 30
rows = math.ceil(len(all_lines_arr)/per_row)
for i in range(0, rows):
row_val = (i*per_row)+per_row
if (row_val > len(all_lines_arr)):
row_val = row_val - (row_val-len(all_lines_arr))
for j in range((i*per_row), row_val):
print(all_lines_arr[j]["pin"] + (line_space-(len(all_lines_arr[j]["pin"]))%line_space)*" ", end="")
print("\n"*2)
for j in range((i*per_row), row_val):
print(all_lines_arr[j]["description"] + (line_space-(len(all_lines_arr[j]["description"]))%line_space)*" ", end="")
print()
for j in range((i*per_row), row_val):
print(all_lines_arr[j]["serial_number"] + (line_space-(len(all_lines_arr[j]["serial_number"]))%line_space)*" ", end="")
print()
for j in range((i*per_row), row_val):
print(all_lines_arr[j]["expiry_date"] + (line_space-(len(all_lines_arr[j]["expiry_date"]))%line_space)*" ", end="")
print("\n"*5)
First we read the lines, and put them into an array of dictionaries i.e. each array element is a dictionary, which has an addressable value e.g. description
Next, we use per_row to decide how many tickets to print per row (you can change this).
Then the code will print the dictionary values for each element in the array.
The key to the formatting is that it uses modulus % to print the correct number of spaces. I used 30 as the separator.
I stripped out a lot of your code in order to just do the print formatting for you. It will be up to you to modify this to print to file or do anything else you need it to.
It is a bit too hardcoded for my liking, but without knowing more about exactly what you need, it works for your simple case.
Hope this helps!
This is the recomended way to open (and close) a file:
# open file as read ('r')
with open('ConcertTickets.txt', 'r') as file:
for line in file.readlines()[18:]:
# your logic
# open result file as write ('w'), '+' creates the file if not exist
with open('ConcertTickets_result.txt', 'w+' as file:
# your logic
Related
My goal is to count occurrences of a substring in a list of strings
passwordlog.txt:
2022-09-12 03:22:18.170604, password < 6
2022-09-12 03:22:33.878446, password > 10
2022-09-12 03:22:40.686602, password > 10
My program:
# Create list from contents of log file
log_list = open("passwordlog.txt").read().splitlines()
# Print lines of list until end of list
for i in range(0, len(log_list)):
print(log_list[i])
# Output count of passwords below minimum and above maximum length
print("Amount of invalid passwords below minimum length:", sum("< 6" in i for i in log_list))
print("Amount of invalid passwords above maximum length:", sum("> 10" in i for i in log_list))
I understand the majority of this code, but the first and final lines are responsible for immense confusion in me. They produce the intended results, but I don't understand how they work, particularly the inclusion of ".read().splitlines()" and "in i for i in".
To be very clear I'm quite a beginner so please explain these parts of the code as explicitly as you can. Maybe there's also a more beginner-friendly method that could replace these first and last two lines that you could also please explain as explicitly as possible
Well there is many ways to change this code and make it more beginner friendly.
You can split the first line in multiple, easier, lines.
smallerSix = 0
greaterTen = 0
with open("passwordlog.txt") as log_file:
lines = log_file.read()
lines = lines.splitlines()
for item in lines:
if "< 6" in item:
smallerSix += 1
if "> 10" in item:
greaterTen += 1
print("Amount of invalid passwords below minimum length:", smallerSix)
print("Amount of invalid passwords above maximum length:", greaterTen)
with open(".txt") as x opens a file and creates a reference x. With that reference you can edit the file content or read the file content out.
lines = log_file.read() reads out the content of the log_file and copyies it's values into lines
lines = lines.splitlines() splits the String lines at every row which creates a list. Each "place" of the list contains a line (String)
Now you have a list that has every line of the log_file. Next step is to search each line for a "< 6" and a "> 10". We can do that by going through each element of the array and check the string for this specific string.
To do that we can loop through each element of the list: for item in lines.
Now we check each item for the specific string "< 6" and "> 10". If one of them is in item, then the variable that contributes to it is being increased by 1.
At the end you just print out the variable values.
I have been writing a program to get all the coordinates within a ten mile radius of any given point and when the distance data is printed its output is different from the data in the file. Not only this but it creates a blank line at the end of the file. What should I do?
import geopy.distance
distance_data = open("Distance from start.txt", "w")
distance_data.truncate()
distance_data_to_add = []
for i in range (360):
bearing = i
lat = 51.8983
long = 177.1822667
for i in range (10):
distance = i
new_lat_long =
geopy.distance.distance(miles=distance).destination((lat, long), bearing=bearing)
distance_data_to_add.append(new_lat_long)
for element in distance_data_to_add:
distance_data.write(str(element) + "\n")
print(distance_data_to_add)
An example line from the file is:
51 56m 30.0669s N, 177 10m 51.749s E
An example of the printed info in the console is:
Point(51.94168524994642, 177.1810413957121, 0.0)
The reason the objects look different in a list is because that is the repr version of them, not the str version. So write repr(element) to the file instead of str(element).
The reason there is a newline at the end of the file is because you write \n after every element.
Replace this:
for element in distance_data_to_add:
distance_data.write(str(element) + "\n")
with this:
distance_data.write('\n'.join(map(repr, distance_to_add)))
That will write the repr of each object, with a newline between each one (but not at the end).
And don't forget distance_data.close() after you finish writing your file.
So I have a file that looks like this
mass (GeV) spectrum (1-100 GeV)
10 0.06751019803888393
20 0.11048827045815585
30 0.1399367785958526
40 0.1628781532692572
I want to multiply the spectrum by half or any percentage, then create a new file with the same data, but the spectrum is replaced with the new spectrum multiplied by the multiplier
DM_file=input("Name of DM.in file: ") #name of file is DMmumu.in
print(DM_file)
n=float(input('Enter the percentage of annihilation: '))
N=n*100
pct=(1-n)
counter = 0
with open (DM_file,'r+') as f:
with open ('test.txt','w') as output:
lines=f.readlines()
print(type(lines))
Spectrumnew=[]
Spectrum=[]
for i in range(8,58):
single_line=lines[i].split("\t")
old_number = single_line[1]
new_number = float(single_line[1])*pct
Spectrumnew.append(new_number)
Spectrum.append(old_number)
f.replace(Spectrum,Spectrumnew)
output.write(str(new_number))
The problem I'm having is f.replace(Spectrum,Spectrumnew) is not working, and if I were to comment it out, a new file is created called test.txt with just Spectrumnew nothing else. What is wrong with f.replace, am I using the wrong string method?
replace is a function that works on strings. f is not a string. (For that matter, neither is Spectrum or Spectrumnew.)
You need to construct the line you want in the output file as a string and then write it out. You already have string output working. To construct the output line, you can just concatenate the first number from the input, a tab character, and the product of the second number and the multiplier. You can convert a number to a string with the str() function and you can concatenate strings with the + operator.
There are several more specific answers on this site already that may be helpful, such as replacing text in a file with Python.
I have a text file that contains users username, password and highest score, however I want to overwrite their high score when the achieve a high score. However I only want to overwrite that specific value and no others.
This is my text file (called 'users.txt') :
david 1234abc 34 hannah 5678defg 12 conor 4d3c2b1a 21
For example, if 'hannah' gets a new score of 15, I want to change 12 to 15
Here is what I've tried:
# splitting the file
file = open("users.txt","r")
read = file.read()
users = read.split()
file.close()
# finding indexs for username, password and score
usernamePosition1 = users.index(user)
passwordPosition1 = usernamePosition1 + 1
scorePosition1 = passwordPosition1 + 1
file = open("users.txt","a")
# setting previous high score to an integer
player1OldScore = int(users[scorePosition1])
if player1Score > player1OldScore:
# setting in back to a str for text file
player1ScoreStr = str(player1Score)
# from here on i dont really know what i was doing
users.insert([scorePosition1],player1ScoreStr)
file.write(users)
print(player2 + "\n \nAchieved a new high score")
else:
print("\n \n" + player1 + " , you didn't achieve a new high score")
Your text file format is rather brittle. If David uses "hannah" as a password, then when Hannah tries to update her score, instead of locating her score (the sixth field), it will find her name as the second field and try using the fourth field (her name) as her score! Anyone using a space in their password would also cause problems, although a sneaky person could use “abcd 1000000” as their initial password and seed their initial score as one million.
These problems can be fixed by:
Using 1 line per user, or
Searching for user names only in the first of every 3 fields
And
disallowing spaces in passwords, or
coding/encrypting the passwords
In any case, you must read in and store the existing data, and then write out the entire dataset to the file. The reason is the data is not stored in fixed-width fields. A score changing from 99 to 100 will require moving all subsequent characters of the file one character forward, which is not a modification you can make to the file without actually reading and rewriting it in it’s entirety.
You are going to need to find and replace the strings. This means you will need to format the users.txt file in a way that you are able to simply replace the user data. If you have each user and their data on a seperate line this should be fairly easy:
import string
s = open("users.txt","r+")
for line in s.readlines():
print line
string.replace(line, 'hannah 5678defg 12','hannah gfed8765 21')
print line
s.close()
You have the right idea (note that I your code will only work for 1 user, but I'll let you figure out how to extend it), but there is no way to change the file without writing the entire file.
As such I recommend something like this:
...
file = open("users.txt","w") # change this from 'a' to 'w' to overwrite
player1OldScore = int(users[scorePosition1])
if player1Score > player1OldScore:
users[scorePosition1] = str(player1Score) # change the score
file.write(" ".join(users)) # write a string with spaces between elements
print(player2 + "\n \nAchieved a new high score")
...
I have two files A and B in FASTQ format, which are basically several hundred million lines of text organized in groups of 4 lines starting with an # as follows:
#120412_SN549_0058_BD0UMKACXX:5:1101:1156:2031#0/1
GCCAATGGCATGGTTTCATGGATGTTAGCAGAAGACATGAGACTTCTGGGACAGGAGCAAAACACTTCATGATGGCAAAAGATCGGAAGAGCACACGTCTGAACTCN
+120412_SN549_0058_BD0UMKACXX:5:1101:1156:2031#0/1
bbbeee_[_ccdccegeeghhiiehghifhfhhhiiihhfhghigbeffeefddd]aegggdffhfhhihbghhdfffgdb^beeabcccabbcb`ccacacbbccB
I need to compare the
5:1101:1156:2031#0/
part between files A and B and write the groups of 4 lines in file B that matched to a new file. I got a piece of code in python that does that, but only works for small files as it parses through the entire #-lines of file B for every #-line in file A, and both files contain hundreds of millions of lines.
Someone suggested that I should create an index for file B; I have googled around without success and would be very grateful if someone could point out how to do this or let me know of a tutorial so I can learn. Thanks.
==EDIT==
In theory each group of 4 lines should only exist once in each file. Would it increase the speed enough if breaking the parsing after each match or do I need a different algorithm altogether?
An index is just a shortened version of the information you are working with. In this case, you will want the "key" - the text between the first colon(':') on the #-line and the final slash('/') near the end - as well as some kind of value.
Since the "value" in this case is the entire contents of the 4-line block, and since our index is going to store a separate entry for each block, we would be storing the entire file in memory if we used the actual value in the index.
Instead, let's use the file position of the beginning of the 4-line block. That way, you can move to that file position, print 4 lines, and stop. Total cost is the 4 or 8 or however many bytes it takes to store an integer file position, instead of however-many bytes of actual genome data.
Here is some code that does the job, but also does a lot of validation and checking. You might want to throw stuff away that you don't use.
import sys
def build_index(path):
index = {}
for key, pos, data in parse_fastq(path):
if key not in index:
# Don't overwrite duplicates- use first occurrence.
index[key] = pos
return index
def error(s):
sys.stderr.write(s + "\n")
def extract_key(s):
# This much is fairly constant:
assert(s.startswith('#'))
(machine_name, rest) = s.split(':', 1)
# Per wikipedia, this changes in different variants of FASTQ format:
(key, rest) = rest.split('/', 1)
return key
def parse_fastq(path):
"""
Parse the 4-line FASTQ groups in path.
Validate the contents, somewhat.
"""
f = open(path)
i = 0
# Note: iterating a file is incompatible with fh.tell(). Fake it.
pos = offset = 0
for line in f:
offset += len(line)
lx = i % 4
i += 1
if lx == 0: # #machine: key
key = extract_key(line)
len1 = len2 = 0
data = [ line ]
elif lx == 1:
data.append(line)
len1 = len(line)
elif lx == 2: # +machine: key or something
assert(line.startswith('+'))
data.append(line)
else: # lx == 3 : quality data
data.append(line)
len2 = len(line)
if len2 != len1:
error("Data length mismatch at line "
+ str(i-2)
+ " (len: " + str(len1) + ") and line "
+ str(i)
+ " (len: " + str(len2) + ")\n")
#print "Yielding #%i: %s" % (pos, key)
yield key, pos, data
pos = offset
if i % 4 != 0:
error("EOF encountered in mid-record at line " + str(i));
def match_records(path, index):
results = []
for key, pos, d in parse_fastq(path):
if key in index:
# found a match!
results.append(key)
return results
def write_matches(inpath, matches, outpath):
rf = open(inpath)
wf = open(outpath, 'w')
for m in matches:
rf.seek(m)
wf.write(rf.readline())
wf.write(rf.readline())
wf.write(rf.readline())
wf.write(rf.readline())
rf.close()
wf.close()
#import pdb; pdb.set_trace()
index = build_index('afile.fastq')
matches = match_records('bfile.fastq', index)
posns = [ index[k] for k in matches ]
write_matches('afile.fastq', posns, 'outfile.fastq')
Note that this code goes back to the first file to get the blocks of data. If your data is identical between files, you would be able to copy the block from the second file when a match occurs.
Note also that depending on what you are trying to extract, you may want to change the order of the output blocks, and you may want to make sure that the keys are unique, or perhaps make sure the keys are not unique but are repeated in the order they match. That's up to you - I'm not sure what you're doing with the data.
these guys claim to parse a few gigs file while using a dedicated library, see http://www.biostars.org/p/15113/
fastq_parser = SeqIO.parse(fastq_filename, "fastq")
wanted = (rec for rec in fastq_parser if ...)
SeqIO.write(wanted, output_file, "fastq")
a better approach IMO would be to parse it once and load the data to some database instead of that output_file (i.e mysql) and latter run the queries there