Using txt files in Python

Using txt files in Python - python

I have a txt file which I need to access through python. The data in the txt file displays a football league in CSV format. The CSV data covers the games played, won and lost, where this will calculate the teams points (2 points for a win, 0 for a loss). I have an idea on how to start this but not sure if I have started on the right foot.
How do I calculate the total points for each team? And can I get the headings above the data from the txt file? (Team,Played, Won, Lost, Total) Any support would be appreciated.
CSV Data:
Liverpool,19,7,12
Chelsea,19,8,11
Arsenal,19,0,19
Tottenham,19,7,12
Man Utd,19,7,12
Man City,19,5,14
Southampton,19,3,16
Code:
text_file = open ("leagueResults.txt","r")
print (text_file.read())
text_file.close()

As mentioned in the comments you should look into the csv module.
However in your case since I assume you have just started learning python and the problem is relatively trivial we can do it by just reading the file line by line splitting on the delimiter ,.
team_name = []
games_won = []
num_records = 0
with open('leagueResults.txt') as f:
for line in f:
record = line.strip().split(',')
team_name.append(record[0])
games_won.append(record[2])
num_records += 1
print("Points Table")
print("============")
for i in range(0, num_records):
print("%s: %d" % (team_name[i], (int(games_won[i]) * 2)))
Output:
Points Table
============
Liverpool: 14
Chelsea: 16
Arsenal: 0
Tottenham: 14
Man Utd: 14
Man City: 10
Southampton: 6
Notice how I am only interested in the team_name and games_won since those are the only two actually required to calculate the amount of points per team in the problem (games_played is always 19 and games_lost has no affect on the total points as it is multiplied by a scale factor of 0 in the total points calculation).

Related

Formatted strings, decimals and commas question

I have a .txt file that I read in and wish to create formatted strings using these values. Columns 3 and 4 need decimals and the last column needs a percent sign and 2 decimal places. The formatted string will say something like "The overall attendance at Bulls was 894659, average attendance was 21,820 and the capacity was 104.30%’
the shortened .txt file has these lines:
1 Bulls 894659 21820 104.3
2 Cavaliers 843042 20562 100
3 Mavericks 825901 20143 104.9
4 Raptors 812863 19825 100.1
5 NY_Knicks 812292 19812 100
So far my code looks like this and its mostly working, minus the commas and decimal places.
file_1 = open ('basketball.txt', 'r')
count = 0
list_1 = [ ]
for line in file_1:
count += 1
textline = line.strip()
items = textline.split()
list_1.append(items)
print('Number of teams: ', count)
for line in list_1:
print ('Line: ', line)
file_1.close()
for line in list_1: #iterate over the lines of the file and print the lines with formatted strings
a, b, c, d, e = line
print (f'The overall attendance at the {b} game was {c}, average attendance was {d}, and the capacity was {e}%.')
Any help with how to format the code to show the numbers with commas (21820 ->21,828) and last column with 2 decimals and a percent sign (104.3 -> 104.30%) is greatly appreciated.

You've got some options for how to tackle this.
Option 1: Using f strings (Python 3 only)
Since your provided code already uses f strings, this solution should work for you. For others reading here, this will only work if you are using Python 3.
You can do string formatting within f strings, signified by putting a colon : after the variable name within the curly brackets {}, after which you can use all of the usual python string formatting options.
Thus, you could just change one of your lines of code to get this done. Your print line would look like:
print(f'The overall attendance at the {b} game was {int(c):,}, average attendance was {int(d):,}, and the capacity was {float(e):.2f}%.')
The variables are getting interpreted as:
The {b} just prints the string b.
The {int(c):,} and {int(d):,} print the integer versions of c and d, respectively, with commas (indicated by the :,).
The {float(e):.2f} prints the float version of e with two decimal places (indicated by the :.2f).
Option 2: Using string.format()
For others here who are looking for a Python 2 friendly solution, you can change the print line to the following:
print("The overall attendance at the {} game was {:,}, average attendance was {:,}, and the capacity was {:.2f}%.".format(b, int(c), int(d), float(e)))
Note that both options use the same formatting syntax, just the f string option has the benefit of having you write your variable name right where it will appear in the resulting printed string.

This is how I ended up doing it, very similar to the response from Bibit.
file_1 = open ('something.txt', 'r')
count = 0
list_1 = [ ]
for line in file_1:
count += 1
textline = line.strip()
items = textline.split()
items[2] = int(items[2])
items[3] = int(items[3])
items[4] = float(items[4])
list_1.append(items)
print('Number of teams/rows: ', count)
for line in list_1:
print ('Line: ', line)
file_1.close()
for line in list_1:
print ('The overall attendance at the {:s} games was {:,}, average attendance was {:,}, and the capacity was {:.2f}%.'.format(line[1], line[2], line[3], line[4]))

Update my first line value in a txt file and add a line to the next line in the txt file

For example, let's say I spent 10 dollars in a book and I want to update the first value, which is my total value spend till now. Which is ZERO like 0+10, And add to the next line valid how I spent this money, for example like written below.
Total value : 10
Historic : Bought a science book with a value of 10 dollars
the first line will be treated like an integer type and the second one is just a string to show my history of spending's, and then I buy a piece of chocolate for 4 dollars
Total value : 14
Historic : Bought a science book with a value of 10 dollars
Historic : Bought a chocolate bar for a value of 4 dollars my code down, I'm having trouble understanding, I watched 3 videos now, and my trouble is how do I add a last line, and how I update the first LINE of an existing text file
while True:
try:
x = int(input("Enter the value spend by Tino: "))
break
except:
print("is not a number")
history = input("How did Tino spend this money?")
with open('tinodebt.txt') as file:
read = file.readlines()
read[0] = str(int(read[0]) + x)
with open('tinodebt.txt','w') as file:
file.write(read)
print("The value spent is : "+read[0]+ "$")
```py
I tried finding a way to update the first line of an given text, and add new text lines to the file, but i can't write it in code

Rewriting a txt file in python, creating new lines where there is a certain string

I have converted a PDF bank statement to a txt file. Here is a snippet of the .txt file:
15 Apr 20DDOPEN 100.00DDBENNON WATER SRVCS29.00DDBG BUSINESS106.00BPC BOB PETROL MINISTRY78.03BPC BARBARA STREAMING DATA30.50CRPAYPAL Z4J22FR450.00CRPAYNAL AAWDL4Z4J22222KHMG30.0019,028.4917 Apr 20CRCASH IN AT HSBC BANK
What is the easiest way of re-writing the text file in python to create a new line at certain points. i.e. after a number ‘xx.xx’ there in a new date such as ‘xx APR’
For example the text to become:
15 Apr 20DDOPEN 100.00
BENNON WATER SRVCS29.00
DDBG BUSINESS106.00...(etc)
I am just trying to make a PDF more readable and useful when working amongst my other files.
If you know of another PDF to txt python converter which works better, I would also be interested.
Thanks for your help

First step would be getting the text file into Python
with open(“file.txt”) as file:
data = file.read()
This next part, initially, I thought you wouldn't be able to do, but in your example, each part contains a number XX.XX The important thing to notice here is that there is a '.' in each number.
Using Python's string find command, you can iteratively look for that '.' and add a newline character two characters later. You can change my indices below to remove the DD as well if you want.
index = 0
while(index != -1):
index = data.find('.', index)
if index != -1:
data = data[:index+3] + '\n' + data[index+3:]
Then you need to write the new data back to the file.
file = open('ValidEmails.txt','w')
file.write(data)

For the given input the following should work:
import re
counter = 0
l = "15 Apr 20DDOPEN 100.00DDBENNON WATER SRVCS29.00DDBG BUSINESS106.00BPC BOB PETROL MINISTRY78.03BPC BARBARA STREAMING DATA30.50CRPAYPAL Z4J22FR450.00CRPAYNAL AAWDL4Z4J22222KHMG30.0019,028.4917 Apr 20CRCASH IN AT HSBC BANK"
nums = re.finditer("[\d]+[\.][\d]+", l)
for elem in nums:
idx = elem.span()[1] + counter
l = l[:idx] + '\n' + l[idx:]
counter += 1
print(l)
The output is:
15 Apr 20DDOPEN 100.00
DDBENNON WATER SRVCS29.00
DDBG BUSINESS106.00
BPC BOB PETROL MINISTRY78.03
BPC BARBARA STREAMING DATA30.50
CRPAYPAL Z4J22FR450.00
CRPAYNAL AAWDL4Z4J22222KHMG30.0019
,028.4917
Apr 20CRCASH IN AT HSBC BANK
Then you should easily able to write line by line to a file.

Python checking if line above or below equals to phrase

I am trying to make an automated monthly cost calculator for my family. The idea is whenever they shop they take a picture of the receipt and send it to an e-mail adress. A Python script downloads that picture and using the Google Vision API scans for the Total amount which then gets written into a .csv file for later use. ( I have yet to make the csv thing so it's only being saved into txts for now.)
This works because in my country the receipts all look the same because of regulations however the Google Vision API returns the OCRed text back line by line. What i am trying to do now is check the text line by line for the total amount which is always in the following format (Numbers space Currency) then i check if the OCR messed up something like put the "Total amount" above or below the actual numbers.
My problem is that if i run this script on more than 3 .txt OCR data then it only gets the first 2 right even though they are the same if i manually check them. If i run it on them 1 by 1 then it gets them perfect everytime.
The OCR data looks like this:
Total amount:
1000 USD
or
1000 USD
Total amount:
My code so far:
import re
import os
import codecs
for files in os.listdir('texts/'):
filedir="texts/"+str(files)
with codecs.open(filedir,'rb','utf-8') as f:
lines=f.readlines()
lines=[l.strip() for l in lines]
for index,line in enumerate(lines):
match=re.search(r"(\d+) USD",line)
if match:
if lines[index+1].endswith("USD"):
amount=re.sub(r'(\d\s+(\d)',r'\1\2',lines[index])
amount=amount.replace(" USD","")
print(amount)
with open('amount.txt',"a") as data:
data.write(amount)
data.write("\n")
if lines[index-1].endswith("USD"):
amount=re.sub(r'(\d\s+(\d)',r'\1\2',lines[index])
amount=amount.replace(" USD","")
print(amount)
with open('amount.txt',"a") as data:
data.write(amount)
data.write("\n")

Question: checking if line above or below equals to phrase
Simplify to the following:
Assumptions:
The Amount line has the following format (Numbers space Currency).
These exact phrase "Total amount:", exists allways in the other line.
The above lines are separated with a blank line.
FILE1 = u"""Total amount:
1000 USD
"""
FILE2 = u"""1000 USD
Total amount:"""
import io
import os
import codecs
total = []
#for files in os.listdir('texts/'):
for files in [FILE1, FILE2]:
# filedir="texts/"+str(files)
# with codecs.open(filedir,'rb','utf-8') as f:
with io.StringIO(files) as f:
v1 = next(f).rstrip()
# eat empty line
next(f)
v2 = next(f).rstrip()
if v1 == 'Total amount:':
total.append(v2.split()[0])
else:
total.append(v1.split()[0])
print(total)
# csv_writer.writerows(total)
Output:
[u'1000', u'1000']

How to read then parse with split and write into a text file?

I'm struggling to get readline() and split() to work together as I was expecting. Im trying to use .split(')') to cut down some data from a text file and write some of that data to a next text file.
I have tried writing everything from the line.
I have tried [cnt % 2] to get what I expected.
line = fp.readline()
fw = open('output.txt', "w+")
cnt = 1
while line:
print("Line {}: {}".format(cnt, line.strip()))
line = fp.readline()
line = line.split(')')[0]
fw.write(line + "\n")
cnt += 1
Example from the text file im reading from.
WELD 190 Manufacturing I Introduction to MasterCAM (3)
1½ hours lecture - 4½ hours laboratory
Note: Cross listed as DT 190/ENGR 190/IT 190
This course will introduce the students to MasterCAM and 2D and basic 3D
modeling. Students will receive instructions and drawings of parts requiring
2- or 3-axis machining. Students will design, model, program, set-up and run
their parts on various machines, including plasma cutters, water jet cutters and
milling machines.
WELD 197 Welding Technology Topics (.5 - 3)
I'm very far off from actually effectively scraping this data but I'm trying to get a start.
My goal is to extract only class name and number and remove descriptions.
Thanks as always!

I believe to solve your current problem, if you're only attempting to parse one line, you will simply need to move your second line = fp.readline() line to the end of the while loop. Currently, you are actually starting the parsing from the second line, because you have already used a readline in the first line of your example code.
After the change it would look like this:
line = fp.readline() # read in the first line
fw = open('output.txt', "w+")
cnt = 1
while line:
print("Line {}: {}".format(cnt, line.strip()))
line = line.split(')')[0]
fw.write(line + "\n")
cnt += 1
line = fp.readline() # read in next line after parsing done
Output for your example input text:
WELD 190 Manufacturing I Introduction to MasterCAM (3

Assuming your other class text blocks share the same structure than the one you showed you might want to use a regular expression to extract the class name and class number:
Following I assume that every text block contains the information "XX hours lecture" at the same order where 'XX' stands for any kind of number (time frame). In the variable 'match_re' I define a regular matching expression to match only to the defined spot 'XX hours lecture'. And by using 'match.group(2)' I restrict my match to the part within the inmost bracket pair.
The matching expression below probably won't be complete for you yet since I don't know your whole text file.
Below I extract the string: WELD 190 Manufacturing I Introduction to MasterCAM (3)
import re
string = "WELD 190 Manufacturing I Introduction to MasterCAM (3) 1½ hours lecture - 4½ hours laboratory Note: Cross listed as DT 190/ENGR 190/IT 190 This course will introduce the students to MasterCAM and 2D and basic 3D modeling. Students will receive instructions and drawings of parts requiring 2- or 3-axis machining. Students will design, model, program, set-up and run their parts on various machines, including plasma cutters, water jet cutters and milling machines. WELD 197 Welding Technology Topics (.5 - 3)"
match_re = "(^(.*)\d.* hours lecture)"
match = re.search(match_re,string)
if match:
print(match.group(2))
else:
print("No match")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using txt files in Python - python

Related

Formatted strings, decimals and commas question

Update my first line value in a txt file and add a line to the next line in the txt file

Rewriting a txt file in python, creating new lines where there is a certain string

Python checking if line above or below equals to phrase

How to read then parse with split and write into a text file?

Categories

Resources