calculations with floats in a txt file - python

I have a txt file that is composed of three columns, the first column is integers and the second and third column are floats. I want to do a calculation with each float and separate by line. My pseudocode is below:
def first_function(file):
pogt = 0.
f=open(file, 'r')
for line in f:
pogt += otherFunction(first float, second float)
f.close
Also, would the "for line in f" guarantee that my pogt will be the sum of my otherFunction calculation of all the lines in the txt file?

Assuming that you get the values for first float and second float correctly, your code is close to correct, you'll need to dedent (the inverse of indent) the f.close line, or even better, use with, it will handle the close for you (btw, you should do f.close() instead of f.close)
And do not use file as variable name, it's reserved word in Python.
Also use better names for your variables.
Assuming your file is separated by spaces, you can define get_numbers as follows:
def get_numbers(line):
[the_integer, first_float, second_float] = line.strip().split()
return (first_float, second_float)
def first_function(filename):
the_result = 0
with open(filename, 'r') as f:
for line in f:
(first_float, second_float) = get_numbers(line)
the_result += other_function(first_float, second_float)

Related

How to load .txt file as a numpy array such that it only reads in certain lines?

I have a text file that contains xyz coordinates broken up by lines of text (specifically, the first 2 lines are text, then the next 22 are coordinates, and the next 2 are text, etc. It goes on like that for the rest of the file). I want to read in the file such that it will be a numpy array (or list, either works) that contains all the different sets of coordinates in separate lists/arrays.
So:
[[x1 y1 z1],[x2 y2 z2],...]
Here is what I have tried:
def convert_xyz_bat(filename, newfile): #add self later
with open(filename, "r") as f:
coords = []
for line in f:
if "C" in line or "H" in line:
atom,x,y,z = line.split(" ")
coords.append([float(x), float(y), float(z)])
else:
pass
coordinates = np.array(coords, dtype=object)
return print(coordinates[0])
This takes up a lot of memory since it writes all the lines to this variable (the file is really large). I'm not sure if this will use less memory or not, but I could also do something like this, where I make another file which contains all the coordinates:
with open(filename, "r") as f:
with open(newfile, "r+") as f1:
for line in f:
if "C" in line or "H" in line:
atom, x,y,z = line.split(" ")
f1.write(str([float(x), float(y), float(z)]))
else:
pass
return
If I make the file, the problem with that is it only lets me write the coordinates in as strings, so I would have to define a variable that opens the file and writes it in as an array (so that I can use indexing later with loop functions).
I am not sure which option would work better, or if there is a better third/fourth option that I have not considered.
you have some typos in your first code. return print() is weird combination and some indentation problem near the with statement.
as mentioned your second option will have less memory consumption, however the data will be reachable on demand.
I think that you need to rethink what is your main target. if you just want to cast the data between different formats from file to file the second option is better. If you need to apply some logic on the data the first option (with high memory consumption) is the solution. You can also do something else, instead of reading all the data try to read it as chunks and work your way thru the file. Something like:
class ReadFile:
def __init__(self, file_path):
self.file_pipe = open(file_path, "r")
self.number_of_lines_to_read = 1000
def __del__(self):
self.file_pipe.close()
def get_next_cordinates(self):
cnt = 0
coords = []
for line in self.file_pipe:
cnt += 1
if cnt % self.number_of_lines_to_read == 0:
yield np.array(coords, dtype=object)
coords = []
if "C" in line or "H" in line:
atom, x, y, z = line.split(" ")
coords.append([float(x), float(y), float(z)])
else:
pass
yield np.array(coords, dtype=object)
and than you can use it as follows:
read_file = ReadFile(file_path)
for coords in read_file.get_next_cordinates():
# do something with the coords
pass

Python highest integer in a file

I am working on writing a function that returns the highest integer number in a specified file. The files only contain numbers. I came up with the following code;
def max_num_in_file(filename):
"""DOCSTRING"""
with open(filename, 'r') as file:
return max(file.read())
When I test this with a text file that I created, it returns the highest digit in any of the lines in the file. I need it to return the overall highest number rather than a single digit.
Assuming your file contains one number on each line:
with open(path, 'r') as file:
m = max(file.readlines(), key=lambda x: int(x))
Then m holds as a string the greatest number of the file, and int(m) is the value you are looking for.
file.readlines() gives you a list whose elements are the lines of the file.
The max built-in function takes an iterable (here, that list of lines), and an optional key argument.
The key argument is how you want the elements to be compared.
The elements of my iterable are strings which I know represent integers.
Therefore, I want them to be compared as integers.
So my key is lambda x: int(x), which is an anonymous function that returns int(x) when fed x.
Now, why did max(file.read()) not work?
file.read() gives you the string corresponding to the whole content of the file.
Then again, max compares the elements of the iterable it is passed, and returns the greatest one, according to the order relation defined on the elements' type(s).
For strings (str instances), it is the lexicographical order.
So if your file contains only numbers, all characters are digits, and the greatest element is the character corresponding to the greatest digit.
So max(file.read()) will most likely return '9' in most cases.
As long as your file is clean and has no empty/non number lines:
def max_num_in_file(filename):
"""DOCSTRING"""
with open(filename, 'r') as file:
return max([int(_x.strip()) for _x in file.readlines()])
You need to iterate the file object and convert each line to int(). If the file is very large, I would advise agains using readlines() as it will alocate a huge list into the memory. I'ts better to use an iterator to do the job, iterate one line at a time:
def max_num_in_a_file(filename):
def line_iterator(filename):
with open(filename) as f:
for line in f:
yield int(line)
return max(line_iterator(filename))
Beware the script will thrown an Exception if any line in your file is not convertable to an int() object. You can protect your iterator for such case and just skips the line, as follows:
def max_num_in_a_file(filename):
def line_iterator(filename):
with open(filename) as f:
for line in f:
try:
num = int(line)
except ValueError:
continue
yield num
return max(line_iterator(filename))
This function will work for a file with numbers and other data, and will just skips lines that are not convertible to int().
d=f.read()
max(map(int,d.split())) #given that file contains only numbers separated by ' '
# if file has other characters as well
max(map(int,[i for i in d.split() if i.isdigit()]))
You may also go through it.
def max_num_in_file(filename):
"""DOCSTRING"""
with open(filename, 'r') as file:
# read every line and converting into list
ls = [x.strip().split() for x in file.readlines()]
return max(map(int, sum(ls, [])))
# sum(ls,[]) is used for converting into a single list
# map() is used for convert string to int

I can't properly define a function in Python. Can anyone tell me where I am going wrong?

The directions are to:
Write a function called calc_average that takes a string representing a filename of the following format, for example:
Smith 82
Jones 75
Washington 91
The function should calculate and return the class average from the data in the file.
So far I have:
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
I have tried to work past this part, but I can't find the right way to do the average. Any suggestions? I only need the function and the file is only an example.
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
After all that, you're just literally sending something into the function, but not asking for anything to come out.
Using return will help get something out of the function.
return [variable] is a way to use it.
Here:
Add this line
return [variable] to the end of your code, such that it looks like this:
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
return variable #where you replace variable with
#the thing you want to get out of your function
To call this function (or should i say "run" it) just write the name of it, but dedented.
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
return variable
calc_average() #<- this calls the function
You might also want to read up on parameters:
parametere are values passed into a function and are used.
Example:
def test1(number):
number = number + 1 #this can be written as number += 1 as well
return number
x = test1(5)
First I define the function with a number parameter. This would mean that number will be used in this function. Notice how the lines below def test1(number) also use the variable number. whatever is passed into the function as number will be considered number in the function.
Then, I call the function and use 5 as a parameter.
When it's called, the function takes 5 (since that was the input parameter) and stores the variable number as 5.(from def test1(number)) Thus, It's like writing number = 5 in the function itself.
Afterwards, return number will take the number (which in this case is added to become 6, number = 6) and give it back to the outside code. Thus, it's like saying return 6.
Now back to the bottom few lines. x = test1(5) will make x = 6, since the function returned 6.
Hope I helped you understand functions more.
The function needs an argument. It also needs to return the average, so there should be a return statement at the end.
def calc_average(file_name):
...
return <something>
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
clsavg = 0
counter = 0
for line in readlines:
parts = line.split()
clsavg = clsavg+ float(parts[1])
counter = counter + 1
print clsavg/counter
To continue using mostly your own code and your example, you could do:
def calc_average(filename):
infile = open(filename, "r")
readlines = infile.readlines()
average = 0 # use this to sum all grades
index = 0 # keep track of how many records or lines in our txt
for line in readlines:
parts = line.split()
name = parts[0] # basically useless
# for each line in txt we sum all grades
average = average + float(parts[1]) # need to convert this string to a float
index = index + 1
# divide by the number of entries in your txt file to get the class average
return average / index
then:
calc_average('grades.txt')
prints:
82.66666666666667
Alright, let's look at this, part by part:
Write a function called calc_average
def calc_average():
that takes a string representing a filename
let's make this a meaningful variable name:
def calc_average(filename):
Now that we have the basics down, let's talk about how to actually solve your problem:
Each line in your file contains a name and a grade. You want to keep track of the grades so that you can compute the average out of them. So, you need to be able to:
read a file one line at a time
split a line and take the relevant part
compute the average of the relevant parts
So, it seems that holding the relevant parts in a list would be helpful. We would then need a function that computes the average of a list of numbers. So let's write that
def average(L):
sum = 0
for num in L:
sum += num
return sum/len(L)
Of course, there's an easier way to write this:
def average(L):
return sum(L)/len(L) # python has a built-in sum function to compute the sum of a list
Now that we have a function to compute the average, let's read the file and create a list of numbers whose average we want to compute:
def read_from_file(filename):
answer = [] # a list of all the numbers in the file
with open(filename) as infile: # open the file to read
for line in infile:
parts = line.split()
grade = int(parts[-1]) # the integer value of the last entity in that line
answer.append(grade)
return answer
Now that we have a function that returns the relevant information from a file, and a function that computes averages, we just have to use the two together:
def calc_average(filename):
numbers = read_from_file(filename)
answer = average(numbers)
return answer
Now, you might notice that you don't need to keep track of each number, as you just sum them up and divide by the number of numbers. So, this can be done more concisely as follows:
def calc_average(filename):
nums = 0
total = 0
with open(filename) as infile:
for line in infile:
total += int(line.split()[-1])
nums += 1
return total/nums
You didnt calculated the average.Also didnt return anything from the function calc_average.So try this
def calc_average():
with open('filename.txt') as text:
nums = [int(i.split()[1]) for i in text]
avg = float(sum(nums)) / float(len(nums))
return avg
>>>print(calc_average())
82.6666666667
First thing you're doing wrong is not marking the source code in your post, as source code. Put it on separate lines to the rest of your post, and use the {} link at the top of the editor to mark it as source code. Then it should come out like this:
def calc_average():
infile = open("filename.txt", "r")
readlines = infile.readlines()
for line in readlines:
parts = line.split()
name = parts[0]
clsavg = parts[1]
average = 0
You should do the same with the file contents: I am assuming that you have one name and one number per line.
If you want to put a snippet of code inline with your text, e.g. "the foo() function", put a backtick each side of the code. The backtick is like an accent grave, and is sometimes very wrongly used as an opening quote char in text files.
Next, you were to write a function that takes a string containing a filename. But you have
def calc_average():
infile = open("filename.txt", "r")
That doesn't take anything. How about
def calc_average(filename):
infile = open(filename, "r")
Now, what your function is doing is, reading the lines ok, splitting them into a name and a number -- but both are still strings, and the string containing the number is put in the variable clsavg, and then just setting the variable average to 0 every time a line is read.
But what do you want to do? I think when you say "class average", these are all people in a class, the numbers are their scores, and you want to calculate the average of the numbers? So that means adding up all the numbers, and dividing by the number of rows in the file.
So you need to set a variable to 0 ONCE, before the loop starts, then on each time you read a line, increment it by the number value. I would imagine the clsavg variable would be the one to use. So you need to convert parts[1] to an integer. You can use int() for that. Then you need to increment it, with += or with a statement like x = x + y. Ask google if you want more details. That way you build up a total value of all the numbers. Finally, after the loop is finished (meaning on a line that is only indented as far as the for), you need to divide the total by the number of rows. That would be the number of elements in readlines. You should google the len() function. Division uses the / operator. You can use x /= y to set x to the value of x/y.
That's making lots of assumptions: that you want an integer average, that every line in the file has the name and number (no blank lines or comments etc.) By the way, you can use float() instead of int() if you want more precision.

TypeError in for loop

I'm having trouble with some code, where I have a text file with 633,986 tuples, each with 3 values (example: the first line is -0.70,0.34,1.05). I want to create an array where I take the magnitude of the 3 values in the tuple, so for elements a,b,c, I want magnitude = sqrt(a^2 + b^2 + c^2).
However, I'm getting an error in my code. Any advice?
import math
fname = '\\pathname\\GerrysTenHz.txt'
open(fname, 'r')
Magn1 = [];
for i in range(0, 633986):
Magn1[i] = math.sqrt((fname[i,0])^2 + (fname[i,1])^2 + (fname[i,2])^2)
TypeError: string indices must be integers, not tuple
You need to open the file properly (use the open file object and the csv module to parse the comma-separated values), read each row and convert the strings into float numbers, then apply the correct formula:
import math, csv
fname = '\\pathname\\GerrysTenHz.txt'
magn1 = []
with open(fname, 'rb') as inputfile:
reader = csv.reader(inputfile)
for row in reader:
magn1.append(math.sqrt(sum(float(c) ** 2 for c in row)))
which can be simplified with a list comprehension to:
import math, csv
fname = '\\pathname\\GerrysTenHz.txt'
with open(fname, 'rb') as inputfile:
reader = csv.reader(inputfile)
magn1 = [math.sqrt(sum(float(c) ** 2 for c in row)) for row in reader]
The with statement assigns the open file object to inputfile and makes sure it is closed again when the code block is done.
We add up the squares of the column values with sum(), which is fed a generator expression that converts each column to float() before squaring it.
You need to use the lines of the file and the csv module (as Martijn Pieters points out) to examine each value. This can be done with a list comprehension and with:
with open(fname) as f:
reader = csv.reader(f)
magn1 = [math.sqrt(sum(float(i)**2 for i in row)) for row in reader]
just make sure you import csv as well
To explain the issues your having (there are quite a few) I'll walk through a more drawn out way to do this.
you need to use what openreturns. open takes a string and returns a file object.
f = open(fname)
I'm assuming the range in your for loop is suppose to be the number of lines in the file. You can instead iterate over each line of the file one by one
for line in f:
Then to get the numbers on each line, use the str.split method of to split the line on the commas
x, y, z = line.split(',')
convert all three to floats so you can do math with them
x, y, z = float(x), float(y), float(z)
Then use the ** operator to raise to a power, and take the sqrt of the sum of the three numbers.
n = math.sqrt(x**2 + y**2 + z**2)
Finally use the append method to add to the back of the list
Magn1.append(n)
Let's look at fname. That's a string. So if you try to subscript it (i.e., fname[i, 0]), you should use an integer, and you'll get back the character at index i. Since you're using [i, 0] as the string indices, you're passing a tuple. That's no integer!
Really, you should be reading a line from the file, then doing things with that. So,
with(open(fname, 'r')) as f: # You're also opening the file and doing nothing with it
for line in f:
print('doing something with %s' % line)

How to read in floats from a file?

How can I open a file and read in the floats from the file, when it is in string format, in Python? I would also like to change the values of the each float and rewrite the file over with the new values.
Assuming there's one float per line:
with open("myfile") as f:
floats = map(float, f)
# change floats
with open("myfile", "w") as f:
f.write("\n".join(map(str, floats)))
If you want more control with formatting, use the format method of string. For instance, this will only print 3 digits after each period:
f.write("\n".join(map("{0:.3f}".format, floats)))
The "float()" function accepts strings as input and converts them into floats.
>>> float("123.456")
123.456
def get_numbers():
with open("yourfile.txt") as input_file:
for line in input_file:
line = line.strip()
for number in line.split():
yield float(number)
Then just write them back when your done
and as a shorter version (not tested, written from head)
with open("yourfile.txt") as input_file:
numbers = (float(number) for number in (line for line in (line.split() for line in input_file)))
if you want to read input_num floats:
import numpy as np
import struct
float_size=4
np.array(struct.unpack('<'+str(input_num)+'f',
fin.read(float_size*input_num)))

Categories