I am working on writing a function that returns the highest integer number in a specified file. The files only contain numbers. I came up with the following code;
def max_num_in_file(filename):
"""DOCSTRING"""
with open(filename, 'r') as file:
return max(file.read())
When I test this with a text file that I created, it returns the highest digit in any of the lines in the file. I need it to return the overall highest number rather than a single digit.
Assuming your file contains one number on each line:
with open(path, 'r') as file:
m = max(file.readlines(), key=lambda x: int(x))
Then m holds as a string the greatest number of the file, and int(m) is the value you are looking for.
file.readlines() gives you a list whose elements are the lines of the file.
The max built-in function takes an iterable (here, that list of lines), and an optional key argument.
The key argument is how you want the elements to be compared.
The elements of my iterable are strings which I know represent integers.
Therefore, I want them to be compared as integers.
So my key is lambda x: int(x), which is an anonymous function that returns int(x) when fed x.
Now, why did max(file.read()) not work?
file.read() gives you the string corresponding to the whole content of the file.
Then again, max compares the elements of the iterable it is passed, and returns the greatest one, according to the order relation defined on the elements' type(s).
For strings (str instances), it is the lexicographical order.
So if your file contains only numbers, all characters are digits, and the greatest element is the character corresponding to the greatest digit.
So max(file.read()) will most likely return '9' in most cases.
As long as your file is clean and has no empty/non number lines:
def max_num_in_file(filename):
"""DOCSTRING"""
with open(filename, 'r') as file:
return max([int(_x.strip()) for _x in file.readlines()])
You need to iterate the file object and convert each line to int(). If the file is very large, I would advise agains using readlines() as it will alocate a huge list into the memory. I'ts better to use an iterator to do the job, iterate one line at a time:
def max_num_in_a_file(filename):
def line_iterator(filename):
with open(filename) as f:
for line in f:
yield int(line)
return max(line_iterator(filename))
Beware the script will thrown an Exception if any line in your file is not convertable to an int() object. You can protect your iterator for such case and just skips the line, as follows:
def max_num_in_a_file(filename):
def line_iterator(filename):
with open(filename) as f:
for line in f:
try:
num = int(line)
except ValueError:
continue
yield num
return max(line_iterator(filename))
This function will work for a file with numbers and other data, and will just skips lines that are not convertible to int().
d=f.read()
max(map(int,d.split())) #given that file contains only numbers separated by ' '
# if file has other characters as well
max(map(int,[i for i in d.split() if i.isdigit()]))
You may also go through it.
def max_num_in_file(filename):
"""DOCSTRING"""
with open(filename, 'r') as file:
# read every line and converting into list
ls = [x.strip().split() for x in file.readlines()]
return max(map(int, sum(ls, [])))
# sum(ls,[]) is used for converting into a single list
# map() is used for convert string to int
Related
I'm working on Problem Set 6: DNA. My approach is to save the different types of STR sequences as "all_sequences", then find the maximum number of repeats for each sequence in "all_sequences".
My question is: Why does next() ensure I will only select the first row of the csv? I understand [1:] is to remove the name column, but how does next() ensure I only select the first row?
f = sys.argv[1] # name of CSV file
t = sys.argv[2] # name of text file with dna sequence
# Open the CSV file and read its contents into memory.
with open(f, "r") as database:
index = csv.reader(database)
all_sequences = next(index)[1:]
# Open the DNA sequence and read its contents into memory.
with open(t, "r") as dnaseq:
s = dnaseq.read()
actual = [maxrepeats(s, seq) for seq in all_sequences]
print_match(index, actual)
In your example, index is a csv.reader object which is an iterator. next(index)yields the next element of the iterator (apparently a list). The list is then sliced to omit the first value.
It is strange to see that next is used only once, because this is simply yields the first row of the index iterator. It starts making sense when next is called more often.
I have a CSV file that contains matrix:
1,9,5,78
4.9,0,24,7
6,2,3,8
10,21.4,8,7
I want to create a function that returns list of lists:
[[1.0,9.0,5.0,78.0],[4.9,0.0,24.0,7.0],[6.0,2.0,3.0,8.0],[10.0,21.4,8.0,7.0]]
this is my attempt:
fileaname=".csv"
def get_csv_matrix(fileaname):
mat=open(fileaname,'r')
mat_list=[]
for line in mat:
line=line.strip()
mat_line=[line]
mat_list.append(mat_line)
return mat_list
but I get list of lists with one string:
[['1,9,5,78'], ['4.9,0,24,7'], ['6,2,3,8'], ['10,21.4,8,7']]
how can i turn the lists of strings to lists of floats?
mat_line = [line]
This line just takes the line as a single string and makes it into a one element list. If you want to separate it by commas, instead do:
mat_line = line.split(',')
If you want to also turn them into numbers, you'll have to do:
mat_line = [float(i) for i in line.split(',')]
I find it easier to read a list comprehension than a for loop.
def get_csv_matrix(filename):
with open(filename) as input_file:
return [[float(i) for i in line.split(',')] for line in input_file]
print (get_csv_matrix("data.csv"))
The above function opens a file (I use with to avoid leaking open file descriptors), iterates over the lines, splits each line, and converts each item into a floating-point number.
Try
fileaname=".csv"
def get_csv_matrix(fileaname):
mat=open(fileaname,'r')
mat_list=[]
for line in mat:
line=line.strip()
mat_line=line.split(",")
for i in mat_line:
i_position = line.index(i)
line[i_position] = float(i)
mat_list.append(mat_line)
return mat_list
If any object in mat_line isn't an integer, you will come up with an error, so I suggest you create a validation method to be absolutely sure that it is an integer.
I've been stuck on this Python homework problem for awhile now: "Write a complete python program that reads 20 real numbers from a file inner.txt and outputs them in sorted order to a file outter.txt."
Alright, so what I do is:
f=open('inner.txt','r')
n=f.readlines()
n.replace('\n',' ')
n.sort()
x=open('outter.txt','w')
x.write(print(n))
So my thought process is: Open the text file, n is the list of read lines in it, I replace all the newline prompts in it so it can be properly sorted, then I open the text file I want to write to and print the list to it. First problem is it won't let me replace the new line functions, and the second problem is I can't write a list to a file.
I just tried this:
>>> x= "34\n"
>>> print(int(x))
34
So, you shouldn't have to filter out the "\n" like that, but can just put it into int() to convert it into an integer. This is assuming you have one number per line and they're all integers.
You then need to store each value into a list. A list has a .sort() method you can use to then sort the list.
EDIT:
forgot to mention, as other have already said, you need to iterate over the values in n as it's a list, not a single item.
Here's a step by step solution that fixes the issues you have :)
Opening the file, nothing wrong here.
f=open('inner.txt','r')
Don't forget to close the file:
f.close()
n is now a list of each line:
n=f.readlines()
There are no list.replace methods, so I suggest changing the above line to n = f.read(). Then, this will work (don't forget to reassign n, as strings are immutable):
n = n.replace('\n','')
You still only have a string full of numbers. However, instead of replacing the newline character, I suggest splitting the string using the newline as a delimiter:
n = n.split('\n')
Then, convert these strings to integers:
`n = [int(x) for x in n]`
Now, these two will work:
n.sort()
x=open('outter.txt','w')
You want to write the numbers themselves, so use this:
x.write('\n'.join(str(i) for i in n))
Finally, close the file:
x.close()
Using a context manager (the with statement) is good practice as well, when handling files:
with open('inner.txt', 'r') as f:
# do stuff with f
# automatically closed at the end
I guess real means float. So you have to convert your results to float to sort properly.
raw_lines = f.readlines()
floats = map(float, raw_lines)
Then you have to sort it. To write result back, you have to convert to string and join with line endings:
sortеd_as_string = map(str, sorted_floats)
result = '\n'.join(sortеd_as_string)
Finally you have have to write result to destination.
Ok let's look it step by step what you want to do.
First: Read some integers out of a textfile.
Pythonic Version:
fileNumbers = [int(line) for line in open(r'inner.txt', 'r').readlines()]
Easy to get version:
fileNumbers = list()
with open(r'inner.txt', 'r') as fh:
for singleLine in fh.readlines():
fileNumbers.append(int(singleLine))
What it does:
Open the file
Read each line, convert it to int (because readlines return string values) and append it to the list fileNumbers
Second: Sort the list
fileNumbers.sort()
What it does:
The sort function sorts the list by it's value e.g. [5,3,2,4,1] -> [1,2,3,4,5]
Third: Write it to a new textfile
with open(r'outter.txt', 'a') as fh:
[fh.write('{0}\n'.format(str(entry))) for entry in fileNumbers]
I have a txt file that is composed of three columns, the first column is integers and the second and third column are floats. I want to do a calculation with each float and separate by line. My pseudocode is below:
def first_function(file):
pogt = 0.
f=open(file, 'r')
for line in f:
pogt += otherFunction(first float, second float)
f.close
Also, would the "for line in f" guarantee that my pogt will be the sum of my otherFunction calculation of all the lines in the txt file?
Assuming that you get the values for first float and second float correctly, your code is close to correct, you'll need to dedent (the inverse of indent) the f.close line, or even better, use with, it will handle the close for you (btw, you should do f.close() instead of f.close)
And do not use file as variable name, it's reserved word in Python.
Also use better names for your variables.
Assuming your file is separated by spaces, you can define get_numbers as follows:
def get_numbers(line):
[the_integer, first_float, second_float] = line.strip().split()
return (first_float, second_float)
def first_function(filename):
the_result = 0
with open(filename, 'r') as f:
for line in f:
(first_float, second_float) = get_numbers(line)
the_result += other_function(first_float, second_float)
There's a text file that I'm reading line by line. It looks something like this:
3
3
67
46
67
3
46
Each time the program encounters a new number, it writes it to a text file. The way I'm thinking of doing this is writing the first number to the file, then looking at the second number and checking if it's already in the output file. If it isn't, it writes THAT number to the file. If it is, it skips that line to avoid repetitions and goes on to the next line. How do I do this?
Rather than searching your output file, keep a set of the numbers you've written, and only write numbers that are not in the set.
Instead of checking output file for the number if it was already written it is better to keep this information in a variable (a set or list). It will save you on disk reads.
To search a file for numbers you need to loop through each line of that file, you can do that with for line in open('input'): loop, where input is the name of your file. On each iteration line would contain one line of input file ended with end of line character '\n'.
In each iteration you should try to convert the value on that line to a number, int() function may be used. You may want to protect yourself against empty lines or non-number values with try statement.
In each iteration having the number you should check if the value you found wasn't already written to the output file by checking a set of already written numbers. If value is not in the set yet, add it and write to the output file.
#!/usr/bin/env python
numbers = set() # create a set for storing numbers that were already written
out = open('output', 'w') # open 'output' file for writing
for line in open('input'): # loop through each line of 'input' file
try:
i = int(line) # try to convert line to integer
except ValueError: # if conversion to integer fails display a warning
print "Warning: cannot convert to number string '%s'" % line.strip()
continue # skip to next line on error
if i not in numbers: # check if the number wasn't already added to the set
out.write('%d\n' % i) # write the number to the 'output' file followed by EOL
numbers.add(i) # add number to the set to mark it as already added
This example assumes that your input file contains single numbers on each line. In case of empty on incorrect line a warning will be displayed to stdout.
You could also use list in the above example, but it may be less efficient.
Instead of numbers = set() use numbers = [] and instead of numbers.add(i): numbers.append(i). The if condition stays the same.
Don't do that. Use a set() to keep track of all the numbers you have seen. It will only have one of each.
numbers = set()
for line in open("numberfile"):
numbers.add(int(line.strip()))
open("outputfile", "w").write("\n".join(str(n) for n in numbers))
Note this reads them all, then writes them all out at once. This will put them in a different order than in the original file (assuming they're integers, they will come out in ascending numeric order). If you don't want that, you can also write them as you read them, but only if they are not already in the set:
numbers = set()
with open("outfile", "w") as outfile:
for line in open("numberfile"):
number = int(line.strip())
if number not in numbers:
outfile.write(str(number) + "\n")
numbers.add(number)
Are you working with exceptionally large files? You probably don't want to try to "search" the file you're writing to for a value you just wrote. You (probably) want something more like this:
encountered = set([])
with open('file1') as fhi, open('file2', 'w') as fho:
for line in fhi:
if line not in encountered:
encountered.add(line)
fho.write(line)
If you want to scan through a file to see if it contains a number on any line, you could do something like this:
def file_contains(f, n):
with f:
for line in f:
if int(line.strip()) == n:
return True
return False
However as Ned points out in his answer, this isn't a very efficient solution; if you have to search through the file again for each line, the running time of your program will increase proportional to the square of the number of numbers.
It the number of values is not incredibly large, it would be more efficient to use a set (documentation). Sets are designed to very efficiently keep track of unordered values. For example:
with open("input_file.txt", "rt") as in_file:
with open("output_file.txt", "wt") as out_file:
encountered_numbers = set()
for line in in_file:
n = int(line.strip())
if n not in encountered_numbers:
encountered_numbers.add(n)
out_file.write(line)