Everyone knows how to count the characters from STDIN in C. However, when I tried to do that in python3, I find it is a puzzle. (counter.py)
import sys
chrCounter = 0
for line in sys.stdin.readline():
chrCounter += len(line)
print(chrCounter)
Then I try to test the program by
python3 counter.py < counter.py
The answer is only the length of the first line "import sys". In fact, the program ONLY read the first line from the standard input, and discard the rest of them.
It will be work if I take the place of sys.stdin.readline by sys.stdin.read()
import sys
print(len(sys.stdin.read()))
However, it is obviously, that the program is NOT suitable for a large input. Please give me a elegant solution. Thank you!
It's simpler:
for line in sys.stdin:
chrCounter += len(line)
The file-like object sys.stdin is automatically iterated over line by line; if you call .readline() on it, you only read the first line (and iterate over that character-by-character); if you call read(), then you'll read the entire input into a single string and iterate over that character-by.character.
The answer from Tim Pietzcker is IMHO the correct one. There are 2 similar ways of doing this. Using:
for line in sys.stdin:
and
for line in sys.stdin.readlines():
The second option is closer to your original code. The difference between these two options is made clear by using e.g. the following modification of the for-loop body and using keyboard for input:
for line in sys.stdin.readlines():
line_len = len(line)
print('Last line was', line_len, 'chars long.')
chrCounter += len(line)
If you use the first option (for line in sys.stdin:), then the lines are processed right after you hit enter.
If you use the second option (for line in sys.stdin.readlines():), then the whole file is first read, split into lines and only then they are processed.
If I just wanted a character count, I'd read in blocks at a time instead of lines at a time:
# 4096 chosen arbitrarily. Pick any other number you want to use.
print(sum(iter(lambda:len(sys.stdin.read(4096)), 0)))
Related
I recently came across an answer that uses the code below to remove last line from big file using Python, it is very fast and efficient but I cannot make it work to delete first line from a file.
Can anyone please help?
Here is that answer
https://stackoverflow.com/a/10289740/9311781
Below is the code:
with open(sys.argv[1], "r+", encoding = "utf-8") as file:
# Move the pointer (similar to a cursor in a text editor) to the end of the file
file.seek(0, os.SEEK_END)
# This code means the following code skips the very last character in the file -
# i.e. in the case the last line is null we delete the last line
# and the penultimate one
pos = file.tell() - 1
# Read each character in the file one at a time from the penultimate
# character going backwards, searching for a newline character
# If we find a new line, exit the search
while pos > 0 and file.read(1) != "\n":
pos -= 1
file.seek(pos, os.SEEK_SET)
# So long as we're not at the start of the file, delete all the characters ahead
# of this position
if pos > 0:
file.seek(pos, os.SEEK_SET)
file.truncate()
As comments already mentioned - Removing last line like that is easy because you can "truncate" the file from a given position - and afaik, that works across multiple operating systems and filesystems.
However, similary truncating from the start of the file is not standard operation in many filesystems. Linux does support this on new enough kernels (>3.15 afaik) and Mac's might have something similar too.
You could try to use Fallocate package from pypi[0] - or implement something similar by using the underlying syscall[1] if your os/filesystem is compatible.
[0] - https://pypi.org/project/fallocate/
[1] - https://man7.org/linux/man-pages/man2/fallocate.2.html
I have a input.txt file with the following content.
3
4 5
I want to use this as a standard input by using the following command in the command line.
python a.py < input.txt
In the a.py script, I am trying to read the input line by line using input() function. I know there are better ways to read the stdin, but I need to use input() function.
A naive approach of
line1 = input()
line2 = input()
did not work. I get the following error message.
File "<string>", line 1
4 5
^
SyntaxError: unexpected EOF while parsing
That way is ok, it works:
read = input()
print(read)
but you are just reading one line.
From the input() doc:
The function then reads a line from input, converts it to a string
(stripping a trailing newline), and returns that.
That means that if the file does not end with a blank line, or what is the same, the last nonblank line of the file do not end with an end of line character, you will get exceptions.SyntaxError and the last line will not be read.
You mention HackerRank; looking at some of my old submissions, I think I opted to give up on input in lieu of sys.stdin manipulations. input() is very similar to next(sys.stdin), but the latter will handle EOF just fine.
By way of example, my answer for https://www.hackerrank.com/challenges/maximize-it/
import sys
import itertools
# next(sys.stdin) is functionally identical to input() here
nK, M = (int(n) for n in next(sys.stdin).split())
# but I can also iterate over it
K = [[int(n) for n in line.split()][1:] for line in sys.stdin]
print(max(sum(x**2 for x in combo) % M for combo in itertools.product(*K)))
Currently, I am using
def eofapproached(f):
pos = f.tell()
near = f.read(1) == ''
f.seek(pos)
return near
to detect if a file open in 'r' mode (the default) is "at EOF" in the sense that the next read would produce the EOF condition.
I might use it like so:
f = open('filename.ext') # default 'r' mode
print(eofapproached(f))
FYI, I am working with some existing code that stops when EOF occurs, and I want my code to do some action just before that happens.
I am also interested in any suggestions for a better (e.g., more concise) function name. I thought of eofnear, but that does not necessarily convey as specific a meaning.
Currently, I use Python 3, but I may be forced to use Python 2 (part of a legacy system) in the future.
You can use f.tell() to find out your current position in the file.
The problem is, that you need to find out how big the file is.
The niave (and efficient) solution is os.path.getsize(filepath) and compare that to the result of tell() but that will return the size in bytes, which is only relavent if reading in binary mode ('rb') as your file may have multi-byte characters.
Your best solution is to seek to the end and back to find out the size.
def char_count(f):
current = f.tell()
f.seek(0, 2)
end = f.tell()
f.seek(current)
return end
def chars_left(f, length=None):
if not length:
length = char_count(f)
return length - f.tell()
Preferably, run char_count once at the beginning, and then pass that into chars_left. Seeking isn't efficient, but you need to know how long your file is in characters and the only way is by reading it.
If you are reading line by line, and want to know before reading the last line, you also have to know how long your last line is to see if you are at the beginning of the last line.
If you are reading line by line, and only want to know if the next line read will result in an EOF, then when chars_left(f, total) == 0 you know you are there (no more lines left to read)
I've formulated this code to avoid the use of tell (perhaps using tell is simpler):
import os
class NearEOFException(Exception): pass
def tellMe_before_EOF(filePath, chunk_size):
fileSize = os.path.getsize(filePath)
chunks_num = (fileSize // chunk_size) # how many chunks can we read from file?
reads = 0 # how many chunks we read so far
f = open(filePath)
if chunks_num == 0:
raise NearEOFException("File is near EOF")
for i in range(chunks_num-1):
yield f.read(chunk_size)
else:
raise NearEOFException("File is near EOF")
if __name__ == "__main__":
g = tellMe_before_EOF("xyz", 3) # read in chunks of 3 chars
while True:
print(next(g), end='') # near EOF raise NearEOFException
The naming of the function is disputed. It's boring to name things, I'm just not good at that.
The function works like this: take the size of the file and see approximately how many times can we read N sized chunks and store it in chunks_num. This simple division gets us near EOF, the question is where do you think near EOF is? Near the last char for example or near the last nth characters? Maybe that's something to keep in mind if it matters.
Trace through this code to see how it works.
I am reading input from sys.stdin in python and I need to perform some extra operations when the last line is encountered. How can I identify if the current line being executed is the last one?
for line in sys.stdin:
if <line is last line>:
// do some extra operation
else:
// rest of stuff
The only way to know that you're at the end of the stream is when you try to read from it and there's nothing there. Logic added after the for-loop will be at the end-of-stream case.
If you need to detect end-of-stream in the input stream, before you've finished with the previous record, then your logic can't use a "for"-loop. Instead, you must use "while." You must pre-read the first record, then, "while" the latest-thing-read isn't empty, you must read the next record and then process the current one. Only in this way can you know, before processing the current record, that there will be no records following it.
Before starting the loop, read the first line of the input. Then, in the loop, always process the line previously read. After the loop terminates, you'll still have the last line from sys.stdin in line_prev for your special processing.
import sys
line_prev = sys.stdin.readline()
for line in sys.stdin:
rest_of_stuff(line_prev)
line_prev = line
do_some_extra_operation(line)
Why don't use try this:
for a in iter(raw_input, ""):
# do something with a
The loop will break when the input equals to the sentinel (the second argument in iter). You can keep a reference to the last line as well, such as:
for a in iter(raw_input, ""):
if a == last_line:
# do stuff
last_line = a
# Do more stuff
For your understanding, all input in python is taken from sys.stdin, and as a result, you can use functions such as raw_input, and it will read from sys.stdin. Think of it like this:
def raw_input():
sys.stdin.readline()
It's not exactly like that, but it's similar to that concept.
i have a text file that is being written by another program every 10 seconds.
my code goes through this file and parses the data i want. but at some point the for loop reaches the end of file and program closes.
GOAL: i want the program to wait inside the for loop for more data to come so that it parses the new data too.
i tried it using a while with a condition about the lines that are left to be read but for some reason the program just stops a little after exiting the while loop.if i add let's say 25 lines...it processes 9 of them and then the program exits the for loop and program finishes(not crashes)
QUESTION: is there a better way to idle the program until new data arrives? what is wrong in this code?
k = -1
with open('epideiksh.txt') as weather_file:
for line in weather_file:
k = k+1
lines_left = count_lines_of('epideiksh.txt') - k
while ( lines_left <= 10 ):
print("waiting for more data")
time.sleep(10)
pointer = count_lines('epideiksh.txt') - k
if line.startswith('Heat Index'):
do_my_thing()
time.sleep(10)
The simplest, but slightly error-prone way of simulating tail is:
with open("filename") as input:
while True:
for line in input:
if interesting(line):
do_something_with(line)
sleep a_little
input.seek(0, io.SEEK_CUR)
In my very limited testing, that seemed to work without the seek. But it shouldn't, since normally you have to do something like that in order to clear the eof flag. One thing to keep in mind is that tell() cannot be used on a (text) file while it is being iterated, and seeking from SEEK_CUR invokes tell(). So in the above code snippet, you could not break out of the for loop and fall into the input.seek() call.
The problem with the above is that it is possible that the readline (implicit in the iterator) will only read part of the line currently being written. So you need to be prepared to abandon and reread partial lines:
with open("filename") as input:
# where is the end of the last complete line read
where = input.tell()
# use readline explicitly because next() and tell() are incompatible
while True:
line = input.readline()
if not line or line[-1] != '\n':
time.sleep(a_little)
input.seek(where)
else:
where = input.tell()
if interesting(line):
do_something_with(line)