I have a file sized 15-16GB containing json objects seperated by new line (\n).
I am new to python and reading the file using the following code.
with open(filename,'rb') as file:
for data in file:
dosomething(data)
If while reading the reading ,my script fails after 5GB, how can I resume my read operation from the last read position and continue from there.
I am trying to do the same by using the file.tell() to get position and move the pointer using the seek() function.
Since this file contains json objects, after seek operation am getting the below error.
ValueError: No JSON object could be decoded
I am assuming that after seek operation the pointer is not getting proper json.
How can I solve this?. Is there any other way to read from last read position in python.
Use another file to store the current location:
cur_loc = open("location.txt", "w+")
cur_loc.write('0')
exception = False
i = 0
with open("test.txt","r") as f:
while(True):
i+=1
if exception:
cur_loc.seek(0)
pos = int(cur_loc.readline())
f.seek(pos)
exception = False
try:
read = f.readline()
print read,
if i==5:
print "Exception Happened while reading file!"
x = 1/0 #to make an exception
#remove above if block and do everything you want here.
if read == '':
break
except:
exception = True
cur_loc.seek(0)
cur_loc.write(str(f.tell()))
cur_loc.close()
Let assume we have the following text.txt as input file:
#contents of text.txt
1
2
3
4
5
6
7
8
9
10
When you run above program, you will have:
>>> ================================ RESTART ================================
>>>
1
2
3
4
5
Exception Happened while reading file!
6
7
8
9
10
>>>
You can use for i, line in enumerate(opened_file) to get the line numbers and store this variable. when your script fails you can display this variable to the user. You will then need to make an optional command line argument for this variable. if the variable is given your script needs to do opened_file.readline() for i in range(variable). this way you will get to the point where you left.
for i in range(passed_variable):
opened_file.readline()
Related
I am having some trouble getting a part of my code to read a value from a text file which then can be converted to an integer and then modified by adding a user input value, then input the new value into the file. This is for a simple inventory program that keeps track of certain items.
Example:
User inputs 10 to be added to the number in the file. The number in the file is 231 so 10+231 = 241. 241 is the new number that is put in the file in place of the original number in the file. I have tried many different things and tried researching this topic, but no code I could come up with has worked. If it isn't apparent by now I am new to python. If anyone one can help it would be greatly appreciated!
the steps that you need to take are
Open the file in read mode: file = open("path/to/file", "r")
Read the file to a python string: file_str = file.read()
Convert the string to an integer: n = int(file_str)
Add 10 and convert num: num_str = str(n + 10)
Close the file: file.close()
Reopen the file in write mode: file = open("path/to/file", "w")
Write the num string to the file: file.write(num_str)
If your is
1 2 3 4 5 6
7 8 9 10 11 12
....
....
then,go searching line by line and find the line number and index of your number.
with open('data.txt') as f:
content = f.readlines()
for x in range(len(content)):
if '5' in content[x].split(' '):
lno = x
index = content[x].split(' ').index('5')
So,now you got the index.Add the user input to the number and save it into the file as you have the line number and index.
I'm trying to loop over every 2 character in a file, do some tasks on them and write the result characters into another file.
So I tried to open the file and read the first two characters.Then I set the pointer on the 3rd character in the file but it gives me the following error:
'bytes' object has no attribute 'seek'
This is my code:
the_file = open('E:\\test.txt',"rb").read()
result = open('E:\\result.txt',"w+")
n = 0
s = 2
m = len(the_file)
while n < m :
chars = the_file.seek(n)
chars.read(s)
#do something with chars
result.write(chars)
n =+ 1
m =+ 2
I have to mention that inside test.txt is only integers (numbers).
The content of test.txt is a series of binary data (0's and 1's) like this:
01001010101000001000100010001100010110100110001001011100011010000001010001001
Although it's not the point here, but just want to replace every 2 character with something else and write it into result.txt .
Use the file with the seek and not its contents
Use an if statement to break out of the loop as you do not have the length
use n+= not n=+
finally we seek +2 and read 2
Hopefully this will get you close to what you want.
Note: I changed the file names for the example
the_file = open('test.txt',"rb")
result = open('result.txt',"w+")
n = 0
s = 2
while True:
the_file.seek(n)
chars = the_file.read(2)
if not chars:
break
#do something with chars
print chars
result.write(chars)
n +=2
the_file.close()
Note that because, in this case, you are reading the file sequentially, in chunks i.e. read(2) rather than read() the seek is superfluous.
The seek() would only be required if you wished to alter the position pointer within the file, say for example you wanted to start reading at the 100th byte (seek(99))
The above could be written as:
the_file = open('test.txt',"rb")
result = open('result.txt',"w+")
while True:
chars = the_file.read(2)
if not chars:
break
#do something with chars
print chars
result.write(chars)
the_file.close()
You were trying to use .seek() method on a string, because you thought it was a File object, but the .read() method of files turns it into a string.
Here's a general approach I might take to what you were going for:
# open the file and load its contents as a string file_contents
with open('E:\\test.txt', "r") as f:
file_contents = f.read()
# do the stuff you were doing
n = 0
s = 2
m = len(file_contents)
# initialize a result string
result = ""
# iterate over the file_contents, incrementing by 2, adding to results
for i in xrange(0, m, 2):
result += file_contents[i]
# write to results.txt
with open ('E:\\result.txt', 'wb') as f:
f.write(result)
Edit: It seems like there was a change to the question. If you want to change every second character, you'll need to make some adjustments.
I have an issue with a bit of code that works in Python 3, but fail in 2.7. I have the following part of code:
def getDimensions(file,log):
noStations = 0
noSpanPts = 0
dataSet = False
if log:
print("attempting to retrieve dimensions. Opening file",file)
while not dataSet:
try: # read until error occurs
string = file.readline().rstrip() # to avoid breaking on an empty line
except IOError:
break
stations
if "Ax dist hub" in string: # parse out number of stations
if log:
print("found ax dist hub location")
next(file) # skip empty line
eos = False # end of stations
while not eos:
string = file.readline().rstrip()
if string =="":
eos = True
else:
noStations = int(string.split()[0])
This returns an error:
ValueError: Mixing iteration and read methods would lose data.
I understand that the issue is how I read my string in the while loop, or at least that is what I believe. Is there a quick way to fix this? Any help is appreciated. Thank you!
The problem is that you are using next and readline on the same file. As the docs say:
. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right.
The fix is trivial: replace next with readline.
If you want a short code to do that try:
lines = []
with open(filename) as f:
lines = [line for line in f if line.strip()]
Then you can do tests for lines.
I have a problem reading characters from a file. I have a file called fst.fasta and I want to know the number of occurrences of the letters A and T.
This is the first code sample :
f = open("fst.fasta","r")
a = f.read().count("A")
t = f.read().count("T")
print "nbr de A : ", a
print "nbr de T : ", t
The result:
nbr of A : 255
nbr of T : 0
Even if there are Ts i get always 0
But after that, I tried this :
f = open("fst.fasta","r")
a = f.read().count("A")
f = open("fst.fasta","r")
t = f.read().count("T")
print "nbr de A : ", a
print "nbr de T : ", t
This worked! Is there any other way to avoid repeating f = open("fst.fasta","r") ?
You're dealing with the fact that read() has a side effect (to use the term really loosely): it reads through the file and as it does so sets a pointer to where it is in that file. When it returns you can expect that pointer to be set to the last position. Therefore, executing read() again starts from that position and doesn't give you anything back. This is what you want:
f = open("fst.fasta","r")
contents = f.read()
a = contents.count("A")
t = contents.count("T")
The documentation also indicates other ways you can use read:
next_value = f.read(1)
if next_value == "":
# We have reached the end of the file
What has happened in the code above is that, instead of getting all the characters in the file, the file handler has only returned 1 character. You could replace 1 with any number, or even a variable to get a certain chunk of the file. The file handler remembers where the above-mentioned pointer is, and you can pick up where you left off. (Note that this is a really good idea for very large files, where reading it all into memory is prohibitive.)
Only once you call f.close() does the file handler 'forget' where it is - but it also forgets the file, and you'd have to open() it again to start from the beginning.
There are other functions provided (such as seek() and readline()) that will let you move around a file using different semantics. f.tell() will tell you where the pointer is in the file currently.
Each time you call f.read(), it consumes the entire remaining contents of the file and returns it. You then use that data only to count the as, and then attempt to read the data thats already been used. There are two solutions"
Option 1: Use f.seek(0)
a = f.read().count("A")
f.seek(0)
t = f.read().count("T")
The f.seek call sets the psoition of the file back to the beginning.
Option 2. Store the result of f.read():
data = f.read()
a = data.count("A")
t = data.count("T")
f.seek(0) before the second f.read() will reset the file pointer to the beginning of the file. Or more sanely, save the result of f.read() to a variable, and you can then call .count on that variable to your heart's content without rereading the file pointlessly.
Try the with construct:
with open("fst.fasta","r") as f:
file_as_string = f.read()
a = file_as_string.count("A")
t = file_as_string.count("T")
This keeps the file open until you exit the block.
Read it into a string:
f = open ("fst.fasta")
allLines = f.readlines()
f.close()
# At this point, you are no longer using the file handler.
for line in allLines:
print (line.count("A"), " ", line.count("T"))
I would like to read one line at a time and assigning that String to a variable in my Python script. Once this value is assigned I would like to delete that line from the txt file. Just now I have the next code:
import os
# Open file with a bunch of keywords
inputkeywordsfile = open(os.path.join(os.path.dirname(__file__),'KeywordDatabase.txt'),'r')
# Assigning
keyword = inputkeywordsfile.readline().strip()
So for example if the .txt file has this structure:
dog
cat
horse
The first time I run my script, dog will be assigned to keyword.
The second time I run my script, cat will be assigned to keyword and dog will be deleted from the text file.
SOLVED:
readkeywordsfile = open(os.path.join(os.path.dirname(__file__),'KeywordDatabase.txt'),'r')
firstline = readkeywordsfile.readline().strip()
lines = readkeywordsfile.readlines()
readkeywordsfile.close()
del lines[0:1]
writekeywordsfile = open(os.path.join(os.path.dirname(__file__),'KeywordDatabase.txt'),'w')
writekeywordsfile.writelines(lines)
writekeywordsfile.close()
keyword = firstline
Try this out and let me know how you get on. As a point to note, when dealing with file objects, the Pythonic way is to use the with open syntax as this ensures that the file is closed once you leave the indented code block. :)
import os
# Open file with a bunch of keywords
with open(os.path.join(os.path.dirname(__file__),'KeywordDatabase.txt'),'r') as inputkeywordsfile:
# Read all lines into a list and retain the first one
keywords = inputkeywordsfile.readlines()
keyword = keywords[0].strip()
with open(os.path.join(os.path.dirname(__file__),'KeywordDatabase.txt'),'w') as outputkeywordsfile:
for w in keywords[1:]:
outputkeywordsfile.write(w)
There could be a better solution perhaps. This worked for me as per my understanding of your question. During the executing each of the lines are assigned to the variable keyword . That is the reason I used print keyword to elaborate this fact. Moreover just for demonstrating I used time.sleep(5). During this pause of 5 seconds you can check your txt file, it will contain the data as you wished(When second line is assigned to a variable, the first line is removed from the txt file).
Code:
import os
import time
f = open("KeywordDatabase.txt","r")
lines = f.readlines()
f.close()
k = 0
for line in lines:
if k == 0:
keyword = line #Assignment takes place here
print keyword
f = open("KeywordDatabase.txt","w")
for w in lines[k:]:
f.write(w)
k += 1
f.close()
else:
keyword = line #Assignment takes place here
print keyword
f = open("KeywordDatabase.txt","w")
for w in lines[k:]:
f.write(w)
f.close()
k += 1
time.sleep(5) #Time to check the txt file :)