Switching files to read from in python's .csv reader - python

I am reading a csv file several times, but cutting its size every time I go through it. So, once I've reached the bottom, I am writing a new csv file which is, say, the bottom half of the .csv file. I then wish to change the csv reader to use this new file instead, but it doesn't seem to be working... Here's what I've done.
sent = open(someFilePath)
r_send = csv.reader(sent)
try:
something = r_send.next()
except StopIteration:
sent.seek(0)
sent.close()
newFile = cutFile(someFilePath, someLineNumber)
sent = open(newFile, "r")
r_send = csv.reader(sent)
where cutFile does..
def cutFile(sender, lines):
sent = open(sender, "r")
new_sent = open(sender + ".temp.csv", "w")
counter = 0
for line in sent:
counter = counter + 1
if counter >= lines:
print >> new_sent, ",".join(line)
new_sent.close()
return sender + ".temp.csv"
Why is this not working?

Is something = r_send.next() in some kind of loop? The way you wrote it, it's only going to read one line.
Why do you need ",".join(line)? You can simply print line itself, and it should work.
Plus, there really is no need to seek(0) before closing a file.

I suggest the following:
Use for something in r_send: rather than something = r_send.next(); you won't even need the try... except blocks, as you'll just put the stuff closing the original file outside that loop (as someone else mentioned, you aren't even looping through the original file in your current code). Then you'll probably want to wrap all that in another loop so it keeps continuing until the file has been fully manipulated.
Use new_sent.write(line) instead of print >> new_sent, ",".join(line). Not that it makes that much of a difference besides the ",".join bit (which you don't need since you aren't using the csv module to write to a file), which you shouldn't be using here anyway, but it makes the fact that you're writing to a file more evident.
So...
sent = open(someFilePath)
r_send = csv.reader(sent)
someLineNumber = len(sent.readlines())
while someLineNumber > 0:
for something in r_send:
# do stuff
someLineNumber /= 2 # //= 2 in Python 3
sent.close()
newFile = cutFile(someFilePath, someLineNumber)
sent = open(newFile, "r")
r_send = csv.reader(sent)
Something like that.

Related

Trying to skip over several lines but the skipped lines are still being worked on

My program takes an input file, reads the file using whitespace as the delimiter and puts the data into an array, then I want to iterate over each line and if certain strings are found write that info to another file.
When a specific string is found, I want to skip over several lines, meaning that these lines are NOT iterated over. I thought that if I increased the 'line' variable (i) that would do it, but despite the fact that i is increased by 50, those 50 lines are still being worked on, which is not what I want.
Hopefully I have explained this problem well. Thank you in advance for your feedback.
def create_outfile(infile):
gto_found = 0
outfile = "output.txt" # Output file
outfile = open(outfile,'w') # Open output file for writing
for i in range(len(infile)): # iterate over each line
if len(infile[i]) == 6:
if (infile[i][4][1:-1]) == "GTO" and gto_found == 0: # now skip
print (i)
print (infile[i])
debugPause = input("\nPausing to debug...\n")
i = i + 50 # Skip over the GTO section
gto_found = 1
print (i)
debugPause = input("\nPausing to debug...\n")
print (infile[i])
for j in range(len(infile[i])): # iterate over each element
# Command section
if (infile[i][j])[:5] == "#ACS_":
# Do some work
Unfortunately, python does not allow a for loop to jump up like that. The variable i cannot be edited inside the loop. This is same as this question here, so check it out. This other topic shows some work around that you could use.

Is there a way to read file in reverse using with open using Python

I'm trying to read a file.out server file but I need to read only latest data in datetime range.
Is it possible to reverse read file using with open() with modes(methods)?
The a+ mode gives access to the end of the file:
``a+'' Open for reading and writing. The file is created if it does not
exist. The stream is positioned at the end of the file. Subsequent writes
to the file will always end up at the then current end of the file,
irrespective of any intervening fseek(3) or similar.
Is there a way to use maybe a+ or other modes(methods) to access the end of the file and read a specific range?
Since regular r mode reads file from beginning
with open('file.out','r') as file:
have tried using reversed()
for line in reversed(list(open('file.out').readlines())):
but it returns no rows for me.
Or there are other ways to reverse read file... help
EDIT
What I got so far:
import os
import time
from datetime import datetime as dt
start_0 = dt.strptime('2019-01-27','%Y-%m-%d')
stop_0 = dt.strptime('2019-01-27','%Y-%m-%d')
start_1 = dt.strptime('09:34:11.057','%H:%M:%S.%f')
stop_1 = dt.strptime('09:59:43.534','%H:%M:%S.%f')
os.system("touch temp_file.txt")
process_start = time.clock()
count = 0
print("reading data...")
for line in reversed(list(open('file.out'))):
try:
th = dt.strptime(line.split()[0],'%Y-%m-%d')
tm = dt.strptime(line.split()[1],'%H:%M:%S.%f')
if (th == start_0) and (th <= stop_0):
if (tm > start_1) and (tm < stop_1):
count += 1
print("%d occurancies" % (count))
os.system("echo '"+line.rstrip()+"' >> temp_file.txt")
if (th == start_0) and (tm < start_1):
break
except KeyboardInterrupt:
print("\nLast line before interrupt:%s" % (str(line)))
break
except IndexError as err:
continue
except ValueError as err:
continue
process_finish = time.clock()
print("Done:" + str(process_finish - process_start) + " seconds.")
I'm adding these limitations so when I find the rows it could atleast print that the occurancies appeared and then just stop reading the file.
The problem is that it's reading, but it's way too slow..
EDIT 2
(2019-04-29 9.34am)
All the answers I received works well for reverse reading logs, but in my (and maybe for other people's) case, when you have n GB size log Rocky's answer below suited me the best.
The code that works for me:
(I only added for loop to Rocky's code):
import collections
log_lines = collections.deque()
for line in open("file.out", "r"):
log_lines.appendleft(line)
if len(log_lines) > number_of_rows:
log_lines.pop()
log_lines = list(log_lines)
for line in log_lines:
print(str(line).split("\n"))
Thanks people, all the answers works.
-lpkej
There's no way to do it with open params but if you want to read the last part of a large file without loading that file into memory, (which is what reversed(list(fp)) will do) you can use a 2 pass solution.
LINES_FROM_END = 1000
with open(FILEPATH, "r") as fin:
s = 0
while fin.readline(): # fixed typo, readlines() will read everything...
s += 1
fin.seek(0)
mylines = []
for i, e in enumerate(fin):
if i >= s - LINES_FROM_END:
mylines.append(e)
This won't keep your file in the memory, you can also reduce this to one pass by using collections.deque
# one pass (a lot faster):
mylines = collections.deque()
for line in open(FILEPATH, "r"):
mylines.appendleft(line)
if len(mylines) > LINES_FROM_END:
mylines.pop()
mylines = list(mylines)
# mylines will contain #LINES_FROM_END count of lines from the end.
Sure there is:
filename = 'data.txt'
for line in reversed(list(open(filename))):
print(line.rstrip())
EDIT:
As mentioned in comments this will read the whole file into memory. This solution should not be used with large files.
Another option is to mmap.mmap the file and then use rfind from the end to search for the newlines and then slice out the lines.
Hey m8 I have made this code it works for me I can read in my file in reversed order. hope it helps :)
I start by creating a new text file, so I don't know how much that is important for you.
def main():
f = open("Textfile.txt", "w+")
for i in range(10):
f.write("line number %d\r\n" % (i+1))
f.close
def readReversed():
for line in reversed(list(open("Textfile.txt"))):
print(line.rstrip())
main()
readReversed()

Python- lazily read file which does not have new lines [duplicate]

I usually read files like this in Python:
f = open('filename.txt', 'r')
for x in f:
doStuff(x)
f.close()
However, this splits the file by newlines. I now have a file which has all of its info in one line (45,000 strings separated by commas). While a file of this size is trivial to read in using something like
f = open('filename.txt', 'r')
doStuff(f.read())
f.close()
I am curious if for a much larger file which is all in one line it would be possible to achieve a similar iteration effect as in the first code snippet but with splitting by comma instead of newline, or by any other character?
The following function is a fairly straightforward way to do what you want:
def file_split(f, delim=',', bufsize=1024):
prev = ''
while True:
s = f.read(bufsize)
if not s:
break
split = s.split(delim)
if len(split) > 1:
yield prev + split[0]
prev = split[-1]
for x in split[1:-1]:
yield x
else:
prev += s
if prev:
yield prev
You would use it like this:
for item in file_split(open('filename.txt')):
doStuff(item)
This should be faster than the solution that EMS linked, and will save a lot of memory over reading the entire file at once for large files.
Open the file using open(), then use the file.read(x) method to read (approximately) the next x bytes from the file. You could keep requesting blocks of 4096 characters until you hit end-of-file.
You will have to implement the splitting yourself - you can take inspiration from the csv module, but I don't believe you can use it directly because it wasn't designed to deal with extremely long lines.

Check to see if ID is contained in a txt file

I want to download new tweets from a particular user and filter with a few other rules. How do I cross reference the tweet ID from the tweet I am handling with what ID's are in the tweetid.txt file to avoid duplicating what I am saving in the NRE_tweet file?
This is what I have written so far that is producing duplicates.
i = 0
for tweet in NRE_tweets:
tweet_ids = open('tweetid.txt', 'a+')
if NRE_tweets[i]['in_reply_to_screen_name'] is None:
if NRE_tweets[i]['id_str'] not in tweet_ids.readlines():
print("adding tweet " + str(NRE_tweets[i]['id_str']))
info_wanted.append(NRE_tweets[i]['text'])
info_wanted.append(NRE_tweets[i]['id_str'])
info_wanted.append(NRE_tweets[i]['created_at'])
NRE_file = open('NRE.txt', 'a')
NRE_file.write(str(info_wanted) + '\n')
NRE_file.close()
append_tweet_ids = open('tweetid.txt', 'a')
append_tweet_ids.write(NRE_tweets[i]['id_str'] + '\n')
append_tweet_ids.close()
tweet_ids.close()
info_wanted = []
i += 1
EDIT: Thanks for advice, working code is now sorted. There is a few things I can do to make it cleaner, but for now... It works.
NRE_tweets = t.statuses.user_timeline(screen_name='NRE_northern')
i = 0
NRE_file = open('NRE.txt', 'a')
openFile = shelve.open('tweetid')
try:
loadIDs = openFile['list_id']
print("list_id's loaded")
except:
print("exception entered")
loadIDs = []
for tweet in NRE_tweets:
if NRE_tweets[i]['in_reply_to_screen_name'] is None: # check that tweet isn't a reply
if NRE_tweets[i]['id_str'] in loadIDs:
print(str(NRE_tweets[i]['id_str']) + ' already stored')
else:
print("adding " + str(NRE_tweets[i]['id_str']))
# added wanted elements to a list
info_wanted.append(NRE_tweets[i]['text'])
info_wanted.append(NRE_tweets[i]['id_str'])
info_wanted.append(NRE_tweets[i]['created_at'])
# added list to txt file
NRE_file.write(str(info_wanted) + '\n')
loadIDs.append(NRE_tweets[i]['id_str'])
openFile['list_id'] = loadIDs
info_wanted = []
i += 1
print(openFile['list_id'])
NRE_file.close()
openFile.close()
Don't use if x is None: in your code, unless there's a chance that x is literally None. Because only None is None and everybody else (0, empty iterables, etc) are fakers :) Instead, you should use if not x.
readlines() returns the lines in the file, including the line ending \n for each line. So you should write if (NRE_tweets[i]['id_str'] + '\n') not in tweet_ids.readlines():
Like you've been advised in a comment, open the file once before your for loop and close after the for loop. Also consider using the shelve module (or sqlite3) for this; it'll make handling the data a lot easier.
EDIT:
Also I notice you opened tweetid.txt twice without closing in between. Theres no need for the second open() inside the IF block. You can simply call write() using the first file handle, in order to add the new ID to the file. You should also call readlines() outside the loop and save it to a list which you then use in the for loop header, because, with your new code structure, subsequent calls to readlines() will return an empty string as the file has been exhausted. So when you find a new ID, you append it to this list, as well as call write() to add the ID to tweetid.txt.
An alternative is that you open the file in read mode at first, call readlines() and save the result to a list, close the file. Start the loop and perform all your operations on the list; add new IDs, delete, whatever. At the end of the loop, you re-open tweetid.txt in write mode and write the lists contents to the file; it will overwrite the old contents. Use this method if you could be adding a lot of new IDs.
Structure your code so that you only open files once, operate on them and finally close them.

Prevent closing file in Python

I have a problem reading characters from a file. I have a file called fst.fasta and I want to know the number of occurrences of the letters A and T.
This is the first code sample :
f = open("fst.fasta","r")
a = f.read().count("A")
t = f.read().count("T")
print "nbr de A : ", a
print "nbr de T : ", t
The result:
nbr of A : 255
nbr of T : 0
Even if there are Ts i get always 0
But after that, I tried this :
f = open("fst.fasta","r")
a = f.read().count("A")
f = open("fst.fasta","r")
t = f.read().count("T")
print "nbr de A : ", a
print "nbr de T : ", t
This worked! Is there any other way to avoid repeating f = open("fst.fasta","r") ?
You're dealing with the fact that read() has a side effect (to use the term really loosely): it reads through the file and as it does so sets a pointer to where it is in that file. When it returns you can expect that pointer to be set to the last position. Therefore, executing read() again starts from that position and doesn't give you anything back. This is what you want:
f = open("fst.fasta","r")
contents = f.read()
a = contents.count("A")
t = contents.count("T")
The documentation also indicates other ways you can use read:
next_value = f.read(1)
if next_value == "":
# We have reached the end of the file
What has happened in the code above is that, instead of getting all the characters in the file, the file handler has only returned 1 character. You could replace 1 with any number, or even a variable to get a certain chunk of the file. The file handler remembers where the above-mentioned pointer is, and you can pick up where you left off. (Note that this is a really good idea for very large files, where reading it all into memory is prohibitive.)
Only once you call f.close() does the file handler 'forget' where it is - but it also forgets the file, and you'd have to open() it again to start from the beginning.
There are other functions provided (such as seek() and readline()) that will let you move around a file using different semantics. f.tell() will tell you where the pointer is in the file currently.
Each time you call f.read(), it consumes the entire remaining contents of the file and returns it. You then use that data only to count the as, and then attempt to read the data thats already been used. There are two solutions"
Option 1: Use f.seek(0)
a = f.read().count("A")
f.seek(0)
t = f.read().count("T")
The f.seek call sets the psoition of the file back to the beginning.
Option 2. Store the result of f.read():
data = f.read()
a = data.count("A")
t = data.count("T")
f.seek(0) before the second f.read() will reset the file pointer to the beginning of the file. Or more sanely, save the result of f.read() to a variable, and you can then call .count on that variable to your heart's content without rereading the file pointlessly.
Try the with construct:
with open("fst.fasta","r") as f:
file_as_string = f.read()
a = file_as_string.count("A")
t = file_as_string.count("T")
This keeps the file open until you exit the block.
Read it into a string:
f = open ("fst.fasta")
allLines = f.readlines()
f.close()
# At this point, you are no longer using the file handler.
for line in allLines:
print (line.count("A"), " ", line.count("T"))

Categories