Pandas Read Continuously Growing CSV File

Pandas Read Continuously Growing CSV File - python

I have a continuously growing CSV File, that I want to periodically read. I am also only interested in new values.
I was hoping to do something like:
file_chunks = pd.read_csv('file.csv', chunksize=1)
while True:
do_something(next(file_chunks))
time.sleep(0.1)
in a frequency, that is faster than the .csv file is growing.
However, as soon as the iterator does not return a value once, it "breaks" and does not return values, even if the .csv file has grown in the meantime.
Is there a way to read continuously growing .csv files line by line?

you could build a try: except: around it or make and if statement that checks if file_chunks is not none first.
Like this it shouldnt break anymore and he only sleeps when he there are no more chunks left.
while True:
file_chunks = pd.read_csv('file.csv', chunksize=1)
while True:
try:
do_something(next(file_chunks))
except:
time.sleep(0.1)

This is easier to do with the standard csv module where you can write your own line iterator that knows how to read an updating file. This generator would read in binary mode so that it can track file position, close the file at EOF and poll its size for appended data. This can fail if the reader gets a partial file update because the other side hasn't flushed yet, or if a CSV cell contains and embedded new line that invalidates the reader's assumption that a binary mode newline always terminates a row.
import csv
import time
import os
import threading
import random
def rolling_reader(filename, poll_period=.1, encoding="utf-8"):
pos = 0
while True:
while True:
try:
if os.stat(filename).st_size > pos:
break
except FileNotFoundError:
pass
time.sleep(poll_period)
fp = open(filename, "rb")
fp.seek(pos)
for line in fp:
if line.strip():
yield line.decode("utf-8")
pos = fp.tell()
# ---- TEST - thread updates test.csv periodically
class GenCSVThread(threading.Thread):
def __init__(self, csv_name):
super().__init__(daemon=True)
self.csv_name = csv_name
self.start()
def run(self):
val = 1
while True:
with open(self.csv_name, "a") as fp:
for _ in range(random.randrange(4)):
fp.write(",".join(str(val) for _ in range(4)) + "\n")
val += 1
time.sleep(random.random())
if os.path.exists("test.csv"):
os.remove("test.csv")
test_gen = GenCSVThread("test.csv")
reader = csv.reader(rolling_reader("test.csv"))
for row in reader:
print(row)
A platform dependent update would be to use a facility such as inotify to trigger reads off of a file close operation to reduce the risk of partial data.

Related

read a csv file and validate whether input parameter is in the csv file then bypass the purge process otherwise initiate purge process using python

sample csv file:
test.csv
process_cd
ramsize
dbsize
protocal
random
function will be called with below parameters
self.complete_stage_purge_process(self.targetCnxn, self.targetTable, self.processCode)
sample process codes:
test
protocal
forensic
each time function is called need to read the csv file for those process codes and if process code matches then bypass internal delete call
def complete_stage_purge_process(self, target_cnxn, stage_table, process_cd):
delete_sql = "delete from " + schemaName.STAGE.value + "." + \
stage_table + " where run_pk in (" + run_pk_sql + ")"
try:
trgt_cursor = target_cnxn.cursor()
trgt_cursor.execute(delete_sql)
target_cnxn.commit()
self.logger.debug("deletes processed successfully ")
target_cnxn.close()
except:
self.logger.exception('Error in processing deletes')
raise
else:
self.logger.debug('purge process is not required for this process')
how to achieve that csv read in loop
Tried with below piece of code but the code is still going to purge process and not running the process code search in loop
non_purge_process_file = open(self.file_path)
reader = csv.reader(non_purge_process_file)
for row in reader:
if process_cd in row:
self.logger.debug("Do not perform stage purge process.")
return
else:
delete_dt = datetime.today() - timedelta(days=30)
delete_dt = str(delete_dt)

I achieved solution to above problem using below approach:
def check_process_cd(self, process_cd):
self.logger.debug(datetime.now())
self.logger.debug('check_process_cd')
purge_process_flag = 0
reader = csv.reader(purge_process_codes_file)
for row in reader:
if process_cd == row[0]:
self.logger.debug("Perform stage purge process.")
purge_process_flag = 1
return purge_process_flag
based on flag perform the function call
if self.purge_process_flag == 1:
self.complete_stage_purge_process(self.targetCnxn, self.targetTable, self.processCode)
else:
self.logger.debug('Do not perform purge process')

import csv
def validate_and_purge(input_param, csv_file):
found = False # flag to indicate whether input_param is found in the CSV file
with open(csv_file, 'r') as f:
reader = csv.reader(f)
for row in reader:
if input_param in row:
found = True
break
if found:
return True
return False

How to save data from PLC in txt file by using value change trigger in python

I have a project that needs the data or value from PLC to trigger a printing program. This program I made can only read the value but when the value changes the action of saving it into a txt file didn't execute. Instead the program only stops reading the tag's value. Where did my code go wrong?
from pylogix import PLC
import time
with PLC() as comm:
comm.IPAddress = 'IPAddress'
read = True
ret = comm.Read('TagName')
new = ret
old = new
while read:
if new == old :
print(ret.TagName, ret.Value)
time.sleep(60)
else:
print('exiting')
for r in new:
with open("output.txt", "a") as f:
print(r.TagName, r.Value, file=f)
old = new
read = False

Python: Read a line from a file then remove it with threading

I have this program that uses Python threading to read different lines in a file, if it reads a duplicate line then reads another one, and once It has read it, removes the line from the file. The problem is that whenever it reads the file It doesn't update the file, or I'm not quite sure what's happening. It can sometimes read the same line as before therefore breaking it. I'm not sure if my code is the most effective way to do this?
def read_tokens_list():
tokens = []
with open('inputTokens.txt', 'r', encoding='UTF-8') as file:
lines = file.readlines()
for line in lines:
tokens.append(line.replace('\n', ''))
return tokens
def worker(token_list):
while True:
token = random.choice(token_list)
print(token)
ver = open("Fullyverified.txt", "a+")
ver.write(token + "\n")
with open("inputTokens.txt", "r") as f:
lines = f.readlines()
with open("inputTokens.txt", "w") as f:
for line in lines:
if line.strip("\n") != token:
f.write(line)
time.sleep(1)
def main():
threads = []
num_thread = input('Number of Threads: ')
num_thread = int(num_thread)
token_list = read_tokens_list() # read in the pokens.txt file
random.shuffle(token_list) # shuffle the list into random order
tokens_per_worker = len(token_list) // num_thread # how many tokens from the list each worker will get (roughly)
for i in range(num_thread):
if ((i+1)<num_thread):
num_tokens_for_this_worker = tokens_per_worker # give each worker an even share of the list
else:
num_tokens_for_this_worker = len(token_list) # except the last worker gets whatever is left
# we'll give the first (num_tokens_for_this_worker) tokens in the list to this worker
tokens_for_this_worker = token_list[0:num_tokens_for_this_worker]
# and remove those tokens from the list so that they won't get used by anyone else
token_list = token_list[num_tokens_for_this_worker:]
t = threading.Thread(target=worker, args= (tokens_for_this_worker, ))
threads.append(t)
t.start()
for t in threads:
t.join()
if __name__ == '__main__':
main()

Use a lock.
something like:
from threading import Lock
# ...
lock = Lock()
# ...
def worker(token_list, lock = lock):
# ...
with lock:
with open("inputTokens.txt", "r") as f:
lines = f.readlines()
with open("inputTokens.txt", "w") as f:
for line in lines:
if line.strip("\n") != token:
f.write(line)
# ...
The idea of the lock is to protect resources from being accessed by various threads simultaneously. So while one thread is working with the file, the others are waiting.
The next question is if this approach makes sense now, because depending of the size of your file, threads might be stuck waiting for the lock most of the time.
What about a database instead of a file? so you don't have to rewrite a full file, but just delete/update an entry

How to write numeric values to a file and read the same values in another thread in Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I want to write 4 numeric values to a file using one thread and read the same values in an another thread. All of this should of course be simultaneous. The second thread will only read values not modify them. Can it be done?

First of all, if you want to transfer data among threads which belong to the same process - you don't need to do it using files as a communication point. The easiest way is to use queues which are thread safe python mechanisms to send data from one thread to another using a FIFO style.
#!/usr/bin/python3
from threading import Thread
from queue import Queue
from time import sleep
q = Queue()
def run_func_in_diff_thread(func, *args):
t = Thread(target=func, args=args)
t.setDaemon = True
t.start()
def writer(data_set):
global q # queue must be shared/common for our threads
print("Writer thread started")
for elem in data_set:
q.put(elem) # put current elem into queue
print("%s > %d" % ("writer", elem))
sleep(1)
q.put(-1) # something like last element
print("Writer thread finished since last element has been put")
def reader():
global q # queue must be shared/common for our threads
print("Reader thread started")
while True:
elem = q.get() # remove and return an item from the queue
print("%s > %d" % ("reader", elem))
sleep(0.1)
if elem == -1:
break
print("Reader thread finished since last element has been received")
data = [x for x in range(5)]
run_func_in_diff_thread(writer, data)
run_func_in_diff_thread(reader)
You will get the following output:
Writer thread started
writer > 0
Reader thread started
reader > 0
writer > 1
reader > 1
writer > 2
reader > 2
writer > 3
reader > 3
writer > 4
reader > 4
Writer thread finished since last element has been put
reader > -1
Reader thread finished since last element has been received
This is very simple, isnt' it?
Ok, if you really need to use a file to send data between your threads, I can suggest the following:
#!/usr/bin/python3
from threading import Thread, Lock
from time import sleep
lock = Lock() # lock to synchronize your threads
def run_func_in_diff_thread(func, *args):
t = Thread(target=func, args=args)
t.setDaemon = True
t.start()
def read_or_write(path, type='r', data=None):
if type == 'r':
with open(path, type) as f:
elem = f.readlines()[-1] # read last element
print("Read data from file < %d" % int(elem))
if int(elem) == -1:
print("Received last element. Stop reading")
return False
return True
else:
with open(path, type) as f:
print("Write data to file > %d" % data)
f.write("%s\n" % str(data))
return True
def func_runner(func, file_path, type, data_set=None):
if type == "a":
# writer
for data in data_set:
lock.acquire()
read_or_write(file_path, type, data)
lock.release()
sleep(1)
else:
# reader
while True:
lock.acquire()
res = func(file_path, type)
lock.release()
if not res:
break
sleep(1)
data_set = [x for x in range(5)]
data_set.append(-1) # something like last element
file_path = "file.txt"
run_func_in_diff_thread(func_runner, read_or_write, file_path, "a", data_set)
run_func_in_diff_thread(func_runner, read_or_write, file_path, "r")
And you'll get the following output:
Write data to file > 0
Read data from file < 0
Read data from file < 0
Write data to file > 1
Read data from file < 1
Write data to file > 2
Read data from file < 2
Write data to file > 3
Read data from file < 3
Write data to file > 4
Read data from file < 4
Write data to file > -1
Read data from file < -1
Received last element. Stop reading
In this case you need to use Lock - built-in mechanism to synchronize threads to avoid data corruption.
As I said, please consider using Queue rather than using a file for thread syncing.

Python: Why is DictWriter writing 'NULL' bytes?

class WhatsGoingOn:
def __init__(self, filename, fieldNames, maxLines):
self.file_to_write = filename
self.fieldNames = fieldNames'
self.maxLines = maxLines
# Open the file for reading and writing. Create it if it doesn't exist,
# and truncate it if it does.
self.file = open(self.file_to_write, 'w+b')
self.csvReader = csv.DictReader(self.file, fieldnames=self.fieldNames)
self.csvWriter = csv.DictWriter(self.file, fieldnames=self.fieldNames, extrasaction='ignore')
def looper(self):
# Infinitly (don't worry about that - this is a daemon),
# write to the file. When a certain number of lines have been written,
# read the file and then truncate it.
try:
numRowsWritten = 0
while True:
# Nevermind what's being written
self.csvWriter.writerow({'name_of_field_0': str(numRowsWritten ), 'name_of_field_1': str(numRowsWritten )})
numRowsWritten += 1
if numRowsWritten >= self.maxLines:
# Iterate through each row of the file
self.file.seek(0)
# This only works the first time...
for row in self.csvReader:
print row['name_of_field']
# Truncate the file, and restart the
self.file.truncate(0)
numRowsWritten = 0
except Exception as e:
print 'Exception!: {0}'.format(e)
self.file.close()
Output: Exception!: line contains NULL byte
The second time the for row in self.csvReader: line gets hit, the exception gets thrown, and when I look at the file, the file does indeed have a whole bunch of NULL bytes at the start. Apparently, after the file got truncated, the DictWriter wrote a whole bunch of NULL bytes (or at least that's my assumption). How do I prevent NULL bytes from being written to my file?

Apparently, by truncating the file, you mess up some internal state of the writer. Instead of truncating, close and re-open the file (mode w+b truncates), then re-initialize the csvReader and csvWriter.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas Read Continuously Growing CSV File - python

Related

read a csv file and validate whether input parameter is in the csv file then bypass the purge process otherwise initiate purge process using python

How to save data from PLC in txt file by using value change trigger in python

Python: Read a line from a file then remove it with threading

How to write numeric values to a file and read the same values in another thread in Python? [closed]

Python: Why is DictWriter writing 'NULL' bytes?

Categories

Resources