I have written a code to write parallel in a csv file in python.
When my program gets over, what I see is that few lines are merged instead of in seperate lines. Each line should only contain 3 columns. But instead it shows as below
EG
myname myage myvalue
myname myage myvaluemyname
myname myage myvalue
myage
What I understood by reading few other questions, is that I need to lock my file if I want to avoid such scenarios. So I added fcntl module. But it seems my file is still not being locked as it produces similar output
My code
def getdata(x):
try:
# get data from API
c.writefile(x,x1,x2)
except Exception,err:
print err
class credits:
def __init__(self):
self.d = dict()
self.details = dict()
self.filename = "abc.csv"
self.fileopen = open(self.filename,"w")
def acquire(self):
fcntl.flock (self.fileopen, fcntl.LOCK_EX)
def release(self):
fcntl.flock(self.fileopen, fcntl.LOCK_UN)
def __del__(self):
self.fileopen.close()
def writefile(self,x,x1,x2,x3):
try:
self.acquire()
self.fileopen.write(str(x)+","+str(x1)+","+str(x2)+"\n")
except Exception, e:
raise e
finally:
self.release()
if __name__ == '__main__':
conn = psycopg2.connect()
curr = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
curr.execute("select * from emp")
rows = curr.fetchall()
listdata = []
for each in rows:
listdata.append(each[0])
c = credits()
p = Pool(processes = 5)
results = p.map(getdata,listdata)
conn.close()
I had to declare getdata as TOP level function otherwise it gave me "Cant pickle function"
Why don't you write to multiple files in each separate process and then merge them? It might be more computationally expensive but it will ensure thread safety.
Related
I would like use "try except" statement, but in two function. I caught an exception in function, but function2 does anyway. How can i stop it until there is an exception
i want to transfer it to a window application. If the file does not load, I want to display an information window. I only want the program to go on (function2) when the file loads
class Files:
def __init__(self):
self.name = "fle.txt"
def function(self):
try:
self.f = open(self.name, 'rb')
except OSError:
print("Problem!!!")
def function2(self):
print(self.f.read())
def main():
file=Files()
file.function()
file.function2()
Don't catch an exception unless you actually know how to handle it.
class Files:
def __init__(self):
self.name = "fle.txt"
self.f = None
def function(self):
self.f = open(self.name, 'rb')
def function2(self):
if self.f is None:
raise Exception("File not initialized!") #Example
#return #just if you don't throw or exit
print(self.f.read())
def main():
file=Files()
try:
file.function()
except OSError:
print("Problem!!!")
else:
file.function2()
main()
Wrap your function calls in a higher level try/except.
Of course, you would never write anything like this because it's so inflexible. This answer does not condone the OP's approach but suggests how that could be made to work.
class Files:
def __init__(self):
self.name = 'fle.txt'
def function_1(self):
self.fd = open(self.name)
def function_2(self):
print(self.fd.read())
def __del__(self):
try:
self.fd.close()
except Exception:
pass
file = Files()
try:
file.function_1()
file.function_2()
except Exception as e:
print(e)
So we don't do any exception handling (except in __del__ where we ignore any issues) within the class functions but allow all/any exceptions to propagate to the caller. Here we want to call two class functions but we wrap them in the same try/except block.
If function_1 raises an exception, function_2 won't be called.
del added to show how one could clean up but it's not the way this should be handled
#tomgalpin is right you could just exit right there after the problem
But this being a class maybe you want to print the error and pass back no data?
Here's one way to look at that with Tom's included sys exit (commented out)
Also be sure if you keep your code to close the file handler. Calling open on a file without .close() can leave file handlers open and cause problems for you if your class were to continue on after.
class Files:
def __init__(self):
self.name = "fle.txt"
# Empty string in data if no data
self.data = ""
def function(self):
try:
#self.f = open(self.name, 'rb')
with open(self.name, 'rb') as f:
self.data = f.read()
except OSError as err:
print("Problem!!!", err)
# You could exit
# sys.exit()
# But you could also return an empty string,
# which is "falsy", regardless of what happens
finally:
return self.data
def function2(self):
print(f"Data 3 {self.data}")
def main():
file=Files()
# You could print this, or use it to get the data
print("Data 1", file.function())
data = file.function()
print(f"Data 2 {data}")
# this now also shows the data
file.function2()
Use the variable that is usually True but becomes False if function fails
Example
class Files:
def __init__(self):
self.name = "file.txt"
self.Continue=True
self.data = ""
def function(self):
try:
#self.f = open(self.name, 'rb')
with open(self.name, 'rb') as f:
self.data = f.read()
except OSError as err:
print("Problem!!!", err)
self.Continue=False
return False
finally:
return self.data
def function2(self):
if self.Continue:
print(self.data)
else:
#Code if function failed
def main():
file=Files()
file.function()
file.function2()
Hello I have a problem with the queue not printing items in the order its supposed to be in so it doesnt check for all the passwords. Code below
Class ZipFile:
def __init__(self):
self.zip_file = self.return_zip_file() # just grabbing the zip file here
self.password_word_list = self.password_file() # grabbing the password file
self.q = queue.Queue(maxsize=50) # making the queue here
def word_list(self):
with open(self.password_word_list, "r") as f:
data = f.readlines()
for password in data:
password = password.strip()
yield password
def extract_zip_file(self, zip_file, password):
try:
zip_file.extractall(pwd=password.encode()) # extracting zip
print(f"[+] Password -> {password}")
except Exception as e:
print(e) # for debugging
pass
def brute_force_zip(self)
get_word_list = self.word_list()
count = 0
get_zip_file = zipfile.ZipFile(self.zip_file)
for password in get_word_list:
self.q.put(password)
if self.qsize() == 50:
while not self.q.empty():
thread = Process(target=self.extract_zip_file, args=(get_zip_file, self.q.get()), daemon=True)
thread.start()
count += 1
print(f"\rAttempts: {str(count)}")
So basically the self.q.get() prints everything out of order, and doesnt even get all the words sometimes, how can i fix it? thanks!
Actually I figured it out, I forgot that the multiprocessing was processing different threads and that was the reason why.
I have a very simple threading example using Python 3.4.2. In this example I am creating a five threads that just returns the character string "Result" and appends it to an array titled thread. In another for loop iterated five times the threads are joined to the term x. I am trying to print the result x, which should yield a list that looks like ['Resut','Result','Result','Result','Result'] but instead the print command only yields the title of the thread and the fact that it is closed. Im obviously misunderstanding how to use threads in python. If someone could provide an example of how to adequately complete this test case I would be very grateful.
import threading
def Thread_Test():
return ("Result")
number = 5
threads = []
for i in range(number):
Result = threading.Thread(target=Thread_Test)
threads.append(Result)
Result.start()
for x in threads:
x.join()
print (x)
There is a difference between creating a thread and trying to get values out of a thread. Generally speaking, you should never try to use return in a thread to provide a value back to its caller. That is not how threads work. When you create a thread object, you have to figure out a different way of get any values calculated in the thread to some other part of your program. The following is a simple example showing how values might be returned using a list.
#! /usr/bin/env python3
import threading
def main():
# Define a few variables including storage for threads and values.
threads_to_create = 5
threads = []
results = []
# Create, start, and store all of the thread objects.
for number in range(threads_to_create):
thread = threading.Thread(target=lambda: results.append(number))
thread.start()
threads.append(thread)
# Ensure all threads are done and show the results.
for thread in threads:
thread.join()
print(results)
if __name__ == '__main__':
main()
If you absolutely insist that you must have the ability to return values from the target of a thread, it is possible to override some methods in threading.Thread using a child class to get the desired behavior. The following shows more advanced usage and demonstrates how multiple methods require a change in case someone desires to inherit from and override the run method of the new class. This code is provided for completeness and probably should not be used.
#! /usr/bin/env python3
import sys as _sys
import threading
def main():
# Define a few variables including storage for threads.
threads_to_create = 5
threads = []
# Create, start, and store all of the thread objects.
for number in range(threads_to_create):
thread = ThreadWithReturn(target=lambda: number)
thread.start()
threads.append(thread)
# Ensure all threads are done and show the results.
print([thread.returned for thread in threads])
class ThreadWithReturn(threading.Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, *, daemon=None):
super().__init__(group, target, name, args, kwargs, daemon=daemon)
self.__value = None
def run(self):
try:
if self._target:
return self._target(*self._args, **self._kwargs)
finally:
del self._target, self._args, self._kwargs
def _bootstrap_inner(self):
try:
self._set_ident()
self._set_tstate_lock()
self._started.set()
with threading._active_limbo_lock:
threading._active[self._ident] = self
del threading._limbo[self]
if threading._trace_hook:
_sys.settrace(threading._trace_hook)
if threading._profile_hook:
threading. _sys.setprofile(threading._profile_hook)
try:
self.__value = True, self.run()
except SystemExit:
pass
except:
exc_type, exc_value, exc_tb = self._exc_info()
self.__value = False, exc_value
if _sys and _sys.stderr is not None:
print("Exception in thread %s:\n%s" %
(self.name, threading._format_exc()), file=_sys.stderr)
elif self._stderr is not None:
try:
print((
"Exception in thread " + self.name +
" (most likely raised during interpreter shutdown):"), file=self._stderr)
print((
"Traceback (most recent call last):"), file=self._stderr)
while exc_tb:
print((
' File "%s", line %s, in %s' %
(exc_tb.tb_frame.f_code.co_filename,
exc_tb.tb_lineno,
exc_tb.tb_frame.f_code.co_name)), file=self._stderr)
exc_tb = exc_tb.tb_next
print(("%s: %s" % (exc_type, exc_value)), file=self._stderr)
finally:
del exc_type, exc_value, exc_tb
finally:
pass
finally:
with threading._active_limbo_lock:
try:
del threading._active[threading.get_ident()]
except:
pass
#property
def returned(self):
if self.__value is None:
self.join()
if self.__value is not None:
valid, value = self.__value
if valid:
return value
raise value
if __name__ == '__main__':
main()
please find the below simple example for queue and threads,
import threading
import Queue
import timeit
q = Queue.Queue()
number = 5
t1 = timeit.default_timer()
# Step1: For example, we are running multiple functions normally
result = []
def fun(x):
result.append(x)
return x
for i in range(number):
fun(i)
print result ," # normal result"
print (timeit.default_timer() - t1)
t2 = timeit.default_timer()
#Step2: by using threads and queue
def fun_thrd(x,q):
q.put(x)
return
for i in range(number):
t1 = threading.Thread(target = fun_thrd, args=(i,q))
t1.start()
t1.join()
thrd_result = []
while True:
if not q.empty():
thrd_result.append(q.get())
else:
break
print thrd_result , "# result with threads involved"
print (timeit.default_timer() - t2)
t3 = timeit.default_timer()
#step :3 if you want thread to be run without depending on the previous thread
threads = []
def fun_thrd_independent(x,q):
q.put(x)
return
def thread_indep(number):
for i in range(number):
t = threading.Thread(target = fun_thrd_independent, args=(i,q))
t.start()
threads.append(t)
thread_indep(5)
for j in threads:
j.join()
thread_indep_result = []
while True:
if not q.empty():
thread_indep_result.append(q.get())
else:
break
print thread_indep_result # result when threads are independent on each other
print (timeit.default_timer() - t3)
output:
[0, 1, 2, 3, 4] # normal result
3.50475311279e-05
[0, 1, 2, 3, 4] # result with threads involved
0.000977039337158
[0, 1, 2, 3, 4] result when threads are independent on each other
0.000933170318604
It will hugely differ according to the scale of the data
Hope this helps, Thanks
this is a code example implementing a file lock, so the application can only open one instance. It currently works, but saves the the lock file in the Home-folder (Ubuntu). If the application crashes, the lock file does not get removed which is not good....
I can not easily see where I should change the code to save it in the c:/tmp-folder instead?
#!/usr/bin/python
# -*- coding: utf-8 -*-
#implements a lockfile if program already is open
import os
import socket
from fcntl import flock
class flock(object):
'''Class to handle creating and removing (pid) lockfiles'''
# custom exceptions
class FileLockAcquisitionError(Exception): pass
class FileLockReleaseError(Exception): pass
# convenience callables for formatting
addr = lambda self: '%d#%s' % (self.pid, self.host)
fddr = lambda self: '<%s %s>' % (self.path, self.addr())
pddr = lambda self, lock: '<%s %s#%s>' %\
(self.path, lock['pid'], lock['host'])
def __init__(self, path, debug=None):
self.pid = os.getpid()
self.host = socket.gethostname()
self.path = path
self.debug = debug # set this to get status messages
def acquire(self):
'''Acquire a lock, returning self if successful, False otherwise'''
if self.islocked():
if self.debug:
lock = self._readlock()
print 'Previous lock detected: %s' % self.pddr(lock)
return False
try:
fh = open(self.path, 'w')
fh.write(self.addr())
fh.close()
if self.debug:
print 'Acquired lock: %s' % self.fddr()
except:
if os.path.isfile(self.path):
try:
os.unlink(self.path)
except:
pass
raise (self.FileLockAcquisitionError,
'Error acquiring lock: %s' % self.fddr())
return self
def release(self):
'''Release lock, returning self'''
if self.ownlock():
try:
os.unlink(self.path)
if self.debug:
print 'Released lock: %s' % self.fddr()
except:
raise (self.FileLockReleaseError,
'Error releasing lock: %s' % self.fddr())
return self
def _readlock(self):
'''Internal method to read lock info'''
try:
lock = {}
fh = open(self.path)
data = fh.read().rstrip().split('#')
fh.close()
lock['pid'], lock['host'] = data
return lock
except:
return {'pid': 8**10, 'host': ''}
def islocked(self):
'''Check if we already have a lock'''
try:
lock = self._readlock()
os.kill(int(lock['pid']), 0)
return (lock['host'] == self.host)
except:
return False
def ownlock(self):
'''Check if we own the lock'''
lock = self._readlock()
return (self.fddr() == self.pddr(lock))
def __del__(self):
'''Magic method to clean up lock when program exits'''
self.release()
#now testing to see if file is locked = other instance of this program is running already
lock = flock('tmp.lock', True).acquire()
if lock:
print 'doing stuff'
else:
print 'locked!'
exit()
#end of lockfile
Use tempfile
At the end of the script:
lock = flock('tmp.lock', True).acquire()
The 'tmp.lock' is the path to the file in the current directory. Change it to the path you need, i.e. 'c:/tmp-folder/tmp.lock'.
lock = flock('c:/tmp-folder/tmp.lock', True).acquire()
However, as #g19fanatic notes: are you on a Windows ('c:/...') or Linux (Ubuntu) system?
I'm creating a python script of which parses a large (but simple) CSV.
It'll take some time to process. I would like the ability to interrupt the parsing of the CSV so I can continue at a later stage.
Currently I have this - of which lives in a larger class: (unfinished)
Edit:
I have some changed code. But the system will parse over 3 million rows.
def parseData(self)
reader = csv.reader(open(self.file))
for id, title, disc in reader:
print "%-5s %-50s %s" % (id, title, disc)
l = LegacyData()
l.old_id = int(id)
l.name = title
l.disc_number = disc
l.parsed = False
l.save()
This is the old code.
def parseData(self):
#first line start
fields = self.data.next()
for row in self.data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
self.save(item)
Thanks guys.
If under linux, hit Ctrl-Z and stop the running process. Type "fg" to bring it back and start where you stopped it.
You can use signal to catch the event. This is a mockup of a parser than can catch CTRL-C on windows and stop parsing:
import signal, tme, sys
def onInterupt(signum, frame):
raise Interupted()
try:
#windows
signal.signal(signal.CTRL_C_EVENT, onInterupt)
except:
pass
class Interupted(Exception): pass
class InteruptableParser(object):
def __init__(self, previous_parsed_lines=0):
self.parsed_lines = previous_parsed_lines
def _parse(self, line):
# do stuff
time.sleep(1) #mock up
self.parsed_lines += 1
print 'parsed %d' % self.parsed_lines
def parse(self, filelike):
for line in filelike:
try:
self._parse(line)
except Interupted:
print 'caught interupt'
self.save()
print 'exiting ...'
sys.exit(0)
def save(self):
# do what you need to save state
# like write the parse_lines to a file maybe
pass
parser = InteruptableParser()
parser.parse([1,2,3])
Can't test it though as I'm on linux at the moment.
The way I'd do it:
Puty the actual processing code in a class, and on that class I'd implement the Pickle protocol (http://docs.python.org/library/pickle.html ) (basically, write proper __getstate__ and __setstate__ functions)
This class would accept the filename, keep the open file, and the CSV reader instance as instance members. The __getstate__ method would save the current file position, and setstate would reopen the file, forward it to the proper position, and create a new reader.
I'd perform the actuall work in an __iter__ method, that would yeld to an external function after each line was processed.
This external function would run a "main loop" monitoring input for interrupts (sockets, keyboard, state of an specific file on the filesystem, etc...) - everything being quiet, it would just call for the next iteration of the processor. If an interrupt happens, it would pickle the processor state to an specific file on disk.
When startingm the program just has to check if a there is a saved execution, if so, use pickle to retrieve the executor object, and resume the main loop.
Here goes some (untested) code - the iea is simple enough:
from cPickle import load, dump
import csv
import os, sys
SAVEFILE = "running.pkl"
STOPNOWFILE = "stop.now"
class Processor(object):
def __init__(self, filename):
self.file = open(filename, "rt")
self.reader = csv.reader(self.file)
def __iter__(self):
for line in self.reader():
# do stuff
yield None
def __getstate__(self):
return (self.file.name, self.file.tell())
def __setstate__(self, state):
self.file = open(state[0],"rt")
self.file.seek(state[1])
self.reader = csv.reader(self.File)
def check_for_interrupts():
# Use your imagination here!
# One simple thing would e to check for the existence of an specific file
# on disk.
# But you go all the way up to instantiate a tcp server and listen to
# interruptions on the network
if os.path.exists(STOPNOWFILE):
return True
return False
def main():
if os.path.exists(SAVEFILE):
with open(SAVEFILE) as savefile:
processor = load(savefile)
os.unlink(savefile)
else:
#Assumes the name of the .csv file to be passed on the command line
processor = Processor(sys.argv[1])
for line in processor:
if check_for_interrupts():
with open(SAVEFILE, "wb") as savefile:
dump(processor)
break
if __name__ == "__main__":
main()
My Complete Code
I followed the advice of #jsbueno with a flag - but instead of another file, I kept it within the class as a variable:
I create a class - when I call it asks for ANY input and then begins another process doing my work. As its looped - if I were to press a key, the flag is set and only checked when the loop is called for my next parse. Thus I don't kill the current action.
Adding a process flag in the database for each object from the data I'm calling means I can start this any any time and resume where I left off.
class MultithreadParsing(object):
process = None
process_flag = True
def f(self):
print "\nMultithreadParsing has started\n"
while self.process_flag:
''' get my object from database '''
legacy = LegacyData.objects.filter(parsed=False)[0:1]
if legacy:
print "Processing: %s %s" % (legacy[0].name, legacy[0].disc_number)
for l in legacy:
''' ... Do what I want it to do ...'''
sleep(1)
else:
self.process_flag = False
print "Nothing to parse"
def __init__(self):
self.process = Process(target=self.f)
self.process.start()
print self.process
a = raw_input("Press any key to stop \n")
print "\nKILL FLAG HAS BEEN SENT\n"
if a:
print "\nKILL\n"
self.process_flag = False
Thanks for all you help guys (especially yours #jsbueno) - if it wasn't for you I wouldn't have got this class idea.