Nested loops with Python cPickle - python

I would like to be able to have a series of nested loops that use the same pickle file. See below:
def pickleRead(self):
try:
with open(r'myfile', 'rb') as file:
print 'Reading File...'
while True:
try:
main = pickle.load(file)
id = main[0]
text = main[1]
while True:
try:
data = pickle.load(file)
data_id = data[0]
data_text = data[1]
coefficient = Similarity().jaccard(text.split(),data_text.split())
if coefficient > 0 and data_text is not None:
print str(id) + '\t' + str(data_id) + '\t' + str(coefficient)
except EOFError:
break
except Exception as err:
print err
except EOFError:
break
print 'Done Reading File...'
file.close()
except Exception as err:
print err
The second (inner) loop runs without any problems but the first one just does a single iteration and then stops. I am trying to grab a single row at a time then compare it against every other row in the file. There are several thousand rows and I have found that the cPickle module out performs anything similar. The problem is that it is limited in what is exposed. Can anyone point me in the right direction?

The inner loop only stops when it hits an EOFError while reading the file, so by the time you get to what would have been the second iteration of the outer loop, you've read the entire file. So trying to read more just gives you another EOFError, and you're out.

First, I should say ben w's answer does explain the behavior you're experiencing.
As for your broader question of "how do I accomplish my task using Python?" I recommend just using a single loop through the file to load all the pickled objects into a data structure in memory (a dictionary with IDs as keys and text as values seems like a natural choice). Once all the objects are loaded, you don't mess with the file at all; just use the in-memory data structure. You can use your existing nested-loop logic, if you like. It might look something like (pseudocode)
for k1 in mydict:
for k2 in mydict:
if k1 != k2:
do_comparison(mydict[k1], mydict[k2])

Related

Python continue for loop from file

I have a code that generates characters from 000000000000 to ffffffffffff which are written to a file.
I'm trying to implement a check to see if the program was closed so that I can read from the file, let's say at 00000000781B, and continue for-loop from the file.
The Variable "attempt" in (for attempt in to_attempt:) has tuple type and always starting from zero.
Is it possible to continue the for-loop from the specified value?
import itertools
f = open("G:/empty/last.txt", "r")
lines = f.readlines()
rand_string = str(lines[0])
f.close()
letters = '0123456789ABCDEF'
print(rand_string)
for length in range(1, 20):
to_attempt = itertools.product(letters, repeat=length)
for attempt in to_attempt:
gen_string = rand_string[length:] + ''.join(attempt)
print(gen_string)
You have to store the value on a file to keep track of what value was last being read from. I'm assuming the main for loop running from 000000000000 to ffffffffffff is the to_attempt one. All you need store the value of the for loop in a file. You can use a new variable to keep track of it.
try:
with open('save.txt','r') as reader:
save = int(reader.read())
except FileNotFoundError:
save = 0
#rest of the code
for i in range(save,len(to_attempt)):
with open('save.txt','r') as writer:
writer.write(i)
#rest of the code

Python readlines() doesn't function outputs bug in for loop

Intro:
I'm a beginner python learning syntax at the moment. I've come across this concept of reading and writing files natively supported by python. I've figured to give it a try and find bugs after attempting looping reading and writing commands. I wanted to randomly pick a name from a name file and then writing it into a new file. My file includes 19239 lines of names, randrange(18238) generates from 0 - 18238, and, supposedly, would read a randomly read a line between 1 - 18239. The problem is that the code that reads and writes works without the for loop but not with the for loop.
My attempt:
from random import randrange
rdname = open("names.dat", "r")
wrmain = open("main.dat", "a")
rdmain = open("main.dat", "r")
for x in range(6):
nm = rdname.readlines()[randrange(18238)]
print(str(randrange(18238)) + ": " + nm)
wrmain.write("\n" + nm)
...
Error code:
Exception has occurred: IndexError
list index out of range
Good luck with your programming journey.
The readlines() method. Has some non-intuitive behaviour. When you use the readlines() it "dumps" the entire content of the file and returns a list of strings of each line. Thus the second time you call the rdname.readlines()[randrange(18238)], the rdname file object is completely empty and you actually have an empty list. So functionally you are telling your programme to run [][randrange(18238)] on the second iteration of the loop.
I also took the liberty of fixing the random number call, as the way you had implemented it would mean it would call 2 different random numbers when selecting the name nm = rdname.readlines()[randrange(18238)] and printing the selected name and linenumber print(str(randrange(18238)) + ": " + nm)
...
rdname = open("names.dat", "r")
wrmain = open("main.dat", "a")
rdmain = open("main.dat", "r")
rdname_list = rdname.readlines()
for x in range(6):
rd_number = randrange(18238)
nm = rdname_list[rd_number]
print(str(rd_number) + ": " + nm)
wrmain.write("\n" + nm)
...
rdname.readlines() exhausts your file handle. Running rdname.readlines() gives you the list of lines the first time, but returns an empty list every subsequent time. Obviously, you can't access an element in an empty list. To fix this, assign the result of readlines() to a variable just once, before your loop.
rdlines = rdname.readlines()
maxval = len(rdlines)
for x in range(6):
randval = randrange(maxval)
nm = rdlines[randval]
print(str(randval) + ": " + nm)
wrmain.write("\n" + nm)
Also, making sure your random number can only go to the length of your list is a good idea. No need to hardcode the length of the list though -- the len() function will give you that.
I highly recommend you take a look at how to debug small programs. Using a debugger to step through your code is immensely helpful because you can see how each line affects the values of your variables. In this case, if you'd looked at the value of nm in each iteration, it would be obvious why you got the IndexError, and finding out that nm becomes an empty list on readlines() would point you in the direction of the answer.

Unable to load a list of classes using the pickle function in Python 3.7

I have a script I've written that stores class instances of expenses and income in lists.
When first starting the module, I use a try/except block to check for a pickled file:
try:
#list definitions
expenselist = pickle.load(open(filename,'rb'))
incomelist = pickle.load(open(filename,'rb'))
except:
print ("Generating new lists.")
#list definitions
expenselist = []
incomelist = []
This works as intended when run in IDLE:
test-slide-1
The program prints the debug message about generating new lists and prints the blank lists. It then prompts the user for input to create a class instance that is appended to the empty lists.
enterValue = str(input("Hi, press 1 to enter income and 2 to expense"))
try:
if enterValue == '1':
addIncome ()
#printRev()#debug to print class
elif enterValue == '2':
addExpense ()
#printExp() #debug to print class
else:
raise ValueError
except ValueError:
print ("Incorrect Value.")
This produces this result in IDLE:
test-slide-2
It is then pickled using:
#pickle expenselist
pickle.dump(expenselist,open(filename,'wb'))
#pickle incomelist
pickle.dump(incomelist,open(filename,'wb'))
The pickled data is stored here:
file_structure
When I open this file using Notepad++, I get this:
€]q .
So I know it stored something.
Everything until now has gone according to design, but when I run the script again, I get:
test-slide-3
My debugging message about generating new lists didn't print, so I know it detected and tried to load the file, but it printed blank lists instead of the data I had stored in the list during the previous session.
I've tried for weeks now to figure out what was going on. I originally thought it was saving the location of the class instance and not the instance itself, but it should print the address of the instance if that were the case. Instead I'm getting blank lists.
My questions are: 1, is it generating new lists regardless of loading the pickle and how do I stop it? and 2, if that isn't the problem, what is?
I'm about 50% self taught and I've exhausted the resources I know of. I've had two programmers take a look at the code, but neither are Python experts. So this is the only thing I haven't tried.
Pickle is very simple to use. I recommend using a with statement and like in the comment by #John Anderson, it looks like you are overwriting a file.
Usage example:
import pickle
a = [1,2,3]
b = [4,5,6]
with open("pickle1.dat", 'wb') as f:
pickle.dump(a, f)
pickle.dump(b, f)
c = None
d = None
with open("pickle1.dat", 'rb') as f:
c = pickle.load(f)
d = pickle.load(f)
print(c)
print(d)
Outputs
[1,2,3]
[4,5,6]

Searching a column in a tab delimited array for a specific value

So i'm trying to use a "column_matches" function to search a txt file with data, which i have stored into an array, for a specific value in a column and then print the line containing that value.
The code I have right now looks something like this:
f = open( r'file_directory' )
a = []
for line in f:
a.append(line)
def column_matches(line, substring, which_column):
for line in a:
if column_matches(line, '4', 6):
print (line)
else:
print('low multiplicity')
In this example i'm trying to search the 7th column for the value 4. However, this is currently not printing anything.
I'm a beginner programmer so this might be very wrong, but would love some feedback as I haven't been able to solve it from other peoples questions.
Ideally the program should search all lines and print (or save) every line with a specific value in a specific column!
Edit: example input:
K00889.01 0.9990 8.884922995 10.51 0.114124 89.89 1 153 0.8430 0.8210
K01009.01 0.0000 5.09246539 1.17 0.014236 89.14 1 225 0.7510 0.7270
Your existing function doesn't actually have any logic to handle the case that you are trying to search for. Indeed, you have if column_matches(line, '4', 6): inside the function called column_matches so you imply that it has to call itself in order to determine what action to take... that logically just forms an infinite loop (though in your case, nothing actually runs).
This should be similar to your existing approach but should do what you want. It should be relatively resilient to your actual file structure but let me know if it throws errors.
data = []
with open('example.txt', 'r') as infile:
# Will automatically close the file once you're done reading it
for row in infile:
data.append(row.replace('\n', '').split())
def column_matches(line, target, column_index):
try:
file_data = int(line[column_index])
if file_data == target:
return True
else:
return False
except ValueError:
print('Not a valid number: {}'.format(line[column_index]))
return False
matching_rows = [] # To store items in data that meet our criteria
for line in data:
if column_matches(line, 4, 6):
matching_rows.append(line) # Function has to return True for this to happen

Trying to write python CSV extractor

I am complete newbie for programming and this is my first real program I am trying to write.
So I have this huge CSV file (hundreds of cols and thousands of rows) where I am trying to extract only few columns based on value in the field. It works fine and I get nice output, but the problem arises when I am try to encapsulate the same logic in a function.
it returns only first extracted row however print works fine.
I have been playing for this for hours and read other examples here and now my mind is mush.
import csv
import sys
newlogfile = csv.reader(open(sys.argv[1], 'rb'))
outLog = csv.writer(open('extracted.csv', 'w'))
def rowExtractor(logfile):
for row in logfile:
if row[32] == 'No':
a = []
a.append(row[44])
a.append(row[58])
a.append(row[83])
a.append(row[32])
return a
outLog.writerow(rowExtractor(newlogfile))
You are exiting prematurely. When you put return a inside the for loop, return gets called on the first iteration. Which means that only the firs iteration runs.
A simple way to do this would be to do:
def rowExtractor(logfile):
#output holds all of the rows
ouput = []
for row in logfile:
if row[32] == 'No':
a = []
a.append(row[44])
a.append(row[58])
a.append(row[83])
a.append(row[32])
output.append(a)
#notice that the return statement is outside of the for-loop
return output
outLog.writerows(rowExtractor(newlogfile))
You could also consider using yield
You've got a return statement in your function...when it hits that line, it will return (thus terminating your loop). You'd need yield instead.
See What does the "yield" keyword do in Python?

Categories