I am trying to create random digits and have them stored in a file and I did some googling and came across the pickle function. I used it exactly how the tutorial did and now I need to know how to store all of the codes that I create in there? Here is my code
import string
import pickle
from random import randint
data = list(string.ascii_lowercase)
[data.append(n) for n in range(0, 10)]
x = [str(data[randint(0, len(data)-1)]) for n in range(0, 21)]
y = ''.join(x)
print (y)
inUse = []
inUse.append(y)
pickle.dump(inUse, open("data.pkl", "wb"))
inUse = pickle.load(open("data.pkl", "rb"))
Your way of generating x is overly convoluted
import string
import random
data = string.ascii_lowercase + string.digits
x = ''.join(random.choice(data) for n in range(20))
Now, you can simply print x to a file like this
with open("data.txt", "a")) as fout:
print(x, file=fout)
If you wish to append N codes to the file
with open("data.txt", "a")) as fout:
for i in range(N):
x = ''.join(random.choice(data) for n in range(20))
print(x, file=fout)
In the below line -
y = ''.join(x)
Lets say x is a list of random characters like - `['a', 'x', 'c', 'j']
After the above line executes, you will get y = 'axcj'
You can use pickle , to serialize the list object itself, so you would not even need y or inUse lists.
The code would look like -
import string
import pickle
from random import randint
data = list(string.ascii_lowercase)
[data.append(n) for n in range(0, 10)]
x = [str(data[randint(0, len(data)-1)]) for n in range(0, 21)]
pickle.dump(x, open("data.pkl", "ab"))
x = pickle.load(open("data.pkl", "rb"))
Please note the ab file mode, it is for appending to file, instead of overwriting it.
Related
Does anyone know how I can optimize this code better to run larger files. It works with smaller inputs, but I need it to run a file with over 200,000 words. Any suggestions?
Thank you.
import random
import re
def quick_sort(a,i,n):
if n <= 1:
return
mid = (len(a)) // 2
x = a[random.randint(0,len(a)-1)]
p = i - 1
j = i
q = i + n
while j < q:
if a[j] < x:
p = p + 1
a[j],a[p] = a[p],a[j]
j = j + 1
elif a[j] > x:
q = q - 1
a[j],a[q] = a[q],a[j]
else:
j = j + 1
quick_sort(a,i,p-i+1)
quick_sort(a,q,n-(q-i))
file_name = input("Enter file name: ")
my_list = []
with open(file_name,'r') as f:
for line in f:
line = re.sub('[!#?,.:";\']', '', line).lower()
token = line.split()
for t in token:
my_list.append(t)
a = my_list
quick_sort(a,0,len(my_list))
print("List After Calling Quick Sort: ",a)
Your random selection of an index to use for your pivot x is using the whole size of the input list a, not just the part you're supposed to be sorting on the current call. This means that very often your pivot won't be in the current section at all, and so you won't be able to usefully reduce your problem (because all of the values will be on the same side of the pivot). This leads to lots and lots of recursion, and for larger inputs you'll almost always hit the recursion cap.
The fix is simple, just change how you get x:
x = a[random.randrange(i, i+n)]
I like randrange a lot better than randint, but you could use randint(i, i+n-1) if you feel the other way.
Must you use a quicksort? If you can use a heapq or PriorityQueue, the .get/(.pop()) methods automatically implement the sort:
import sys
from queue import PriorityQueue
pq = PriorityQueue()
inp = open(sys.stdin.fileno(), newline='\n')
#inp = ['dag', 'Rug', 'gob', 'kex', 'mog', 'Wes', 'pox', 'sec', 'ego', 'wah'] # for testing
for word in inp:
word = word.rstrip('\n')
pq.put(word)
while not pq.empty():
print(pq.get())
Then test with some large random word input or file e.g.:
shuf /usr/share/dict/words | ./word_pq.py
where shuf is Gnu /usr/local/bin/shuf.
I am trying to writer unique values to a csv that already has a list of ints inside it.
Currently I have tried to loop through a range of possible numbers then check if those numbers are in the csv. It appears that the checking is not working properly.
def generateUserCode():
with open ('/MyLocation/user_codes.csv') as csvDataFile:
userCodes = csv.reader(csvDataFile)
for x in range(0, 201):
if x not in userCodes:
return x
def writeUserCode(userCode):
with open ('/MyLocation/user_codes.csv', 'a') as csvDataFile:
csvDataFile.write('\n' + str(userCode))
userCode = generateUserCode()
writeUserCode(userCode)
So it should print the first number not in the csv and add the number to the csv. However all it is doing is printing 0 and adding 0 to my csv every time it is run even if 0 is in the csv.
Update:
The csv looks something like this:
3
4
5
35
56
100
There are more values but it is generally the same with no repeats and values between 0 and 200
The problem is with the following line:
if x not in userCodes:
userCodes is not a list it is a csvReader object. Also, you should use
if str(x) not in line:
#use str(x) instead of x
This is the code that works for me:
import csv
def generateUserCode():
with open ('file.csv') as csvDataFile:
csvread = csv.reader(csvDataFile)
userCodes = []
#print(userCodes)
for line in csvread:
try:
userCodes.append(line[0]) # As long as the code is the first
# element in that line, it should work
except:
IndexError # Avoid blank lines
print(userCodes)
for x in range(0, 201):
if str(x) not in userCodes:
return x
def writeUserCode(userCode):
with open ('file.csv', 'a') as csvDataFile:
csvDataFile.write('\n' + str(userCode))
userCode = generateUserCode()
writeUserCode(userCode)
Iterating userCodes shows each item is a list of strings:
for x in userCodes:
print(x)
returns:
['3']
['4']
['5']
['35']
['56']
['100']
So there are a lot of possible fixes, one would be:
def generateUserCode():
with open ('/MyLocation/user_codes.csv') as csvDataFile:
userCodes = csv.reader(csvDataFile)
userCodes = [int(item[0]) for item in userCodes]
for x in range(0, 201):
if x not in userCodes:
return x
It’s tricky to answer without seeing the CSV, but when you read the CSV, all fields are strings. Therefor you need to convert either the userCodes to int or x to string for the comparison to work.
For example:
userCodes = [int(d[0]) for d in csv.reader(csvDataFile)]
for x in range(0, 201):
if x not in userCodes:
return x
You are checking if a str is in an instance of csv.reader. This syntax doesn't work even with a normal file handle:
with open('somefile.txt') as fh:
x = fh.read()
x
'Spatial Reference: 43006\nName: Jones Tract\n424564.620666, 4396443.55267\n425988.30892, 4395630.01652\n426169.09473, 4395426.63249\n426214.291182, 4395268.4449\n\nName: Lewis Tract\n427909.158152, 4393935.14955\n428587.104939, 4393731.76552\n428700.096071, 4393528.38148\n428745.292523, 4393347.59567\n\nName: Adams Tract\n424180.450819, 4393957.74778\n424361.236629, 4393709.16729\n424655.013571, 4393641.37261\n424858.397607, 4393776.96197\n'
# now check if 'e' is in fh
with open('somefile.txt') as fh:
'e' in fh
False
'e' in x
True
Also, your csv file isn't really a csv file, so I'd just use a normal file handle and ignore the csv entirely.
The better approach may be to aggregate your codes in a set and check from there:
def get_codes():
with open('user_codes.csv') as fh:
# return a set to test membership quickly
return {line.strip() for line in fh}
codes = get_codes()
def add_code(code):
if code not in codes:
codes.add(code)
with open('user_codes.csv', 'a') as fh:
fh.write(code)
else:
raise ValueError("Code already exists")
# or do something else
add_code(88)
add_code(88)
# ValueError
To generate a user code automatically, since you are using a range, this becomes relatively easy:
def generate_user_code():
try:
# this returns the first number not in codes
return next(i for i in range(201) if i not in codes)
except StopIteration:
# you've exhausted your range, nothing is left
raise ValueError("No unique codes available")
# and your write method can be changed to
def add_code(code):
with open('user_codes.csv', 'a') as fh:
codes.add(code)
fh.write(code)
codes = get_codes()
user_code = generate_user_code()
add_code(user_code)
You may try to do this:
....
userCodes = csv.reader(csvDataFile)
uc = []
for y in userCodes:
uc += y
for x in range(0, 201):
if str(x) not in uc:
return x
....
The problem is to read the file, look for integers using the re.findall(), looking for a regular expression of '[0-9]+' and then converting the extracted strings to integers and summing up the integers.
MY CODE: in which sample.txt is my text file
import re
hand = open('sample.txt')
for line in hand:
line = line.rstrip()
x = re.findall('[0-9]+',line)
print x
x = [int(i) for i in x]
add = sum(x)
print add
OUTPUT:
You need to append the find results to another list. So that the number found on current line will be kept back when iterating over to the next line.
import re
hand = open('sample.txt')
l = []
for line in hand:
x = re.findall('[0-9]+',line)
l.extend(x)
j = [int(i) for i in l]
add = sum(j)
print add
or
with open('sample.txt') as f:
print sum(map(int, re.findall(r'\d+', f.read())))
try this
import re
hand = open("a.txt")
x=list()
for line in hand:
y = re.findall('[0-9]+',line)
x = x+y
sum=0
for z in x:
sum = sum + int(z)
print(sum)
I need to create a csv file with 100,000 random pairs.
So far I have:
import random
randfile = open("Random.csv", "w" )
pairs = []
for i in range(100000):
line1 = str(random.randint(1, 100))
line2 = str(random.randint(1, 100))
pair = line1, line2
pairs.append(pair)
randfile.close()
You are on the right track actually. You can use csv.writer to easily write to a csv file:
>>> import csv
>>> import random
>>> randfile = open("Random.csv", "w")
>>> writer = csv.writer(randfile, delimiter=",")
>>> for i in range(100000):
... pair = random.randint(1, 100), random.randint(1, 100)
... writer.writerow(pair)
...
>>> randfile.close()
You don't need to have pairs and pairs.append(pair), you can throw them away your code.
import random
with open("Random.csv", "w" ) as outf:
outfile = csv.writer(outf)
for i in range(100000):
num1 = str(random.randint(1, 100))
num2 = str(random.randint(1, 100))
row = (num1,num2)
outfile.writerow(row)
Your logic is correct, but you did not write anything to your file you just opened and closed it.
Just add this line before closing and it will work fine:
randfile.write('\n'.join('%d,%d' % (a,b) for a, b in pairs))
I'm loading from a text document containing so random strings and I'm trying to print every possible permutation of the characters in that string.
If the notepad contains for example:
123
abc
I want my output to be
123,132,213,231,312,321
abc,acb,bac,bca,cab,cba
The text file contains some pretty large strings so I can see why I am getting this MemoryError.
My first attempt I used this:
import sys
import itertools
import math
def organize(to_print):
number_list = []
upper_list = []
lower_list = []
for x in range(0,len(to_print)):
if str(to_print[x]).isdigit() is True:
number_list.append(to_print[x])
elif to_print[x].isupper() is True:
upper_list.append(to_print[x])
else:
lower_list.append(to_print[x])
master_list = number_list + upper_list + lower_list
return master_list
number = open(*file_dir*, 'r').readlines()
factorial = math.factorial(len(number))
complete_series = ''
for x in range(0,factorial):
complete_string = ''.join((list(itertools.permutations(organize(number)))[x]))
complete_series += complete_string+','
edit_series = complete_series[:-1]
print(edit_series)
The reason for def organize is if I have a string 1aB I would need to preorder it by number,uppercase,lowercase before I start the permutations.
I got the memory error here: complete_string = ''.join((list(itertools.permutations(organize(number)))[x])) so my initial attempt was to bring it out of the for-loop.
My second attempt is this:
import sys
import itertools
import math
def organize(to_print):
number_list = []
upper_list = []
lower_list = []
for x in range(0,len(to_print)):
if str(to_print[x]).isdigit() is True:
number_list.append(to_print[x])
elif to_print[x].isupper() is True:
upper_list.append(to_print[x])
else:
lower_list.append(to_print[x])
master_list = number_list + upper_list + lower_list
return master_list
number = open(*file_dir*, 'r').readlines()
factorial = math.factorial(len(number))
complete_series = ''
the_permutation = list(itertools.permutations(organize(number)))
for x in range(0,factorial):
complete_string = ''.join((the_permutation[x]))
complete_series += complete_string+','
edit_series = complete_series[:-1]
print(edit_series)
But I am still getting a memory error. I don't necessarily need or want the answer directly as this is good learning practice to reduce my inefficiencies, so hints in the right direction would be nice.
Added 3rd attempt:
import sys
import itertools
import math
def organize(to_print):
number_list = []
upper_list = []
lower_list = []
for x in range(0,len(to_print)):
if str(to_print[x]).isdigit() is True:
number_list.append(to_print[x])
elif to_print[x].isupper() is True:
upper_list.append(to_print[x])
else:
lower_list.append(to_print[x])
master_list = number_list + upper_list + lower_list
return master_list
number = open(*file_dir*, 'r').readlines()
factorial = math.factorial(len(number))
complete_series = ''
the_permutation = itertools.permutations(organize(number))
for x in itertools.islice(the_permutation,factorial):
complete_string = ''.join(next(the_permutation))
complete_series += complete_string+','
edit_series = complete_series[:-1]
print(edit_series)
Don't call list, just iterate over the permutations:
the_permutation = itertools.permutations(organize(number))
for x in the_permutation:
complete_string = ''.join(the_permutation)
list(itertools.permutations(organize(number))) stores all the permutations in memory then you store all the permutations in a string in your loop, there is no guarantee that you will be able to store all the data even using this approach depending on how much data is in the_permutation
If you only want a certain amount of the permutations you can call next om the permutations object:
the_permutation = itertools.permutations(organize(number))
for x in range(factorial):
complete_string = ''.join(next(the_permutation))
Or use itertools.islice:
for x in itertools.islice(the_permutation,factorial):
complete_string = ''.join(next(the_permutation))
Keep in mind factorials grow enormously fast
... so even for a string of moderate length the number of permutations is enormous. For 12 letters its ~ 480 million.