What's wrong with my python multiprocessing code? - python

I am an almost new programmer learning python for a few months. For the last 2 weeks, I had been coding to make a script to search permutations of numbers that make magic squares.
Finally I succeeded in searching the whole 880 4x4 magic square numbers sets within 30 seconds. After that I made some different Perimeter Magic Square program. It finds out more than 10,000,000 permutations so that I want to store them part by part to files. The problem is that my program doesn't use all my processes that while it is working to store some partial data to a file, it stops searching new number sets. I hope I could make one process of my CPU keep searching on and the others store the searched data to files.
The following is of the similar structure to my magic square program.
while True:
print('How many digits do you want? (more than 20): ', end='')
ansr = input()
if ansr.isdigit() and int(ansr) > 20:
ansr = int(ansr)
break
else:
continue
fileNum = 0
itemCount = 0
def fileMaker():
global fileNum, itemCount
tempStr = ''
for i in permutationList:
itemCount += 1
tempStr += str(sum(i[:3])) + ' : ' + str(i) + ' : ' + str(itemCount) + '\n'
fileNum += 1
file = open('{0} Permutations {1:03}.txt'.format(ansr, fileNum), 'w')
file.write(tempStr)
file.close()
numList = [i for i in range(1, ansr+1)]
permutationList = []
itemCount = 0
def makePermutList(numList, ansr):
global permutationList
for i in numList:
numList1 = numList[:]
numList1.remove(i)
for ii in numList1:
numList2 = numList1[:]
numList2.remove(ii)
for iii in numList2:
numList3 = numList2[:]
numList3.remove(iii)
for iiii in numList3:
numList4 = numList3[:]
numList4.remove(iiii)
for v in numList4:
permutationList.append([i, ii, iii, iiii, v])
if len(permutationList) == 200000:
print(permutationList[-1])
fileMaker()
permutationList = []
fileMaker()
makePermutList(numList, ansr)
I added from multiprocessing import Pool at the top. And I replaced two 'fileMaker()' parts at the end with the following.
if __name__ == '__main__':
workers = Pool(processes=2)
workers.map(fileMaker, ())
The result? Oh no. It just works awkwardly. For now, multiprocessing looks too difficult for me.
Anybody, please, teach me something. How should my code be modified?

Well, addressing some things that are bugging me before getting to your asked question.
numList = [i for i in range(1, ansr+1)]
I know list comprehensions are cool, but please just do list(range(1, ansr+1)) if you need the iterable to be a list (which you probably don't need, but I digress).
def makePermutList(numList, ansr):
...
This is quite the hack. Is there a reason you can't use itertools.permutations(numList,n)? It's certainly going to be faster, and friendlier on memory.
Lastly, answering your question: if you are looking to improve i/o performance, the last thing you should do is make it multithreaded. I don't mean you shouldn't do it, I mean that it should literally be the last thing you do. Refactor/improve other things first.
You need to take all of that top-level code that uses globals, apply the backspace key to it, and rewrite functions that pass data around properly. Then you can think about using threads. I would personally use from threading import Thread and manually spawn Threads to do each unit of I/O rather than using multiprocessing.

Related

itertools.product for the full range of columns

as a part of my code, I'm trying to get a full factorial matrix, this is not a problem since I already have a working code for it. However, I would like to generalize it in a way that it wouldn't matter the number of inputs. This would require modifying the line:
for combination in itertools.product(X[0,:],X[1,:],X[2,:],X[3,:],X[4,:],X[5,:],X[6,:]):
input_list = dfraw.columns[0:n_inputs]
output_list = dfraw.columns[n_inputs:len(dfraw.columns)]
fflvls = 4
lhspoints = 60000
X = np.zeros((n_inputs, fflvls),float)
ii=0
for entrada in input_list:
X[ii] = np.linspace(min(dfraw[entrada]), max(dfraw[entrada]), fflvls)
ii+=1
number=1
i=0
X_fact=np.zeros((int(fflvls**n_inputs),n_inputs),float)
for combination in itertools.product(X[0,:],X[1,:],X[2,:],X[3,:],X[4,:],X[5,:],X[6,:]):
X_fact[i,:] = (combination)
i +=1
number+=1
I thought of writing the input of itertools.product as a string with a loop and then evaluating but it doesn't work and I've also seen it is regarded as bad practice
prodstring = ['X[0,:]']
for ii in range(n_inputs):
prodstring.append(',X[%d,:]'%(ii))
in_products = ''.join(prodstring)
for combination in itertools.product(eval(in_products)):
X_fact[i,:] = (combination)
i +=1
number+=1
what other way is there to inputing the full range of columns in this function? (or similar ones)
who said working harder is working better? im back from lunch and I delved into *args and **kwargs as a form of procrastination cause ive sometimes seen them mentioned and i was curious. It seems like it was just the tool I needed. In case this can help other code rookies like me in the future:
args = ()
for ii in range(n_inputs):
b = (X[ii,:])
args += (b,)
for combination in itertools.product(*args):
X_fact[i,:] = (combination)
i +=1
number+=1
Seems to work properly. Solved in an hour of "not working" what i haven't solved in approx 4 hours of "working"

Mime type optimisation in python

I want to solve the mime challenge in coding games.com. My code can pass all the test but not the optimisation test.
I tried to remove all useless functions like parsing to string but the problem is on the way I think about it.
import sys
import math
# Auto-generated code below aims at helping you parse
# the standard input according to the problem statement.
n = int(input()) # Number of elements which make up the association table.
q = int(input())
# Number Q of file names to be analyzed.
dico = {}
# My function
def check(word):
for item in dico:
if(word[-len(item)-1:].upper() == "."+item.upper()):
return(dico[item])
return("UNKNOWN")
for i in range(n):
# ext: file extension
# mt: MIME type.
ext, mt = input().split()
dico[ext] = mt
for i in range(q):
fname = input()
fname = fname
print(check(fname))
# Write an action using print
# To debug: print("Debug messages...", file=sys.stderr)
#print("Debug message...", file=sys.stderr)
Failure
Process has timed out. This may mean that your solution is not optimized enough to handle some cases.
This is the right idea, but one detail appears to be destroying the performance. The problem is the line for item in dico:, which unnecessarily loops over every entry in the dictionary. This is a linear search O(n), checking for the target item-by-item. But this pretty much defeats the purpose of the dictionary data structure, which is to offer constant-time O(1) lookups. "Constant time" means that no matter how big the dictionary gets, the time it takes to find an item is always the same (thanks to hashing).
To draw a metaphor, imagine you're looking for a spoon in your kitchen. If you know where all the utensils, appliances and cookware are are ahead of time, you don't need to look in every drawer to find the utensils. Instead, you just go straight to the utensils drawer containing the spoon you want, and it's one-shot!
On the other hand, if you're in someone else's kitchen, it can be difficult to find a spoon. You have to start at one end of the cupboard and check every drawer until you find the utensils. In the worst-case, you might get unlucky and have to check every drawer before you find the utensil drawer.
Back to the code, the above snippet is using the latter approach, but we're dealing with trying to find something in 10k unfamiliar kitchens each with 10k drawers. Pretty slow, right?
If you can adjust the solution to check the dictionary in constant time, without a loop, then you can handle n = 10000 and q = 10000 without having to make q * n iterations (you can do it in q iterations instead--so much faster!).
Thank you for your help,
I figured out the solution.
n = int(input()) # Number of elements which make up the association table.
q = int(input()) # Number Q of file names to be analyzed.
dico = {}
# My function
def check(word):
if("." in word):
n = len(word)-(word.rfind(".")+1)
extension = word[-n:].lower()
if(extension in dico):
return(dico[extension])
return("UNKNOWN")
for i in range(n):
# ext: file extension
# mt: MIME type.
ext, mt = input().split()
dico[ext.lower()] = mt
for i in range(q):
fname = input()
print(check(fname))
Your explanation was clear :D
Thank you

python code to generate password list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am researching wireless security and trying to write a python script to generate passwords, not random, but a dictionary of hex numbers. The letters need to be capital, and it has to go from 12 characters to 20 characters. I went from 11 f's to 20 f's, this seems like it would meet the requirements. I then tried to place them in a text file. After I made the file, I chmod'ed it to 777 and then clicked run. It has been a few minutes, but I cannot tell if it is working or not. I am running it in kali right now, on a 64 bit core i3 with 8gb of ram. I'm not sure how long it would be expected to take, but this is my code, let me know if it looks right please:
# generate 10 to 32 character password list using hex numbers, 0-9 A-F
def gen_pwd(x):
x = range(17592186044415 -295147905179352830000)
def toHex(dec):
x = (dec % 16)
digits = "0123456789ABCDEF"
rest = dec / 16
if (rest == 0):
return digits[x]
return toHex(rest) + digits[x]
for x in range(x):
print toHex(x)
f = open(/root/Home/sdnlnk_pwd.txt)
print f
value = x
string = str(value)
f.write(string)
gen_pwd
how bout just
password = hex(random.randint(1000000,100000000))[2:]
or
pw_len = 12
my_alphabet = "1234567890ABCDEF"
password = "".join(random.choice(my_alphabet) for _ in range(pw_len))
or what maybe closer to what you are trying to do
struct.pack("Q",12365468987654).encode("hex").upper()
basically you are overcomplicating a very simple task
to do exactly what you are asking you can simplify it
import itertools, struct
def int_to_chars(d):
'''
step 1: break into bytes
'''
while d > 0: # while we have not consumed
yield struct.pack("B",d&0xFF) # decode char
d>>=8 # shift right one byte
yield "" # a terminator just in case its empty
def to_password(d):
# this will convert an arbitrarily large number to a password
return "".join(int_to_chars(d)).encode("hex").upper()
# you could probably just get away with `return hex(d)[2:]`
def all_the_passwords(minimum,maximum):
#: since our numbers are so big we need to resort to some trickery
all_pw = itertools.takewhile(lambda x:x<maximum,
itertools.count(minimum))
for pw in all_pw:
yield to_password(pw)
all_passwords = all_the_passwords( 0xfffffffffff ,0xffffffffffffffffffff)
#this next bit is gonna take a while ... go get some coffee or something
for pw in all_passwords:
print pw
#you will be waiting for it to finish for a very long time ... but it will get there
You can use time.time() to get the execution time. and if you are using python 2 use xrange() instead range because xrange return an iterator :
import time
def gen_pwd(x):
def toHex(dec):
x = (dec % 16)
digits = "0123456789ABCDEF"
rest = dec / 16
if (rest == 0):
return digits[x]
return toHex(rest) + digits[x]
for x in range(x):
print toHex(x)
f = open("/root/Home/sdnlnk_pwd.txt")
print f
value = x
string = str(value)
f.write(string)
start= time.time()
gen_pwd()
last=time.time()-start
print last
Note : you need () to call your function and "" in your open() function. also i think your first range is an extra command , as its wrong , you need to remove it.
Disclaimer
I'd like to comment on the OP question but I need to show some code and also the output that said code produces, so that I eventually decided to present my comment in the format of an answer.
OTOH, I hope that this comment persuades the OP that her/his undertaking, while conceptually simple (see my previous answer, 6 lines of Python code), is not feasible with available resources (I mean, available on Planet Earth).
Code
import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
pg = lambda n: locale.format("%d", n, grouping=True)
def count_bytes(low, hi):
count = low+1
for i in range(low+1,hi+1):
nn = 15*16**(i-1)
nc = i+1
count = count + nn*nc
return count
n_b = count_bytes(10,20)
n_d = n_b/4/10**12
dollars = 139.99*n_d
print "Total number of bytes to write on disk:", pg(n_b)
print """
Considering the use of
WD Green WD40EZRX 4TB IntelliPower 64MB Cache SATA 6.0Gb/s 3.5\" Internal Hard Drives,
that you can shop at $139.99 each
(see <http://www.newegg.com/Product/Product.aspx?Item=N82E16822236604>,
retrieved on December 29th, 2014)."""
print "\nNumber of 4TB hard disk drives necessary:", pg(n_d)
print "\nCost of said hard disks: $" + pg(dollars)
Output
Total number of bytes to write on disk: 25,306,847,157,254,216,063,385,611
Considering the use of
WD Green WD40EZRX 4TB IntelliPower 64MB Cache SATA 6.0Gb/s 3.5" Internal Hard Drives,
that you can shop at $139.99 each
(see <http://www.newegg.com/Product/Product.aspx?Item=N82E16822236604>,
retrieved on December 29th, 2014).
Number of 4TB hard disk drives necessary: 6,326,711,789,313
Cost of said hard disks: $885,676,383,385,926
My comment on what the OP wants to do
Quite a bit of disk storage (and money) is needed to accomplish your undertaking.
Perspective
Projected US Federal debt at the end of fiscal year 2014 is $18.23 trillion, my estimated cost, not considering racks, power supplies and energy bills, is $886 trillion.
Recommended reading
Combinatorial_Explosion#SussexUniversity,
There is hope
If you are still convinced to pursue your research project on wireless security in the direction you've described, it is possible that you can get a substantial volume discount on the drives'purchase.
characters=["a","b","c"]
for x,y in zip(range(5),characters):
print (hex(x)+y)
Output:
>>>
0x0a
0x1b
0x2c
>>>
You see, its actually doing that with a short way. It is not possible if you use a range like that, keep it small and try to add another things to your result.
Also for file process, here is a better way:
with open("filepath/name","a+") as f:
f.write("whateveryouwanttowrite")
I was working with password generators, well better if you define a dict with complicated characters and compile them like:
passw={"h":"_*2ac","e":"=.kq","y":"%.hq1"}
x=input("Wanna make some passwords? Enter a sentence or word: ")
for i in x:
print (passw[i],end="")
with open("passwords.txt","a+") as f:
f.write(passw[i])
Output:
>>>
Wanna make some passwords? Enter a sentence or word: hey
_*2ac=.kq%.hq1
>>>
So, just define a dict with keys=alphabet and values=complicated characters, and you can make very strong passwords with simple words-sentences.I just wrote it for an example, of course you can add them to dict later, you dont have to write. But basic way is for that is better I think.
Preamble
I don't want to comment on what you want to do.
Code MkI
Your code can be trimmed (quite a bit) to the following
with open("myfile", "w") as f:
for x in xrange(0xff,0xff*2+1): f.write("%X\n"%x)
Comments on my code
Please note that
You can write hex numbers in source code as, ehm, hex numbers and you can mix hex and decimal notation as well
The to_hex function is redundant as python has (surprise!) a number of different ways to format your output as you please (here I've used so called string interpolation).
Of course you have to change the filename in the open statement and
adjust the extremes of the interval generated by xrange (it seems
you're using python 2.x) to your content.
Code MkII
Joran Beasley remarked that (at least in Python 2.7) xrange internally uses a C long and as such it cannot step up to the task of representing
0XFFFFFFFFFFFFFFFFFFFF. This alternative code may be a possibility:
f = open("myfile", "w")
cursor = 0XFFFFFFFFFF
end = 0XFFFFFFFFFFFFFFFFFFFF
while cursor <= end:
f.write("%X\n"%cursor)
cursor += 1
all of this is well and good, however, none of it accomplishes my purpose. if python cannot handle such large numbers, i will have to use something else. as i stated, i do not want to generate random anything, i need a list of sequential hex characters which are anywhere from 12 characters to 20 characters long. it is to make a dictionary of passwords which are nothing more than a hex number that should be about 16 characters long.
does anyone have any suggestions on what i can use for this purpose? i think some type of c language should do the trick, but i know less about c or c++ than python. sounds like this will take a while, but that's ok, it is just a research project.
i have come up with another possibility, counting in hex starting from 11 f's and going until i reach 20 f's. this would produce about 4.3 billion numbes, which should fit in a 79 million page word document. sounds like it is a little large, but if i go from 14 f's to 18 f's, that should be manageable. here is the code i am proposing now:
x = 0xffffffffffffff
def gen_pwd(x):
while x <= 0xffffffffffffffffff:
return x
string = str(x)
f = open("root/Home/sdnlnk_pwd.txt")
print f.upper(string, 'a')
f.write(string)
x = x + 0x1
gen_pwd()

Processing a sub-list of variable size within a larger list

I'm a biological engineering PhD student here trying to self-learn Python programming for use in automating a part of my research, but I've ran into a problem with processing sub-lists within a bigger list that I can't seem to solve.
Basically, the goal of what I'm trying to do is write a small script that will process a CSV file containing a list of plasmid sequences that I'm building using various DNA assembly methods, and then spit out the primer sequences that I need to order in order to build the plasmid.
Here's the scenario that I'm dealing with:
When I want to build a plasmid, I have to enter into my Excel spreadsheet the full sequence of that plasmid. I have to choose between two DNA assembly methods, called "Gibson" and "iPCR". Each "iPCR" assembly only requires one line in the list, so I know how to process those guys already, as I just have to put in one cell the full sequence of the plasmid I'm trying to build. "Gibson" assemblies, on the other hand, require that I have to split up the full DNA sequence into smaller chunks, so sometimes I need 2-5 lines within the Excel spreadsheet to fully describe one plasmid.
So I end up with a spreadsheet that sort of ends up looking like this:
Construct.....Strategy.....Name
1.....Gibson.....P(OmpC)-cI::P(cI)-LacZ controller
1.....Gibson.....P(OmpC)-cI::P(cI)-LacZ controller
1.....Gibson.....P(OmpC)-cI::P(cI)-LacZ controller
2.....iPCR.......P(cpcG2)-K1F controller with K1F pos. feedback
3.....Gibson.....P(cpcG2)-K1F controller with swapped promoter positions
3.....Gibson.....P(cpcG2)-K1F controller with swapped promoter positions
4.....iPCR.......P(cpcG2)-K1F controller with stronger K1F RBS library
I think the list at this length is representative enough.
So the problem I'm running into is, I'd like to be able to run through the list and process the Gibsons, but I can't seem to get the code to work the way I want. Here's the code I've written so far:
#import BioPython Tools
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
#import csv tools
import csv
import sys
import os
with open('constructs-to-make.csv', 'rU') as constructs:
construct_list = csv.reader(constructs, delimiter=',')
construct_list.next()
construct_number = 1
primer_list = []
temp_list = []
counter = 2
for row in construct_list:
print('Current row is row number ' + str(counter))
print('Current construct number is ' + str(construct_number))
print('Current assembly type is ' + row[1])
if row[1] == "Gibson": #here, we process the Gibson assemblies first
print('Current construct number is: #' + row[0] + ' on row ' + str(counter) + ', which is a Gibson assembly')
## print(int(row[0]))
## print(row[3])
if int(row[0]) == construct_number:
print('Adding DNA sequence from row ' + str(counter) + ' for construct number ' + row[0])
temp_list.append(str(row[3]))
counter += 1
if int(row[0]) > construct_number:
print('Current construct number is ' + str(row[0]) + ', which is greater than the current construct number, ' + str(construct_number))
print('Therefore, going to work on construct number ' + str(construct_number))
for part in temp_list: #process the primer design work here
print('test')
## print(part)
construct_number += 1
temp_list = []
print('Adding DNA from row #' + str(counter) + ' from construct number ' + str(construct_number))
temp_list.append(row)
print('Next construct number is number ' + str(construct_number))
counter += 1
## counter += 1
if str(row[1]) == "iPCR":
print('Current construct number is: ' + row[0] + ' on row ' + str(counter) + ', which is an iPCR assembly.')
#process the primer design work here
#get first 60 nucleotides from the sequence
sequence = row[3]
fw_primer = sequence[1:61]
print('Sequence of forward primer:')
print(fw_primer)
last_sixty = sequence[-60:]
## print(last_sixty)
re_primer = Seq(last_sixty).reverse_complement()
print('Sequence of reverse primer:')
print(re_primer)
#ending code: add 1 to counter and construct number
counter += 1
construct_number += 1
## if int(row[0]) == construct_number:
## else:
## counter += 1
## construct_number += 1
## print(temp_list)
## for row in temp_list:
## print(temp_list)
## print(temp_list[-1])
# fw_primer = temp_list[counter - 1].
(I know the code probably looks noob - I've never done any programming class beyond introductory Java.)
The problem with this code is that if I have n "constructs" (a.k.a. plasmids) that I'm trying to build by "Gibson" assembly, it will process the first n-1 plasmids, but not the last one. I also can't think of any better way to write this code, however, but I can see that for the workflow that I'm trying to implement, knowing how to process "n" things in a list, but with each "thing" of variable numbers of rows, would come in really handy for me.
I'd really appreciate anybody's help here! Thanks a lot!
The problem with this code is that if I have n "constructs" (a.k.a. plasmids) that I'm trying to build by "Gibson" assembly, it will process the first n-1 plasmids, but not the last one.
This is actually a general problem, and the simplest way around it is to add a check after the loop, like this:
for row in construct_list:
do all your existing code
if we have a current Gibson list:
repeat the code to process it.
Of course you don't want to repeat yourself… so you move that work into a function, which you call in both places.
However, I'd probably write this differently, using groupby. I know this will probably seem "way too advanced" at first glance, but it's worth trying to see if you can understand it, because it makes things a lot simpler.
def get_strategy(row):
return row[0]
for group in itertools.groupby(construct_list, key=get_strategy):
Now, you'll get each construct as a separate list, so you don't need the temp_list at all. For example, the first group will be:
[[1, 'Gibson', 'P(OmpC)-cI::P(cI)-LacZ controller'],
[1, 'Gibson', 'P(OmpC)-cI::P(cI)-LacZ controller'],
[1, 'Gibson', 'P(OmpC)-cI::P(cI)-LacZ controller']]
The next will be:
[[2, 'iPCR', 'P(cpcG2)-K1F controller with K1F pos. feedback']]
And there won't be a left-over group at the end to worry about.
So:
for group in itertools.groupby(construct_list, key=get_strategy):
construct_strategy = get_strategy(group[0])
if construct_strategy == "Gibson":
# your existing code, using group instead of temp_list,
# and no need to maintain temp_list at all
elif construct_strategy == 'iPCR":
# your existing code, using group[0] instead of row
Once you get over the abstraction hurdle, it's a whole lot simpler to think about the problem this way.
In fact, once you start to grasp iterators intuitively, you'll start finding that itertools (and the recipes on its docs page, and the third-party library more_itertools, and similar code you can write yourself) turn a lot of complicated questions into very simple ones. The answer to "How do I keep track of the current group of matching rows within a list of rows?" is "Keep a temporary list, and remember to check it every time the group changes and then check again at the end for leftovers", but the answer to the equivalent question "How do I transform row iteration into row-group iteration?" is "Wrap the iterator in groupby."
You also might want to add in an assert or other check that all(row[1] == construct_strategy for row in group[1:]), that len(group) == 1 in the iPCR case, that there is no unexpected third strategy, etc., so when you inevitable run into an error, it'll be easier to tell whether it was bad data or bad code.
Meanwhile, instead of using a csv.reader, skipping the first row, and referring to the columns by meaningless numbers, it might be better to use a DictReader:
with open('constructs-to-make.csv', 'rU') as constructs:
primer_list = []
def get_strategy(row):
return row["Strategy"]
for group in itertools.groupby(csv.DictReader(constructs), key=get_strategy):
# same as before, but with
# ... row["Construct"] instead of row[0]
# ... row["Strategy"] instead of row[1]
# ... row["Name"] instead of row[2]
Just some general coding help with python. If you haven't read PEP8 do so.
To maintain clear code it can be helpful to assign variables to fields referenced in a record/row.
I would add something like this for any field referenced:
construct_idx = 0
Also, I would recommend using string formatting, it's cleaner.
So:
print('Current construct number is: #{} on row {}, which is a Gibson assembly'.format(row[construct_idx], counter))
Instead of:
print('Current construct number is: #' + row[0] + ' on row ' + str(counter) + ', which is a Gibson assembly')
If you're creating a csv reader object, making it's variable name "*_list" can be miss-leading. Calling it "*_reader" is more intuitive.
construct_reader = csv.reader(constructs, delimiter=',')
Instead of:
construct_list = csv.reader(constructs, delimiter=',')

make a global condition break

allow me to preface this by saying that i am learning python on my own as part of my own curiosity, and i was recommended a free online computer science course that is publicly available, so i apologize if i am using terms incorrectly.
i have seen questions regarding this particular problem on here before - but i have a separate question from them and did not want to hijack those threads. the question:
"a substring is any consecutive sequence of characters inside another string. The same substring may occur several times inside the same string: for example "assesses" has the substring "sses" 2 times, and "trans-Panamanian banana" has the substring "an" 6 times. Write a program that takes two lines of input, we call the first needle and the second haystack. Print the number of times that needle occurs as a substring of haystack."
my solution (which works) is:
first = str(input())
second = str(input())
count = 0
location = 0
while location < len(second):
if location == 0:
location = str.find(second,first,0)
if location < 0:
break
count = count + 1
location = str.find(second,first,location +1)
if location < 0:
break
count = count + 1
print(count)
if you notice, i have on two separate occasions made the if statement that if location is less than 0, to break. is there some way to make this a 'global' condition so i do not have repetitive code? i imagine efficiency becomes paramount with increasing program sophistication so i am trying to develop good practice now.
how would python gurus optimize this code or am i just being too nitpicky?
I think Matthew and darshan have the best solution. I will just post a variation which is based on your solution:
first = str(input())
second = str(input())
def count_needle(first, second):
location = str.find(second,first)
if location == -1:
return 0 # none whatsoever
else:
count = 1
while location < len(second):
location = str.find(second,first,location +1)
if location < 0:
break
count = count + 1
return count
print(count_needle(first, second))
Idea:
use function to structure the code when appropriate
initialise the variable location before entering the while loop save you from checking location < 0 multiple times
Check out regular expressions, python's re module (http://docs.python.org/library/re.html). For example,
import re
first = str(input())
second = str(input())
regex = first[:-1] + '(?=' + first[-1] + ')'
print(len(re.findall(regex, second)))
As mentioned by Matthew Adams the best way to do it is using python'd re module Python re module.
For your case the solution would look something like this:
import re
def find_needle_in_heystack(needle, heystack):
return len(re.findall(needle, heystack))
Since you are learning python, best way would be to use 'DRY' [Don't Repeat Yourself] mantra. There are lots of python utilities that you can use for many similar situation.
For a quick overview of few very important python modules you can go through this class:
Google Python Class
which should only take you a day.
even your aproach could be imo simplified (which uses the fact, that find returns -1, while you aks it to search from non existent offset):
>>> x = 'xoxoxo'
>>> start = x.find('o')
>>> indexes = []
>>> while start > -1:
... indexes.append(start)
... start = x.find('o',start+1)
>>> indexes
[1, 3, 5]
needle = "ss"
haystack = "ssi lass 2 vecess estan ss."
print 'needle occurs %d times in haystack.' % haystack.count(needle)
Here you go :
first = str(input())
second = str(input())
x=len(first)
counter=0
for i in range(0,len(second)):
if first==second[i:(x+i)]:
counter=counter+1
print(counter)
Answer
needle=input()
haystack=input()
counter=0
for i in range(0,len(haystack)):
if(haystack[i:len(needle)+i]!=needle):
continue
counter=counter+1
print(counter)

Categories