I am trying to compare two sequences using difflib.Differ(). However, I am observing some unwanted differences which I am not able to understand. Can someone please explain this behavior and how this can be resolved?
import difflib
a = "abc-123 Abcdef"
b = "abc-123 Abcdef-def"
a = a.strip("\n")
b = b.strip("\n")
a = a.split(" ")
b = b.split(" ")
d = difflib.Differ()
result = list(d.compare(a,b))
for s in result:
if s[0] == ' ':
continue
print s
Output:
- Abcdef
+ Abcdef-def
? ++++
Why is the ? difference reported here? I would expect only first two differences to be reported (changes only).
From the documentation:
Lines beginning with ‘?‘ attempt to guide the eye to intraline
differences, and were not present in either input sequence.
Meaning it's just a way to mark where the difference is, it's not actually another difference.
https://docs.python.org/2/library/difflib.html
Related
(Apologies this is gonna be a long question)
I just have a bug in my code that I have not been able to resolve for a very long time. I would really appreciate if someone could help me find out what the problem is.
Context:
I have a long string of letters - lets call this subject - containing the letters A, G, T and C (like DNA) and the whole point of my algorithms is to correctly count how many of each of the following STRs are found within subject. The STRs are:
AGATC
TTTTTTCT
AATG
TCTAG
GATA
TATC
GAAA
TCTG
I must count how many of each are within subject. Counting works by going sequentially letter by letter until the start of one of above STRs are found. If the rest of the STR follows, the program should update the counter of the respective STR and then boost the searching index to account of the length of the STR and then keep going. It should stop when it reaches the end of subject.
(Hope it makes sense).
My Code:
STRs = ['AGATC','TTTTTTCT','AATG','TCTAG','GATA','TATC','GAAA','TCTG']
subject = "GCTAAATTTGTTCAGCCAGATGTAGGCTTACAAATCAAGCTGTCCGCTCGGCACGGCCTACACACGTCGTGTAACTACAACAGCTAGTTAATCTGGATATCACCATGACCGAATCATAGATTTCGCCTTAAGGAGCTTTACCATGGCTTGGGATCCAATACTAAGGGCTCGACCTAGGCGAATGAGTTTCAGGTTGGCAATCAGCAACGCTCGCCATCCGGACGACGGCTTACAGTTAGTAGCATAGTACGCGATTTTCGGGAAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCCGTCAACTCATTCACACCGCATCCTTTCCTGCCACTGTAACTAGTCGACTGGGGAACCTCATCATCCATACTCTCCCACATTATGCCTCCCAACCTTGTTAAGCGTGGCATGCTTGGGATTGCATTGATGCTTCTTGGAGAGGACGCTTTCGTTTTGGAGATTACAGGGATCCAATTTTATCATCGGTTCGACTCCCGTAACGACTTAGCAGTAAGGGTGCTAGTTCCTGGTTAGAATCTTAATAAATCACGTCGCTTGGAGCAAGACAAAGATCGTCGTAATGCCAAGTGCACGACCACCTTCAGACTTGCAGGACCCGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTCGATAGCTATGCGGTTCAATACAATCTTAACGCAATGCAGCGATGTGGTTTCGTACACTTAGCATAAAACCCCCCACATTAAATCGATGTACCCGCCCTCTTAGACGCCAATTTCAATGCCGAACCTCCGGCGGGTATCTCTGCACTAGGAGAAGTAGCACGTCGCTGTAGCGAACTCCTATCGTGAGATAATTTGTAGAGCTGCTCTTATAATACAATAGCTCAGATGGATTATTCCATGGACATCCCCGTGCGTTGTTTCGAGGATGGTAGGTGGAAATTTTGCCAGACCTCTAGTCTTAAACATGGTTGACGTTATAGGCGCTATCTCTTGCGTCTGGAAGTGTTAATCCGTGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAACACGCAACTCTGGAGGAGGGCACTGCACTGCAAACTTGCGTAATATCCTTCACCCACACTTGCCTGGCCTCCTTGCTTAAAGCTCTGGCGATGCGATTTTTCGGCCCAGTAGCTGAATAGGTCATGAAATGGGCACCGAACTGGAAAGACCCATATATTCGATACTCACAACTTAATGATAGCGCGATTAAGAGCGACACCAAAAACCAAATTACGTTCACGAACCTTTGAGAGTCAAGGAGACTTAGACCGAATTGAATGATCACTGATGCGCCCGCTGATACTGAGCCTCACCATTAATCGCCGACCAATACGGCGTGTACCGGGCGCGGCCTTGCCGCATAACGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTACACAGCCCCGTCCTCATTGCTAAGTGCACTGGCAACTGGACCTAAAGATTTTTCGAGTATGGCCCTCGAATCAAGCGCCCACCCAGAAACCTACGAGCCAGTAACCCCAGTAAACAAGCATTAGTGCTATATGCTTGCTGCCCACTAGGACCCTTATGGTTCATACCAGGGTGACGTGTCTTGCGGGCCAAGGATGAACCAGAAGCAAGATCCTTAGATGGACGACTGTCTCATTGCTTAAACTCCACATACCAAAGGGCGCGGTAAACGATAGTTTTAGGTAATGTTAGTCGGATGGTTGTCTGCAGCTACCAATACAGCCTGGCACCCAGGGTCTGAACAATAACGCGTGAGAGCAGCTCTCCCGCGTGTGGTGGATTTGCCGTCTATGAAATTGAGGCTCTTGCAACTATTCGCACTCGGAATGCCCTCATATCTGGTGCCTAGCGGCCTTTGCCCCGTGCCGGTAGGACTAAACTCTACGGATCGTTGACGGATCTCGATGTGGAAGATGGTTATGAAAGATAACAACGCGTGTGCTAATTGATTTAGACAAGTATTGCGGCAGTAAAAGATAATCGGCTGCAGAGTTACGAAAGACTTCCATGCATGGATTCCATTCCTTCTAGTATAGGACCCACTCTGAATACACGTCTTGCGGGCCGATCATCTCCACCGCTGCGGAAGAAAGCAATTAAGAATCTATGCTCATTAAGAGTGCGACTATAATGCGGATCTTACAGTGCTAATGATCAGGACGTCGTCCAAGCAGGCTGCATGCCGAATTTAGCTTACGTCAGGATCAGGCGTTATAGCCTGGGAATCGGACTATGAGGACGCCACGACCTCTGGGAGAAAGCTATATACATTGAGGATCGCGCCATCTTTATGAGACTCAAATGAATCTAGATAGGTAGCATTGCGGACTTGAGTTAGCACATCGGTATTGGAAGGTGAGGGTCCTGCCGCTCGTTCTATGTTCGGTTTATAGTATACAAATAGGTCATCCCGAACGTTGAAGTTAAACTCATGACACGTTGTCGTAATGAAACGGGCCTGTTATTAGGGATACAGACAAAAGGCACAAGCTGGCTTGCACATTAAGGCGCACTAGAGATCCTCACAACCGTTGCCCGCACGGAGGTCGTGTCTAACAGACAGTGAACCAGCCGTATTGGGGTGGATGACCTGAGCTTCTTGGGGCCTGTTGTACACCGCGTGTGGTTCAACTGGTACACATACTACGAATATTCGAAATCATTGTACTGTGCTCTTCGGTGCTACTGACTGTGAGCGAATGCATCCCAATCCCAAACAATGCTTGTGGTAGGAGAATTGAAACTCTCGAAGCCTGGCCCAATGTCATCTACTTTTAACATGTCGGGCCAGGAGTTACGGGCATTGCTTACTTACTTTGCCCCCTTACACCACAGCAGCGCGATTCTTGTTGTAGTAGATTTTATACGACTCGCGAATTAAATGGAACTTGTCTGTCCCATATCGATCGTGTCCATCGTAAGATGAGATTGTAGGAGCATTCGGAAGTCTATGCGGCCCAGGGACTACTACGTTAAATCTGGTCAGACGTGGTTTACAAGGCGTCCCGATCTTCTCAGAACATATGGGAAAGCACTACCGTTCCTTCACGCATACAGTTGTTCGTGCCGAACGAGTAAGCTTGCGACCAGCCCACCCGCTAGGGCTATGCAGCGGGTCATGGCTGGCGCCATACTGTGCGGACAACCCACGCTCTGGCAGAAAGCGTCTTGTGTTTTGTAGTAGCTCCAACGGTTAGACCTTCGATATCTATTCAGAGCGCGAGCGACCACTATTAGACGGCATGTAAACAATGTGTATTTGTTCGGCCCAACCGGTATATGGGTAAGACCGCGAAGGGCCTGCGCGAATACCAGCGTCCAAAAATTCCTCACCCGAGATATGCGGTTAGTACCCCTTGGGTAACGGTCCGCTACGGGTAGCGACGCGAGCCGGCCGCATCGGTTGGAGCCGAGTTGTCGGGCAGGCGAGTAACGTGTGCAATTTGATGGGCCCAAGCCTCCGGCACTATCCACCTCATACATCGACAAAAGCACCAAATATGGGGAAAAGCTGAGCGTCGATATGTACATCTACCCAGGAACCGGCCCGAACATTAGGCGGACGTGAATTTCCGACCTAGGTTCGGCTACATTTCTACGATCCAAGCACACGTGAAGGAGGAGGGGTGTTCCGACCGTAAATGAACGAGGTGCGCAGTGACCCGATGGCGTTTAGCGGATAGCCTTCCTATGCCGGCCTATGCTGTATGGTAGTTGGTTGGTGCCTCCAGAGCCACTGCACCCAATCATAGGGTCTACAGCAGCGTACTTATAAAATTGTACGGGTGACCCATATCCATTACGGGTTGCGACCAGTATAGGAGAGTATAACTGCGTGAACTAATGCGTTATGACGCTTCAGAGTTTGCTCGGGCCCGAGTTCTAGGGCTATAATGTGTTAGGGCGCAAGTATGCCAAGCTAAGATGTGGCGTGCACACTAGGAGTTGTGTTCCTCTGCAAGCAGACACGAGCACTCTGGCAGTAGTTTGACCACACCCGGGTATCACTGCTACTCCATTTCGAACAAGCTATTGGAGCGGACAAAATATGCTACTCAAGAGCATTAGTTATAGGTCTACGAGACAGAAGCAGTTACTGAGTCTGAATATTCGATATAAGTAGGCATGGAGGCGGAGCAAAACAACGTCTGCGATCAATCGTGTTGATGACGTATGGCGACTGGAAGGTAAGGACTATGGCCGGACGGAATGATTCATGTTCTGTTCAAAGCTATATTTCGAAGGGGTATATTAGCGGTCCTACACTTGGTTAGCACCCTCCCCCCTCTGGATCCTGCACTAATTCGAGCTGGCCTCCATCGGTATCAGTCCGGAAGCTCCACTCTCTATCGTAGTCCTAATCAACAGGGTGCCAGTTTGCTCACGTGGAAGTTTGAGGCCCTTTGTGCTCCATAGCCAATCACTAACCATGCACGCGCGACCCACTCTACGTCCAGATCGGCTATAATAGTTGCGCCCGGGACTGGCAGAGTAGACATGTAAGCTAGATAGAGCCCCGACATCGGCCAAGAGATCCTACGCTGCTTCCAGATAATGAGAGACATTCTAGCATTAGACATGCAAGTCGGCAGGGACTCCCCTTATCTAGTAATTTCGATGAATTGGTTTTTCGGCTAGCATCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGACCATGCCGACCTCATCATAGAAGGAATGCTCTAAACTTAGAGTGCTACTAGGAAAACTATTAATCAATGATCGTCCTGCTTACATAGCTGGACGGCGAAAGTTCTTATACTGCGGAGGTTGCTGACGTAGAGTGCGCTGGGTACAGCGGATAAGTTGATCAGGGTGGGGATAGGGTGGCTCACCGTTTATACTCATATAGATTCCTGGCGTCGACGCTGTGACAGGGTCGAGATCGAGGGGGAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGCGGAGCGGAGGGAAAATTATCACCAGAGGGTAGGGGCTCGCGACATTCTATTCAATGCATTTCAAGCTACTTACGTATTTCGGCACAGTGACTACTGCCTGCGCGGCAGCCGTAAGGTTTCCCGTCAATAGGTGGCACGTATCATTGATGAAAGTGTCAGCTAATCATTCAGGCCTTA"
x = 0 # Searching index.
dataSTR = { # All the STRs to seach for.
"AGATC":0,
"TTTTTTCT":0,
"AATG":0,
"TCTAG":0,
"GATA":0,
"TATC":0,
"GAAA":0,
"TCTG":0,
}
# This dict will hold all the count values of STR's in the text-file.
# Scanning STR's from the txt file.
total = len(subject)
limit = 8
while x < total:
currentString = subject[x:x+limit] # A temporary variable to hold the next few letters from the text-file at index x.
for STR in STRs:
if STR in currentString: # The STR is found within this set of letters?
lSTR = len(STR) - 1
if STR[0:lSTR] == currentString[0:lSTR]: # In order to minimise the risk of duplication...
dataSTR[STR] += 1 # ...the STR must be at the start of currentString.
#print(currentString, STR, x, dataSTR[STR])
x += lSTR # The index must be boosted each time a new STR is read. In the event that an STR is at the end of a stand...
x += 1 # The index counts up by 1 by default. (From above) ...so that no duplicates are added.
print(dataSTR.items())
print("The correct result is: AGATC - 22, TTTTTTCT - 33, AATG - 43, TCTAG - 12, GATA - 26, TATC - 18, GAAA - 47, TCTG - 41")
(Sorry its very long, it might be helpful to copy into a separate python file).
As you will see from running it, the result my program brings up from counting is incorrect. The correct results are in the final print statement of the program, but the program does not match this (yes I know that these results are 100% correct since this is part of a problem set from an online computer science course).
However, I cannot seem to find the bug or logic error that seems to be causing my program to count wrong and I have been trying for quite a while now. Does anyone know what the solution is?
Please feel free to ask me anything about the program, thank you all.
Your problem statement doesn't agree with the "correct results" given in your example code. Either you've misunderstood the problem, or you've taken the correct results from a different problem. (The "correct results" appear to be for the problem of finding the maximum number of consecutive repeats of each query string.) [The latter possibility is the point that Chris Charley makes in a comment on the original post.]
You can convince yourself by doing the problem "by hand": look at the subject string in a text editor, pick a query string, do a search on it, and step through the occurrences.
E.g., for the query string "GAAA", you'll count ~67 occurrences, but most of them are in a block of 47 repeats in subject[1449:1637]. (This is more obvious if you use a text editor that highlights all occurrences of the search string, as 188 characters of consecutive highlighting should jump out at you.) And 47 agrees with the "correct result" for GAAA.
Does this help?
count_results = dict()
STRs = ['AGATC','TTTTTTCT','AATG','TCTAG','GATA','TATC','GAAA','TCTG']
subject = "loooong string..."
for search_string in STRs:
count_results[search_string] = subject.count(search_string)
print(count_results)
{'AGATC': 28, 'TTTTTTCT': 33, 'AATG': 69, 'TCTAG': 18, 'GATA': 46, 'TATC': 36, 'GAAA': 67, 'TCTG': 60}
I realize the results are sometimes different to your expected counts, but I didn't go through the intricacies of your search algo and wonder if the expected output might be wrong? If not, check out the docs for the str.count() function, to see how & why it gets different output, and adapt what it does to your needs.
Try like this:
import re
# Define STRs and subject here
dic = {}
for x in STRs:
tv = len([m.start() for m in re.finditer(x,subject)])
tv += 1
dic[x] = tv
for y in dic.keys():
print(y,dic[y])
The results in the last print statement are incorrect. I checked it with python's built in method .count(), if you are allowed to use this method just use this one instead, but if not, I would recommend to do the following:
total = len(subject)
while x < total:
for STR in STRs:
limit = len(STR)
currentString = subject[x:x+limit]
if STR == currentString:
dataSTR[STR] += 1
x += 1
that way, you set the limit to the string's length so the STR is either exactly the string or not, so you don't have to check for duplicates. I don't know why your code didn't work, but I hope this will help you.
I've started learning Python last week on codecademy and Google etc. but got stuck and couldn't find the answer anywhere so signed up on stackoverflow.com looking for your support.
I'm trying to build a program that only takes first 5 letters of any name and the remainder of the letter(s) to be shows as blank dot(s). e.g.
Adrian: "Adria."
Michael: "Micha.."
Alexander: "Alexa...." etc.
I tried to "fix" it with the "b" variable but that just prints three dots "..." regardless of how long the name is.
This is what I've got so far:
def namecheck():
name = raw_input("Name?")
if len(name) <=5:
print name
else:
if len(name) >5:
name = name[0:5]
b = ("...")
print name + b
namecheck()
I'm a total newbie so I apologise for any wrong spacing here, thank you for your support and patience.
As an alternative to sequence multiplication (one which is somewhat more self-documenting, and hopefully less confusing to maintainers), just use str.ljust to do your padding:
def namecheck():
name = raw_input("Name?")
# Reduce to first five (or less) characters, then pad with .s to original length
# with str.ljust
print name[:5].ljust(len(name), '.')
print name[:5] + '.' * (len(name) - 5) works fine, it's just a bit arcane (and also involves more temporary values, though in practice, the lack of actual method calls makes it faster on CPython).
you can try to use the function replace().
name = 'abcdefg'
name.replace(name[5:], '.' * len(name[5:]))
output: 'abcde..'
name='randy12345'
name.replace(name[5:],'.' * len(name[5:]))
output: 'randy.....'
name[5:] means get all the element starting 6 (5+1 because it start with 0)
'.' * len(name[5:] then this code count it and multiply it by dot
name.replace(name[5:],'.' * len(name[5:])) then use replace function to replace the excess element with dots
The most concise way I can think of:
def namecheck():
name = raw_input("Name?")
print(name[0:5] + '.' * (len(name) - 5))
namecheck()
Try something like this:
def namecheck():
name = raw_input("Name?")
if len(name) <= 5:
print name
else:
print name[0:5] + '.' * (len(name)-5)
namecheck()
I have a string in the form of ( + m1 + "|" + m2 + ) and ( + m1 + "." + m2 + )
where m1 an m2 are strings made up of "6", "7", "8", "a".
These are some of the valid expressions:
"(6|7)"
"((8.7)|(6.7).(a.2))"
Now my question is, if I want to split at the "." basically that works a a divisor, how would I do that?
What I did was tried finding a middle point but the thing is it's not always in the middle.
I also tried doing s.rindex(".") and s.index(".") and s.find(".") but they also don't seem to work.
I was thinking of calling the outer most brackets and then work it's way inside
but I don't think that's gonna work.
I am thinking there is some relation perhaps with the brackets but I just can't seem to figure out what it is.
Any suggestion on how do I approach this problem? or hint about how i can find that splitting point?
Any help will be appreciated.
Thanks in advance
This is my interpretation of what you're asking. Without more information, an explicitly stated problem, expected output, or code showing your own attempts, I can't do much more.
test1 = "(6|7)"
test2 = "((8.7)|(6.7).(a.2))"
# Not really sure what your output is suppose to look
# Like so this is my interpretation
test1_split = test1.replace("(", "").replace(")", "").split("|")
print test1_split # output --> ['6', '7']
# creates a list containing the string 6 and 7
# You said the . works an a divisor. If you mean a delimiter then you can use the split("."") method
# If you mean that it is a symbol for a divide sign then see below
test2 = test2.replace(".", "/") # convert periods into dividing sign
test2_split = test2.split("|") # seperates (8/7) (6/7)/(a/2))
print test2_split # output ['((8/7)', '(6/7)/(a/2))']
# now if this is an equation I would continue...Since I don't know the output
# I leave the rest to you
allow me to preface this by saying that i am learning python on my own as part of my own curiosity, and i was recommended a free online computer science course that is publicly available, so i apologize if i am using terms incorrectly.
i have seen questions regarding this particular problem on here before - but i have a separate question from them and did not want to hijack those threads. the question:
"a substring is any consecutive sequence of characters inside another string. The same substring may occur several times inside the same string: for example "assesses" has the substring "sses" 2 times, and "trans-Panamanian banana" has the substring "an" 6 times. Write a program that takes two lines of input, we call the first needle and the second haystack. Print the number of times that needle occurs as a substring of haystack."
my solution (which works) is:
first = str(input())
second = str(input())
count = 0
location = 0
while location < len(second):
if location == 0:
location = str.find(second,first,0)
if location < 0:
break
count = count + 1
location = str.find(second,first,location +1)
if location < 0:
break
count = count + 1
print(count)
if you notice, i have on two separate occasions made the if statement that if location is less than 0, to break. is there some way to make this a 'global' condition so i do not have repetitive code? i imagine efficiency becomes paramount with increasing program sophistication so i am trying to develop good practice now.
how would python gurus optimize this code or am i just being too nitpicky?
I think Matthew and darshan have the best solution. I will just post a variation which is based on your solution:
first = str(input())
second = str(input())
def count_needle(first, second):
location = str.find(second,first)
if location == -1:
return 0 # none whatsoever
else:
count = 1
while location < len(second):
location = str.find(second,first,location +1)
if location < 0:
break
count = count + 1
return count
print(count_needle(first, second))
Idea:
use function to structure the code when appropriate
initialise the variable location before entering the while loop save you from checking location < 0 multiple times
Check out regular expressions, python's re module (http://docs.python.org/library/re.html). For example,
import re
first = str(input())
second = str(input())
regex = first[:-1] + '(?=' + first[-1] + ')'
print(len(re.findall(regex, second)))
As mentioned by Matthew Adams the best way to do it is using python'd re module Python re module.
For your case the solution would look something like this:
import re
def find_needle_in_heystack(needle, heystack):
return len(re.findall(needle, heystack))
Since you are learning python, best way would be to use 'DRY' [Don't Repeat Yourself] mantra. There are lots of python utilities that you can use for many similar situation.
For a quick overview of few very important python modules you can go through this class:
Google Python Class
which should only take you a day.
even your aproach could be imo simplified (which uses the fact, that find returns -1, while you aks it to search from non existent offset):
>>> x = 'xoxoxo'
>>> start = x.find('o')
>>> indexes = []
>>> while start > -1:
... indexes.append(start)
... start = x.find('o',start+1)
>>> indexes
[1, 3, 5]
needle = "ss"
haystack = "ssi lass 2 vecess estan ss."
print 'needle occurs %d times in haystack.' % haystack.count(needle)
Here you go :
first = str(input())
second = str(input())
x=len(first)
counter=0
for i in range(0,len(second)):
if first==second[i:(x+i)]:
counter=counter+1
print(counter)
Answer
needle=input()
haystack=input()
counter=0
for i in range(0,len(haystack)):
if(haystack[i:len(needle)+i]!=needle):
continue
counter=counter+1
print(counter)
I am making a program for my own purposes (a naming program) that completely generates a random name. The problem is I cannot assign a number to a letter, so as a being 1 and z being 26, or a being 0 and z being 25. It gives me a SyntaxError. I need to assign this because the random integer (1,26) triggers a letter (if the random integer is 1, select A) and prints the name.
EDIT:
I have implemented your advice, and it works, I am grateful for this, but I wish to have my program create readable names, or more procedural. Here is an example of a name after I tweaked my program: ddjau. Now that doesn't look like a name, so I want it my program to work as if it were creating REAL names, like Samuel or other common names. Thanks!
EDIT (2):
Thanks, Adam, but I need a sort of 'seed' for the user to enter for the start of the name is. (Seed = A, Name = Adam. Seed = G, Name = George.) Should I do this by searching the file line by line, at the very beginning? If so, how do I do this?
Short Answer
Look into Python dictionaries to allow the 1 = 'a' type assignments. Below I have working example that would generate a random name based on gender and a 'litter'.
Disclaimer
I do not fully understand (via the code) what you're trying to accomplish with char/ord and a random letter. Also note having absolutely no idea of your design goals or requirements, I have made the example more complex than it may need to be for instructional purposes.
Additional Resources
* Python Docs for dictionary
* Using Python dictionary relationship to search both ways
In response to the last edit
If you are looking to build random 'real' names, I think your best bet will be to use a large list of names and just pick a random one. If I were you I'd look into something linking to the census results: males and females. Note that male_names.txt and female_names.txt are a copy of the list found at the census website. As a disclaimer, I'm sure there is a more efficient way to load / read the file. Just use this example as a proof on concept.
Update
Here's a quick and dirty way to seed the random values. Again I am not sure that this is the most pythonic way or most efficient way, but it works.
Example
import random
import time
def get_random_name(gender, seed):
if(gender == 'male'):
file = 'male_names.txt'
elif(gender == 'female'):
file = 'female_names.txt'
fid = open(file,'r')
names = []
total_names = 0
for line in fid:
if(line.lower().startswith(seed)):
names.append(line)
total_names = total_names + 1
random_index = random.randint(0,total_names)
return names[random_index]
if (__name__ == "__main__"):
print 'Welcome to Name Database 2.2\n'
print '1. Boy'
print '2. Girl'
bog = raw_input('\nGender: ')
print 'What should the name start with?'
print 'A, Ab, Abc, B, Ba, Br, etc...'
print ''
l = raw_input('Leter(s): ').lower()
new_name = ''
if bog == '1': # Boy
print get_random_name('male',l)
elif bog == '2':
print get_random_name('female',l)
Output
Welcome to Name Database 2.2
1. Boy
2. Girl
Gender: 2
What should the name start with?
A, Ab, Abc, B, Ba, Br, etc...
Leter(s): br
BRITTA
chr (see here) and ord (see here) are the two functions you're looking for (though you already seem to know about the latter). Follow those links for a more detailed explanation.
The first gives you a one-character string based on the integer, the second does the reverse operaion (technically, it handles Unicode as well, which chr doesn't, though you have unichr for that if you need it).
You can base your code on the following:
ch = "E"
print ord (ch) - ord ("A") + 1 # should give 5 for the fifth letter
val = 7
print chr (val + ord ("A") - 1) # should give G, the seventh letter
I'm not entirely sure what you're trying to do, but you can convert a number into a letter with the chr() function. chr() takes an ASCII code, so if you want to use the range [0, 25] instead you can adapt it like so:
chr(25 + ord('a')) # 'z'