Python, finding position of a charater in a string

Python, finding position of a charater in a string - python

>>> s.index("r")
>>> s.find("r")
The above only finds the first character. For example:
>>>s='hello'
>>>s.find('l')
only outputs 2. What if we want the position of both 'l' and want the output
>>>[2, 3]

You can use list comprehension, with enumerate, like this
>>> [index for index, char in enumerate(s) if char == "l"]
[2, 3]
The enumerate function will give the current index as well the current item in the iterable. So, in each iteration, you will get the index and the corresponding character in the string. We are checking of the character is l and if it is l, we include the index in the resulting list.

This is thefourtheye's answer expanded into a normal function:
def count_chars(string, char):
results = []
for index, value in enumerate(string):
if value == char:
results.append(index)
return results
However thefourtheye's answer is the best most pythonic answer. List comprehensions, and comprehensions in general, are a very important part of Python (and one of my favorite parts). I strongly suggest reading up on them here. Nothing wrong with being a little ahead of the curve :-).

thefourtheye's solution is the best but if you want a different method:
s = raw_input("Enter a string")
c = raw_input("Enter a char to search")
indexes = []
i = 0
while i < len(s):
if s[i] == c:
indexes.append(i)
i += 1
print "{} appears {} times at the following index/indexes {}".format(c,len(indexes),indexes)

Related

How to stop over counting of duplicate letters in a list of strings

I'm trying to count the number of times a duplicate letter shows up in the list element.
For example, given
arr = ['capps','hat','haaah']
I out put a list and I get ['1','0','1']
def myfunc(words):
counter = 0 #counters dup letters in words
len_ = len(words)-1
for i in range(len_):
if words[i] == words[i+1]: #if the letter ahead is the same add one
counter+=1
return counter
def minimalOperations(arr):
return [*map(myfunc,arr)] #map fuc applies myfunc to element in words.
But my code would output [1,0,2]
I'm not sure why I am over counting.
Can anyone help me resolve this, thank you in advance.

A more efficient solution using a regular expression:
import re
def myfunc(words):
reg_str = r"(\w)\1{1,}"
return len(re.findall(reg_str, words))
This function will find the number of substrings of length 2 or more containing the same letter. Thus 'aaa' in your example will only be counted once.
For a string like
'hhhhfafaahggaa'
the output will be 4 , since there are 4 maximal substrings of the same letter occuring at least twice : 'hhh' , 'ss', 'gg', 'aa'

You aren't accounting for situations where you have greater than 2 identical characters in succession. To do this, you can look back as well as forward:
if (words[i] == words[i+1]) and (words[i] != words[i-1] if i != 0 else True)
# as before
The ternary statement helps for the first iteration of the loop, to avoid comparing the last letter of a string with the first.
Another solution is to use itertools.groupby and count the number of instances where a group has a length greater than 1:
arr = ['capps','hat','haaah']
from itertools import groupby
res = [sum(1 for _, j in groupby(el) if sum(1 for _ in j) > 1) for el in arr]
print(res)
[1, 0, 1]
The sum(1 for _ in j) part is used to count the number items in a generator. It's also possible to use len(list(j)), though this requires list construction.

Well, your code counts the number of duplications, so what you observe is quite logical:
your input is arr = ['capps','hat','haaah']
in 'capps', the letter p is duplicated 1 time => myfunc() returns 1
in 'hat', there is no duplicated letter => myfunc() returns 0
in 'haaah', the letter a is duplicated 2 times => myfunc() returns 2
So finally you get [1,0,2].
For your purpose, I suggest you to use a regex to match and count the number of groups of duplicated letters in each word. I also replaced the usage of map() with a list comprehension that I find more readable:
import re
def myfunc(words):
return len(re.findall(r'(\w)\1+', words))
def minimalOperations(arr):
return [myfunc(a) for a in arr]
arr = ['capps','hat','haaah']
print(minimalOperations(arr)) # [1,0,1]
arr = ['cappsuul','hatppprrrrtyyy','haaah']
print(minimalOperations(arr)) # [2,3,1]

You need to keep track of a little more state, specifically if you're looking at duplicates now.
def myfunc(words):
counter = 0 #counters dup letters in words
seen = None
len_ = len(words)-1
for i in range(len_):
if words[i] == words[i+1] and words[i+1] != seen: #if the letter ahead is the same add one and wasn't the first
counter+=1
seen = words[i]
return counter
This gives you the following output
>>> arr = ['capps','hat','haaah']
>>> map(myfunc, arr)
[1, 0, 1]
As others have pointed out, you could use a regular expression and trade clarity for performance. They key is to find a regular expression that means "two or more repeated characters" and may depend on what you consider to be characters (e.g. how do you treat duplicate punctuation?)
Note: the "regex" used for this is technically an extension on regular expressions because it requires memory.
The form will be len(re.findall(regex, words))

I would break this kind of problem into smaller chunks. Starting by grouping duplicates.
The documentation for itertools has groupby and recipes for this kind of things.
A slightly edited version of unique_justseen would look like this:
duplicates = (len(sum(1 for _ in group) for _key, group in itertools.groupby("haaah")))
and yields values: 1, 3, 1. As soon as any of these values are greater than 1 you have a duplicate. So just count them:
sum(n > 1 for n in duplicates)

Use re.findall for matches of 2 or more letters
>>> arr = ['capps','hat','haaah']
>>> [len(re.findall(r'(.)\1+', w)) for w in arr]
[1, 0, 1]

Finding regular expression with at least one repetition of each letter

From any *.fasta DNA sequence (only 'ACTG' characters) I must find all sequences which contain at least one repetition of each letter.
For examle from sequence 'AAGTCCTAG' I should be able to find: 'AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG' and 'CTAG' (iteration on each letter).
I have no clue how to do that in pyhton 2.7. I was trying with regular expressions but it was not searching for every variants.
How can I achive that?

You could find all substrings of length 4+, and then down select from those to find only the shortest possible combinations that contain one of each letter:
s = 'AAGTCCTAG'
def get_shortest(s):
l, b = len(s), set('ATCG')
options = [s[i:j+1] for i in range(l) for j in range(i,l) if (j+1)-i > 3]
return [i for i in options if len(set(i) & b) == 4 and (set(i) != set(i[:-1]))]
print(get_shortest(s))
Output:
['AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG', 'CTAG']

This is another way you can do it. Maybe not as fast and nice as chrisz answere. But maybe a little simpler to read and understand for beginners.
DNA='AAGTCCTAG'
toSave=[]
for i in range(len(DNA)):
letters=['A','G','T','C']
j=i
seq=[]
while len(letters)>0 and j<(len(DNA)):
seq.append(DNA[j])
try:
letters.remove(DNA[j])
except:
pass
j+=1
if len(letters)==0:
toSave.append(seq)
print(toSave)

Since the substring you are looking for may be of about any length, a LIFO queue seems to work. Append each letter at a time, check if there are at least one of each letters. If found return it. Then remove letters at the front and keep checking until no longer valid.
def find_agtc_seq(seq_in):
chars = 'AGTC'
cur_str = []
for ch in seq_in:
cur_str.append(ch)
while all(map(cur_str.count,chars)):
yield("".join(cur_str))
cur_str.pop(0)
seq = 'AAGTCCTAG'
for substr in find_agtc_seq(seq):
print(substr)
That seems to result in the substrings you are looking for:
AAGTC
AGTC
GTCCTA
TCCTAG
CCTAG
CTAG

I really wanted to create a short answer for this, so this is what I came up with!
See code in use here
s = 'AAGTCCTAG'
d = 'ACGT'
c = len(d)
while c <= len(s):
x,c = s[:c],c+1
if all(l in x for l in d):
print(x)
s,c = s[1:],len(d)
It works as follows:
c is set to the length of the string of characters we are ensuring exist in the string (d = ACGT)
The while loop iterates over each possible substring of s such that c is smaller than the length of s.
This works by increasing c by 1 upon each iteration of the while loop.
If every character in our string d (ACGT) exist in the substring, we print the result, reset c to its default value and slice the string by 1 character from the start.
The loop continues until the string s is shorter than d
Result:
AAGTC
AGTC
GTCCTA
TCCTAG
CCTAG
CTAG
To get the output in a list instead (see code in use here):
s = 'AAGTCCTAG'
d = 'ACGT'
c,r = len(d),[]
while c <= len(s):
x,c = s[:c],c+1
if all(l in x for l in d):
r.append(x)
s,c = s[1:],len(d)
print(r)
Result:
['AAGTC', 'AGTC', 'GTCCTA', 'TCCTAG', 'CCTAG', 'CTAG']

If you can break the sequence into a list, e.g. of 5-letter sequences, you could then use this function to find repeated sequences.
from itertools import groupby
import numpy as np
def find_repeats(input_list, n_repeats):
flagged_items = []
for item in input_list:
# Create itertools.groupby object
groups = groupby(str(item))
# Create list of tuples: (digit, number of repeats)
result = [(label, sum(1 for _ in group)) for label, group in groups]
# Extract just number of repeats
char_lens = np.array([x[1] for x in result])
# Append to flagged items
if any(char_lens >= n_repeats):
flagged_items.append(item)
# Return flagged items
return flagged_items
#--------------------------------------
test_list = ['aatcg', 'ctagg', 'catcg']
find_repeats(test_list, n_repeats=2) # Returns ['aatcg', 'ctagg']

Python loops iteration to get a list of all indices of an element in a list

I am trying to write a function which consumes a string and a character and produces a list of indices for all occurrences of that character in that string.
So far this is what I have, but it always gives me [].
def list_of_indices(s,char):
string_lowercase = s.lower()
sorted_string = "".join(sorted(string_lowercase))
char_list = list(sorted_string)
for x in char_list:
a = []
if x == char:
a.append(char_list.index(x))
return a
I don't understand why this does not yield the answer. And it has to be a list of non-empty length.
Anyone aware of how to get the indices for all occurrences?

You're returning on the first iteration of your for-loop. Make sure the return statement is outside the scope of the loop.
Also, be sure to put a = [] before the for-loop. Otherwise, you're effectively resetting the list on each iteration of loop.
There is also a problem with char_list.index(x). This will always return the index of the first occurrence of x, which isn't what you want. You should keep track of an index as you are looping (e.g. with enumerate()).
And I'm not sure what you were trying to do with the sort; looping through the original string should be sufficient.
Lastly, note that you can loop over a string directly; you don't need to convert it to a list (i.e. char_list is unnecessary).
Note that your task can be accomplished with a simple list comprehension:
>>> s = 'abcaba'
>>> char = 'a'
>>>
>>> [i for i,c in enumerate(s) if c == char] # <--
[0, 3, 5]

You could implement it using a quick list comprehension.
def list_of_indicies(s, char):
return [i for i, c in enumerate(s) if c == char]
or by using a for loop instead:
def list_of_indicies(s, char):
results = list()
for i, c in enumerate(s):
if c == char:
results.append(i)
return results

You are returning a on the first loop of your for loop iteration.
Change for loop to this for starters:
def list_of_indices(s,char):
string_lowercase = s.lower()
a = []
i = 0
for x in string_lowercase:
if x == char:
a.append(i)
i+=1
return a

Reverse a string without using reversed() or [::-1]?

I came across a strange Codecademy exercise that required a function that would take a string as input and return it in reverse order. The only problem was you could not use the reversed method or the common answer here on stackoverflow, [::-1].
Obviously in the real world of programming, one would most likely go with the extended slice method, or even using the reversed function but perhaps there is some case where this would not work?
I present a solution below in Q&A style, in case it is helpful for people in the future.

You can also do it with recursion:
def reverse(text):
if len(text) <= 1:
return text
return reverse(text[1:]) + text[0]
And a simple example for the string hello:
reverse(hello)
= reverse(ello) + h # The recursive step
= reverse(llo) + e + h
= reverse(lo) + l + e + h
= reverse(o) + l + l + e + h # Base case
= o + l + l + e + h
= olleh

Just another option:
from collections import deque
def reverse(iterable):
d = deque()
d.extendleft(iterable)
return ''.join(d)

Use reversed range:
def reverse(strs):
for i in xrange(len(strs)-1, -1, -1):
yield strs[i]
...
>>> ''.join(reverse('hello'))
'olleh'
xrange or range with -1 step would return items in reversed order, so we need to iterate from len(string)-1 to -1(exclusive) and fetch items from the string one by one.
>>> list(xrange(len(strs) -1, -1 , -1))
[4, 3, 2, 1, 0] #iterate over these indexes and fetch the items from the string
One-liner:
def reverse(strs):
return ''.join([strs[i] for i in xrange(len(strs)-1, -1, -1)])
...
>>> reverse('hello')
'olleh'

EDIT
Recent activity on this question caused me to look back and change my solution to a quick one-liner using a generator:
rev = ''.join([text[len(text) - count] for count in xrange(1,len(text)+1)])
Although obviously there are some better answers here like a negative step in the range or xrange function. The following is my original solution:
Here is my solution, I'll explain it step by step
def reverse(text):
lst = []
count = 1
for i in range(0,len(text)):
lst.append(text[len(text)-count])
count += 1
lst = ''.join(lst)
return lst
print reverse('hello')
First, we have to pass a parameter to the function, in this case text.
Next, I set an empty list, named lst to use later. (I actually didn't know I'd need the list until I got to the for loop, you'll see why it's necessary in a second.)
The count variable will make sense once I get into the for loop
So let's take a look at a basic version of what we are trying to accomplish:
It makes sense that appending the last character to the list would start the reverse order. For example:
>>lst = []
>>word = 'foo'
>>lst.append(word[2])
>>print lst
['o']
But in order to continue reversing the order, we need to then append word[1] and then word[0]:
>>lst.append(word[2])
>>lst.append(word[1])
>>lst.append(word[0])
>>print lst
['o','o','f']
This is great, we now have a list that has our original word in reverse order and it can be converted back into a string by using .join(). But there's a problem. This works for the word foo, it even works for any word that has a length of 3 characters. But what about a word with 5 characters? Or 10 characters? Now it won't work. What if there was a way we could dynamically change the index we append so that any word will be returned in reverse order?
Enter for loop.
for i in range(0,len(text)):
lst.append(text[len(text)-count])
count += 1
First off, it is necessary to use in range() rather than just in, because we need to iterate through the characters in the word, but we also need to pull the index value of the word so that we change the order.
The first part of the body of our for loop should look familiar. Its very similar to
>>lst.append(word[..index..])
In fact, the base concept of it is exactly the same:
>>lst.append(text[..index..])
So what's all the stuff in the middle doing?
Well, we need to first append the index of the last letter to our list, which is the length of the word, text, -1. From now on we'll refer to it as l(t) -1
>>lst.append(text[len(text)-1])
That alone will always get the last letter of our word, and append it to lst, regardless of the length of the word. But now that we have the last letter, which is l(t) - 1, we need the second to last letter, which is l(t) - 2, and so on, until there are no more characters to append to the list. Remember our count variable from above? That will come in handy. By using a for loop, we can increment the value of count by 1 through each iteration, so that the value we subtract by increases, until the for loop has iterated through the entire word:
>>for i in range(0,len(text)):
..
.. lst.append(text[len(text)-count])
.. count += 1
Now that we have the heart of our function, let's look at what we have so far:
def reverse(text):
lst = []
count = 1
for i in range(0,len(text)):
lst.append(text[len(text)-count])
count += 1
We're almost done! Right now, if we were to call our function with the word 'hello', we would get a list that looks like:
['o','l','l','e','h']
We don't want a list, we want a string. We can use .join for that:
def reverse(text):
lst = []
count = 1
for i in range(0,len(text)):
lst.append(text[len(text)-count])
count += 1
lst = ''.join(lst) # join the letters together without a space
return lst
And that's it. If we call the word 'hello' on reverse(), we'd get this:
>>print reverse('hello')
olleh
Obviously, this is way more code than is necessary in a real life situation. Using the reversed function or extended slice would be the optimal way to accomplish this task, but maybe there is some instance when it would not work, and you would need this. Either way, I figured I'd share it for anyone who would be interested.
If you guys have any other ideas, I'd love to hear them!

Only been coding Python for a few days, but I feel like this was a fairly clean solution. Create an empty list, loop through each letter in the string and append it to the front of the list, return the joined list as a string.
def reverse(text):
backwardstext = []
for letter in text:
backwardstext.insert(0, letter)
return ''.join(backwardstext)

I used this:
def reverse(text):
s=""
l=len(text)
for i in range(l):
s+=text[l-1-i]
return s

Inspired by Jon's answer, how about this one
word = 'hello'
q = deque(word)
''.join(q.pop() for _ in range(len(word)))

This is a very interesting question, I will like to offer a simple one
liner answer:
>>> S='abcdefg'
>>> ''.join(item[1] for item in sorted(enumerate(S), reverse=True))
'gfedcba'
Brief explanation:
enumerate() returns [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g')]. The indices and the values.
To reverse the values, just reverse sort it by sorted().
Finally, just put it together back to a str

I created different versions of how to reverse a string in python in my repo:
https://github.com/fedmich/Python-Codes/tree/master/Reverse%20a%20String
You can do it by using list-comprehension or lambda technique:
# Reverse a string without using reverse() function
s = 'Federico';
li = list( s ) #convert string to list
ret = [ li[i-1] for i in xrange(len(li),0,-1) ] #1 liner lambda
print ( "".join( ret ) )
or by doing a backward for loop
# Reverse a string without using reverse() function
s = 'Federico';
r = []
length = len(s)
for i in xrange(length,0,-1):
r.append( s[ i - 1] )
print ( "".join(r) )

reduce(lambda x, y : y + x, "hello world")

A golfed version: r=lambda x:"".join(x[i] for i in range(len(x-1),-1,-1)).

i just solved this in code academy and was checking my answers and ran across this list. so with a very limited understanding of python i just did this and it seamed to work.
def reverse(s):
i = len(s) - 1
sNew = ''
while i >= 0:
sNew = sNew + str(s[i])
i = i -1
return sNew

def reverse(s):
return "".join(s[i] for i in range(len(s)-1, -1, -1))

Blender's answer is lovely, but for a very long string, it will result in a whopping RuntimeError: maximum recursion depth exceeded. One might refactor the same code into a while loop, as one frequently must do with recursion in python. Obviously still bad due to time and memory issues, but at least will not error.
def reverse(text):
answer = ""
while text:
answer = text[0] + answer
text = text[1:]
return answer

Today I was asked this same exercise on pen&paper, so I come up with this function for lists:
def rev(s):
l = len(s)
for i,j in zip(range(l-1, 0, -1), range(l//2)):
s[i], s[j] = s[j], s[i]
return s
which can be used with strings with "".join(rev(list("hello")))

This is a way to do it with a while loop:
def reverse(s):
t = -1
s2 = ''
while abs(t) < len(s) + 1:
s2 = s2 + s[t]
t = t - 1
return s2

I have also just solved the coresponding exercise on codeacademy and wanted to compare my approach to others. I have not found the solution I used so far, so I thought that I sign up here and provide my solution to others. And maybe I get a suggestion or a helpful comment on how to improve the code.
Ok here it goes, I did not use any list to store the string, instead I have just accessed the string index. It took me a bit at first to deal with the len() and index number, but in the end it worked :).
def reverse(x):
reversestring = ""
for n in range(len(str(x))-1,-1, -1):
reversestring += x[n]
return reversestring
I am still wondering if the reversestring = "" could be solved in a more elegant way, or if it is "bad style" even, but i couldn't find an answer so far.

def reverse(text):
a=""
l=len(text)
while(l>=1):
a+=text[l-1]
l-=1
return a
i just concatenated the string a with highest indexes of text (which keeps on decrementing by 1 each loop).

All I did to achieve a reverse string is use the xrange function with the length of the string in a for loop and step back per the following:
myString = "ABC"
for index in xrange(len(myString),-1):
print index
My output is "CBA"

You can simply reverse iterate your string starting from the last character. With python you can use list comprehension to construct the list of characters in reverse order and then join them to get the reversed string in a one-liner:
def reverse(s):
return "".join([s[-i-1] for i in xrange(len(s))])
if you are not allowed to even use negative indexing you should replace s[-i-1] with s[len(s)-i-1]

You've received a lot of alternative answers, but just to add another simple solution -- the first thing that came to mind something like this:
def reverse(text):
reversed_text = ""
for n in range(len(text)):
reversed_text += text[-1 - n]
return reversed_text
It's not as fast as some of the other options people have mentioned(or built in methods), but easy to follow as we're simply using the length of the text string to concatenate one character at a time by slicing from the end toward the front.

def reverseThatString(theString):
reversedString = ""
lenOfString = len(theString)
for i,j in enumerate(theString):
lenOfString -= 1
reversedString += theString[lenOfString]
return reversedString

This is my solution using the for i in range loop:
def reverse(string):
tmp = ""
for i in range(1,len(string)+1):
tmp += string[len(string)-i]
return tmp
It's pretty easy to understand. I start from 1 to avoid index out of bound.

Here's my contribution:
def rev(test):
test = list(test)
i = len(test)-1
result = []
print test
while i >= 0:
result.append(test.pop(i))
i -= 1
return "".join(result)

You can do simply like this
def rev(str):
rev = ""
for i in range(0,len(str)):
rev = rev + str[(len(str)-1)-i]
return rev

Here is one using a list as a stack:
def reverse(s):
rev = [_t for _t in s]
t = ''
while len(rev) != 0:
t+=rev.pop()
return t

Try this simple and elegant code.
my_string= "sentence"
new_str = ""
for i in my_string:
new_str = i + new_str
print(new_str)

you have got enough answer.
Just want to share another way.
you can write a two small function for reverse and compare the function output with the given string
var = ''
def reverse(data):
for i in data:
var = i + var
return var
if not var == data :
print "No palindrome"
else :
print "Palindrome"

Not very clever, but tricky solution
def reverse(t):
for j in range(len(t) // 2):
t = t[:j] + t[- j - 1] + t[j + 1:- j - 1] + t[j] + t[len(t) - j:]
return t

Pointfree:
from functools import partial
from operator import add
flip = lambda f: lambda x, y: f(y, x)
rev = partial(reduce, flip(add))
Test:
>>> rev('hello')
'olleh'

Determine prefix from a set of (similar) strings

I have a set of strings, e.g.
my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter
I simply want to find the longest common portion of these strings, here the prefix. In the above the result should be
my_prefix_
The strings
my_prefix_what_ever
my_prefix_what_so_ever
my_doesnt_matter
should result in the prefix
my_
Is there a relatively painless way in Python to determine the prefix (without having to iterate over each character manually)?
PS: I'm using Python 2.6.3.

Never rewrite what is provided to you: os.path.commonprefix does exactly this:
Return the longest path prefix (taken
character-by-character) that is a prefix of all paths in list. If list
is empty, return the empty string (''). Note that this may return
invalid paths because it works a character at a time.
For comparison to the other answers, here's the code:
# Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1

Ned Batchelder is probably right. But for the fun of it, here's a more efficient version of phimuemue's answer using itertools.
import itertools
strings = ['my_prefix_what_ever',
'my_prefix_what_so_ever',
'my_prefix_doesnt_matter']
def all_same(x):
return all(x[0] == y for y in x)
char_tuples = itertools.izip(*strings)
prefix_tuples = itertools.takewhile(all_same, char_tuples)
''.join(x[0] for x in prefix_tuples)
As an affront to readability, here's a one-line version :)
>>> from itertools import takewhile, izip
>>> ''.join(c[0] for c in takewhile(lambda x: all(x[0] == y for y in x), izip(*strings)))
'my_prefix_'

Here's my solution:
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
prefix_len = len(a[0])
for x in a[1 : ]:
prefix_len = min(prefix_len, len(x))
while not x.startswith(a[0][ : prefix_len]):
prefix_len -= 1
prefix = a[0][ : prefix_len]

The following is an working, but probably quite inefficient solution.
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
b = zip(*a)
c = [x[0] for x in b if x==(x[0],)*len(x)]
result = "".join(c)
For small sets of strings, the above is no problem at all. But for larger sets, I personally would code another, manual solution that checks each character one after another and stops when there are differences.
Algorithmically, this yields the same procedure, however, one might be able to avoid constructing the list c.

Just out of curiosity I figured out yet another way to do this:
def common_prefix(strings):
if len(strings) == 1:#rule out trivial case
return strings[0]
prefix = strings[0]
for string in strings[1:]:
while string[:len(prefix)] != prefix and prefix:
prefix = prefix[:len(prefix)-1]
if not prefix:
break
return prefix
strings = ["my_prefix_what_ever","my_prefix_what_so_ever","my_prefix_doesnt_matter"]
print common_prefix(strings)
#Prints "my_prefix_"
As Ned pointed out it's probably better to use os.path.commonprefix, which is a pretty elegant function.

The second line of this employs the reduce function on each character in the input strings. It returns a list of N+1 elements where N is length of the shortest input string.
Each element in lot is either (a) the input character, if all input strings match at that position, or (b) None. lot.index(None) is the position of the first None in lot: the length of the common prefix. out is that common prefix.
val = ["axc", "abc", "abc"]
lot = [reduce(lambda a, b: a if a == b else None, x) for x in zip(*val)] + [None]
out = val[0][:lot.index(None)]

Here's a simple clean solution. The idea is to use zip() function to line up all the characters by putting them in a list of 1st characters, list of 2nd characters,...list of nth characters. Then iterate each list to check if they contain only 1 value.
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
list = [all(x[i] == x[i+1] for i in range(len(x)-1)) for x in zip(*a)]
print a[0][:list.index(0) if list.count(0) > 0 else len(list)]
output: my_prefix_

Here is another way of doing this using OrderedDict with minimal code.
import collections
import itertools
def commonprefix(instrings):
""" Common prefix of a list of input strings using OrderedDict """
d = collections.OrderedDict()
for instring in instrings:
for idx,char in enumerate(instring):
# Make sure index is added into key
d[(char, idx)] = d.get((char,idx), 0) + 1
# Return prefix of keys while value == length(instrings)
return ''.join([k[0] for k in itertools.takewhile(lambda x: d[x] == len(instrings), d)])

I had a slight variation of the problem and google sends me here, so I think it will be useful to document:
I have a list like:
my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter
some_noise
some_other_noise
So I would expect my_prefix to be returned. That can be done with:
from collections import Counter
def get_longest_common_prefix(values, min_length):
substrings = [value[0: i-1] for value in values for i in range(min_length, len(value))]
counter = Counter(substrings)
# remove count of 1
counter -= Counter(set(substrings))
return max(counter, key=len)

In one line without using itertools, for no particular reason, although it does iterate through each character:
''.join([z[0] for z in zip(*(list(s) for s in strings)) if all(x==z[0] for x in z)])

Find the common prefix in all words from the given input string, if there is no common prefix print -1
stringList = ['my_prefix_what_ever', 'my_prefix_what_so_ever', 'my_prefix_doesnt_matter']
len2 = len( stringList )
if len2 != 0:
# let shortest word is prefix
prefix = min( stringList )
for i in range( len2 ):
word = stringList[ i ]
len1 = len( prefix )
# slicing each word as lenght of prefix
word = word[ 0:len1 ]
for j in range( len1 ):
# comparing each letter of word and prefix
if word[ j ] != prefix[ j ]:
# if letter does not match slice the prefix
prefix = prefix[ :j ]
break # after getting comman prefix move to next word
if len( prefix ) != 0:
print("common prefix: ",prefix)
else:
print("-1")
else:
print("string List is empty")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, finding position of a charater in a string - python

>>> s.index("r") >>> s.find("r") The above only finds the first character. For example: >>>s='hello' >>>s.find('l') only outputs 2. What if we want the position of both 'l' and want the output >>>[2, 3]

Related

How to stop over counting of duplicate letters in a list of strings

Finding regular expression with at least one repetition of each letter

Python loops iteration to get a list of all indices of an element in a list

Reverse a string without using reversed() or [::-1]?

Determine prefix from a set of (similar) strings

Categories

Resources