Find the difference between two strings of uneven length in python

Find the difference between two strings of uneven length in python - python

a = 'abcdfjghij'
b = 'abcdfjghi'
Output : j
def diff(a, b):
string=''
for val in a:
if val not in b:
string=val
return string
a = 'abcdfjghij'
b = 'abcdfjghi'
print(diff(a,b))
This code returns an empty string.
Any solution for this?

collections.Counter from the standard library can be used to model multi-sets, so it keeps track of repeated elements. It's a subclass of dict which is performant and extends its functionality for counting purposes. To find differences between two strings you can mimic a symmetric difference between sets.
from collections import Counter
a = 'abcdfjghij'
b = 'abcdfjghi'
ca = Counter(a)
cb = Counter(b)
diff = (cb-ca)+(ca-cb) # symmetric difference
print(diff)
#Counter({'j': 1})

Its hard to know exactly what you want based on your question. Like should
'abc'
'efg'
return 'abc' or 'efg' or is there always just going to be one character added?
Here is a solution that accounts for multiple characters being different but still might not give your exact output.
def diff(a, b):
string = ''
if(len(a) >= len(b)):
longString = a
shortString = b
else:
longString = b
shortString = a
for i in range(len(longString)):
if(i >= len(shortString) or longString[i] != shortString[i]):
string += longString[i]
return string
a = 'abcdfjghij'
b = 'abcdfjghi'
print(diff(a,b))
if one string just has one character added and i could be anywhere in the string you could change
string += longString[i]
to
string = longString[i]

In your example, there are 2 differences between the 2 strings :
The letter g and j.
I tested your code and it returns g because all the other letters from are in b:
a = 'abcdfjghij'
b = 'abcdfjhi'
def diff(a, b):
string=''
for val in a:
if val not in b:
string=val
return string
print(diff(a,b))

updated
But you have j twice in a. So the first time it sees j it looks at b and sees a j, all good. For the second j it looks again and still sees a j, all good.
Are you wanting to check if each letter is the same as the other letter in the same sequence, then you should try this:
a = 'abcdfjghij'
b = 'abcdfjghi'
def diff(a, b):
if len(a)>len(b):
smallest_len = len(b)
for index, value in enumerate(a[:smallest_len]):
if a[index] != b[index]:
print(f'a value {a[index]} at index {index} does not match b value {b[index]}')
if len(a) == len(b):
pass
else:
print(f'Extra Values in A Are {a[smallest_len:]}')
else:
smallest_len = len(a)
for index, value in enumerate(b[:smallest_len]):
if a[index] != b[index]:
print(f'a value {a[index]} at index {index} does not match b value {b[index]}')
if len(a) == len(b):
pass
else:
print(f'Extra Values in B Are {b[smallest_len:]}')
diff(a, b)

if I understand correctyl your question is:
"given 2 strings of different length, how can I find the characters
that are different between them?"
So judging by your example, this implies you want either the characters that are only present in 1 of the strings and not on the other, or characters that might be repeated and which count is different in between the two strings.
Here's a simple solution (maybe not the most efficient one), but one that's short and does not require any extra packages:
**UPDATED: **
a = 'abcdfjghij'
b = 'abcdfjghi'
dict_a = dict( (char, a.count(char)) for char in a)
dict_b = dict( (char, b.count(char)) for char in b)
idx_longest = [dict_a, dict_b].index(max([dict_a, dict_b], key = len))
results = [ k for (k,v) in [dict_a, dict_b][idx_longest].items() if k not in [dict_a, dict_b][1-idx_longest].keys() or v!=[dict_a, dict_b][1-idx_longest][k] ]
print(results)
> ['j']
or you can try with other pair of strings such as
a = 'abcaa'
b = 'aaa'
print(results)
> ['b', 'c']
as 'a' is in both string an equal number of times.

Related

Print all the "a" of a string on python

I'm trying to print all the "a" or other characters from an input string. I'm trying it with the .find() function but just print one position of the characters. How do I print all the positions where is an specific character like "a"

You can use find with while loop
a = "aabababaa"
k = 0
while True:
k = a.find("a", k) # This is saying to start from where you left
if k == -1:
break
k += 1
print(k)

This is also possible with much less amount of code if you don't care where you want to start.
a = "aabababaa"
for i, c in enumerate(a): # i = Index, c = Character. Using the Enumerate()
if ("a" in c):
print(i)

Some first-get-all-matches-and-then-print versions:
With a list comprehension:
s = "aslkbasaaa"
CHAR = "a"
pos = [i for i, char in enumerate(s) if char == CHAR]
print(*pos, sep='\n')
or with itertools and composition of functions
from itertools import compress
s = "aslkbasaaa"
CHAR = "a"
pos = compress(range(len(s)), map(CHAR.__eq__, s))
print(*pos, sep='\n')

how to extract the max numeric substring from a string?

when given a string. how to extract the biggest numeric substring without regex?
if for example given a string: 24some555rrr444
555 will be the biggest substring
def maximum(s1)
sub=[]
max=0
for x in s1
if x.isnummeric() and x>max
sub.append(x)
max=x
return max
what to do in order this code to work?
Thank you in advance!

Replace all non digits to a space, split the resulting word based on spaces, convert each number to a int and then find the max of them
>>> s = '24some555rrr444'
>>> max(map(int, ''.join(c if c.isdigit() else ' ' for c in s).split()))
555

You could use itertools.groupby to pull out the digits in groups and find the max:
from itertools import groupby
s = "24some555rrr444"
max(int(''.join(g)) for k, g in groupby(s, key=str.isdigit) if k)
# 555

Not using regex is wierd, but ok
s = "24some555rrr444"
n = len(s)
m = 0
for i in range(n):
for len in range(i + 1, n + 1):
try:
v = int(s[i:len])
m = v if v > m else m
except:
pass
print(m)
Or if want really want to compress it to basically one line (except the convert function), you can use
s = "24some555rrr444"
n = len(s)
def convert(s):
try:
return int(s)
except:
return -1
m = max(convert(s[i:l]) for i in range(n) for l in range(i + 1, n + 1))
print(m)

In order to stay in your mindset, i propose this, it is quite close from the initial demand:
#Picked a random number+string stringchain
OriginalStringChain="123245hjkh2313k313j23b"
#Creation of the list which will contain the numbers extracted from the formerly chosen stringchain
resultstable=[]
#The b variable will contain the extracted numbers as a string chain
b=""
for i,j in zip(OriginalStringChain, range(len(OriginalStringChain))):
c= j+1
#the is.digit() fonction is your sql isnumeric equivalent in python
#this bloc of if will extract numbers one by one and concatenate them, if they are several in a row, in a string chain of numbers before adding them in the resultstable
if i.isdigit() == True:
b+=i
if j < len(OriginalStringChain)-1 and OriginalStringChain[c].isdigit() == False:
resutstable.append(int(b))
elif j== len(OriginalStringChain)-1 and OriginalStringChain[j].isdigit() == True:
resultstable.append(int(b))
else:
b=""
print(resultstable)
#At the end, you just obtain a list where every values are the numbers we extracted previously and u just use the max function
print(max(resultstable))
I hope i was clear.
Cheers

Code to output the first repeated character in given string?

I'm trying to find the first repeated character in my string and output that character using python. When checking my code, I can see I'm not index the last character of my code.
What am I doing wrong?
letters = 'acbdc'
for a in range (0,len(letters)-1):
#print(letters[a])
for b in range(0, len(letters)-1):
#print(letters[b])
if (letters[a]==letters[b]) and (a!=b):
print(b)
b=b+1
a=a+1

You can do this in an easier way:
letters = 'acbdc'
found_dict = {}
for i in letters:
if i in found_dict:
print(i)
break
else:
found_dict[i]= 1
Output:
c

Here's a solution with sets, it should be slightly faster than using dicts.
letters = 'acbdc'
seen = set()
for letter in letters:
if letter in seen:
print(letter)
break
else:
seen.add(letter)

Here is a solution that would stop iteration as soon as it finds a dup
>>> from itertools import dropwhile
>>> s=set(); next(dropwhile(lambda c: not (c in s or s.add(c)), letters))
'c'

You should use range(0, len(letters)) instead of range(0, len(letters) - 1) because range already stops counting at one less than the designated stop value. Subtracting 1 from the stop value simply makes you skip the last character of letters in this case.
Please read the documentation of range:
https://docs.python.org/3/library/stdtypes.html#range

There were a few issues with your code...
1.Remove -1 from len(letters)
2.Move back one indent and do b = b + 1 even if you don't go into the if statement
3.Indent and do a = a + 1 in the first for loop.
See below of how to fix your code...
letters = 'acbdc'
for a in range(0, len(letters)):
# print(letters[a])
for b in range(0, len(letters)):
# print(letters[b])
if (letters[a] == letters[b]) and (a != b):
print(b)
b = b + 1
a = a + 1

Nice one-liner generator:
l = 'acbdc'
next(e for e in l if l.count(e)>1)
Or following the rules in the comments to fit the "abba" case:
l = 'acbdc'
next(e for c,e in enumerate(l) if l[:c+1].count(e)>1)

If complexity is not an issue then this will work fine.
letters = 'acbdc'
found = False
for i in range(0, len(letters)-1):
for j in range(i+1, len(letters)):
if (letters[i] == letters[j]):
print (letters[j])
found = True
break
if (found):
break

The below code prints the first repeated character in a string. I used the functionality of the list to solve this problem.
def findChar(inputString):
list = []
for c in inputString:
if c in list:
return c
else:
list.append(c)
return 'None'
print (findChar('gotgogle'))
Working fine as well. It gives the result as 'g'.

def first_repeated_char(str1):
for index,c in enumerate(str1):
if str1[:index+1].count(c) > 1:
return c
return "None"
print(first_repeated_char("abcdabcd"))

str_24 = input("Enter the string:")
for i in range(0,len(str_24)):
first_repeated_count = str_24.count(str_24[i])
if(first_repeated_count > 1):
break
print("First repeated char is:{} and character is
{}".format(first_repeated_count,str_24[i]))

frequency of desired characters in the given string [duplicate]

This question already has answers here:
return the top n most frequently occurring chars and their respective counts in python
(3 answers)
Closed 6 years ago.
Can anyone help me with the Python code to return the most frequently occurring characters and their respective counts in a given string?
For example, "aaaaabbbbccc", 2 should return [('a', 5), ('b', 4)]
In case of a tie, the character which comes later in alphabet comes first
in the ordering. For example:
"cdbba",2 -> [('b',2), ('d',1)]
'cdbba',3 -> [('b',2), ('d',1), ('c',1)]

This could be a start for you:
from collections import Counter
import string
s = "aaaaabbbbccc"
counter = Counter(s)
for c in string.ascii_letters:
if counter[c] > 0:
print (c, counter[c])
You can adjust the code, and instead of printing, save the results in a list, sort it, and print top 2 results

First of all - What have you tried so far?
Please, read this before asking!
To solve that, first create a dictionary of frequencies, after that convert it to an array of tuples, and lastly sort.
def freq(st, n):
# Create the dictionary
dct = {}
for c in st:
dct[c] = dct.get(c, 0) + 1
# Convert to array of tuples
ar = [(key, val) for key, val in dct.iteritems()]
# Sort the array:
ar.sort(key=lambda x: x[1])
return ar[:n]
This will not solve your question completely - you still need to figure out how to break the ties. Here is one way: create another dictionary of first occurrences (just a dict with character as the key, and index as value), and before returning check iterate again while identifying the ties, and resolving them.

def freq(word, n):
l1 = list(word)
l3 = []
l2 = []
l4 = []
s = set(l1)
l2 = list(s)
def count(p):
c = 0
for a in word:
if a in word:
c = c + p.count(a)
l3.append(c)
return c
l2.sort(key=count)
l3.sort()
l4 = list(zip(l2, l3))
l4.reverse()
l4.sort(key=lambda l4: ((l4[1]), (l4[0])), reverse=True)
return l4[0:n]
pass

def freq(word, n):
res=[]
a = []
b = []
c=0
if type(word) != str or type(n) != int:
c = 1
raise TypeError
if n <= 0:
c = 1
raise ValueError
for x in word:
a.append(x)
b.append(word.count(x))
c = zip(a, b)
c = list(set(c))
c.sort(key=lambda t: t[0], reverse=True)
c.sort(key=lambda t: t[1], reverse=True)
count=0
for x in range(len(c)):
count+=1
if n>count:
n=count
for x in range(n):
res.append(c[x],)
return res

Counter built-in class provides a function called most_common.
Return a list of the n most common elements and their counts from the
most common to the least. If n is omitted or None, most_common()
returns all elements in the counter. Elements with equal counts are
ordered arbitrarily:
After getting characters and their frequency, we need to sort them first for his frequency and if there is a tie, we select the character with the highest ascii code. Then, we just pick up the N most common elements slicing by any number.
from collections import Counter
s = "cdbba"
sorted_counter = sorted(Counter(s).most_common(),
key = lambda x: (-x[1], -ord(x[0])))
slice_counter = sorted_counter[:2] # here you select n most common
>[('b',2), ('d',1)]
slice_counter = sorted_counter[:3] # here you select n most common
>[('b',2), ('d',1), ('c',1)]

Longest Common Subsequence of Three Sequences

I've implemented my version of Longest Common Subsequence of Three Sequences and cannot find a mistake. But Coursera grader says that I fail last test case.
It means that the algorithm is almost corrects but outputs a wrong answer in a specific case. But what case is it?
Constraints: length of the list is not less than one and not bigger than 100. Numbers in the lists are integers from -10^9 to 10^9.
import sys
def lcs3(a, b, c):
start_b = 0
start_c = 0
result = []
for i in range(len(a)):
for j in range(start_b, len(b)):
if a[i] == b[j]:
for k in range(start_c, len(c)):
if b[j] == c[k]:
start_b = j+1
start_c = k+1
result.append(a[i])
break
if b[j] == c[k]:
break
return len(result)
def lcs3_reversed(a,b, c):
# check reversed sequence which can be with different order
a_rev = a[::-1]
b_rev = b[::-1]
c_rev = c[::-1]
result = lcs3(a, b, c)
result_reversed = lcs3(a_rev, b_rev, c_rev)
if result == result_reversed:
return result
else:
return result_reversed
if __name__ == '__main__':
input = sys.stdin.read()
data = list(map(int, input.split()))
an = data[0]
data = data[1:]
a = data[:an]
data = data[an:]
bn = data[0]
data = data[1:]
b = data[:bn]
data = data[bn:]
cn = data[0]
data = data[1:]
c = data[:cn]
print(lcs3_reversed(a, b, c))
Update: added the function lcs3_reversed to solve the cases described by you. Anyway cannot pass the test case.
Output should contain the length of common subsequence. For example, for input:
3
1 2 3
3
2 1 3
3
1 3 5
output is 2 because the common part is (1, 3) for these 3 lists.
Runtime for failed case is 0.04 seconds and it looks like that the lists are rather long since most of my own tests worked much faster.
Thanks for your help!
Update2: I've tried another version. First we find the Longest Common Subsequence of 2 lists and then use it again on our result and the 3-rd list.
def lcs2(a, b):
start_b = 0
result = []
for i in range(len(a)):
for j in range(start_b, len(b)):
if a[i] == b[j]:
start_b = j+1
result.append(a[i])
break
return result
def lcs2_reversed(a, b):
# check reversed sequence which can be with different order
a_rev = a[::-1]
b_rev = b[::-1]
result_reversed = lcs2(a_rev, b_rev)[::-1]
return result_reversed
def lcs3_reversed(a, b, c):
lcs2_str = lcs2(a, b)
lcs2_rev = lcs2_reversed(a, b)
lcs3_str_str = lcs2(lcs2_str, c)
lcs3_rev_rev = lcs2_reversed(lcs2_rev, c)
lenghts = [len(lcs3_str_str), len(lcs3_rev_rev)]
return max(lenghts)
if __name__ == '__main__':
an = input()
a = input().split()
bn = input()
b = input().split()
cn = input()
c = input().split()
print(max(lcs3_reversed(a, b, c), lcs3_reversed(a, c, b), lcs3_reversed(b, a, c),
lcs3_reversed(b, c, a), lcs3_reversed(c, a, b), lcs3_reversed(c, b, a)))
Moreover, I tried all the combinations of orders but it did not help... Again I cannot pass this last test case.

Your example breaks with something like:
a = [1,2,7,3,7]
b = [2,1,2,3,7]
c = [1,2,3,1,7]
The sequence should be [1,2,3,7] (if I understand the exercise correctly), but the problem is that the last element of a gets matched to the last elements of b and c, which means that start_b and start_c are set to the last elements and therefore the loops are over.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find the difference between two strings of uneven length in python - python

a = 'abcdfjghij' b = 'abcdfjghi' Output : j def diff(a, b): string='' for val in a: if val not in b: string=val return string a = 'abcdfjghij' b = 'abcdfjghi' print(diff(a,b)) This code returns an empty string. Any solution for this?

In your example, there are 2 differences between the 2 strings : The letter g and j. I tested your code and it returns g because all the other letters from are in b: a = 'abcdfjghij' b = 'abcdfjhi' def diff(a, b): string='' for val in a: if val not in b: string=val return string print(diff(a,b))

Related

Print all the "a" of a string on python

how to extract the max numeric substring from a string?

Code to output the first repeated character in given string?

frequency of desired characters in the given string [duplicate]

Longest Common Subsequence of Three Sequences

Categories

Resources