Seperate duplicate in string into list

Seperate duplicate in string into list - python

I have a following string
"TAUXXTAUXXTAUXX"
i want to make a list contains the following
lst = ["TAUXX", "TAUXX", "TAUXX"]
How i make it and is there is a string library in python to do it ?
Thanks in advance.
P.S : I want it in python

Find the string in its double:
s = 'TAUXXTAUXXTAUXX'
i = (s * 2).find(s, 1)
lst = len(s) // i * [s[:i]]
print(lst)
Output (Try it online!):
['TAUXX', 'TAUXX', 'TAUXX']

There are many ways to deal with,
I recommend to use the built-in package: re
import re
test_str = "TAUXXTAUXXTAUXX"
def splitstring(string):
match= re.match(r'(.*?)(?:\1)*$', string)
word= match.group(1)
return [word] * (len(string)//len(word))
splitstring(test_str)
output:
['TAUXX', 'TAUXX', 'TAUXX']

Related

How to replace characters of string from a list entry in Python?

I have a string in which I want to replace certain characters with "*". But replace() function of python doesn't replace the characters. I understand that the strings in python are immutable and I am creating a new variable to store the replaced string. But still the function doesn't provide the replaced strings.
This is the following code that I have written. I have tried the process in two ways but still don't get the desired output:
1st way:
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
rep = ['A','C','P']
for char in rep:
new = a.replace(char, "*")
print(new)
Output:
AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIA*
2nd way:
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
rep = ['A','C','P']
for i in a:
if(i in rep):
new = a.replace(i, "*")
print(new)
Output:
AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIA*
Any help would be much appreciated. Thanks

You assign the result of a.replace(char, "*") to new, but then on the next iteration of the for loop, you again replace parts of a, not new. Instead of assigning to new, just assign the result to a, replacing the original string.
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
rep = ['A','C','P']
for char in rep:
a = a.replace(char, "*")
print(a)

In addition to the answers offered, I would suggest that regular expressions make this perhaps more straightforward, accomplishing all of the substitutions with a single function call.
>>> import re
>>> a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
>>> rep = ['A','C','P']
>>> r = re.compile('|'.join(rep))
>>> r.sub('*', a)
'*GG*FTFG*DF*DTRF**GF*D*RTR*DF**DGFLKLI**'
Just in case someone decides to be clever and puts something regex significant in rep, you could escape those when compiling your regex.
r = re.compile('|'.join(re.escape(x) for x in rep))

Others have explained errors in posted code. An alternative using generator expression:
new = ''.join("*" if char in ['A','C','P'] else char for char in a)
print(new)
>>> '*GG*FTFG*DF*DTRF**GF*D*RTR*DF**DGFLKLI**'

A simple loop is easy to understand and efficient. The crucial part of the looping approach is to re-assign the string reference to the output of replace()
I've taken the liberty of plagiarising two pieces of code from other contributors in order to demonstrate the performance differences (in case that's important).
import re
from timeit import timeit
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
rep = 'A', 'C', 'P'
p = re.compile('|'.join(rep))
def v1(s):
for c in rep:
s = s.replace(c, '*')
return s
def v2(s):
return p.sub('*', s)
def v3(s):
return ''.join("*" if char in rep else char for char in s)
for func in v1, v2, v3:
print(func.__name__, timeit(lambda: func(a)))
assert v1(a) == v2(a)
assert v1(a) == v3(a)
Output:
v1 0.3363962830003402
v2 1.8725565750000897
v3 3.3800653280000006
Platform:
macOS 13.0.1
Python 3.11.0
3 GHz 10-Core Intel Xeon W

As already mentioned, you should write a = a.replace(i, "*") because you are looping through rep and you want to do the replacement in the string a. Strings are immutable, and replace gives back a copy of the string.
The variable new only gives you the replacement over the last iteration of rep which is a P char and will result in AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIA* because there is only a single P at the end of the string and you are never actually changing the value of rep.
If you have single characters, you can use a character class [ACP] with a single call to re.sub
import re
a = "AGGCFTFGADFADTRFCAGFADARTRADFACDGFLKLIAP"
print(re.sub("[ACP]", "*", a))
Output
*GG*FTFG*DF*DTRF**GF*D*RTR*DF**DGFLKLI**

Using a function in Python to return a substring

I have a feeling my question is pretty basic, as I am a first semester computer science student.
I have been asked to return the substring formed before a digit in a string similar to "abcd5efgh". The idea is to use a function to give me "abcd". I think I need to use .isdigit, but I'm not sure how to turn it into a function. Thank you in advance!

It could be done with regexp, but if you already discovered isdigit, why not use it in this case?
You can modify the last return s line to return something else if no digit is found:
def string_before_digit(s):
for i, c in enumerate(s):
if c.isdigit():
return s[:i]
return s # no digit found
print(string_before_digit("abcd5efgh"))

I am also currently a student and this is how i would approch this problem:
*For my school we are not allowed to use built in function like that in python :/
def parse(string):
newstring = ""
for i in string:
if i >= "0" and i <= "9":
break
else:
newstring += i
print newstring #Can use return if your needing it in another function
parse("abcd5efgh")
Hope this helps

A functional approach :)
>>> from itertools import compress, count, imap
>>> text = "abcd5efgh"
>>> text[:next(compress(count(), imap(str.isdigit, text)), len(text))]
'abcd'

The code is below will give you the first non digit part by using regular expression.
import re
myPattern=re.compile('[a-zA-Z]*')
firstNonDigitPart=myPattern.match('abcd5efgh')
firstNonDigitPart.group()
>>> 'abcd'

If you are not allowed to use regexes, maybe because they told you to do it explicitly by hand, you can do it like this:
def digit_index(s):
"""Helper function."""
# next(..., -1) asks the given iterator for the next value and returns -1 if there is none.
# This iterator gives the index n of the first "true-giving" element of the asked generator expression. True-giving is any character which is a digit.
return next(
(n for n, i in enumerate(i.isdigit() for i in "abc123") if i),
-1)
def before_digit(s):
di = digit_index(s)
if di == -1: return s
return s[:di]
should give you your wanted result.

A quite simple one-liner, using isdigit :)
>>> s = 'abcd5efgh'
>>> s[:[i for i, j in enumerate([_ for _ in s]) if j.isdigit()][0]]
'abcd'

An itertools approach:
>>> from itertools import takewhile
>>> s="abcd5efgh"
>>> ''.join(takewhile(lambda x: not x.isdigit(), s))
'abcd'

python string manipulation

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?

Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK

You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.

>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'

Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'

Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)

You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?

You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E

this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()

Find inside a string in Python

There is a string, it contains numbers and characters.
I need to find an entire number(s) (in that string) that contains number 467033.
e.g. 1.467033777777777
Thanks

Try this:
import re
RE_NUM = re.compile('(\d*\.\d+)', re.M)
text = 'eghwodugo83o135.13508yegn1.4670337777777773u87208t'
for num in RE_NUM.findall(text):
if '467033' in num:
print num
Prints:
1.4670337777777773
Generalized / optimized in response to comment:
def find(text, numbers):
pattern = '|'.join('[\d.]*%s[\d.]*' % n for n in numbers)
re_num = re.compile(pattern, re.M)
return [m.group() for m in re_num.finditer(text)]
print find(text, ['467033', '13'])
Prints:
['135.13508', '1.4670337777777773']

If you're just searching for a substring within another substring, you can use in:
>>> sub_num = "467033"
>>> my_num = "1.467033777777777"
>>> sub_num in my_num
True
However, I suspect there's more to your problem than just searching strings, and that doing it this way might not be optimal. Can you be more specific about what you're trying to do?

import re
a = 'e.g. 1.467033777777777\nand also 576575567467033546.90 Thanks '
r = re.compile('[0-9.]*467033[0-9.]*')
r.findall(a)
['1.467033777777777', '576575567467033546.90']

Python: find most frequent bytes?

I'm looking for a (preferably simple) way to find and order the most common bytes in a python stream element.
e.g.
>>> freq_bytes(b'hello world')
b'lohe wrd'
or even
>>> freq_bytes(b'hello world')
[108,111,104,101,32,119,114,100]
I currently have a function that returns a list in the form list[97] == occurrences of "a". I need that to be sorted.
I figure I basically need to flip the list so list[a] = b --> list[b] = a at the same time removing the repeates.

Try the Counter class in the collections module.
from collections import Counter
string = "hello world"
print ''.join(char[0] for char in Counter(string).most_common())
Note you need Python 2.7 or later.
Edit: Forgot the most_common() method returned a list of value/count tuples, and used a list comprehension to get just the values.

def frequent_bytes(aStr):
d = {}
for char in aStr:
d[char] = d.setdefault(char, 0) + 1
myList = []
for char, frequency in d.items():
myList.append((frequency, char))
myList.sort(reverse=True)
return ''.join(myList)
>>> frequent_bytes('hello world')
'lowrhed '
I just tried something obvious. #kindall's answer rocks, though. :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seperate duplicate in string into list - python

I have a following string "TAUXXTAUXXTAUXX" i want to make a list contains the following lst = ["TAUXX", "TAUXX", "TAUXX"] How i make it and is there is a string library in python to do it ? Thanks in advance. P.S : I want it in python

Find the string in its double: s = 'TAUXXTAUXXTAUXX' i = (s * 2).find(s, 1) lst = len(s) // i * [s[:i]] print(lst) Output (Try it online!): ['TAUXX', 'TAUXX', 'TAUXX']

Related

How to replace characters of string from a list entry in Python?

Using a function in Python to return a substring

python string manipulation

Find inside a string in Python

Python: find most frequent bytes?

Categories

Resources