Reverse a string based on custom delimeter - python

I have a string;
txt = "Hello$JOHN$*How*Are*$You"
I want output like:
Output: "You*$Are*How$*JOHN$Hello"
If you see closely, the character delimiters ($ and *) are NOT reversed in their sequence of occurrence. The string is reversed word-wise, but the delimiters are kept sequential.
I have tried the following:
sep=['$','*']
txt_1 = ""
for ch in txt:
if ch in sep:
txt_1 = txt_1+ch
I can't come up with the logic to capture the sequence of the delimiters and reverse the words of the string.

One approach using regex:
import re
s = "Hello$JOHN$*How*Are*$You"
splits = re.split('([$*]+)', s)
res = ''.join(reversed(splits))
print(res)
Output
You*$Are*How$*JOHN$Hello
A (perhaps less elegant) solution (but easier to understand) is to use itertools.groupby:
from itertools import groupby
s = "Hello$JOHN$*How*Are*$You"
splits = [''.join(g) for k, g in groupby(s, key=lambda x: x in ('$', '*'))]
res = ''.join(reversed(splits))
print(res)
The idea here is to create contiguous sequence of delimiter, non-delimiter characters.

Related

How can I check a string for two letters or more?

I am pulling data from a table that changes often using Python - and the method I am using is not ideal. What I would like to have is a method to pull all strings that contain only one letter and leave out anything that is 2 or more.
An example of data I might get:
115
19A6
HYS8
568
In this example, I would like to pull 115, 19A6, and 568.
Currently I am using the isdigit() method to determine if it is a digit and this filters out all numbers with one letter, which works for some purposes, but is less than ideal.
Try this:
string_list = ["115", "19A6", "HYS8", "568"]
output_list = []
for item in string_list: # goes through the string list
letter_counter = 0
for letter in item: # goes through the letters of one string
if not letter.isdigit(): # checks if the letter is a digt
letter_counter += 1
if letter_counter < 2: # if the string has more then 1 letter it wont be in output list
output_list.append(item)
print(output_list)
Output:
['115', '19A6', '568']
Here is a one-liner with a regular expression:
import re
data = ["115", "19A6", "HYS8", "568"]
out = [string for string in data if len(re.sub("\d", "", string))<2]
print(out)
Output:
['115', '19A6', '568']
This is an excellent case for regular expressions (regex), which is available as the built-in re library.
The code below follows the logic:
Define the dataset. Two examples have been added to show that a string containing two alpha-characters is rejected.
Compile a character pattern to be matched. In this case, zero or more digits, followed by zero or one upper case letter, ending with zero of more digits.
Use the filter function to detect matches in the data list and output as a list.
For example:
import re
data = ['115', '19A6', 'HYS8', '568', 'H', 'HI']
rexp = re.compile('^\d*[A-Z]{0,1}\d*$')
result = list(filter(rexp.match, data))
print(result)
Output:
['115', '19A6', '568', 'H']
Another solution, without re using str.maketrans/str.translate:
lst = ["115", "19A6", "HYS8", "568"]
d = str.maketrans(dict.fromkeys(map(str, range(10)), ""))
out = [i for i in lst if len(i.translate(d)) < 2]
print(out)
Prints:
['115', '19A6', '568']
z=False
a = str(a)
for I in range(len(a)):
if a[I].isdigit():
z = True
break
else:
z="no digit"
print(z)```

How to replace a character within a string in a list?

I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]

Split by comma and how to exclude comma from quotes in split in python

I am struggling to split this string on the basis of comma but comma inside the double quotes should be ignored.
cStr = 'aaaa,bbbb,"ccc,ddd"'
expected result : ['aaaa','bbbb',"ccc,ddd" ]
please help me, I tried different methods as mentioned in below soln but couldn't resolve this issue [I am not allowed to use csv, pyparsing module]
there is already similar question asked before for the below input.
cStr = '"aaaa","bbbb","ccc,ddd"'
solution
result = ['"aaa"','"bbb"','"ccc,ddd"']
The usual way I handle this is to use a regex alternation which eagerly matches double quoted terms first, before non quoted CSV terms:
import re
cStr = 'aaaa,bbbb,"ccc,ddd"'
matches = re.findall(r'(".*?"|[^,]+)', cStr)
print(matches) # ['aaaa', 'bbbb', '"ccc,ddd"']
You could use list comprehension, no other libraries needed:
cStr = 'aaaa,bbbb,"ccc,ddd"'
# split by ," afterwards by , if item does not end with double quotes
l = [
item.split(',') if not item.endswith('"') else [item[:-1]]
for item in cStr.split(',"')
]
print(sum(l, []))
Out:
['aaaa', 'bbbb', 'ccc,ddd']
This can be achieved in three steps-
cstr = 'aaaa,bbbb,"ccc,ddd","eee,fff,ggg"'
Step 1-
X = cstr.split(',"')
Step 2-
regular_list = [i if '"' in i else i.split(",") for i in X ]
Step 3-
final_list = []
for i in regular_list:
if type(i) == list:
for j in i:
final_list.append(j)
else:
final_list.append('"'+i)
Final output -
['aaaa', 'bbbb', '"ccc,ddd"', '"eee,fff,ggg"']

Possible occurrences of splitting a string by delimiter

I have a string : str = "**Quote_Policy_Generalparty_NameInfo** "
I am splitting the string as str.split("_") which gives me a list in python.
Any help in getting the output as below is appreciated.
[ Quote, Quote_Policy, Quote_Policy_Generalparty, Quote_Policy_Generalparty_NameInfo ]
You can use range(len(list)) to create slices list[:1], list[:2], etc. and then "_".join(...) to concatenate every slice
text = "Quote_Policy_Generalparty_NameInfo "
data = text.split('_')
result = []
for x in range(len(data)):
part = data[:x+1]
part = "_".join(part)
result.append(part)
print(result)
input = "Quote_Policy_Generalparty_NameInfo"
tokenized = input.split("_")
combined = [
"_".join(tokenized[:i])
for i, token in enumerate(tokenized, 1)
]
The value of combined above will be
['Quote', 'Quote_Policy', 'Quote_Policy_Generalparty', 'Quote_Policy_Generalparty_NameInfo']
you could use accumulate from itertools, we basically give it one more argument, which decides how to accumulate two elements
from itertools import accumulate
input = "Quote_Policy_Generalparty_NameInfo"
output = [*accumulate(input.split('_'), lambda str1, str2 : '_'.join([str1,str2])),]
which gives :
Out[11]:
['Quote',
'Quote_Policy',
'Quote_Policy_Generalparty',
'Quote_Policy_Generalparty_NameInfo']
If you find the above answers too clean and satisfactory, you can also consider regular expressions:
>>> import regex as re # For `overlapped` support
>>> x = "Quote_Policy_Generalparty_NameInfo"
>>> list(map(lambda s: s[::-1], re.findall('(?<=_).*$', '_' + x[::-1], overlapped=True)))
['Quote_Policy_Generalparty_NameInfo', 'Quote_Policy_Generalparty', 'Quote_Policy', 'Quote']

split and flatten tuple of tuples

What is the best way to split and flatten the tuple of tuples below?
I have this tuple of tuples:
(('aaaa_BBB_wacker* cccc',), ('aaaa_BBB_tttt*',), ('aaaa_BBB2_wacker,aaaa_BBB',), ('BBB_ffff',), ('aaaa_BBB2MM*\r\naaaa_BBB_cccc2MM*',), ('BBBMM\\r\\nBBB2MM BBB',), ('aaaa_BBB_cccc2MM_tttt',), ('aaaa_BBB_tttt, aaaa_BBB',))
I need to:
split by \n\, \r\, \n\, \r\, ",", " "
and flatten it. So the end result should look like this:
['aaaa_BBB_wacker*','cccc', 'aaaa_BBB_tttt*','aaaa_BBB2_wacker','aaaa_BBB','BBB_ffff','aaaa_BBB2MM*','naaaa_BBB_cccc2MM*','BBBMM','BBB2MM BBB','aaaa_BBB_cccc2MM_tttt','aaaa_BBB_tttt', 'aaaa_BBB']
I tried the following and it eventually completes the job but I have to repeat it multiple times for each pattern.
patterns = [[i.split('\\r') for i in patterns]]
patterns = [item for sublist in patterns for item in sublist]
patterns = [item for sublist in patterns for item in sublist]
patterns = [[i.split('\\n') for i in patterns]]
You should use a regexp to split the strings:
import re
re.split(r'[\n\r, ]+', s)
It will be easier using a loop:
patterns = []
for item in l:
patterns += re.split(r'[\n\r, ]+', s)
Given
tups = (('aaaa_BBB_wacker* cccc',), ('aaaa_BBB_tttt*',),
('aaaa_BBB2_wacker,aaaa_BBB',), ('BBB_ffff',),
('aaaa_BBB2MM*\r\naaaa_BBB_cccc2MM*',), ('BBBMM\\r\\nBBB2MM BBB',),
('aaaa_BBB_cccc2MM_tttt',), ('aaaa_BBB_tttt, aaaa_BBB',))
Do
import re
delimiters = ('\r', '\n', ',', ' ', '\\r', '\\n')
pat = '(?:{})+'.format('|'.join(map(re.escape, delimiters)))
result = [s for tup in tups for item in tup for s in re.split(pat, item)]
Notes. Calling re.escape on your delimiters makes sure that they are properly escaped for your regular expression. | makes them alternatives. ?: makes your delimiter group non-capturing so it isn't returned by re.split. + means match the previous group one or more times.
Here is a one-liner.. but it's not simple. You can add as many items as you want in the replace portion, just keep adding them.
start = (('aaaa_BBB_wacker* cccc',), ('aaaa_BBB_tttt*',), ('aaaa_BBB2_wacker,aaaa_BBB',), ('BBB_ffff',), ('aaaa_BBB2MM*\r\naaaa_BBB_cccc2MM*',), ('BBBMM\\r\\nBBB2MM BBB',), ('aaaa_BBB_cccc2MM_tttt',), ('aaaa_BBB_tttt, aaaa_BBB',))
output = [final_item for sublist in start for item in sublist for final_item in item.replace('\\r',' ').replace('\\n',' ').split()]

Categories