I have a string : str = "**Quote_Policy_Generalparty_NameInfo** "
I am splitting the string as str.split("_") which gives me a list in python.
Any help in getting the output as below is appreciated.
[ Quote, Quote_Policy, Quote_Policy_Generalparty, Quote_Policy_Generalparty_NameInfo ]
You can use range(len(list)) to create slices list[:1], list[:2], etc. and then "_".join(...) to concatenate every slice
text = "Quote_Policy_Generalparty_NameInfo "
data = text.split('_')
result = []
for x in range(len(data)):
part = data[:x+1]
part = "_".join(part)
result.append(part)
print(result)
input = "Quote_Policy_Generalparty_NameInfo"
tokenized = input.split("_")
combined = [
"_".join(tokenized[:i])
for i, token in enumerate(tokenized, 1)
]
The value of combined above will be
['Quote', 'Quote_Policy', 'Quote_Policy_Generalparty', 'Quote_Policy_Generalparty_NameInfo']
you could use accumulate from itertools, we basically give it one more argument, which decides how to accumulate two elements
from itertools import accumulate
input = "Quote_Policy_Generalparty_NameInfo"
output = [*accumulate(input.split('_'), lambda str1, str2 : '_'.join([str1,str2])),]
which gives :
Out[11]:
['Quote',
'Quote_Policy',
'Quote_Policy_Generalparty',
'Quote_Policy_Generalparty_NameInfo']
If you find the above answers too clean and satisfactory, you can also consider regular expressions:
>>> import regex as re # For `overlapped` support
>>> x = "Quote_Policy_Generalparty_NameInfo"
>>> list(map(lambda s: s[::-1], re.findall('(?<=_).*$', '_' + x[::-1], overlapped=True)))
['Quote_Policy_Generalparty_NameInfo', 'Quote_Policy_Generalparty', 'Quote_Policy', 'Quote']
Related
I am pulling data from a table that changes often using Python - and the method I am using is not ideal. What I would like to have is a method to pull all strings that contain only one letter and leave out anything that is 2 or more.
An example of data I might get:
115
19A6
HYS8
568
In this example, I would like to pull 115, 19A6, and 568.
Currently I am using the isdigit() method to determine if it is a digit and this filters out all numbers with one letter, which works for some purposes, but is less than ideal.
Try this:
string_list = ["115", "19A6", "HYS8", "568"]
output_list = []
for item in string_list: # goes through the string list
letter_counter = 0
for letter in item: # goes through the letters of one string
if not letter.isdigit(): # checks if the letter is a digt
letter_counter += 1
if letter_counter < 2: # if the string has more then 1 letter it wont be in output list
output_list.append(item)
print(output_list)
Output:
['115', '19A6', '568']
Here is a one-liner with a regular expression:
import re
data = ["115", "19A6", "HYS8", "568"]
out = [string for string in data if len(re.sub("\d", "", string))<2]
print(out)
Output:
['115', '19A6', '568']
This is an excellent case for regular expressions (regex), which is available as the built-in re library.
The code below follows the logic:
Define the dataset. Two examples have been added to show that a string containing two alpha-characters is rejected.
Compile a character pattern to be matched. In this case, zero or more digits, followed by zero or one upper case letter, ending with zero of more digits.
Use the filter function to detect matches in the data list and output as a list.
For example:
import re
data = ['115', '19A6', 'HYS8', '568', 'H', 'HI']
rexp = re.compile('^\d*[A-Z]{0,1}\d*$')
result = list(filter(rexp.match, data))
print(result)
Output:
['115', '19A6', '568', 'H']
Another solution, without re using str.maketrans/str.translate:
lst = ["115", "19A6", "HYS8", "568"]
d = str.maketrans(dict.fromkeys(map(str, range(10)), ""))
out = [i for i in lst if len(i.translate(d)) < 2]
print(out)
Prints:
['115', '19A6', '568']
z=False
a = str(a)
for I in range(len(a)):
if a[I].isdigit():
z = True
break
else:
z="no digit"
print(z)```
I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]
I am struggling to split this string on the basis of comma but comma inside the double quotes should be ignored.
cStr = 'aaaa,bbbb,"ccc,ddd"'
expected result : ['aaaa','bbbb',"ccc,ddd" ]
please help me, I tried different methods as mentioned in below soln but couldn't resolve this issue [I am not allowed to use csv, pyparsing module]
there is already similar question asked before for the below input.
cStr = '"aaaa","bbbb","ccc,ddd"'
solution
result = ['"aaa"','"bbb"','"ccc,ddd"']
The usual way I handle this is to use a regex alternation which eagerly matches double quoted terms first, before non quoted CSV terms:
import re
cStr = 'aaaa,bbbb,"ccc,ddd"'
matches = re.findall(r'(".*?"|[^,]+)', cStr)
print(matches) # ['aaaa', 'bbbb', '"ccc,ddd"']
You could use list comprehension, no other libraries needed:
cStr = 'aaaa,bbbb,"ccc,ddd"'
# split by ," afterwards by , if item does not end with double quotes
l = [
item.split(',') if not item.endswith('"') else [item[:-1]]
for item in cStr.split(',"')
]
print(sum(l, []))
Out:
['aaaa', 'bbbb', 'ccc,ddd']
This can be achieved in three steps-
cstr = 'aaaa,bbbb,"ccc,ddd","eee,fff,ggg"'
Step 1-
X = cstr.split(',"')
Step 2-
regular_list = [i if '"' in i else i.split(",") for i in X ]
Step 3-
final_list = []
for i in regular_list:
if type(i) == list:
for j in i:
final_list.append(j)
else:
final_list.append('"'+i)
Final output -
['aaaa', 'bbbb', '"ccc,ddd"', '"eee,fff,ggg"']
I have a string;
txt = "Hello$JOHN$*How*Are*$You"
I want output like:
Output: "You*$Are*How$*JOHN$Hello"
If you see closely, the character delimiters ($ and *) are NOT reversed in their sequence of occurrence. The string is reversed word-wise, but the delimiters are kept sequential.
I have tried the following:
sep=['$','*']
txt_1 = ""
for ch in txt:
if ch in sep:
txt_1 = txt_1+ch
I can't come up with the logic to capture the sequence of the delimiters and reverse the words of the string.
One approach using regex:
import re
s = "Hello$JOHN$*How*Are*$You"
splits = re.split('([$*]+)', s)
res = ''.join(reversed(splits))
print(res)
Output
You*$Are*How$*JOHN$Hello
A (perhaps less elegant) solution (but easier to understand) is to use itertools.groupby:
from itertools import groupby
s = "Hello$JOHN$*How*Are*$You"
splits = [''.join(g) for k, g in groupby(s, key=lambda x: x in ('$', '*'))]
res = ''.join(reversed(splits))
print(res)
The idea here is to create contiguous sequence of delimiter, non-delimiter characters.
I have an array I want to iterate through. The array consists of strings consisting of numbers and signs.
like this: €110.5M
I want to loop over it and remove all Euro sign and also the M and return that array with the strings as ints.
How would I do this knowing that the array is a column in a table?
You could just strip the characters,
>>> x = '€110.5M'
>>> x.strip('€M')
'110.5'
def sanitize_string(ss):
ss = ss.replace('$', '').replace('€', '').lower()
if 'm' in ss:
res = float(ss.replace('m', '')) * 1000000
elif 'k' in ss:
res = float(ss.replace('k', '')) * 1000
return int(res)
This can be applied to a list as follows:
>>> ls = [sanitize_string(x) for x in ["€3.5M", "€15.7M" , "€167M"]]
>>> ls
[3500000, 15700000, 167000000]
If you want to apply it to the column of a table instead:
dataFrame = dataFrame.price.apply(sanitize_string) # Assuming you're using DataFrames and the column is called 'price'
You can use a string comprehension:
numbers = [float(p.replace('€','').replace('M','')) for p in a]
which gives:
[110.5, 210.5, 310.5]
You can use a list comprehension to construct one list from another:
foo = ["€13.5M", "€15M" , "€167M"]
foo_cleaned = [value.translate(None, "€M")]
str.translate replaces all occurrences of characters in the latter string with the first argument None.
Try this
arr = ["€110.5M","€110.5M","€110.5M","€110.5M","€110.5M","€110.5M","€110.5M"]
f = [x.replace("€","").replace("M","") for x in arr]
You can call .replace() on a string as often as you like. An initial solution could be something like this:
my_array = ['€110.5M', '€111.5M', '€112.5M']
my_cleaned_array = []
for elem in my_array:
my_cleaned_array.append(elem.replace('€', '').replace('M', ''))
At this point, you still have strings in your array. If you want to return them as ints, you can write int(elem.replace('€', '').replace('M', '')) instead. But be aware that you will then lose everything after the floating point, i.e. you will end up with [110, 111, 112].
You can use Regex to do that.
import re
str = "€110.5M"
x = re.findall("\-?\d+\.\d+", str )
print(x)
I didn't quite understand the second part of the question.