I want to delete multiple strings from a phrase in python.
For example I want to delete: apple, orange, tomato
How can I do that easily without writing 10 replaces like this:
str = str.replace('apple','').replace(....).replace(....)
Any time you are repeating yourself, think of a loop instead.
for word in ('apple','cherry','tomato','grape'):
str = str.replace(word,'')
And, by the way, str is a poor name for a variable, since it's the name of a type.
You could also use re.sub and list the words in a group between word boundaries \b
import re
s = re.sub(r"\b(?:apple|orange|tomato)\b", "", s)
Related
Very new to Python/programming, trying to create a "grocery list generator" as a practice project.
I created a bunch of meal variables with their ingredients in a list, then to organise that list in a specific (albeit probably super inefficient) way with vegetables at the top I've added a numerical value at the start of each string. It looks like this -
meal = ["07.ingredient1", "02.ingredient2", "05.ingredient3"]
It organises, prints, and writes how I want it to, but now I want to remove the first three characters (the numbers) from each string in the list before I write it to my text file.
So far my final bit of code looks like this -
Have tried a few different things between the '.sort' and 'with open' like replace, strip, range and some other things but can't get them to work.
My next stop was trying something like this, but can't figure it out -
for item in groceries[1:]
str(groceries(range99)).replace('')
Thanks heaps for your help!
for item in groceries:
shopping_list.write(item[3:] + '\n')
Instead of replacing you can just take a substring.
groceries = [g[3:] for g in groceries]
Depending on your general programming knowledge, this solution is maybe a bit enhanced, but regular expressions would be another alternative.
import re
pattern = re.compile(r"\d+\.\s*(\w+)")
for item in groceries:
ingredient = pattern.findall(item)[0]
\d means any digit (0-9), + means "at least one", \. matches ".", \s is whitespace and * means "0 or more" and \w is any word character (a-z, A-Z, 0-9).
This would also match things like
groceries = ["1. sugar", "0110.salt", "10. tomatoes"]
>>> meal = ["07.ingredient1", "02.ingredient2", "05.ingredient3"]
>>> myarr = [i[3:] for i in meal]
>>> print(myarr)
['ingredient1', 'ingredient2', 'ingredient3']
Suppose I have a expression
exp="\"OLS\".\"ORDER_ITEMS\".\"QUANTITY\" <50 and \"OLS\".\"PRODUCTS\".\"PRODUCT_NAME\" = 'Kingston' or \"OLS\".\"ORDER_ITEMS\".\"QUANTITY\" <20"
I want to split the expression by and , or so that my result will be
exp=['\"OLS\".\"ORDER_ITEMS\".\"QUANTITY\" <50','\"OLS\".\"PRODUCTS\".\"PRODUCT_NAME\" = 'Kingston'','\"OLS\".\"ORDER_ITEMS\".\"QUANTITY\" <20']
This is what i have tried:
import re
res=re.split('and|or|',exp)
but it will split by each character how can we make it split by word?
import itertools
exp=itertools.chain(*[y.split('or') for y in exp.split('and')])
exp=[x.strip() for x in list(exp)]
Explanation: 1st split on 'and'. Now try spitting each element obtained on 'or'. This will create list of lists. Using itertools, create a flat list & strip extra spaces from each new element in the flat list
Your regex has three alternatives: "and", "or" or the empty string: and|or|
Omit the trailing | to split just by those two words.
import re
res=re.split('and|or', exp)
Note that this will not work reliably; it'll split on any instance of "and", even when it's in quotes or part of a word. You could make it split only on full words using \b, but that will still split on a product name like 'Black and Decker'. If you need it to be reliable and general, you'll have to parse the string using the full syntax (probably using an off-the-shelf parser, if it's standard SQL or similar).
You can do it in 2 steps: [ss for s in exp.split(" and ") for ss in s.split(' or ')]
I want to cut a String such as "0011165.jpg_Fish" to get only Fish, so everything after the "_", how do i do that in python?
Thank you very much!
Please use str.partition instead of str.split. This is robust, since you can always expect 3 items, unlike, split which maybe tricky to handle if the input string doesn't have the split character,
>>> word = '0011165.jpg_Fish'
>>> not_required, split_char, required = word.partition('_')
>>> required
'Fish'
Try
"0011165.jpg_Fish".split("_")[1]
And in case of a Dataframe
train['Label'] = train.Image_Labels.str.split("_").str[1]
I have to count the occurrence of a string(which can be 1 or more words) in another string (which is a sentence) and should not be case-sensitive.
For instance -
a = "Hi my name is Alex and hi to you as well. How high is the building? The highest floor is 18th. Highlights .... She said hi as well. Do you know highlights of the match ... hi."
b = "hi" #word/sentence to find count of
I tried -
a.lower().count(b)
which returns
>> 8
while the required answer should be
>> 4.
For multi-word, this method seems to work but I am not sure of the limiting cases. How can I fix this?
You can use re.findall to search for the substring with leading and trailing word boundaries:
import re
print(len(re.findall(r'\b{}\b'.format(b), a, re.I))) # -> 4
# ^ ^
# |___|_ word boundaries ^
# |_ case insensitive match
The function works just fine: the sequence "hi" appears 8 times in the string. Since you want it only as words, you'll need to figure out how you can differentiate the word "hi" from the incidental appearance in other words, such as "chipper".
One common way is to use the re package (regular expressions), but that may be more learning then you want to do right now.
A better way at the moment would be to split the string into words before you check each:
word_list = a.lower().split()
b_count = word_list.count(b)
Note that this considers only spaces when dividing words. It still won't find "hi" in "hi-performance", for example. You'd need another split operation for other separators.
"Spliting" a sentence into words is not trivial.
There in a package in python to do that : nltk.
First install this package using pip or system specific package manager.
Then run ipython and use nltk.download() to download "punkt" data : type d then type punkt. Then quit q.
Then use
tokens = nltk.word_tokenize(a)
len(list(filter(lambda x: x.lower() == b, tokens))
it returns 4.
Use str.split() and filter out punctuation with regex:
import re
a = "Hi my name is Alex and hi to you as well. How high is the building? The highest floor is 18th. Highlights .... She said hi as well. Do you know highlights of the match ... hi."
b = "hi"
final_count = sum(re.sub("\W+", '', i.lower()) == b for i in a.split())
Output:
4
There can be an input "some word".
I want to replace this input with "<strong>some</strong> <strong>word</strong>" in some other text which contains this input
I am trying with this code:
input = "some word".split()
pattern = re.compile('(%s)' % input, re.IGNORECASE)
result = pattern.sub(r'<strong>\1</strong>',text)
but it is failing and i know why: i am wondering how to pass all elements of list input to compile() so that (%s) can catch each of them.
appreciate any help
The right approach, since you're already splitting the list, is to surround each item of the list directly (never using a regex at all):
sterm = "some word".split()
result = " ".join("<strong>%s</strong>" % w for w in sterm)
In case you're wondering, the pattern you were looking for was:
pattern = re.compile('(%s)' % '|'.join(sterm), re.IGNORECASE)
This works on your string because the regular expression would become
(some|word)
which means "matches some or matches word".
However, this is not a good approach as it does not work for all strings. For example, consider cases where one word contains another, such as
a banana and an apple
which becomes:
<strong>a</strong> <strong>banana</strong> <strong>a</strong>nd <strong>a</strong>n <strong>a</strong>pple
It looks like you're wanting to search for multiple words - this word or that word. Which means you need to separate your searches by |, like the script below:
import re
text = "some word many other words"
input = '|'.join('some word'.split())
pattern = re.compile('(%s)' % input, flags=0)
print pattern.sub(r'<strong>\1</strong>',text)
I'm not completely sure if I know what you're asking but if you want to pass all the elements of input in as parameters in the compile function call, you can just use *input instead of input. * will split the list into its elements. As an alternative, could't you just try joining the list with and adding at the beginning and at the end?
Alternatively, you can use the join operator with a list comprehension to create the intended result.
text = "some word many other words".split()
result = ' '.join(['<strong>'+i+'</strong>' for i in text])