How to extract a number before a certain words? - python

There is a sentence "i have 5 kg apples and 6 kg pears".
I just want to extract the weight of apples.
So I use
sentence = "I have 5 kg apples and 6 kg pears"
number = re.findall(r'(\d+) kg apples', sentence)
print (number)
However, it just works for integer numbers. So what should I do if the number I want to extract is 5.5?

You can try something like this:
import re
sentence = ["I have 5.5 kg apples and 6 kg pears",
"I have 5 kg apples and 6 kg pears"]
for sen in sentence:
print re.findall(r'(\d+(?:\.\d+)?) kg apples', sen)
Output:
['5.5']
['5']

? designates an optional segment of a regex.
re.findall(r'((\d+\.)?\d+)', sentence)

You can use number = re.findall(r'(\d+\.?\d*) kg apples', sentence)

You change your regex to match it:
(\d+(?:\.\d+)?)
\.\d+ matches a dot followed by at least one digit. I made it optional, because you still want one digit.

re.findall(r'[-+]?[0-9]*\.?[0-9]+.', sentence)

Non-regex solution
sentence = "I have 5.5 kg apples and 6 kg pears"
words = sentence.split(" ")
[words[idx-1] for idx, word in enumerate(words) if word == "kg"]
# => ['5.5', '6']
You can then check whether these are valid floats using
try:
float(element)
except ValueError:
print "Not a float"

The regex you need should look like this:
(\d+.?\d*) kg apples
You can do as follows:
number = re.findall(r'(\d+.?\d*) kg apples', sentence)
Here is an online example

Related

How can I find an unknown word after a specific word?

I have this string "Hello, I bought apples, and sold bananas"
how can I get the value of the word after "bought" and the word after "sold" in python??
you can do this:
string = "Hello, I bought apples, and sold bananas"
string = string.replace(",","")
list_word = string.split(" ")
for i in range(len(list_word)):
if list_word[i]=="bought" or list_word[i]=="sold":
print(list_word[i+1])
output:
apples
bananas
There are several ways to do it. One way is to use regular expressions:
import re
s = "Hello, I bought apples, and sold bananas"
re.findall('\W(?:bought|sold)\W+(\w+)', s)

How can I extract numbers based on context of the sentence in python?

I tried using regular expressions but it doesn't do it with any context
Examples::
"250 kg Oranges for Sale"
"I want to sell 100kg of Onions at 100 per kg"
You can do something like this.
First you split the text in words and then you try to convert each word to a number.
If the word can be converted to a number, it is a number and if you are sure that a quantity is always followed by the word "kg", once you find the number you can test if the next word is "kg".
Then, depending on the result, you add the value to the respective array.
In this particular case, you have to assure the numbers are written alone (e.g. "100 kg" and not "100kg") otherwise it will not be converted.
string = "250 kg Oranges for Sale. I want to sell 100 kg of Onions at 100 per kg."
# Split the text
words_list = string.split(" ")
print(words_list)
# Find which words are numbers
quantity_array = []
price_array = []
for i in range(len(words_list)):
try:
number = int(words_list[i])
# Is it a price or a quantity?
if words_list[i + 1] == 'kg':
quantity_array.append(number)
else:
price_array.append(number)
except ValueError:
print("\'%s\' is not a number" % words_list[i])
# Get the results
print(quantity_array)
print(price_array)

How to split sentences in a list?

I am trying to create a function to count the number of words and mean length of words in any given sentence or sentences. I can't seem to split the string into two sentences to be put into a list, assuming the sentence has a period and ending the sentence.
Question marks and exclamation marks should be replaced by periods to be recognized as a new sentence in the list.
For example: "Haven't you eaten 8 oranges today? I don't know if you did." would be: ["Haven't you eaten 8 oranges today", "I don't know if you did"]
The mean length for this example would be 44/12 = 3.6
def word_length_list(text):
text = text.replace('--',' ')
for p in string.punctuation + "‘’”“":
text = text.replace(p,'')
text = text.lower()
words = text.split(".")
word_length = []
print(words)
for i in words:
count = 0
for j in i:
count = count + 1
word_length.append(count)
return(word_length)
testing1 = word_length_list("Haven't you eaten 8 oranges today? I don't know if you did.")
print(sum(testing1)/len(testing1))
One option might use re.split:
inp = "Haven't you eaten 8 oranges today? I don't know if you did."
sentences = re.split(r'(?<=[?.!])\s+', inp)
print(sentences)
This prints:
["Haven't you eaten 8 oranges today?", "I don't know if you did."]
We could also use re.findall:
inp = "Haven't you eaten 8 oranges today? I don't know if you did."
sentences = re.findall(r'.*?[?!.]', inp)
print(sentences) # prints same as above
Note that in both cases we are assuming that period . would only appear as a stop, and not as part of an abbrevation. If period can have multiple contexts, then it could be tricky to tease apart sentences. For example:
Jon L. Skeet earned more point than anyone. Gordon Linoff also earned a lot of points.
It is not clear here whether period means end of sentence or part of an abbreviation.
An example to split using regex:
import re
s = "Hello! How are you?"
print([x for x in re.split("[\.\?\!]+",s.strip()) if not x == ''])

Splitting by particular punctuation

Let's assume that I want to remove a comma from a sentence, but in this particular way.
I ate pineapples, grapes -> I ate pineapples I ate grapes
we know python 2.0, 3.0 well -> we know python 2.0 well we know python 3.0 well
Basically, I want to keep everything where comma didn't happen. Is there an easy way to do it using 're' library in python?
You’re basically splitting the string by a coma, keeping the first sentence but repeating it replacing the last word of the first sentence with the words after the coma.
s = "I ate pineapples, grapes"
s1 = "we know python 2.0, 3.0 well"
def my_split(string):
sep = string.split(',')
sentence = ' '.join(sep[0].split()[:-1])
words = [sep[0].split()[-1], *sep[1:]]
return ' '.join(f'{sentence} {w.strip()}' for w in words)
print(my_split(s))
print(my_split(s1))

Python 3 How to find the different combinations of a String

Given a String is there any short way in Python 3 to find the different combinations of the space seperated words in that string ?
For eg:
If the input string is 'Peaches Apples Bananas', I want output as:
'Peaches Apples Bananas'
'Peaches Bananas Apples'
'Apples Bananas Peaches'
'Apples Peaches Bananas'
'Bananas Peaches Apples'
'Bananas Apples Peaches'
import itertools
string = 'Peaches Apples Bananas'
word_list = string.split(' ')
output = [' '.join(permutation) for permutation in itertools.permutations(word_list)]
I think you are looking for itertools.permutations:
import itertools
for perm in itertools.permutations('Peaches Apples Bananas'.split(' ')):
print(' '.join(perm))

Categories