animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
if(text1 in animals or text2 in animals or text3 in animals):
print(text2) # because it was met in the if/else statment!
I tried to simplify but this animals string will be update everytime.
What is the best and easy way to achieve this without so many if/else statment in my code?
You can use regex.
import re
pattern = '|'.join([text1, text2, text3])
# pattern -> 'brown dog|white cat|fat cow'
res = re.findall(pattern, animals)
print(res)
# ['white cat']
ANY time you have a set of variables of the form xxx1, xxx2, and xxx3, you need to convert that to a list.
animals = 'silly monkey small bee white cat'
text = [
'brown dog',
'white cat',
'fat cow'
]
for t in text:
if t in animals:
print("Found",t)
Use a loop to check each case:
animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
for text in [text1, text2, text3]:
if text in animals:
print(text)
Related
I just learning Python and the regular "language". But I just encountered this problem which I can't answer. Some help or recommendations would be helpful.
import re
m = " text1 \abs{foo} text2 text3 \L{E.F} text4 "
def separate_mycmd(cmd,m):
math_cmd = re.compile(cmd)
math = math_cmd.findall(m)
text = math_cmd.split(m)
return(math,text)
(math,text) = separate_mycmd(r'\abs',m)
print math # ['\x07bs']
print text # [' text1 ', '{foo} text2 text3 \\L{E.F} text4 ']
(math,text) = separate_mycmd(r'\L',m)
print math # **Question:** Just ['L'] and not ['\\L'] or ['\L]
print text # [' text1 \x07bs{foo} text2 text3 \\', '{E.F} text4 ']
# **Question:** why the \\ after text3 ?
I don't understand the output from the last call. My related questions are in the comments.
Thanks in advance,
Ulrich
You want to match on "\\L", not \L, giving you:
> (math,text) = separate_mycmd(r'\\L',m)
> print math
['\\L']
> print text
[' text1 \x07bs{foo} text2 text3 ', '{E.F} text4 ']
You also probably wanted to use \\a as well. And you probably also wanted to use it in the string you're searching, giving:
m = " text1 \\abs{foo} text2 text3 \\L{E.F} text4 "
I have some tweets which contain some shorthand text like ur,bcz etc. I am using dictionary to map the correct words. I know we cannot mutate strings in python. So after replacing the with correct word, i am storing a copy in a new list. Its working. I am facing issue if any tweet have more than one shorthand text.
My code is replacing one word at a time. How can i replace words multiple times in a single string.
Here is my code
# some sample tweets
tweet = ['stats is gr8', 'india is grt bcz it is colourfull', 'i like you','your movie is grt', 'i hate ur book of hatred' ]
short_text={
"bcz" : "because",
"ur" : "your",
"grt" : "great",
"gr8" : "great",
"u" : "you"
}
import re
def find_word(text,search):
result = re.findall('\\b'+search+'\\b',text,flags=re.IGNORECASE)
if len(result) > 0:
return True
else:
return False
corrected_tweets=list()
for i in tweet:
tweettoken=i.split()
for short_word in short_text:
print("current iteration")
for tok in tweettoken:
if(find_word(tok,short_word)):
print(tok)
print(i)
newi = i.replace(tok,short_text[short_word])
corrected_tweets.append(newi)
print(newi)
my output is
['stats is great',
'india is grt because it is colourfull',
'india is great bcz it is colourfull',
'your movie is great',
'i hate your book of hatred']
What I need is tweet 2 and 3 should be appended once with all correction. I am new to python. Any help will be great.
use a regex function on word boundary, fetching the replacement in the dictionary (with default to original word, so returns the same word if not found)
tweet = ['stats is gr8', 'india is grt bcz it is colourfull', 'i like you','your movie is grt', 'i hate ur book of hatred' ]
short_text={
"bcz" : "because",
"ur" : "your",
"grt" : "great",
"gr8" : "great",
"u" : "you"
}
import re
changed = [re.sub(r"\b(\w+)\b",lambda m:short_text.get(m.group(1),m.group(1)),x) for x in tweet]
result:
['stats is great', 'india is great because it is colourfull', 'i like you', 'your movie is great', 'i hate your book of hatred']
this approach is very fast because it has O(1) lookup for each word (doesn't depend on the length of the dictionary)
Advantage of re+word boundary vs str.split is that it works when words are separated with punctuation as well.
you can use a list comp for this:
[' '.join(short_text.get(s, s) for s in new_str.split()) for new_str in tweet]
result:
In [1]: tweet = ['stats is gr8', 'india is grt bcz it is colourfull', 'i like you','your movie is grt', 'i hate ur book of hatred' ]
...:
In [2]: short_text={
...: "bcz" : "because",
...: "ur" : "your",
...: "grt" : "great",
...: "gr8" : "great",
...: "u" : "you"
...: }
In [4]: [' '.join(short_text.get(s, s) for s in new_str.split()) for new_str in tweet]
Out[4]:
['stats is great',
'india is great because it is colourfull',
'i like you',
'your movie is great',
'i hate your book of hatred']
You can try this approach:
tweet = ['stats is gr8', 'india is grt bcz it is colourfull', 'i like you','your movie is grt', 'i hate ur book of hatred' ]
short_text={
"bcz" : "because",
"ur" : "your",
"grt" : "great",
"gr8" : "great",
"u" : "you"
}
for j,i in enumerate(tweet):
data=i.split()
for index_np,value in enumerate(data):
if value in short_text:
data[index_np]=short_text[value]
tweet[j]=" ".join(data)
print(tweet)
output:
['stats is great', 'india is great because it is colourfull', 'i like you', 'your movie is great', 'i hate your book of hatred']
I am new to Python and need some help with a string I have that looks like this:
string='Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75\n'
and need to transform it into a table that looks more like this:
Category Dish Price
Starters Salad with Greens 14.00
Starters Salad Goat Cheese 12.75
Mains Pizza 12.75
Mains Pasta 12.75
What would be the best way to achieve this?
I was trying to apply string.rsplit(" ",2) but couldn't figure out to make it do it per line. And had no idea how to repeat the headers into a separate column.
Any help will be much appreciated.
Thanks in advance!
I suppose you have to decide how to differentiate category and item. I think that an item should have its price. This code checks if a dot is present, but you probably should use regexp.
s = 'Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75'
items = s.split('\n')
# ['Starters', 'Salad with Greens 14.00', 'Salad Goat Cheese 12.75', 'Mains', 'Pizza 12.75', 'Pasta 12.75']
category = ''
menu = {}
for item in items:
print(item)
if '.' in item:
menu[category].append(item)
else:
category = item
menu[category] = []
print(menu)
# {'Starters': ['Salad with Greens 14.00', 'Salad Goat Cheese 12.75'], 'Mains': ['Pizza 12.75', 'Pasta 12.75']}
UPD: You may replace
if '.' in item:
with
if re.match(r".*\d.\d\d", item):
It is searching for strings which end like 1.11 (it is useful if you have abbreviations in category name)
Not that I would use it in a production environment but for the sake of academic challenge:
import re
string = """Starters
Salad with Greens 14.00
Salad Goat Cheese 12.75
Mains
Pizza 12.75
Pasta 12.75"""
rx = re.compile(r'^(Starters|Mains)', re.MULTILINE)
result = "\n".join(["{}\t{}".format(category, line)
for parts in [[part.strip() for part in rx.split(string) if part]]
for category, dish in zip(parts[0::2], parts[1::2])
for line in dish.split("\n")])
print(result)
This yields
Starters Salad with Greens 14.00
Starters Salad Goat Cheese 12.75
Mains Pizza 12.75
Mains Pasta 12.75
You can use a class-based solution in Python3 with operator overloading to gain additional accessibility over the data:
import re
import itertools
class MealPlan:
def __init__(self, string, headers):
self.headers = headers
self.grouped_data = [d for c, d in [(a, list(b)) for a, b in itertools.groupby(string.split('\n'), key=lambda x:x in ['Starters', 'Mains'])]]
self.final_grouped_data = list(map(lambda x:[x[0][0], x[-1]], [grouped_data[i:i+2] for i in range(0, len(grouped_data), 2)]))
self.final_data = [[[a, *list(filter(None, re.split('\s(?=\d)', i)))] for i in b] for a, b in final_grouped_data]
self.final_data = [list(filter(lambda x:len(x) > 1, i)) for i in self.final_data]
def __getattr__(self, column):
if column not in self.headers:
raise KeyError("'{}' not found".format(column))
transposed = [dict(zip(self.headers, i)) for i in itertools.chain.from_iterable(self.final_data)]
yield from map(lambda x:x[column], transposed)
def __getitem__(self, row):
new_grouped_data = {a:dict(zip(self.headers[1:], zip(*[i[1:] for i in list(b)]))) for a, b in itertools.groupby(list(itertools.chain(*self.final_data)), key=lambda x:x[0])}
return new_grouped_data[row]
def __repr__(self):
return ' '.join(self.headers)+'\n'+'\n'.join('\n'.join(' '.join(c) for c in i) for i in self.final_data)
string='Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75\n'
meal = MealPlan(string, ['Category', 'Dish', 'Price'])
print(meal)
print([i for i in meal.Category])
print(meal['Starters'])
Output:
Category Dish Price
Starters Salad with Greens 14.00
Starters Salad Goat Cheese 12.75
Mains Pizza 12.75
Mains Pasta 12.75
['Starters', 'Starters', 'Mains', 'Mains']
{'Dish': ('Salad with Greens', 'Salad Goat Cheese'), 'Price': ('14.00', '12.75')}
try this. Note: it's assuming 'Starters' are listed before 'Mains'
category = 'Starters'
for item in string.split('\n'):
if item == 'Mains': category = 'Mains'
if item in ('Starters', 'Mains'): continue
price = item.split(' ')[-1]
dish = ' '.join(item.split(' ')[:-1])
print ('{} {} {}'.format(category, dish, price))
I need to turn this file content into a dictionary, so that every key in the dict is a name of a movie and every value is the name of the actors that plays in it inside a set.
Example of file content:
Brad Pitt, Sleepers, Troy, Meet Joe Black, Oceans Eleven, Seven, Mr & Mrs Smith
Tom Hanks, You have got mail, Apollo 13, Sleepless in Seattle, Catch Me If You Can
Meg Ryan, You have got mail, Sleepless in Seattle
Diane Kruger, Troy, National Treasure
Dustin Hoffman, Sleepers, The Lost City
Anthony Hopkins, Hannibal, The Edge, Meet Joe Black, Proof
This should get you started:
line = "a, b, c, d"
result = {}
names = line.split(", ")
actor = names[0]
movies = names[1:]
result[actor] = movies
Try the following:
res_dict = {}
with open('my_file.txt', 'r') as f:
for line in f:
my_list = [item.strip() for item in line.split(',')]
res_dict[my_list[0]] = my_list[1:] # To make it a set, use: set(my_list[1:])
Explanation:
split() is used to split each line to form a list using , separator
strip() is used to remove spaces around each element of the previous list
When you use with statement, you do not need to close your file explicitly.
[item.strip() for item in line.split(',')] is called a list comprehension.
Output:
>>> res_dict
{'Diane Kruger': ['Troy', 'National Treasure'], 'Brad Pitt': ['Sleepers', 'Troy', 'Meet Joe Black', 'Oceans Eleven', 'Seven', 'Mr & Mrs Smith'], 'Meg Ryan': ['You have got mail', 'Sleepless in Seattle'], 'Tom Hanks': ['You have got mail', 'Apollo 13', 'Sleepless in Seattle', 'Catch Me If You Can'], 'Dustin Hoffman': ['Sleepers', 'The Lost City'], 'Anthony Hopkins': ['Hannibal', 'The Edge', 'Meet Joe Black', 'Proof']}
I need to parse a series of short strings that are comprised of 3 parts: a question and 2 possible answers. The string will follow a consistent format:
This is the question "answer_option_1 is in quotes" "answer_option_2 is in quotes"
I need to identify the question part and the two possible answer choices that are in single or double quotes.
Ex.:
What color is the sky today? "blue" or "grey"
Who will win the game 'Michigan' 'Ohio State'
How do I do this in python?
>>> import re
>>> s = "Who will win the game 'Michigan' 'Ohio State'"
>>> re.match(r'(.+)\s+([\'"])(.+?)\2\s+([\'"])(.+?)\4', s).groups()
('Who will win the game', "'", 'Michigan', "'", 'Ohio State')
If your format is a simple as you say (i.e. not as in your examples), you don't need regex. Just split the line:
>>> line = 'What color is the sky today? "blue" "grey"'.strip('"')
>>> questions, answers = line.split('"', 1)
>>> answer1, answer2 = answers.split('" "')
>>> questions
'What color is the sky today? '
>>> answer1
'blue'
>>> answer2
'grey'
One possibility is that you can use regex.
import re
robj = re.compile(r'^(.*) [\"\'](.*)[\"\'].*[\"\'](.*)[\"\']')
str1 = "Who will win the game 'Michigan' 'Ohio State'"
r1 = robj.match(str1)
print r1.groups()
str2 = 'What color is the sky today? "blue" or "grey"'
r2 = robj.match(str2)
r2.groups()
Output:
('Who will win the game', 'Michigan', 'Ohio State')
('What color is the sky today?', 'blue', 'grey')
Pyparsing will give you a solution that will adapt to some variability in the input text:
questions = """\
What color is the sky today? "blue" or "grey"
Who will win the game 'Michigan' 'Ohio State'""".splitlines()
from pyparsing import *
quotedString.setParseAction(removeQuotes)
q_and_a = SkipTo(quotedString)("Q") + delimitedList(quotedString, Optional("or"))("A")
for qn in questions:
print qn
qa = q_and_a.parseString(qn)
print "qa.Q", qa.Q
print "qa.A", qa.A
print
Will print:
What color is the sky today? "blue" or "grey"
qa.Q What color is the sky today?
qa.A ['blue', 'grey']
Who will win the game 'Michigan' 'Ohio State'
qa.Q Who will win the game
qa.A ['Michigan', 'Ohio State']