I have some tweets which contain some shorthand text like ur,bcz etc. I am using dictionary to map the correct words. I know we cannot mutate strings in python. So after replacing the with correct word, i am storing a copy in a new list. Its working. I am facing issue if any tweet have more than one shorthand text.
My code is replacing one word at a time. How can i replace words multiple times in a single string.
Here is my code
# some sample tweets
tweet = ['stats is gr8', 'india is grt bcz it is colourfull', 'i like you','your movie is grt', 'i hate ur book of hatred' ]
short_text={
"bcz" : "because",
"ur" : "your",
"grt" : "great",
"gr8" : "great",
"u" : "you"
}
import re
def find_word(text,search):
result = re.findall('\\b'+search+'\\b',text,flags=re.IGNORECASE)
if len(result) > 0:
return True
else:
return False
corrected_tweets=list()
for i in tweet:
tweettoken=i.split()
for short_word in short_text:
print("current iteration")
for tok in tweettoken:
if(find_word(tok,short_word)):
print(tok)
print(i)
newi = i.replace(tok,short_text[short_word])
corrected_tweets.append(newi)
print(newi)
my output is
['stats is great',
'india is grt because it is colourfull',
'india is great bcz it is colourfull',
'your movie is great',
'i hate your book of hatred']
What I need is tweet 2 and 3 should be appended once with all correction. I am new to python. Any help will be great.
use a regex function on word boundary, fetching the replacement in the dictionary (with default to original word, so returns the same word if not found)
tweet = ['stats is gr8', 'india is grt bcz it is colourfull', 'i like you','your movie is grt', 'i hate ur book of hatred' ]
short_text={
"bcz" : "because",
"ur" : "your",
"grt" : "great",
"gr8" : "great",
"u" : "you"
}
import re
changed = [re.sub(r"\b(\w+)\b",lambda m:short_text.get(m.group(1),m.group(1)),x) for x in tweet]
result:
['stats is great', 'india is great because it is colourfull', 'i like you', 'your movie is great', 'i hate your book of hatred']
this approach is very fast because it has O(1) lookup for each word (doesn't depend on the length of the dictionary)
Advantage of re+word boundary vs str.split is that it works when words are separated with punctuation as well.
you can use a list comp for this:
[' '.join(short_text.get(s, s) for s in new_str.split()) for new_str in tweet]
result:
In [1]: tweet = ['stats is gr8', 'india is grt bcz it is colourfull', 'i like you','your movie is grt', 'i hate ur book of hatred' ]
...:
In [2]: short_text={
...: "bcz" : "because",
...: "ur" : "your",
...: "grt" : "great",
...: "gr8" : "great",
...: "u" : "you"
...: }
In [4]: [' '.join(short_text.get(s, s) for s in new_str.split()) for new_str in tweet]
Out[4]:
['stats is great',
'india is great because it is colourfull',
'i like you',
'your movie is great',
'i hate your book of hatred']
You can try this approach:
tweet = ['stats is gr8', 'india is grt bcz it is colourfull', 'i like you','your movie is grt', 'i hate ur book of hatred' ]
short_text={
"bcz" : "because",
"ur" : "your",
"grt" : "great",
"gr8" : "great",
"u" : "you"
}
for j,i in enumerate(tweet):
data=i.split()
for index_np,value in enumerate(data):
if value in short_text:
data[index_np]=short_text[value]
tweet[j]=" ".join(data)
print(tweet)
output:
['stats is great', 'india is great because it is colourfull', 'i like you', 'your movie is great', 'i hate your book of hatred']
Related
animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
if(text1 in animals or text2 in animals or text3 in animals):
print(text2) # because it was met in the if/else statment!
I tried to simplify but this animals string will be update everytime.
What is the best and easy way to achieve this without so many if/else statment in my code?
You can use regex.
import re
pattern = '|'.join([text1, text2, text3])
# pattern -> 'brown dog|white cat|fat cow'
res = re.findall(pattern, animals)
print(res)
# ['white cat']
ANY time you have a set of variables of the form xxx1, xxx2, and xxx3, you need to convert that to a list.
animals = 'silly monkey small bee white cat'
text = [
'brown dog',
'white cat',
'fat cow'
]
for t in text:
if t in animals:
print("Found",t)
Use a loop to check each case:
animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
for text in [text1, text2, text3]:
if text in animals:
print(text)
Trying to sort with an order of the last name from the list of author names, and books like this. Does anyone know how to get an index value right before the ',' this delimiter? Which are the last names.
I need to put the index value in the lambda x:x[here]
Also what if the author names are the same how do I order them in alphabetical order of book titles?
name_list= ["Dan Brown,The Da Vinci Code",
"Cornelia Funke,Inkheart",
"H G Wells,The War Of The Worlds",
"William Goldman,The Princess Bride",
"Harper Lee,To Kill a Mockingbird",
"Gary Paulsen,Hatchet",
"Jodi Picoult,My Sister's Keeper",
"Philip Pullman,The Golden Compass",
"J R R Tolkien,The Lord of the Rings",
"J R R Tolkien,The Hobbit",
"J.K. Rowling,Harry Potter Series",
"C S Lewis,The Lion the Witch and the Wardrobe",
"Louis Sachar,Holes",
"F. Scott Fitzgerald,The Great Gatsby",
"Eric Walters,Shattered",
"John Wyndham,The Chrysalids"]
def sorting(name):
last_name =[]
name_list = book_rec(name)
for i in name_list:
last_name.append(i.split())
name_list = []
for i in sorted(last_name, key=lambda x: x[]):
name_list.append(' '.join(i))
return name_list
split on comma, keep first part; split on white space, keep last:
name_list.sort(key=lambda x: x.split(',')[0].split()[-1])
If you also want to sort by book titles for the same author last name, then maybe it's better to use a function that throws key:
def sorting_key(author_title):
author, title = author_title.split(',')
# first by author last name, then by book title
return author.split()[-1], title
name_list.sort(key=sorting_key)
print(name_list)
Output:
['Dan Brown,The Da Vinci Code',
'F. Scott Fitzgerald,The Great Gatsby',
'Cornelia Funke,Inkheart',
'William Goldman,The Princess Bride',
'Harper Lee,To Kill a Mockingbird',
'C S Lewis,The Lion the Witch and the Wardrobe',
'Gary Paulsen,Hatchet',
"Jodi Picoult,My Sister's Keeper",
'Philip Pullman,The Golden Compass',
'J.K. Rowling,Harry Potter Series',
'Louis Sachar,Holes',
'J R R Tolkien,The Hobbit',
'J R R Tolkien,The Lord of the Rings',
'Eric Walters,Shattered',
'H G Wells,The War Of The Worlds',
'John Wyndham,The Chrysalids']
I'm making a program that counts how many times a band has played a song from a webpage of all their setlists. I have grabbed the webpage and converted all the songs played into one big list so all I wanted to do was see if the song name was in the list and add to a counter but it isn't working and I can't seem to figure out why.
I've tried using the count function instead and that didn't work
sugaree_counter = 0
link = 'https://www.cs.cmu.edu/~mleone/gdead/dead-sets/' + year + '/' + month+ '-' + day + '-' + year + '.txt'
page = requests.get(link)
page_text = page.text
page_list = [page_text.split('\n')]
print(page_list)
This code returns the list:
[['Winterland Arena, San Francisco, CA (1/2/72)', '', "Truckin'", 'Sugaree',
'Mr. Charlie', 'Beat it on Down the Line', 'Loser', 'Jack Straw',
'Chinatown Shuffle', 'Your Love At Home', 'Tennessee Jed', 'El Paso',
'You Win Again', 'Big Railroad Blues', 'Mexicali Blues',
'Playing in the Band', 'Next Time You See Me', 'Brown Eyed Women',
'Casey Jones', '', "Good Lovin'", 'China Cat Sunflower', 'I Know You Rider',
"Good Lovin'", 'Ramble On Rose', 'Sugar Magnolia', 'Not Fade Away',
"Goin' Down the Road Feeling Bad", 'Not Fade Away', '',
'One More Saturday Night', '', '']]
But when I do:
sugaree_counter = int(sugaree_counter)
if 'Sugaree' in page_list:
sugaree_counter += 1
print(str(sugaree_counter))
It will always be zero.
It should add 1 to that because 'Sugaree' is in that list
Your page_list is a list of lists, so you need two for loops to get the pages, you need to do
for page in page_list:
for item in page:
sugaree_counter += 1
Use sum() and list expressions:
sugaree_counter = sum([page.count('Sugaree') for page in page_list])
I am trying to write a function in python that opens a file and parses it into a dictionary. I am trying to make the first item in the list block the key for each item in the dictionary data. Then each item is supposed to be the rest of the list block less the first item. For some reason though, when I run the following function, it parses it incorrectly. I have provided the output below. How would I be able to parse it like I stated above? Any help would be greatly appreciated.
Function:
def parseData() :
filename="testdata.txt"
file=open(filename,"r+")
block=[]
for line in file:
block.append(line)
if line in ('\n', '\r\n'):
album=block.pop(1)
data[block[1]]=album
block=[]
print data
Input:
Bob Dylan
1966 Blonde on Blonde
-Rainy Day Women #12 & 35
-Pledging My Time
-Visions of Johanna
-One of Us Must Know (Sooner or Later)
-I Want You
-Stuck Inside of Mobile with the Memphis Blues Again
-Leopard-Skin Pill-Box Hat
-Just Like a Woman
-Most Likely You Go Your Way (And I'll Go Mine)
-Temporary Like Achilles
-Absolutely Sweet Marie
-4th Time Around
-Obviously 5 Believers
-Sad Eyed Lady of the Lowlands
Output:
{'-Rainy Day Women #12 & 35\n': '1966 Blonde on Blonde\n',
'-Whole Lotta Love\n': '1969 II\n', '-In the Evening\n': '1979 In Through the Outdoor\n'}
You can use groupby to group the data using the empty lines as delimiters, use a defaultdict for repeated keys extending the rest of the values from each val returned from groupby after extracting the key/first element.
from itertools import groupby
from collections import defaultdict
d = defaultdict(list)
with open("file.txt") as f:
for k, val in groupby(f, lambda x: x.strip() != ""):
# if k is True we have a section
if k:
# get key "k" which is the first line
# from each section, val will be the remaining lines
k,*v = val
# add or add to the existing key/value pairing
d[k].extend(map(str.rstrip,v))
from pprint import pprint as pp
pp(d)
Output:
{'Bob Dylan\n': ['1966 Blonde on Blonde',
'-Rainy Day Women #12 & 35',
'-Pledging My Time',
'-Visions of Johanna',
'-One of Us Must Know (Sooner or Later)',
'-I Want You',
'-Stuck Inside of Mobile with the Memphis Blues Again',
'-Leopard-Skin Pill-Box Hat',
'-Just Like a Woman',
"-Most Likely You Go Your Way (And I'll Go Mine)",
'-Temporary Like Achilles',
'-Absolutely Sweet Marie',
'-4th Time Around',
'-Obviously 5 Believers',
'-Sad Eyed Lady of the Lowlands'],
'Led Zeppelin\n': ['1979 In Through the Outdoor',
'-In the Evening',
'-South Bound Saurez',
'-Fool in the Rain',
'-Hot Dog',
'-Carouselambra',
'-All My Love',
"-I'm Gonna Crawl",
'1969 II',
'-Whole Lotta Love',
'-What Is and What Should Never Be',
'-The Lemon Song',
'-Thank You',
'-Heartbreaker',
"-Living Loving Maid (She's Just a Woman)",
'-Ramble On',
'-Moby Dick',
'-Bring It on Home']}
For python2 the unpack syntax is slightly different:
with open("file.txt") as f:
for k, val in groupby(f, lambda x: x.strip() != ""):
if k:
k, v = next(val), val
d[k].extend(map(str.rstrip, v))
If you want to keep the newlines remove the map(str.rstrip..
If you want the album and songs separately for each artist:
from itertools import groupby
from collections import defaultdict
d = defaultdict(lambda: defaultdict(list))
with open("file.txt") as f:
for k, val in groupby(f, lambda x: x.strip() != ""):
if k:
k, alb, songs = next(val),next(val), val
d[k.rstrip()][alb.rstrip()] = list(map(str.rstrip, songs))
from pprint import pprint as pp
pp(d)
{'Bob Dylan': {'1966 Blonde on Blonde': ['-Rainy Day Women #12 & 35',
'-Pledging My Time',
'-Visions of Johanna',
'-One of Us Must Know (Sooner or '
'Later)',
'-I Want You',
'-Stuck Inside of Mobile with the '
'Memphis Blues Again',
'-Leopard-Skin Pill-Box Hat',
'-Just Like a Woman',
'-Most Likely You Go Your Way '
"(And I'll Go Mine)",
'-Temporary Like Achilles',
'-Absolutely Sweet Marie',
'-4th Time Around',
'-Obviously 5 Believers',
'-Sad Eyed Lady of the Lowlands']},
'Led Zeppelin': {'1969 II': ['-Whole Lotta Love',
'-What Is and What Should Never Be',
'-The Lemon Song',
'-Thank You',
'-Heartbreaker',
"-Living Loving Maid (She's Just a Woman)",
'-Ramble On',
'-Moby Dick',
'-Bring It on Home'],
'1979 In Through the Outdoor': ['-In the Evening',
'-South Bound Saurez',
'-Fool in the Rain',
'-Hot Dog',
'-Carouselambra',
'-All My Love',
"-I'm Gonna Crawl"]}}
I guess this is what you want?
Even if this is not the format you wanted, there are a few things you might learn from the answer:
use with for file handling
nice to have:
PEP8 compilant code, see http://pep8online.com/
a shebang
numpydoc
if __name__ == '__main__'
And SE does not like a list being continued by code...
#!/usr/bin/env python
""""Parse text files with songs, grouped by album and artist."""
def add_to_data(data, block):
"""
Parameters
----------
data : dict
block : list
Returns
-------
dict
"""
artist = block[0]
album = block[1]
songs = block[2:]
if artist in data:
data[artist][album] = songs
else:
data[artist] = {album: songs}
return data
def parseData(filename='testdata.txt'):
"""
Parameters
----------
filename : string
Path to a text file.
Returns
-------
dict
"""
data = {}
with open(filename) as f:
block = []
for line in f:
line = line.strip()
if line == '':
data = add_to_data(data, block)
block = []
else:
block.append(line)
data = add_to_data(data, block)
return data
if __name__ == '__main__':
data = parseData()
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(data)
which gives:
{ 'Bob Dylan': { '1966 Blonde on Blonde': [ '-Rainy Day Women #12 & 35',
'-Pledging My Time',
'-Visions of Johanna',
'-One of Us Must Know (Sooner or Later)',
'-I Want You',
'-Stuck Inside of Mobile with the Memphis Blues Again',
'-Leopard-Skin Pill-Box Hat',
'-Just Like a Woman',
"-Most Likely You Go Your Way (And I'll Go Mine)",
'-Temporary Like Achilles',
'-Absolutely Sweet Marie',
'-4th Time Around',
'-Obviously 5 Believers',
'-Sad Eyed Lady of the Lowlands']},
'Led Zeppelin': { '1969 II': [ '-Whole Lotta Love',
'-What Is and What Should Never Be',
'-The Lemon Song',
'-Thank You',
'-Heartbreaker',
"-Living Loving Maid (She's Just a Woman)",
'-Ramble On',
'-Moby Dick',
'-Bring It on Home'],
'1979 In Through the Outdoor': [ '-In the Evening',
'-South Bound Saurez',
'-Fool in the Rain',
'-Hot Dog',
'-Carouselambra',
'-All My Love',
"-I'm Gonna Crawl"]}}
I am currently programming an Artificial Intelligence in Python, with some basic code from ELIZA. I will improve on the code once I get it working. My problem is that when I run the program and enter a query to the computer, there is no response. My code is below.
import string
# OSWALD v1.0
switch = [
["I need \(.*\)",
[ "Why do you need %1?",
"Would it REALLY help you to get %1?",
"Are you sure you need %1?"]]
#There is more code with responses.
]
gPats = {
"am" : "are",
"was" : "were",
"i" : "you",
"i'd" : "you would",
"i've" : "you have",
"i'll" : "you will",
"my" : "your",
"are" : "am",
"you've": "I have",
"you'll": "I will",
"your" : "my",
"yours" : "mine",
"you" : "me",
"me" : "you",
}
s = input
gKeys = map(lambda x:regex.compile(x[0]),gPats)
gValues = map(lambda x:x[1],gPats)
print ("Hello, mortal. My name is Oswald. What would you like to talk about?")
while s == input:
try: s = input(">")
def translate(str,dict):
words = string.split(string.lower(str))
keys = dict.keys();
for i in range(0,len(words)):
if words[i] in keys:
words[i] = dict[words[i]]
return print(switch)
def respond(str,keys,values):
for i in range(0,len(keys)):
if input == input:
respnum = whrandom.randint(0,len(values[word])-1)
resp = values[i][respnum]
pos = string.find(resp,'%')
print(string.find(resp,'%'))
while pos > -1:
num = string.atoi(resp[pos+1:pos+2])
resp = resp[:pos] + \
translate(keys[i].group(num),gReflections) + \
resp[pos+2:]
pos = string.find(resp,'%')
if resp[-2:] == '?.': resp = resp[:-2] + '.'
if resp[-2:] == '??': resp = resp[:-2] + '?'
print(string.find(resp,'%'))