Joining Strings on New Lines Error Python - python

Almost there with this one!
Taking user input and removing any trailing punctuation and non-hashed words to spot trends in tweets. Don't ask!
tweet = input('Tweet: ')
tweets = ''
while tweet != '':
tweets += tweet
tweet = input('Tweet: ')
print (tweets) # only using this to spot where things are going wrong!
listed_tweets = tweets.lower().rstrip('\'\"-,.:;!?').split(' ')
hashed = []
for entry in listed_tweets:
if entry[0] == '#':
hashed.append(entry)
from collections import Counter
trend = Counter(hashed)
for item in trend:
print (item, trend[item])
Which works apart from that fact I get:
Tweet: #Python is #AWESOME!
Tweet: This is #So_much_fun #awesome
Tweet:
#Python is #AWESOME!This is #So_much_fun #awesome
#awesome!this 1
#python 1
#so_much_fun 1
#awesome 1
Instead of:
#so_much_fun 1
#awesome 2
#python 1
So I'm not getting a space at the end of each line of input and it's throwing my list!
It's probably very simple, but after 10hrs straight of self-teaching, my mind is mush!!

The problem is with this line:
tweets += tweet
You're taking each tweet and appending it to the previous one. Thus, the last word of the previous tweet gets joined with the first word of the current tweet.
There are various ways to solve this problem. One approach is to process the tweets one at a time. Start out with an empty array for your hashtags, then do the following in a loop:
read a line from the user
if the line is empty, break out of the loop
otherwise, extract the hashtags and add them to the array
return to step 1
The following code incorporates this idea and makes several other improvements. Notice how the interactive loop is written so that there's only one place in the code where we prompt the user for input.
hashtags = []
while True: # Read and clean each line of input.
tweet = input('Tweet: ').lower().rstrip('\'\"-,.:;!?')
if tweet == '': # Check for empty input.
break
print('cleaned tweet: '+tweet) # Review the cleaned tweet.
for word in tweet.split(): # Extract hashtags.
if word[0] == '#':
hashtags.append(word)
from collections import Counter
trend = Counter(hashtags)
for item in trend:
print (item, trend[item])
If you continue working on tweet processing, I suspect that you'll find that your tweet-cleaning process is inadequate. What if there is punctuation in the middle of a tweet, for example? You will probably want to embark on the study of regular expressions sooner or later.

Related

Identify lines of speech which contain words from a list using pandas

I have the following dataframe:
test = pd.DataFrame(columns = ['Line No.','Person','Speech'])
test['Person'] = ['A','B','A','B','A','B']
test['Line No.'] = [1,2,3,4,5,6]
test['Speech'] = ['hello. how was your assessment day? i heard it went very well.',
'The beginning was great and the rest of the day was kinda ok.',
'why did things go from great to ok?',
'i was positive at the beginning and went right with all my answers but then i was not feeling well.',
"that's very unfortunate. if there's anything i can help you with please let me know how.",
'Will do.']
And the following list which contains keywords:
keywords = ['hello','day','great','well','happy','right','ok','why','positive']
I would like to generate an output which shows both the speaker and line no. associated with them for each time their speech contains at least 3 words from the keywords list. I have tried iterating through each line in the dataframe to see if there was at least 3 keywords present however my code only returns the last line. Below is the code I used:
def identify_line_numebr(dataframe, keywords:list, thresh:int=3):
is_person = False
keyword_match_list = []
for index, row in dataframe.iterrows():
if is_person == False:
# Pulling out the speech
line = row['Speech']
for token in line:
# Checking if each line of speech contains key words
if token in keywords:
keyword_match_list.append(token)
print(index, is_person, row['Line No.'], row['Person'])
print(len(keyword_match_list))
if len(keyword_match_list) == thresh:
is_person == True
else:
break
return {row['Person'], row['Line No.']}
The expected output for this particular case should be in a similar format:
output = [{1, 'A'},{2, 'B'},{3, 'A'},{5, 'A'}]
whereby the first value is the Line No. which contains speech which has at least 3 keywords and the letter is the person.
The problem is that you stop the iteration over the rows as soon as you find a line containing at least three keywords. Instead, you should iterate over all lines and add the person and line number to a list if the threshold count is met:
def identify_line_numbers(dataframe, keywords, thresh=3):
person_line = [] # will contain sets of {Line No., Person}
for line_index, line in enumerate(dataframe.Speech):
# check if each word is in the current line
words_in_speech = [word in line for word in keywords]
# add person and line number to our list if the threshold count is met
if sum(words_in_speech) >= thresh:
person_line.append(
{dataframe.Person[line_index], dataframe['Line No.'][line_index]}
)
return person_line

Input Function and returning number of words in dataset

I'm supposed to be writing a function for an input of any word to search in the song title and then return the number of songs that contain the word. If no word found then return a statement saying no words found. My output is running the elif statement and then my if statement. I'll post what my outlook is looking like.
import csv
word_count = 0
with open("billboard_songs.csv") as data:
word = input("Enter any word: ")
for line in data:
line_strip = line.split(",")
if word.casefold() in line_strip[1]:
word_count += 1
print(word_count, "songs were found to contain", word.casefold(), "in this data set")
elif word_count == 1:
print("No songs were found to contain the words: ", word.casefold())
Current output:
No songs were found to contain the words: war
No songs were found to contain the words: war
No songs were found to contain the words: war
No songs were found to contain the words: war
2 songs were found to contain war in this data set
3 songs were found to contain war in this data set
4 songs were found to contain war in this data set
5 songs were found to contain war in this data set
6 songs were found to contain war in this data set
7 songs were found to contain war in this data set
8 songs were found to contain war in this data set
There are so many issues with the code.
You should be using the csv library you've already imported, not splitting on comma ,.
Your if statement really isn't doing what you might expect.
You should do something similar to the following:
import csv # Use it!
Store the word as a variable:
word = input("Enter any word: ").casefold()
Hopefully your CSV has headers in it... use csv.DictReader if it does:
reader = csv.DictReader(open('billboard_songs.csv', 'r'))
Iterate through each song in the CSV... from line_strip[1], it looks as if your song lyrics are in the second field. So loop through those. You should also set up a variable to store the count of songs containing the word at this stage:
word_count = 0
for lyrics in reader['song_lyrics']: # replace 'song_lyrics' with your CSV header for the field with song lyrics
# Check the word is present
Iterate through the full CSV first, before printing output.
if word in lyrics:
word_count += 1
Once that finishes, you can use an if/else statement to print any desired output:
if word_count == 0:
print('No songs were found to contain the words: {}'.format(word))
else:
# at least one set of lyrics had the word!
print('{} song(s) were found to contain {} in this data set'.format(word_count, word))
Or, instead of the for loop and everything else below reader, you could use sum as follows:
word_count = sum([word in lyrics for lyrics in reader['song_lyrics'])
Then you could just use a generic print statement:
print('There were {} songs that contained the word: {}'.format(word_count, word))

How can I pull out text snippets around specific words?

I have a large txt file and I'm trying to pull out every instance of a specific word, as well as the 15 words on either side. I'm running into a problem when there are two instances of that word within 15 words of each other, which I'm trying to get as one large snippet of text.
I'm trying to get chunks of text to analyze about a specific topic. So far, I have working code for all instances except the scenario mentioned above.
def occurs(word1, word2, filename):
import os
infile = open(filename,'r') #opens file, reads, splits into lines
lines = infile.read().splitlines()
infile.close()
wordlist = [word1, word2] #this list allows for multiple words
wordsString = ''.join(lines) #splits file into individual words
words = wordsString.split()
f = open(filename, 'w')
f.write("start")
f.write(os.linesep)
for word in wordlist:
matches = [i for i, w in enumerate(words) if w.lower().find(word) != -1]
for m in matches:
l = " ".join(words[m-15:m+16])
f.write(f"...{l}...") #writes the data to the external file
f.write(os.linesep)
f.close
So far, when two of the same word are too close together, the program just doesn't run on one of them. Instead, I want to get out a longer chunk of text that extends 15 words behind and in front of furthest back and forward words
This snippet will get number of words around the chosen keyword. If there are some keywords together, it will join them:
s = '''xxx I have a large txt file and I'm xxx trying to pull out every instance of a specific word, as well as the 15 words on either side. I'm running into a problem when there are two instances of that word within 15 words of each other, which I'm trying to get as one large snippet of text.
I'm trying to xxx get chunks of text to analyze about a specific topic. So far, I have working code for all instances except the scenario mentioned above. xxx'''
words = s.split()
from itertools import groupby, chain
word = 'xxx'
def get_snippets(words, word, l):
snippets, current_snippet, cnt = [], [], 0
for v, g in groupby(words, lambda w: w != word):
w = [*g]
if v:
if len(w) < l:
current_snippet += [w]
else:
current_snippet += [w[:l] if cnt % 2 else w[-l:]]
snippets.append([*chain.from_iterable(current_snippet)])
current_snippet = [w[-l:] if cnt % 2 else w[:l]]
cnt = 0
cnt += 1
else:
if current_snippet:
current_snippet[-1].extend(w)
else:
current_snippet += [w]
if current_snippet[-1][-1] == word or len(current_snippet) > 1:
snippets.append([*chain.from_iterable(current_snippet)])
return snippets
for snippet in get_snippets(words, word, 15):
print(' '.join(snippet))
Prints:
xxx I have a large txt file and I'm xxx trying to pull out every instance of a specific word, as well as the 15
other, which I'm trying to get as one large snippet of text. I'm trying to xxx get chunks of text to analyze about a specific topic. So far, I have working
topic. So far, I have working code for all instances except the scenario mentioned above. xxx
With the same data and different lenght:
for snippet in get_snippets(words, word, 2):
print(' '.join(snippet))
Prints:
xxx and I'm
I have xxx trying to
trying to xxx get chunks
mentioned above. xxx
As always, a variety of solutions avaliable here. A fun one would a be a recursive wordFind, where it searches the next 15 words and if it finds the target word it can call itself.
A simpler, though perhaps not efficient, solution would be to add words one at a time:
for m in matches:
l = " ".join(words[m-15:m])
i = 1
while i < 16:
if (words[m+i].lower() == word):
i=1
else:
l.join(words[m+(i++)])
f.write(f"...{l}...") #writes the data to the external file
f.write(os.linesep)
Or if you're wanting the subsequent uses to be removed...
bExtend = false;
for m in matches:
if (!bExtend):
l = " ".join(words[m-15:m])
f.write("...")
bExtend = false
i = 1
while (i < 16):
if (words[m].lower() == word):
l.join(words[m+i])
bExtend = true
break
else:
l.join(words[m+(i++)])
f.write(l)
if (!bExtend):
f.write("...")
f.write(os.linesep)
Note, have not tested so may require a bit of debugging. But the gist is clear: add words piecemeal and extend the addition process when a target word is encountered. This also allows you to extend with other target words other than the current one with a bit of addition to to the second conditional if.

Reading a random line from a word doc and printing it

So I have a small project with python.
A random song name and artist are chosen.
The artist and the first letter of each word in the song title are displayed.
The user has two chances to guess the name of the song.
If the user guesses the answer correctly the first time, they score 3 points. If the user guesses
the answer correctly the second time they score 1 point. The game repeats.
The game ends when a player guesses the song name incorrectly the second time.
So far I've created a text document and put a few lines of song titles.
In my code I have used the following:
random_lines = random.choice(open("songs.txt").readlines())
This randomly picks a line in the code and does nothing with it.
I am asking where I go from here. I need to display the first letters of each word on the line. I then need a counter or some sort to add chances. I also need to write something that will check to see if they have it correct and add to a score counter.
OK, now just continue with your plan, it's good. Now you have to get the first letter from each word in line. You can do that with:
res = []
for i in line.split():
res.append(i[0])
There you are, you have the first letter of every word in the list res. Now you need to check if the user entered the title correctly. Maybe the best idea would be to keep everything lower-cased (in your file and in the user input) for easier checking. Now you just have to transform the user input to lower-case. You can do it with:
user_entry = input('Song title:')
if user_entry.lower() == line.lower():
score += 3
else:
user_entry_2 = input('Song title:')
if user_entry_2.lower() == line.lower():
score += 1
else:
print('Game over.')
sys.exit()
You should make this into a function ad call it in a loop until user misses. The function could return the current score which you could print out (in that case you should remove sys.exit() call)
I hope this is clear enough. If not, write the question in the comments :)
Assuming your random choice string contains the data in the format {songname} - {artist}
Then you first need to get the song name and the artist as a separate strings.
Print the first letters and ask for input.
After which you need to compare the strings and do some logic with the points.
points = 0;
while(1):
random_line = 'Song - artist' #change this with your random string
song, artist = random_line.split('-')
print("{0} - {1}".format(song.strip()[:2], artist.strip()[:2]))
for i in range(0,3):
if (i == 2):
print('You died with {} points'.format(points))
exit(0)
elif(random_line.lower() == input('Gues the song: ').lower()):
points += 2 - i
print('correct guess. points: ' + str(points))
break
else:
print('Try again')

How do I place multiple searched tweets into string

I have a program set up so it searches tweets based on the hashtag I give it and I can edit how many tweets to search and display but I can't figure out how to place the searched tweets into a string. this is the code I have so far
while True:
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = [status.text]
print tweet
when this is run it only outputs 1 tweet when it is set to search 2
Your code looks like there's nothing to break out of the while loop. One method that comes to mind is to set a variable to an empty list and then with each tweet, append that to the list.
foo = []
for status in tweepy.Cursor(api.search, q=hashtag).items(2):
tweet = status.text
foo.append(tweet)
print foo
Of course, this will print a list. If you want a string instead, use the string join() method. Adjust the last line of code to look like this:
bar = ' '.join(foo)
print bar

Categories