Separate words into list, except for symbols - python

I'm creating a project where I'll receive a list of tweets (Twitter), and then check if there words inside of a dictionary, which has words that certain values. I've gotten my code to take the words, but I don't know how to eliminate the symbols like: , . ":
Here's the code:
def getTweet(tweet, dictionary):
score = 0
seperate = tweet.split(' ')
print seperate
print "------"
if(len(tweet) > 0):
for item in seperate:
if item in dictionary:
print item
score = score + int(dictionary[item])
print "here's the score: " + str(score)
return score
else:
print "you haven't tweeted a tweet"
return 0
Here's the parameter/tweet that will be checked:
getTweet("you are the best loyal friendly happy cool nice", scoresDict)
Any ideas?

If you want to get rid of all the non alphanumerical values you can try:
import re
re.sub(r'[^\w]', ' ', string)
the flag [^\w] will do the trick for you!

Before doing the split, replace the characters with spaces, and then split on the spaces.
import re
line = ' a.,b"c'
line = re.sub('[,."]', ' ', line)
print line # ' a b c'

Related

Code to remove extraneous spaces in a string in Python and keep 1 space between words

I want to write code that will remove extraneous spaces in a string. Any more than 1 space in between words would be an extraneous space. I want to remove those spaces but keep 1 space in between words
I've written code that will remove spaces at the beginning and the end but I'm not sure for to make it remove the middle spaces but keep 1 there.
#Space Cull
def space_cull(str):
result = str
result = result.strip()
return result
So this is what my code does right now
space_cull(' Cats go meow ')
#It would return
'Cats go meow'
What I want it to do is this:
space_cull(' Cats go meow')
#It would return
'Cats go meow'
How should I do this?
It works like this:
sentence = ' Cats go meow '
" ".join(sentence.split())
You can use re.sub to replace any number of spaces with a single space:
>>> import re
>>> re.sub(r"\s+", " ", "foo bar")
"foo bar"
you can do :
txt = ' Cats go meow '
def space_cull(string):
word = string.split(" ")
result = ""
for elem in word:
if not elem == '':
result += str(elem) + ' '
return result.strip()
print(space_cull(txt))
output:
Cats go meow
You can use built-in string methods:
x = " cats go meow "
print(*x.strip().split())
Output will be:
cats go meow

Only print specific amount of Counter items, with decent formatting

Trying to print out the top N most frequent used words in a text file. So far, I have the file system and the counter and everything working, just cant figure out how to print the certain amount I want in a pretty way. Here is my code.
import re
from collections import Counter
def wordcount(user):
"""
Docstring for word count.
"""
file=input("Enter full file name w/ extension: ")
num=int(input("Enter how many words you want displayed: "))
with open(file) as f:
text = f.read()
words = re.findall(r'\w+', text)
cap_words = [word.upper() for word in words]
word_counts = Counter(cap_words)
char, n = word_counts.most_common(num)[0]
print ("WORD: %s \nOCCURENCE: %d " % (char, n) + '\n')
Basically, I just want to go and make a loop of some sort that will print out the following...
For instance num=3
So it will print out the 3 most frequent used words, and their count.
WORD: Blah Occurrence: 3
Word: bloo Occurrence: 2
Word: blee Occurrence: 1
I would iterate "most common" as follows:
most_common = word_counts.most_common(num) # removed the [0] since we're not looking only at the first item!
for item in most_common:
print("WORD: {} OCCURENCE: {}".format(item[0], item[1]))
Two comments:
1. Use format() to format strings instead of % - you'll thank me later for this advice!
2. This way you'll be able to iterate any number of "top N" results without hardcoding "3" into your code.
Save the most common elements and use a loop.
common = word_counts.most_common(num)[0]
for i in range(3):
print("WORD: %s \nOCCURENCE: %d \n" % (common[i][0], common[i][1]))

Count the number of spaces between words in a string

I am doing this problem on Hackerrank,and I came up with the idea, which includes splitting the input and join it afterwards (see my implementation below). However, one of the test cases contains the input (hello< multiple spaces> world), which crashed my code because the input string has more than 1 space between each words. So, I am just wondering if anyone could please help me out fix my code, and I am just wondering how to count how many spaces(esp multiple spaces) in a string in Python. I found how to count spaces in Java, but not in Python. For testcase, I attached the pic.
Thanks in advance.
My implementation:
input_string = input()
splitter = input_string.split()
final = []
for i in range(0,len(splitter)):
for j in range(0,len(splitter[i])):
if(j==0):
final.append(splitter[i][j].upper())
else:
final.append(splitter[i][j])
# Assumed that there is one space btw each words
final.append(' ')
print(''.join(final))
For Test case pic,
You can fix it by splitting with pattern ' ' (whitespace)
splitter = input_string.split(' ')
You can also use .capitalize() method instead of splitting the token again
s = "hello world 4lol"
a = s.split(' ')
new_string = ''
for i in range(0, len(a)) :
new_string = a[i].capitalize() if len(new_string)==0 else new_string +' '+ a[i].capitalize()
print(new_string)
Output:
Hello World 4lol
For counting number of spaces between two words, you can use python's regular expressions module.
import re
s = "hello world loL"
tokens = re.findall('\s+', s)
for i in range(0, len(tokens)) :
print(len(tokens[i]))
Output :
7
2
What I suggest doing for the tutorial question is a quick simple solution.
s = input()
print(s.title())
str.title() will capitalise the starting letter of every word in a string.
Now to answer the question for counting spaces you can use str.count()) which will take a string and return the number of occurrences it finds.
s = 'Hello World'
s.count(' ')
There are various other methods as well, such as:
s = input()
print(len(s) - len(''.join(s.split())))
s2 = input()
print(len(s2) - len(s2.replace(' ', '')))
However count is easiest to implement and follow.
Now, count will return the total number, if you're after the number of spaces between each world.
Then something like this should suffice
s = input()
spaces = []
counter = 0
for char in s:
if char== ' ':
counter += 1
elif counter != 0:
spaces.append(counter)
counter = 0
print(spaces)
import re
line = "Hello World LoL"
total = 0
for spl in re.findall('\s+', line):
print len(spl)
total += len(spl) # 4, 2
print total # 6
>>> 4
>>> 2
>>> 6
For you problem with spaces
my_string = "hello world"
spaces = 0
for elem in my_string:
if elem == " ":
#space between quotes
spaces += 1
print(spaces)
you can use count() function to count repeat of a special character
string_name.count('character')
for count space you should :
input_string = input()
splitter = input_string.split()
final = []
for i in range(0, len(splitter)):
for j in range(0, len(splitter[i])):
if(j==0):
final.append(splitter[i][j].upper())
else:
final.append(splitter[i][j])
final.append(' ')
count = input_string.count(' ')
print(''.join(final))
print (count)
good luck
I solved that problem a time ago, just add " " (white space) to the split function and then print each element separated by a white space. Thats all.
for i in input().split(" "):
print(i.capitalize(), end=" ")
The result of the split function with "hello world lol" is
>>> "hello world lol".split(" ")
>>>['hello', '', '', '', 'world', '', '', '', 'lol']
Then print each element + a white space.
Forget the spaces they are not your problem.
You can reduce the string to just the words without the extra spaces using split(None) which will give you a word count and your string i.e.
>>> a = " hello world lol"
>>> b = a.split(None)
>>> len(b)
3
>>> print(" ".join(b))
hello world lol
Edit: After following your link to read the actual question, next time include the relevant details in your question, it makes it easier all round,
your issue still isn't counting the number of spaces, before, between or after the words. The answer that solves the specific task has already been provided, in the form of:
>>> a= " hello world 42 lol"
>>> a.title()
' Hello World 42 Lol'
>>>
See the answer provided by #Steven Summers
Approach
Given a string, the task is to count the number of spaces between words in a string.
Example:
Input: "my name is geeks for geeks"
Output: Spaces b/w "my" and "name": 1
Spaces b/w "name" and "is": 2
Spaces b/w "is" and "geeks": 1
Spaces b/w "geeks" and "for": 1
Spaces b/w "for" and "geeks": 1
Input: "heyall"
Output: No spaces
Steps to be performed
Input string from the user’s and strip the string for the removing unused spaces.
Initialize an empty list
Run a for loop from 0 till the length of the string
Inside for loop, store all the words without spaces
Again Inside for loop, for storing the actual Indexes of the words.
Outside for loop, print the number of spaces b/w words.
Below is the implementation of the above approach:
# Function to find spaces b/w each words
def Spaces(Test_string):
Test_list = [] # Empty list
# Remove all the spaces and append them in a list
for i in range(len(Test_string)):
if Test_string[i] != "":
Test_list.append(Test_string[i])
Test_list1=Test_list[:]
# Append the exact position of the words in a Test_String
for j in range(len(Test_list)):
Test_list[j] = Test_string.index(Test_list[j])
Test_string[j] = None
# Finally loop for printing the spaces b/w each words.
for i in range(len(Test_list)):
if i+1 < len(Test_list):
print(
f"Spaces b/w \"{Test_list1[i]}\" and \"{Test_list1[i+1]}\": {Test_list[i+1]-Test_list[i]}")
# Driver function
if __name__ == "__main__":
Test_string = input("Enter a String: ").strip() # Taking string as input
Test_string = Test_string.split(" ") # Create string into list
if len(Test_string)==1:
print("No Spaces")
else:
Spaces(Test_string) # Call function

function vowel_count that take a string as inputs count the number of occurrence and prints the occurrence

I need to print the number of vowel occurrences in the string. I am able to count and print them in one line but I am having issue to print 'a,e,i,o and u' respectively on occurrence. I am not allowed to use any built in function. Can some one please guide or let me know what I am missing. Below is my code.
vowels = 'aeiou'
def vowel_count(txt):
for vowel in vowels:
print (txt.count(vowel),end ='')
return
It will print the occurrence but I am not able to add anything in front of it. Lets say I pass le tour de france it should print
a,e,i,o and u appear , respectively ,1,3,0,1,1 times.
Please let me know if any thing is unclear, thanks.
Just print before and after the loop your top and tail text:
def vowel_count(txt):
print('a,e,i,o and u appear , respectively ', end='')
for vowel in vowels:
print(',', txt.count(vowel), sep='', end='')
print(' times')
>>> vowel_count('le tour de france')
a,e,i,o and u appear , respectively ,1,3,0,1,1 times
But isn't print a built in function? I'm not sure how you can complete this task without using any built in functions.
Using list comprehension, the following can be achieved:
vowels = 'aeiou'
def vowel_count(txt):
counts = map(txt.count, vowels)
return ", ".join(vowels) + " appear, respectively, " + ",".join(map(str, counts)) + " times"

Grab a keyword and the text between keywords in Python

Firt thing I'd like to say is this place has helped me more than I could ever repay. I'd like to say thanks to all that have helped me in the past :).
I am trying to devide up some text from a specific style message. It is formated like this:
DATA|1|TEXT1|STUFF: some random text|||||
DATA|2|TEXT1|THINGS: some random text and|||||
DATA|3|TEXT1|some more random text and stuff|||||
DATA|4|TEXT1|JUNK: crazy randomness|||||
DATA|5|TEXT1|CRAP: such random stuff I cant believe how random|||||
I have code shown below that combines the text adding a space between words and adds it to a string named "TEXT" so it looks like this:
STUFF: some random text THINGS: some random text and some more random text and stuff JUNK: crazy randomness CRAP: such random stuff I cant believe how random
I need it formated like this:
DATA|1|TEXT1|STUFF: |||||
DATA|2|TEXT1|some random text|||||
DATA|3|TEXT1|THINGS: |||||
DATA|4|TEXT1|some random text and|||||
DATA|5|TEXT1|some more random text and stuff|||||
DATA|6|TEXT1|JUNK: |||||
DATA|7|TEXT1|crazy randomness|||||
DATA|8|NEWTEXT|CRAP: |||||
DATA|9|NEWTEXT|such random stuff I cant believe how random|||||
The line numbers are easy, I have that done as well as the carraige returns. What I need is to grab "CRAP" and change the part that says "TEXT1" to "NEWTEXT".
My code scans the string looking for keywords then adds them to their own line then adds text below them followed by the next keyword on its own line etc. Here is my code I have so far:
#this combines all text to one line and adds to a string
while current_segment.move_next('DATA')
TEXT = TEXT + " " + current_segment.field(4).value
KEYWORD_LIST = [STUFF:', THINGS:', JUNK:']
KEYWORD_LIST1 = [CRAP:']
#this splits the words up to search through
TEXT_list = TEXT.split(' ')
#this searches for the first few keywords then stops at the unwanted one
for word in TEXT_list:
if word in KEYWORD_LIST:
my_output = my_output + word
elif word in KEYWORD_LIST1:
break
else:
my_output = my_output + ' ' + word
#this searches for the unwanted keywords leaving the output blank until it reaches the wanted keyword
for word1 in TEXT_list:
if word1 in KEYWORD_LIST:
my_output1 = ''
elif word1 in KEYWORD_LIST1:
my_output1 = my_output1 + word1 + '\n'
else:
my_output1 = my_output1 + ' ' + word1
#my_output is formatted back the way I want deviding up the text into 65 or less character lines
MAX_LENGTH = 65
my_wrapped_output = wrap(my_output,MAX_LENGTH)
my_wrapped_output1 = wrap(my_output1,MAX_LENGTH)
my_output_list = my_wrapped_output.split('\n')
my_output_list1 = my_wrapped_output1.split('\n')
for phrase in my_output_list:
if phrase == "":
SetID +=1
output = output + "DATA|" + str(SetID) + "|TEXT| |||||"
else:
SetID +=1
output = output + "DATA|" + str(SetID) + "|TEXT|" + phrase + "|||||"
for phrase2 in my_output_list1:
if phrase2 == "":
SetID +=1
output = output + "DATA|" + str(SetID) + "|NEWTEXT| |||||"
else:
SetID +=1
output = output + "DATA|" + str(SetID) + "|NEWTEXT|" + phrase + "|||||"
#this populates the fields I need
value = output
Then I format the "my_output" and "my_output1" adding the word "NEWTEXT" where it goes. This code runs through each line looking for the keyword then puts that keyword and a carraige return in. Once it gets the other "KEYWORD_LIST1" it stops and drops the rest of the text then starts the next loop. My problem is the above code gives my this:
DATA|1|TEXT1|STUFF: |||||
DATA|2|TEXT1|some random text|||||
DATA|3|TEXT1|THINGS: |||||
DATA|4|TEXT1|some random text and|||||
DATA|5|TEXT1|some more random text and stuff|||||
DATA|6|TEXT1|JUNK: |||||
DATA|7|TEXT1|crazy randomness|||||
DATA|8|NEWTEXT|crazy randomness|||||
DATA|9|NEWTEXT|CRAP: |||||
DATA|10|NEWTEXT|such random stuff I cant believe how random|||||
It is grabbing the text from before "KEYWORD_LIST1" and adding it into the NEWTEXT section. I know there is a way to make groups from the keyword and text after it but I am unclear on how to impliment it. Any help would be much appreciated.
Thanks.
This is what I had to do to get it to work for me:
KEYWORD_LIST = ['STUFF:', 'THINGS:', 'JUNK:']
KEYWORD_LIST1 = ['CRAP:']
def text_to_message(text):
result=[]
for word in text.split():
if word in KEYWORD_LIST or word in KEYWORD_LIST1:
if result:
yield ' '.join(result)
result=[]
yield word
else:
result.append(word)
if result:
yield ' '.join(result)
def format_messages(messages):
title='TEXT1'
for message in messages:
if message in KEYWORD_LIST:
title='TEXT1'
elif message in KEYWORD_LIST1:
title='NEWTEXT'
my_wrapped_output = wrap(message,MAX_LENGTH)
my_output_list = my_wrapped_output.split('\n')
for line in my_output_list:
if line = '':
yield title + '|'
else:
yield title + '|' + line
for line in format_messages(text_to_message(TEXT)):
if line = '':
SetID +=1
output = "DATA|" + str(SetID) + "|"
else:
SetID +=1
output = "DATA|" + str(SetID) + "|" + line
#this is needed instead of print(line)
value = output
General tip: Don't try to build up strings accretively like this:
my_output = my_output + ' ' + word
instead, make my_output a list, append word to the list, and
then, at the very end, do a single join: my_output = '
'.join(my_output). (See text_to_message code below for an example.)
Using join is the right way to build strings. Delaying the creation of the string is useful because processing lists of substrings is more pleasant than splitting and unsplitting strings, and having to add spaces and carriage returns here and there.
Study generators. They are easy to understand, and can help you a lot when processing text like this.
import textwrap
KEYWORD_LIST = ['STUFF:', 'THINGS:', 'JUNK:']
KEYWORD_LIST1 = ['CRAP:']
def text_to_message(text):
result=[]
for word in text.split():
if word in KEYWORD_LIST or word in KEYWORD_LIST1:
if result:
yield ' '.join(result)
result=[]
yield word
else:
result.append(word)
if result:
yield ' '.join(result)
def format_messages(messages):
title='TEXT1'
num=1
for message in messages:
if message in KEYWORD_LIST:
title='TEXT1'
elif message in KEYWORD_LIST1:
title='NEWTEXT'
for line in textwrap.wrap(message,width=65):
yield 'DATA|{n}|{t}|{l}'.format(n=num,t=title,l=line)
num+=1
TEXT='''STUFF: some random text THINGS: some random text and some more random text and stuff JUNK: crazy randomness CRAP: such random stuff I cant believe how random'''
for line in format_messages(text_to_message(TEXT)):
print(line)

Categories