Find word on given position in text - python

there is more elegant (pythonic + effective) way to find word on given position?
FIRST_WORD = re.compile(r'^(\w+)', re.UNICODE)
LAST_WORD = re.compile(r'(\w+)$', re.UNICODE)
def _get_word(self, text, position):
"""
Get word on given position
"""
assert position >= 0
assert position < len(text)
# get second part of word
# slice string and get first word
match = FIRST_WORD.search(text[position:])
assert match is not None
postfix = match.group(1)
# get first part of word, can be empty
# slice text and get last word
match2 = LAST_WORD.search(text[:position])
if match2 : prefix = match2.group(1)
else : prefix = ''
return prefix + postfix
# | 21.
>>> _get_word("Hello, my name is Earl.", 21)
Earl
>>> _get_word("Hello, my name is Earl.", 20)
Earl
Thanks

Here's how I'd do it:
s = "Hello, my name is Earl."
def get_word(text, position):
words = text.split()
characters = -1
for word in words:
characters += len(word)
if characters > = position:
return word
>>> get_word(s, 21)
Earl.
Stripping off the punctuation can be done with ''.strip() or regular expressions or something hacky like
for c in word:
final += c if c.lower() in 'abcdefghijklmnopqrstuvwxyz'

import string
s = "Hello, my name is Earl."
def get_word(text, position):
_, _, start = text[:position].rpartition(' ')
word,_,_ = text[position:].partition(' ')
return start+word
print get_word(s, 21).strip(string.punctuation)

The following solution is to get the alpha characters around the given position:
def get_word(text, position):
if position < 0 or position >= len(text):
return ''
str_list = []
i = position
while text[i].isalpha():
str_list.insert(0, text[i])
i -= 1
i = position + 1
while text[i].isalpha():
str_list.append(text[i])
i += 1
return ''.join(str_list)
The following is a test case:
get_word("Hello, my name is Earl.", 21) # 'Earl'
get_word("Hello, my name is Earl.", 20) # 'Earl'
I don't think it is a good idea to split the text into words with the split function here, because position is essential for this problem. If there are continuous blanks in a text, the split function may cause troubles.

Related

Find words in a string of text (where letters aren't consecutive)

I'd like write code to find specific instances of words in a long string of text, where the letters making up the word are not adjacent, but consecutive.
The string I use will be thousands of characters long, but a as a shorter example... If I want to find instances of the word "chair" within the following string, where each letter is no more than 10 characters from the previous.
djecskjwidhl;asdjakimcoperkldrlkadkj
To avoid the problem of finding many instances in a large string, I'd prefer to limit the distance between every two letters to 10. So the word chair in the string abcCabcabcHabcAabdIabcR would count. But the word chair in the string abcCabcabcabcabcabcabcabcabHjdkeAlcndInadhR would not count.
Can I do this with python code? If so I'd appreciate an example that I could work with.
Maybe paste the string of text or use an input file? Have it search for the word or words I want, and then identify if those words are there?
Thanks.
This code below will do what you want:
will_find = "aaaaaaaaaaaaaaaaaaaaaaaabcCabcabcHabcAabdIabcR"
wont_find = "abcCabcabcabcabcabcabcabcabHjdkeAlcndInadhR"
looking_for = "CHAIR"
max_look = 10
def find_word(characters, word):
i = characters.find(word[0])
if i == -1:
print("I couldnt find the first character ...")
return False
for symbol in word:
print(characters[i:i + max_look+1])
if symbol in characters[i:i + max_look+1]:
i += characters[i: i + max_look+1].find(symbol)
print("{} is in the range of {} [{}]".format(symbol, characters[i:i+ max_look], i))
continue
else:
print("Couldnt find {} in {}".format(symbol, characters[i: i + max_look]))
return False
return True
find_word(will_find, looking_for)
print("--------")
find_word(wont_find, looking_for)
An alternative, this may also work for you.
long_string = 'djecskjwidhl;asdjakimcoperkldrlkadkj'
check_word = 'chair'
def substringChecker(longString, substring):
starting_index = []
n , derived_word = 0, substring[0]
for i, char in enumerate(longString[:-11]):
if char == substring[n] and substring[n + 1] in longString[i : i + 11]:
n += 1
derived_word += substring[n]
starting_index.append(i)
if len(derived_word) == len(substring):
return derived_word == substring, starting_index[0]
return False
print(substringChecker(long_string, check_word))
(True, 3)
To check if the word is there:
string = "abccabcabchabcaabdiabcr"
word = "chair"
while string or word:
index = string[:10].find(word[0])
if index > -1:
string = string[index+1:]
word = word[1:]
continue
if not word:
print("found")
else:
break

How many times a different word appear in a string - python

For example I have GolDeNSanDyWateRyBeaChSand and I need to find how many times the word sand appears.
text = input()
text = text.lower()
count = 0
if "sand" in text:
count += 1
print(count)
But the problem is that there is 2 sand in this string and when it found the first one it stops. Im a beginner in the programming.
You can simply use the str.count() method to count how many times a string appears in another string.
text = input()
text = text.lower()
count = text.count("sand")
To find every occurrence of a string pattern inside another string s, even nested occurrences, do the following:
s = "sandsandssands" # your string here
pattern = "sands" # your pattern here
pos = -1
end_of_string = False
while not end_of_string:
pos = s.find(pattern, pos+1)
print(pos)
end_of_string = (pos == -1)
Output
0
4
9
-1
Extending the solution offered by the OP.
The idea is to use find and move to towards the end of the string.
It is clear that count can be used here and the solution below is for educational purpose.
text = 'GolDeNSanDyWateRyBeaChSand'
word = 'sand'
ltext = text.lower()
offset = 0
counter = 0
while True:
idx = ltext.find(word, offset)
if idx == -1:
break
else:
counter += 1
offset = idx + len(word)
print(f'The word {word} was found {counter} times')
output
The word sand was found 2 times

How to make a function to find duplicacy in a character string?

Here is the code I have written to find duplicate characters and then replace them by ')' and original characters by '(' in a string and it should ignore capitalization.
def duplicate_finder(word):
word1 = word.lower();
w = list(word1);
w1 = '';
for i in range(0, len(word1)):
if ([v in word1.replace(w[i], '') for v in w[i]]==[True]):
w1 += ')';
else:
w1 += '(';
return (w1)
But this function always returns me '((((((...((' [till the number of characters in the input string]. Can someone please point me the fault in my code!!!
Thanks in advance.
The loop you run always gives False because: word1.replace(w[i],'') replaces all instances of w[i] in word1. So when you look for v in word1.replace(w[i],''), it doesn't find any as you relaced all of them. This calls w1 += '(' everytime!
You can do :
>>> def duplicate_finder(word):
... word1 = word.lower();
... w = list(word1);
... w1 = '';
... for i in range(0, len(word1)):
... if ([v in word1[:i]+word1[i+1:] for v in w[i]]==[True]):
... w1 += ')';
... else:
... w1 += '(';
... return (w1)
...
>>> duplicate_finder('hello')
'(())('
I would do it some other way, involving dictionary keeping counts to get true O(n) algo
Here's one way (assuming I understand the question):
def duplicate_finder(word):
word1 = word.lower();
for c in word1:
# If more that one occurence of c
if 1 != word1.count(c):
# Replace all c with (
word1 = word1.replace(c, '(')
# Only one occurence
else:
word1 = word1.replace(c, ')')
return word1
def duplicate_finder(word):
word = word.lower()
l = len(word)
for i in range(l):
index = word.find(word[i], i+1)
if index != -1 and word[i] !=')':
word = word.replace(word[i], '(', 1)
word = word.replace(word[index], ')', 1)
return (word)
Test:
I gave input as "Sanjana"
Output screenshot with steps of replacement
It resulted in s((j))a
Note:
word[i] != ')' check is necessary as there is a possibility of replacement of already existing ) in the unseen segment of the string, and thus can produce weird output
Edit
def duplicate_finder(word):
word = word.lower()
l = len(word)
for i in range(l):
index = word.find(word[i], i+1)
if word[i] not in [')', '('] :
if index != -1:
word = word.replace(word[i], ')')
else:
word = word.replace(word[i], '(')
return (word)
def duplicate_finder(word):
word1 = word.lower();
w1 = '';
length = len(word1)
for i in range(0, length):
w2 = word1[i]
if(word1[i] != ")"):
word1 = word1.replace(word1[i],"(",1)
for v in range(i+1,length):
if(word1[v] != ")" and word1[v] != "("):
if (word1[v] == w2):
word1 = word1.replace(w2,")")
return (word1)
Here is a possible solution:
def duplicate_finder(word):
word1 = word.lower()
w1 = ''
found_chars= set([])
for c in word1:
if c in found_chars:
w1+=')'
else:
found_chars.add(c)
w1+='('
print w1
#Satya, I have used the concept of Counter container of collections module in Python to solve your problem.
A Counter is a subclass of dict. Therefore it is an unordered collection where elements and their respective count are stored as dictionary. This is equivalent to bag or multiset of other languages.
Note: Do not forget to check the References for Counter which is give at very bottom of this answer and comment if find any difficulty.
Have a look at the below code.
"""
StkOvrFlw link: https://stackoverflow.com/questions/50485559/how-to-make-a-function-to-find-duplicacy-in-a-character-string
Aim: [
'1. Here, original character means the character which '
'is first time appearing in the string'
'2. Replacing original character with => ('
'3. If there are more occurences of original character'
then replace them with => )'
]
References: http://www.pythonforbeginners.com/collection/python-collections-counter
"""
from collections import Counter
# Code
def duplicate_finder(word):
word = word.lower()
i = 1;
for ch, count in Counter(word).items():
# print '(', i, ') Original character: \'', ch, '\'with', count - 1, 'more occurence(s)'
if count == 1:
word = word.replace(ch, '(') # Only 1 occurence of original character
else:
l = list(word)
l[word.find(ch)] = '(' # Replace original character with (
word = ''.join(l)
word = word.replace(ch, ')') # Replace other occurences of original character with )
# print 1, 'occurence of \'', ch, '\' replaced with \'(\' and remaining ', count - 1, ' occurence(s) with \')\''
# print 'Iteration ', i, ' gives: ', word, '\n'
i += 1
return word
# Test case 1
print "I/P: abaccccsgfsyetgdggdh"
print "O/P: ", duplicate_finder('abaccccsgfsyetgdggdh')
"""
I/P: abaccccsgfsyetgdggdh
O/P: (()()))((()((()()))(
"""
# Test case 2
print "\nI/P: AAABBBCCC34519543absd67das1729"
print "O/P: ", duplicate_finder('AAABBBCCC34519543absd67das1729')
"""
I/P: AAABBBCCC34519543absd67das1729
O/p: ())())())((((()))))(((()))))()
"""
References: You can find nice articles on Counter container of Python at:
http://www.pythonforbeginners.com/collection/python-collections-counter and
https://www.geeksforgeeks.org/counters-in-python-set-1/
# Find duplicate characters in a string by following conditions:
# - the first (original) character will replaced by '('
# - all others matches will replaced by ')'
# - all in a string with ignore capitalization
def duplicate_finder(word):
s = word.lower()
for ch in s:
if s.count(ch) > 1: # is there more copies of this one character ?
s = s.replace(ch, '(', 1).replace(ch, ')') # the first match will replaced by '(' character, and then all other matches will replaced by ')' character
return s
print 'result: ' + duplicate_finder('hello world') # result: he()( w)r)d
# 0123456789A 0123456789A
New code (by comment from Satya - at 23:26):
# Find duplicate characters in the string by following conditions:
# - all single characters will replaced by '('
# - all multiple characters (duplicates) will replaced by ')'
# - input string must ignore capitalization
def duplicate_finder(word):
s = word.lower()
for ch in s:
if s.count(ch) > 1: # is there more copies of this one character ?
s = s.replace(ch, ')' ) # replace all matched ch by ')'
else:
s = s.replace(ch, '(', 1) # replace this one matched ch by '(' - there's only one character
return s
print 'result: ' + duplicate_finder('hello world') # result: (()))(()()(
# 0123456789A 0123456789A

Python: Delete letters in word which occur more than N times consecutively

Let's say that I have a sentence:
sentence = "Eveeery mondayyy I waaake upp"
I would like to create a function which deletes all letters which occur more than N times consecutively in a word.
So, if I say: N = 2
the result should be:
result = Eveery mondayy I waake upp
How can I do this in an efficient way ?
re.sub() solution:
import re
def remove_continued_char(s, n):
pat = re.compile(r'([a-z])(\1{' + str(n) + '})')
return pat.sub('\\2', s)
sentence = 'Eveeery mondayyy I waaake upp'
print(remove_continued_char(sentence, 2))
The output:
Eveery mondayy I waake upp
[a-z] - match only alphabetic characters(letters)
\1 - backreference to the 1st captured group i.e. ([a-z])
\\2 - points to the 2nd captured(parenthesized) group value
To give you with a good start :
Just posting a sample which might help you :
import re
regex = r"(.)\1+"
test_str = "sentence = Eveeery mondayyy I waaake upp"
# use \\1\\1 if you need to replace with two characters and so on
subst = "\\1"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0)
if result:
print (result)
Output :
>>>Every monday I wake up
Hope this helps
You must iterate over the letters of the sentence while keping track of the previous letter, and how many times it was seen.
def del_n(n, s):
so_far = 1
previous = s[0]
res = [s[0]]
for idx, c in enumerate(s[1:]):
if c == previous:
so_far += 1
if so_far >= n+1:
continue
else:
previous = c
so_far = 1
res.append(c)
return ''.join(res)
sentence = "Eveeery mondayyy I waaake upp"
del_n(2, sentence)
output:
'Eveery mondayy I waake upp'
You can try this with a function without importing any external module:
sentence = "Eveeery mondayyy I waaake upp"
def no_dublicate(senten,N):
final=[]
for word in senten.split():
track=[]
for chara in word:
track.append(chara)
if track.count(chara)>N:
track.remove(chara)
final.append(track)
return ' '.join(["".join(item) for item in final])
print(no_dublicate(sentence,2))
output:
Eveery mondayy I waake upp

Function that give the number of words

I'm doing a function that give the number of words in a sentence.
Example: " Hello L World " there are 3 "words" (A letter is counted like a word).
Here is my code:
def number_of_word(s):
"""
str -> int
"""
# i : int
i = 0
# nb_word : int
nb_word = 0
if s == "":
return 0
else:
while i < len(s)-1:
if ((s[i] != " ") and (s[i+1] == " ")):
nb_word = nb_word + 1
i = i + 1
else:
i = i + 1
if s[len(s)-1] != " ":
nb_word = nb_word + 1
return nb_word
else:
return nb_word
I tried my function and I think it works. But, I also think there is a better way to do a function that do the same thing in an easier way.
Can you tell me if you know one better function? Or any comments on mine?
I hade to use:
if s == "":
return 0
else:
...........
because if I didn't, my function didn't work for number_of_word("")
If you define words as character sequences separated by one or more whitespaces, then you can simply use the split method of strings to split to words,
and then len to get their count:
def number_of_word(s):
return len(s.split())
From the documentation (emphasis mine):
split(...) method of builtins.str instance
S.split(sep=None, maxsplit=-1) -> list of strings
Return a list of the words in S, using sep as the delimiter string.
If maxsplit is given, at most maxsplit splits are done. If sep is not
specified or is None, any whitespace string is a separator and empty
strings are removed from the result.
If you want you can use RegExp
import re
def number_of_word(s):
pattern = r'\b\w+\b'
return len(re.findall(pattern, s))
If you can't use split or regex, I think this is the right solution:
def count(sentence):
wlist = []
word = ""
for c in sentence:
if c == " ":
wlist.append(word)
word = ""
else:
word += c
wlist.append(word)
return len(wlist)
You can use split() method :
def count(string1):
string1=string1.split()
return len(string1)
print(count(" Hello L World "))
output:
3
I can't use more python' functions than just "len".
So, I can't use split or RegExp.
So I want to make this function with just basic code and len.
Well, since the requirements were published, here's a way to do this without calling any library functions:
def count_words(sentence):
count, white = 0, True
for character in sentence:
if character not in " \t\n\r":
if white:
count += 1
white = False
else:
white = True
return count

Categories