If the verb ends in e, drop the e and add -ing.
I'm inputing a string (English verb). And my goal is to delete last char of the word if it's "e". And add 3 more characters "i","n" and "g".
I'd like to know how to delete the list object or if possible a string character. And how to switch a list into a string.
Currently im on.
if verb_list[-1] == ["e"]: #verb_list is a inputed string putted into a list
verb_list[-1] = "i"
verb_list.append("n")
verb_list.append("g")
This isnt a proper solution for me. I'd like to know how to delete for example [-1] element from list or from string. Also here im left with a list, and i want my output to be a string.
Thanks for any help!
You can use re.sub:
re.sub('e$', 'ing', s)
The $ in the regex matches the pattern only if it's at the end of a string.
Example usage:
import re
data = ['date', 'today', 'done', 'cereal']
print([re.sub('e$', 'ing', s) for s in data])
#['dating', 'today', 'doning', 'cereal']
I know the words in data aren't verbs but those were words off the top of my head.
This should suffice
if verb[-1]=='e':
verb = verb[:-1]+"ing"
For more about slicing in Python - Understanding slice notation
Try this:
li=list(verb)
if li[-1]=='e':
li[-1]='ing'
verb=''.join(li)
Related
I have tried things like this, but there is no change between the input and output:
def remove_al(text):
if text.startswith('ال'):
text.replace('ال','')
return text
text.replace returns the updated string but doesn't change it, you should change the code to
text = text.replace(...)
Note that in Python strings are "immutable"; there's no way to change even a single character of a string; you can only create a new string with the value you want.
If you want to only remove the prefix ال and not all of ال combinations in the string, I'd rather suggest to use:
def remove_prefix_al(text):
if text.startswith('ال'):
return text[2:]
return text
If you simply use text.replace('ال',''), this will replace all ال combinations:
Example
text = 'الاستقلال'
text.replace('ال','')
Output:
'استقل'
I would recommend the method str.lstrip instead of rolling your own in this case.
example text (alrashid) in Arabic: 'الرَشِيد'
text = 'الرَشِيد'
clean_text = text.lstrip('ال')
print(clean_text)
Note that even though arabic reads from right to left, lstrip strips the start of the string (which is visually to the right)
also, as user 6502 noted, the issue in your code is because python strings are immutable, thus the function was returning the input back
"ال" as prefix is quite complex in Arabic that you will need Regex to accurately separate it from its stem and other prefixes. The following code will help you isolate "ال" from most words:
import re
text = 'والشعر كالليل أسود'
words = text.split()
for word in words:
alx = re.search(r'''^
([وف])?
([بك])?
(لل)?
(ال)?
(.*)$''', word, re.X)
groups = [alx.group(1), alx.group(2), alx.group(3), alx.group(4), alx.group(5)]
groups = [x for x in groups if x]
print (word, groups)
Running that (in Jupyter) you will get:
I want to split a big string by word and that word is repeating in that big string.
Example what i expect :
We have tried to split a code, please check below
string.split("RFF+AAJ:")
So we need a bunch of list that i have described in my above screenshot.
You can get your result with the help of regex :-
import re
string = 'helloisworldisbyeishi'
re.split('(is)', string) # Splitting from 'is'
Output
['hello', 'is', 'world', 'is', 'bye', 'is', 'hi']
I hope it may help you.
split returns one single list with the complete string in it ( it is just split in parts ). So the list here contains the part before the first "RFF+AAJ:", then the part between the two "RFF+AAJ:"s and the last part, after the second "RFF+AAJ:". If you want to have three differrent lists use:
all = string.split("RFF+AAJ:")
first = all[0]
second = all[1]
third = all[2]
And the elements will be stored in first, second and third.
If you want to create lists, use first = list(first) # and so on.
Hope that helped.
I am making a dictionary application using argparse in Python 3. I'm using difflib to find the closest matches to a given word. Though it's a list, and it has newline characters at the end, like:
['hello\n', 'hallo\n', 'hell\n']
And when I put a word in, it gives a output of this:
hellllok could be spelled as hello
hellos
hillock
Question:
I'm wondering if there is a reverse or inverse \n so I can counteract these \n's.
Any help is appreciated.
There's no "reverse newline" in the standard character set but, even if there was, you would have to apply it to each string in turn.
And, if you can do that, you can equally modify the strings to remove the newline. In other words, create a new list using the current one, with newlines removed. That would be something like:
>>> oldlist = ['hello\n', 'hallo\n', 'hell\n']
>>> oldlist
['hello\n', 'hallo\n', 'hell\n']
>>> newlist = [s.replace('\n','') for s in oldlist]
>>> newlist
['hello', 'hallo', 'hell']
That will remove all newlines from each of the strings. If you want to ensure you only replace a single newline at the end of the strings, you can instead use:
newlist = [re.sub('\n$','',s) for s in oldlist]
In my homework there is question about write a function words_of_length(N, s) that can pick unique words with certain length from a string, but ignore punctuations.
what I am trying to do is:
def words_of_length(N, s): #N as integer, s as string
#this line i should remove punctuation inside the string but i don't know how to do it
return [x for x in s if len(x) == N] #this line should return a list of unique words with certain length.
so my problem is that I don't know how to remove punctuation , I did view "best way to remove punctuation from string" and relevant questions, but those looks too difficult in my lvl and also because my teacher requires it should contain no more than 2 lines of code.
sorry that I can't edit my code in question properly, it's first time i ask question here, there much i need to learn, but pls help me with this one. thanks.
Use string.strip(s[, chars])
https://docs.python.org/2/library/string.html
In you function replace x with strip (x, ['.', ',', ':', ';', '!', '?']
Add more punctuation if needed
First of all, you need to create a new string without characters you want to ignore (take a look at string library, particularly string.punctuation), and then split() the resulting string (sentence) into substrings (words). Besides that, I suggest using type annotation, instead of comments like those.
def words_of_length(n: int, s: str) -> list:
return [x for x in ''.join(char for char in s if char not in __import__('string').punctuation).split() if len(x) == n]
>>> words_of_length(3, 'Guido? van, rossum. is the best!'))
['van', 'the']
Alternatively, instead of string.punctuation you can define a variable with the characters you want to ignore yourself.
You can remove punctuation by using string.punctuation.
>>> from string import punctuation
>>> text = "text,. has ;:some punctuation."
>>> text = ''.join(ch for ch in text if ch not in punctuation)
>>> text # with no punctuation
'text has some punctuation'
I have a code where I extract bigrams from a large corpus, and concatenate/merge them to get unigrams. 'may', 'be' --> maybe. The corpus contains, of course, a lot of punctuations, but I also discovered that it contains other characters such as emojis... My plan was to put punctuations in a list, and if those characters are not in a line, print the line. Maybe I should change my approach and only print the lines ONLY containing letters and no other characters, since I don't know what kinds of characters are in the corpus. How can this be done? I do need to keep these other characters for the first part of the code, so that bigrams that don't actually exist are printed. The last lines of my code are at the moment:
counted = collections.Counter(grams)
for gram, count in sorted(counted.items()):
s = ''
print (s.join(gram))
And the output I get is:
!aku
!bet
!brå
!båda
These lines won't be of any use for me... Would really appreciate some help! :)
If you want to check that each string contains only letters you can probably use the isalpha() method.
>>> '!båda'.isalpha()
False
>>> 'båda'.isalpha()
True
As you can see from the example, this method should recognize any unicode letter, not just ascii.
To filter out strings that contain a non-letter character, the code can check for the existence of non-letter character in each string:
# coding=utf-8
import string
import unicodedata
source_strings = [u'aku', u'bet', u'brå', u'båda', u'!båda']
valid_chars = (set(string.ascii_letters))
valid_strings = [s for s in source_strings if
set(unicodedata.normalize('NFKD', s).encode('ascii', 'ignore')) <= valid_chars]
# valid_strings == [u'aku', u'bet', u'brå', u'båda']
# "båda" was not included.
You can use the unicodedata module to classify the characters:
import unicodedata
unigram= ''.join(gram)
if all(unicodedata.category(char)=='Ll' for char in unigram):
print(unigram)
If you want to remove from your lines only some characters, then you can filter with an easy replace your line before edit it:
sourceList = ['!aku', '!bet', '!brå', '!båda']
newList = []
for word in sourceList:
for special in ['!','&','å']:
word = word.replace(special,'')
newList.append(word)
Then you can do what is needed for your bigram exercise. Hope this help.
Second query: in case you have lots of characters then on your string you can use always the isalpha():
sourceList = ['!aku', '!bet', 'nor mal alpha', '!brå', '!båda']
newList = [word for word in sourceList if word.isalpha()]
In this case you will only check for characters. Hope this clarify second query.