Adding 2 dictionary entries from one string - python

Suppose I have the following code:
new_dict = {}
text = "Yes: No Maybe: So"
I want to split the string up into 2 dictionary elements like so:
new_dict = {'Yes':'No', 'Maybe':'So'}
I tried to split the string up into a list in the same fashion to get a brief idea on how to do it, but I haven't had much success.

text = "Yes: No Maybe: So"
words = [w.rstrip(':') for w in text.split()]
new_dict = dict(zip(words[::2], words[1::2]))

If each colon is followed by a space, str.split() will work fine for you:
tokens = (s.rstrip(":") for s in text.split())
new_dict = dict(zip(tokens, tokens))

>>> import re
>>> text = "Yes: No Maybe: So"
>>> dict(re.findall(r'(\w+): (\w+)', text))
{'Maybe': 'So', 'Yes': 'No'}
or the more efficient:
>>> dict(m.groups() for m in re.finditer(r'(\w+): (\w+)', text))
{'Maybe': 'So', 'Yes': 'No'}

Related

split string into sentences everytime there is punctuation, with punctuation?

I would like to split a string into separate sentences in a list.
example:
string = "Hey! How are you today? I am fine."
output should be:
["Hey!", "How are you today?", "I am fine."]
You can use a built-in regular expression library.
import re
string = "Hey! How are you today? I am fine."
output = re.findall(".*?[.!\?]", string)
output>> ['Hey!', ' How are you today?', ' I am fine.']
Update:
You may use split() method but it'll not return the character used for splitting.
import re
string = "Hey! How are you today? I am fine."
output = re.split("!|?", string)
output>> ['Hey', ' How are you today', ' I am fine.']
If this works for you, you can use replace() and split().
string = "Hey! How are you today? I am fine."
output = string.replace("!", "?").split("?")
you can try
>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']
I find it in here
You can use the methode split()
import re
string = "Hey! How are you today? I am fine."
yourlist = re.split("!|?",string)
You don't need regex for this. Just create your own generator:
def split_punc(text):
punctuation = '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
# Alternatively, can use:
# from string import punctuation
j = 0
for i, x in enumerate(text):
if x in punctuation:
yield text[j:i+1]
j = i + 1
return text[j:i+1]
Usage:
list(split_punc(string))
# ['Hey!', ' How are you today?', ' I am fine.']

Removing punctuations and spaces in a string without using regex

I used import string and string.punctuation but I realized I still have '…' after conducting string.split(). I also get '', which I don't know why I would get it after doing strip(). As far as I understand, strip() removes the peripheral spaces, so if I have spaces between a string it would not matter:
>>> s = 'a dog barks meow! # … '
>>> s.strip()
'a dog barks meow! # …'
>>> import string
>>> k = []
>>> for item in s.split():
... k.append(item.strip(string.punctuation))
...
>>> k
['a', 'dog', 'barks', 'meow', '', '…']
I would like to get rid of '', '…', the final output I'd like is ['a', 'dog', 'barks', 'meow'].
I would like to refrain from using regex, but if that's the only solution I will consider it .. for now I'm more interested in solving this without resorting to regex.
You can remove punctuation by retaining only alphanumeric characters and spaces:
s = 'a dog barks meow! # …'
print(''.join(c for c in s if c.isalnum() or c.isspace()).split())
This outputs:
['a', 'dog', 'barks', 'meow']
I used the following:
s = 'a dog barks Meow! # … '
import string
p = string.punctuation+'…'
k = []
for item in s.split():
k.append(item.strip(p).lower())
k = [x for x in k if x]
building on the accepted answer to this question:
import itertools
k = []
for ok, grp in itertools.groupby(s, lambda c: c.isalnum()):
if ok:
k.append(''.join(list(grp)))
or the same as a one-liner (except for the import):
k = [''.join(list(grp)) for ok, grp in itertools.groupby(s, lambda c: c.isalnum()) if ok]
itertools.groupby() scans the string s as a list of characters, grouping them (grp) by the value (ok) of the lambda expression. The if ok filters out the groups not matching the lambda. The groups are iterators that have to be converted to a list of characters and then joined to get back the words.
The meaning of isalnum() is essentially “is alphanumeric”. Depending on your use case, you might prefer isalpha(). In both cases, for this input:
s = 'a 狗 barks meow! # …'
the output is
['a', '狗', 'barks', 'meow']
(For experts: this reminds us of the problem that not in all languages words are separated by non-word characters - e.g.)

Write a for loop to remove punctuation

I've been tasked with writing a for loop to remove some punctuation in a list of strings, storing the answers in a new list. I know how to do this with one string, but not in a loop.
For example: phrases = ['hi there!', 'thanks!'] etc.
import string
new_phrases = []
for i in phrases:
if i not in string.punctuation
Then I get a bit stuck at this point. Do I append? I've tried yield and return, but realised that's for functions.
You can either update your current list or append the new value in another list. the update will be better because it takes constant space while append takes O(n) space.
phrases = ['hi there!', 'thanks!']
i = 0
for el in phrases:
new_el = el.replace("!", "")
phrases[i] = new_el
i += 1
print (phrases)
will give output: ['hi there', 'thanks']
Give this a go:
import re
new_phrases = []
for word in phrases:
new_phrases.append(re.sub(r'[^\w\s]','', word))
This uses the regex library to turn all punctuation into a 'blank' string. Essentially, removing it
You can use re module and list comprehension to do it in single line:
phrases = ['hi there!', 'thanks!']
import string
import re
new_phrases = [re.sub('[{}]'.format(string.punctuation), '', i) for i in phrases]
new_phrases
#['hi there', 'thanks']
If phrases contains any punctuation then replace it with "" and append to the new_phrases
import string
new_phrases = []
phrases = ['hi there!', 'thanks!']
for i in phrases:
for pun in string.punctuation:
if pun in i:
i = i.replace(pun,"")
new_phrases.append(i)
print(new_phrases)
OUTPUT
['hi there', 'thanks']
Following your forma mentis, I'll do like this:
for word in phrases: #for each word
for punct in string.punctuation: #for each punctuation
w=w.replace(punct,'') # replace the punctuation character with nothing (remove punctuation)
new_phrases.append(w) #add new "punctuationless text" to your output
I suggest you using the powerful translate() method on each string of your input list, which seems really appropriate. It gives the following code, iterating over the input list throug a list comprehension, which is short and easily readable:
import string
phrases = ['hi there!', 'thanks!']
translationRule = str.maketrans({k:"" for k in string.punctuation})
new_phrases = [phrase.translate(translationRule) for phrase in phrases]
print(new_phrases)
# ['hi there', 'thanks']
Or to only allow spaces and letters:
phrases=[''.join(x for x in i if x.isalpha() or x==' ') for i in phrases]
Now:
print(phrases)
Is:
['hi there', 'thanks']
you should use list comprehension
new_list = [process(string) for string in phrases]

how to find all occurrences of a word using regex in python

how can I find all occurrences of the word "good " in the the following string using regex
f = "good and obj is a \good to look for it is good"
import re
regex = re.compile("good")
regex.findall("good and obj is a \good to look for it is good")
['good', 'good', 'good']
f = r"good and obj is a \good to look for it is good"
m = re.findall('good ', f)
will give you:
['good ', 'good ']
You should use a raw string when defining f, unless the \ is supposed to go with g
if you want to find all the 'good'
re.findll('good', f)
will work.
else
you can try this
import re
s = "good and obj is a \good to look for it is good"
s = re.sub(r'[^\/\s]good', '', s)
print re.findall(r'good', s)

Regex to separate Numeric from Alpha

I have a bunch of strings:
"10people"
"5cars"
..
How would I split this to?
['10','people']
['5','cars']
It can be any amount of numbers and text.
I'm thinking about writing some sort of regex - however I'm sure there's an easy way to do it in Python.
>>> re.findall('(\d+|[a-zA-Z]+)', '12fgsdfg234jhfq35rjg')
['12', 'fgsdfg', '234', 'jhfq', '35', 'rjg']
Use the regex (\d+)([a-zA-Z]+).
import re
a = ["10people", "5cars"]
[re.match('^(\\d+)([a-zA-Z]+)$', x).groups() for x in a]
Result:
[('10', 'people'), ('5', 'cars')]
>>> re.findall("\d+|[a-zA-Z]+","10people")
['10', 'people']
>>> re.findall("\d+|[a-zA-Z]+","10people5cars")
['10', 'people', '5', 'cars']
In general, a split on /(?<=[0-9])(?=[a-z])|(?<=[a-z])(?=[0-9])/i separates a string that way.
>>> import re
>>> s = '10cars'
>>> m = re.match(r'(\d+)([a-z]+)', s)
>>> print m.group(1)
10
>>> print m.group(2)
cars
If you are like me and goes long loops around to avoid regexpes justbecause they are ugly, here is a non-regex approach:
data = "5people10cars"
numbers = "".join(ch if ch.isdigit() else "\n" for ch in data).split()
names = "".join(ch if not ch.isdigit() else "\n" for ch in data).split()
final = zip (numbers, names)
Piggybacking on jsbueno's idea, using str.translate, followed by split:
import string
allchars = ''.join(chr(i) for i in range(32,256))
digExtractTrans = string.maketrans(allchars, ''.join(ch if ch.isdigit() else ' ' for ch in allchars))
alpExtractTrans = string.maketrans(allchars, ''.join(ch if ch.isalpha() else ' ' for ch in allchars))
data = "5people10cars"
numbers = data.translate(digExtractTrans).split()
names = data.translate(alpExtractTrans).split()
You only need to create the translation tables once, then call translate and split as often as you want.

Categories