I have a bunch of strings:
"10people"
"5cars"
..
How would I split this to?
['10','people']
['5','cars']
It can be any amount of numbers and text.
I'm thinking about writing some sort of regex - however I'm sure there's an easy way to do it in Python.
>>> re.findall('(\d+|[a-zA-Z]+)', '12fgsdfg234jhfq35rjg')
['12', 'fgsdfg', '234', 'jhfq', '35', 'rjg']
Use the regex (\d+)([a-zA-Z]+).
import re
a = ["10people", "5cars"]
[re.match('^(\\d+)([a-zA-Z]+)$', x).groups() for x in a]
Result:
[('10', 'people'), ('5', 'cars')]
>>> re.findall("\d+|[a-zA-Z]+","10people")
['10', 'people']
>>> re.findall("\d+|[a-zA-Z]+","10people5cars")
['10', 'people', '5', 'cars']
In general, a split on /(?<=[0-9])(?=[a-z])|(?<=[a-z])(?=[0-9])/i separates a string that way.
>>> import re
>>> s = '10cars'
>>> m = re.match(r'(\d+)([a-z]+)', s)
>>> print m.group(1)
10
>>> print m.group(2)
cars
If you are like me and goes long loops around to avoid regexpes justbecause they are ugly, here is a non-regex approach:
data = "5people10cars"
numbers = "".join(ch if ch.isdigit() else "\n" for ch in data).split()
names = "".join(ch if not ch.isdigit() else "\n" for ch in data).split()
final = zip (numbers, names)
Piggybacking on jsbueno's idea, using str.translate, followed by split:
import string
allchars = ''.join(chr(i) for i in range(32,256))
digExtractTrans = string.maketrans(allchars, ''.join(ch if ch.isdigit() else ' ' for ch in allchars))
alpExtractTrans = string.maketrans(allchars, ''.join(ch if ch.isalpha() else ' ' for ch in allchars))
data = "5people10cars"
numbers = data.translate(digExtractTrans).split()
names = data.translate(alpExtractTrans).split()
You only need to create the translation tables once, then call translate and split as often as you want.
Related
I'd like to split strings like these
'foofo21'
'bar432'
'foobar12345'
into
['foofo', '21']
['bar', '432']
['foobar', '12345']
Does somebody know an easy and simple way to do this in python?
I would approach this by using re.match in the following way:
import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
items = match.groups()
print(items)
>> ("foofo", "21")
def mysplit(s):
head = s.rstrip('0123456789')
tail = s[len(head):]
return head, tail
>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
Yet Another Option:
>>> [re.split(r'(\d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]
>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'
So, if you have a list of strings with that format:
import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]
Output:
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
I'm always the one to bring up findall() =)
>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(\w+?)(\d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
Note that I'm using a simpler (less to type) regex than most of the previous answers.
here is a simple function to seperate multiple words and numbers from a string of any length, the re method only seperates first two words and numbers. I think this will help everyone else in the future,
def seperate_string_number(string):
previous_character = string[0]
groups = []
newword = string[0]
for x, i in enumerate(string[1:]):
if i.isalpha() and previous_character.isalpha():
newword += i
elif i.isnumeric() and previous_character.isnumeric():
newword += i
else:
groups.append(newword)
newword = i
previous_character = i
if x == len(string) - 2:
groups.append(newword)
newword = ''
return groups
print(seperate_string_number('10in20ft10400bg'))
# outputs : ['10', 'in', '20', 'ft', '10400', 'bg']
import re
s = raw_input()
m = re.match(r"([a-zA-Z]+)([0-9]+)",s)
print m.group(0)
print m.group(1)
print m.group(2)
without using regex, using isdigit() built-in function, only works if starting part is text and latter part is number
def text_num_split(item):
for index, letter in enumerate(item, 0):
if letter.isdigit():
return [item[:index],item[index:]]
print(text_num_split("foobar12345"))
OUTPUT :
['foobar', '12345']
This is a little longer, but more versatile for cases where there are multiple, randomly placed, numbers in the string. Also, it requires no imports.
def getNumbers( input ):
# Collect Info
compile = ""
complete = []
for letter in input:
# If compiled string
if compile:
# If compiled and letter are same type, append letter
if compile.isdigit() == letter.isdigit():
compile += letter
# If compiled and letter are different types, append compiled string, and begin with letter
else:
complete.append( compile )
compile = letter
# If no compiled string, begin with letter
else:
compile = letter
# Append leftover compiled string
if compile:
complete.append( compile )
# Return numbers only
numbers = [ word for word in complete if word.isdigit() ]
return numbers
Here is simple solution for that problem, no need for regex:
user = input('Input: ') # user = 'foobar12345'
int_list, str_list = [], []
for item in user:
try:
item = int(item) # searching for integers in your string
except:
str_list.append(item)
string = ''.join(str_list)
else: # if there are integers i will add it to int_list but as str, because join function only can work with str
int_list.append(str(item))
integer = int(''.join(int_list)) # if you want it to be string just do z = ''.join(int_list)
final = [string, integer] # you can also add it to dictionary d = {string: integer}
print(final)
In Addition to the answer of #Evan
If the incoming string is in this pattern 21foofo then the re.match pattern would be like this.
import re
match = re.match(r"([0-9]+)([a-z]+)", '21foofo', re.I)
if match:
items = match.groups()
print(items)
>> ("21", "foofo")
Otherwise, you'll get UnboundLocalError: local variable 'items' referenced before assignment error.
I have a Python string
string = aaa1bbb1ccc1ddd
and I want to split it like this
re.split('[split at all occurrences of "1", unless the 1 is followed by a c]', string)
so that the result is
['aaa', 'bbb1ccc', 'ddd']
How do I do this?
Use negative-lookahead with regex and the re module:
>>> string = 'aaa1bbb1ccc1ddd'
>>> import re
>>> re.split(r"1(?!c)", string)
['aaa', 'bbb1ccc', 'ddd']
def split_by_delim_except(s, delim, bar):
escape = '\b'
find = delim + bar
return map(lambda s: s.replace(escape, find),
s.replace(find, escape).split(delim))
split_by_delim_except('aaa1bbb1ccc1ddd', '1', 'c')
Although not as pretty as regex, my following code returns the same result:
string = 'aaa1bbb1ccc1ddd'
Split the string at all instances of '1'
p1 = string.split('1')
Create a new empty list so we can append our desired items to
new_result = []
count = 0
for j in p1:
if j.startswith('c'):
# This removes the previous element from the list and stores it in a variable.
prev_element = new_result.pop(count-1)
prev_one_plus_j = prev_element + '1' + j
new_result.append(prev_one_plus_j)
else:
new_result.append(j)
count += 1
print (new_result)
Output:
['aaa', 'bbb1ccc', 'ddd']
I'm trying to find characters that are repeated 3 times or more, for example I want to take the following strings:
('aaa', 'buuuuut', 'oddddddddd')
and replace all occurrences of three or more of a letter with only one:
('a', 'but', 'od').
I've tried following code
s=re.sub(r'(\w)\3*',r'(\w)',s)
but it results in a compile error.
What regex do I need to use?
Look at this:
>>> mystr = 'buuuuuttttt'
>>> re.sub(r'(.)\1{2,}', r'\1', mystr)
'but'
>>> mystr = 'buttt'
>>> re.sub(r'(.)\1{2,}', r'\1', mystr)
'but'
>>>
>>> s = 'abbcccdddd'
>>> s = re.sub(r'(\w)\1(\1+)',r'\1',s)
>>> s
'abbcd'
Maybe try something like this:
s = re.sub(r'(\w)\1\1+', r'\1', s)
my_list = ['1\tMelkor\tMorgoth\tSauronAtDolGoldul','2\tThingols\tHeirIsDior\tSilmaril','3\tArkenstone\tIsProbablyA\tSilmaril']
I'm trying to split this list into sublists separated by \t
output = [['1','Melkor','Morgoth','SauronAtDolGoldul'],['2','Thigols','HeirIsDior','Silmaril'],['3','Arkenstone','IsProbablyA','Silmaril']]
I was thinking something on the lines of
output = []
for k_string in my_list:
temp = []
for i in k_string:
temp_s = ''
if i != '\':
temp_s = temp_s + i
elif i == '\':
break
temp.append(temp_s)
it gets messed up with the t . . i'm not sure how else I would go about doing it. I've seen people use .join for similar things but I don't really understand how to use .join
You want to use str.split(); a list comprehension lets you apply this to all elements in one line:
output = [sub.split('\t') for sub in my_list]
There is no literal \ in the string; the \t is an escape code that signifies the tab character.
Demo:
>>> my_list = ['1\tMelkor\tMorgoth\tSauronAtDolGoldul','2\tThingols\tHeirIsDior\tSilmaril','3\tArkenstone\tIsProbablyA\tSilmaril']
>>> [sub.split('\t') for sub in my_list]
[['1', 'Melkor', 'Morgoth', 'SauronAtDolGoldul'], ['2', 'Thingols', 'HeirIsDior', 'Silmaril'], ['3', 'Arkenstone', 'IsProbablyA', 'Silmaril']]
>>> import csv
>>> my_list = ['1\tMelkor\tMorgoth\tSauronAtDolGoldul','2\tThingols\tHeirIsDior\tSilmaril','3\tArkenstone\tIsProbablyA\tSilmaril']
>>> list(csv.reader(my_list, delimiter='\t'))
[['1', 'Melkor', 'Morgoth', 'SauronAtDolGoldul'], ['2', 'Thingols', 'HeirIsDior', 'Silmaril'], ['3', 'Arkenstone', 'IsProbablyA', 'Silmaril']]
Suppose I have the following code:
new_dict = {}
text = "Yes: No Maybe: So"
I want to split the string up into 2 dictionary elements like so:
new_dict = {'Yes':'No', 'Maybe':'So'}
I tried to split the string up into a list in the same fashion to get a brief idea on how to do it, but I haven't had much success.
text = "Yes: No Maybe: So"
words = [w.rstrip(':') for w in text.split()]
new_dict = dict(zip(words[::2], words[1::2]))
If each colon is followed by a space, str.split() will work fine for you:
tokens = (s.rstrip(":") for s in text.split())
new_dict = dict(zip(tokens, tokens))
>>> import re
>>> text = "Yes: No Maybe: So"
>>> dict(re.findall(r'(\w+): (\w+)', text))
{'Maybe': 'So', 'Yes': 'No'}
or the more efficient:
>>> dict(m.groups() for m in re.finditer(r'(\w+): (\w+)', text))
{'Maybe': 'So', 'Yes': 'No'}