Say I have a string, string = 'a'
I do string.split() and I get ['a']
I don't want this, I only want a list when I have whitespace in my string, ala string = 'a b c d'
So far, I've tried all the following with no luck:
>>> a = 'a'
>>> a.split()
['a']
>>> a = 'a b'
>>> a.split(' ')
['a', 'b']
>>> a = 'a'
>>> a.split(' ')
['a']
>>> import re
>>> re.findall(r'\S+', a)
['a']
>>> re.findall(r'\S', a)
['a']
>>> re.findall(r'\S+', a)
['a', 'b']
>>> re.split(r'\s+', a)
['a', 'b']
>>> a = 'a'
>>> re.split(r'\s+', a)
['a']
>>> a.split(" ")
['a']
>>> a = "a"
>>> a.split(" ")
['a']
>>> a.strip().split(" ")
['a']
>>> a = "a".strip()
>>> a.split(" ")
['a']
Am I crazy? I see no whitespace in the string "a".
>>> r"[^\S\n\t]+"
'[^\\S\\n\\t]+'
>>> print(re.findall(r'[^\S\n\t]+',a))
[]
What up?
EDIT
FWIW, this is how I got what I needed:
# test for linked array
if typename == 'org.apache.ctakes.typesystem.type.textsem.ProcedureMention':
for f in AnnotationType.all_features:
if 'Array' in f.rangeTypeName:
if attributes.get(f.name) and typesystem.get_type(f.elementType):
print([ int(i) for i in attributes[f.name].split() ])
and that is the end...
Split will always return a list, try this.
def split_it(s):
if len(s.split()) > 1:
return s.split()
else:
return s
The behavior of split makes sense, it always returns a list. Why not just check if the list length is 1?
def weird_split(a):
words = a.split()
if len(words) == 1:
return words[0]
return words
You could use the conditional expression to check for the presence of space, and use split only if a space is detected:
str1 = 'abc'
split_str1 = str1 if (' ' not in str1) else str1.split(' ')
print (split_str1)
str1 = 'ab c'
split_str1 = str1 if (' ' not in str1) else str1.split(' ')
print (split_str1)
This would give the output:
abc
['ab', 'c']
In python, we can use str.format to construct string like this:
string_format + value_of_keys = formatted_string
Eg:
FMT = '{name:} {age:} {gender}' # string_format
VoK = {'name':'Alice', 'age':10, 'gender':'F'} # value_of_keys
FoS = FMT.format(**VoK) # formatted_string
In this case, formatted_string = 'Alice 10 F'
I just wondering if there is a way to get the value_of_keys from formatted_string and string_format? It should be function Fun with
VoK = Fun('{name:} {age:} {gender}', 'Alice 10 F')
# the value of Vok is expected as {'name':'Alice', 'age':10, 'gender':'F'}
Is there any way to get this function Fun?
ADDED :
I would like to say, the '{name:} {age:} {gender}' and 'Alice 10 F' is just a simplest example. The realistic situation could be more difficult, the space delimiter may not exists.
And mathematically speaking, most of the cases are not reversible, such as:
FMT = '{key1:}{key2:}'
FoS = 'HelloWorld'
The VoK could be any one in below:
{'key1':'Hello','key2':'World'}
{'key1':'Hell','key2':'oWorld'}
....
So to make this question well defined, I would like to add two conditions:
1. There are always delimiters between two keys
2. All delimiters are not included in any value_of_keys.
In this case, this question is solvable (Mathematically speaking) :)
Another example shown with input and expected output:
In '{k1:}+{k2:}={k:3}', '1+1=2' Out {'k1':1,'k2':2, 'k3':3}
In 'Hi, {k1:}, this is {k2:}', 'Hi, Alice, this is Bob' Out {'k1':'Alice', 'k2':'Bob'}
You can indeed do this, but with a slightly different format string, called regular expressions.
Here is how you do it:
import re
# this is how you write your "format"
regex = r"(?P<name>\w+) (?P<age>\d+) (?P<gender>[MF])"
test_str = "Alice 10 F"
groups = re.match(regex, test_str)
Now you can use groups to access all the components of the string:
>>> groups.group('name')
'Alice'
>>> groups.group('age')
'10'
>>> groups.group('gender')
'F'
Regex is a very cool thing. I suggest you learn more about it online.
I wrote a funtion and it seems work:
import re
def Fun(fmt,res):
reg_keys = '{([^{}:]+)[^{}]*}'
reg_fmts = '{[^{}:]+[^{}]*}'
pat_keys = re.compile(reg_keys)
pat_fmts = re.compile(reg_fmts)
keys = pat_keys.findall(fmt)
lmts = pat_fmts.split(fmt)
temp = res
values = []
for lmt in lmts:
if not len(lmt)==0:
value,temp = temp.split(lmt,1)
if len(value)>0:
values.append(value)
if len(temp)>0:
values.append(temp)
return dict(zip(keys,values))
Usage:
eg1:
fmt = '{k1:}+{k2:}={k:3}'
res = '1+1=2'
print Fun(fmt,res)
>>>{'k2': '1', 'k1': '1', 'k': '2'}
eg2:
fmt = '{name:} {age:} {gender}'
res = 'Alice 10 F'
print Fun(fmt,res)
>>>
eg3:
fmt = 'Hi, {k1:}, this is {k2:}'
res = 'Hi, Alice, this is Bob'
print Fun(fmt,res)
>>>{'k2': 'Bob', 'k1': 'Alice'}
There is no way for python to determine how you created the formatted string once you get the new string.
For example: once your format "{something} {otherthing}" with values with space and you get the desired string, you can not differentiate whether the word with space was the part of {something} or {otherthing}
However you may use some hacks if you know about the format of the new string and there is consistency in the result.
For example, in your given example: if you are sure that you'll have word followed by space, then a number, then again a space and then a word, then you may use below regex to extract the values:
>>> import re
>>> my_str = 'Alice 10 F'
>>> re.findall('(\w+)\s(\d+)\s(\w+)', my_str)
[('Alice', '10', 'F')]
In order to get the desired dict from this, you may update the logic as:
>>> my_keys = ['name', 'age', 'gender']
>>> dict(zip(my_keys, re.findall('(\w+)\s(\d+)\s(\w+)', my_str)[0]))
{'gender': 'F', 'age': '10', 'name': 'Alice'}
I suggest another approach to this problem using **kwargs, such as...
def fun(**kwargs):
result = '{'
for key, value in kwargs.iteritems():
result += '{}:{} '.format(key, value)
# stripping the last space
result = result[:-1]
result += '}'
return result
print fun(name='Alice', age='10', gender='F')
# outputs : {gender:F age:10 name:Alice}
NOTE : kwargs is not an ordered dict, and will only keep the parameters order up to version 3.6 of Python. If order is something you with to keep, it is easy though to build a work-around solution.
This code produces strings for all the values, but it does split the string into its constituent components. It depends on the delimiter being a space, and none of the values containing a space. If any of the values contains a space this becomes a much harder problem.
>>> delimiters = ' '
>>> d = {k: v for k,v in zip(('name', 'age', 'gender'), 'Alice 10 F'.split(delimiters))}
>>> d
{'name': 'Alice', 'age': '10', 'gender': 'F'}
for your requirement, I have a solution.
This solution concept is:
change all delimiters to same delimiter
split input string by the same delimiter
get the keys
get the values
zip keys and values as dict
import re
from collections import OrderedDict
def Func(data, delimiters, delimiter):
# change all delimiters to delimiter
for d in delimiters:
data[0] = data[0].replace(d, delimiter)
data[1] = data[1].replace(d, delimiter)
# get keys with '{}'
keys = data[0].split(delimiter)
# if string starts with delimiter remove first empty element
if keys[0] == '':
keys = keys[1:]
# get keys without '{}'
p = re.compile(r'{([\w\d_]+):*.*}')
keys = [p.match(x).group(1) for x in keys]
# get values
vals = data[1].split(delimiter)
# if string starts with delimiter remove first empty element
if vals[0] == '':
vals = vals[1:]
# pack to a dict
result_1 = dict(zip(keys, vals))
# if you need Ordered Dict
result_2 = OrderedDict(zip(keys, vals))
return result_1, result_2
The usage:
In_1 = ['{k1}+{k2:}={k3:}', '1+2=3']
delimiters_1 = ['+', '=']
result = Func(In_1, delimiters_1, delimiters_1[0])
# Out_1 = {'k1':1,'k2':2, 'k3':3}
print(result)
In_2 = ['Hi, {k1:}, this is {k2:}', 'Hi, Alice, this is Bob']
delimiters_2 = ['Hi, ', ', this is ']
result = Func(In_2, delimiters_2, delimiters_2[0])
# Out_2 = {'k1':'Alice', 'k2':'Bob'}
print(result)
The output:
({'k3': '3', 'k2': '2', 'k1': '1'},
OrderedDict([('k1', '1'), ('k2', '2'), ('k3', '3')]))
({'k2': 'Bob', 'k1': 'Alice'},
OrderedDict([('k1', 'Alice'), ('k2', 'Bob')]))
try this :
import re
def fun():
k = 'Alice 10 F'
c = '{name:} {age:} {gender}'
l = re.sub('[:}{]', '', c)
d={}
for i,j in zip(k.split(), l.split()):
d[j]=i
print(d)
you can change the fun parameters as your wish and assign it to variables. It accepts the same string you want to give. and gives the dict like this:
{'name': 'Alice', 'age': '10', 'gender': 'F'}
I think the only right answer is that, what you are searching for isn't really possible generally after all. You just don't have enough information. A good example is:
#python 3
a="12"
b="34"
c="56"
string=f"{a}{b}{c}"
dic = fun("{a}{b}{c}",string)
Now dic might be {"a":"12","b":"34","c":"56"} but it might as well just be {"a":"1","b":"2","c":"3456"}. So any universal reversed format function would ultimately fail to this ambiguity. You could obviously force a delimiter between each variable, but that would defeat the purpose of the function.
I know this was already stated in the comments, but it should also be added as an answer for future visitors.
I do the following for replacing.
import fileinput
for line in fileinput.FileInput("input.txt",inplace=1):
line = line.replace("A","A'")
print line,
But I want to do it many replaces.
Replace A with A' , B with BB, C with CX, D with KK, etc.
I can of course do this by repeating the above code many times.
But I guess that will consume a lot of time especially when input.txt is large.
How can I do this elegantly?
Emphasis added
My input is not just a str ABCD.
I need to use input.txt as input and I want to replace every occurrences of A in input.txt to A', every occurrences in input.txt of B to BB, every occurrences of C in input.txt to CX, every occurrences of D in input.txt to KK.
Use a mapping dictionary:
>>> map_dict = {'A':"A'", 'B':'BB', 'C':'CX', 'D':'KK'}
>>> strs = 'ABCDEF'
>>> ''.join(map_dict.get(c,c) for c in strs)
"A'BBCXKKEF"
In Python3 use str.translate instead of str.join:
>>> map_dict = {ord('A'):"A'", ord('B'):'BB', ord('C'):'CX', ord('D'):'KK'}
>>> strs = 'ABCDEF'
>>> strs.translate(map_dict)
"A'BBCXKKEF"
Using regular expression:
>>> import re
>>>
>>> replace_map = {
... 'A': "A'",
... 'B': 'BB',
... 'C': 'CX',
... 'D': 'KK',
... 'EFG': '.',
... }
>>> pattern = '|'.join(map(re.escape, replace_map))
>>> re.sub(pattern, lambda m: replace_map[m.group()], 'ABCDEFG')
"A'BBCXKK."
I have the following string that I'm trying to parse into a dict of k,v
foo = "abc=foo.bazz; defg=6cab; rando=random; token=foobar"
I can achieve this with some really ugly code
foo_dict = {}
bar = foo.split(';')
for item in bar:
x = item.split('=')
foo_dict[x[0]] = x[1]
I'd much prefer this be a simple 1 line list comprehension.
What about using urlparse module:
In [1] import urlparse
In [2] foo = "abc=foo.bazz; defg=6cab; rando=random; token=foobar"
In [3]: urlparse.parse_qs('abc=foo.bazz; defg=6cab; rando=random; token=foobar')
Out[3]:
{' defg': ['6cab'],
' rando': ['random'],
' token': ['foobar'],
'abc': ['foo.bazz']}
In [4]: dict(urlparse.parse_qsl('abc=foo.bazz; defg=6cab; rando=random; token=foobar'))
Out[4]: {' defg': '6cab', ' rando': 'random', ' token': 'foobar', 'abc': 'foo.bazz'}
Not sure if you wanted those blank values in keys or not, but obviously easy to clean.
In [107]: foo = "abc=foo.bazz; defg=6cab; rando=random; token=foobar"
In [115]: dict(map(str.strip,x.split('=')) for x in foo.split(';'))
.....:
Out[115]: {'abc': 'foo.bazz', 'defg': '6cab', 'rando': 'random', 'token': 'foobar'}
dict(part.split("=") for part in foo.split(";"))
i think would work
>>> foo = "abc=foo.bazz; defg=6cab; rando=random; token=foobar"
>>> dict(part.split("=") for part in foo.split(";"))
{' token': 'foobar', 'abc': 'foo.bazz', ' defg': '6cab', ' rando': 'random'}
>>>
if you do part.strip().split("=") it may get rid of the extra spaces...
>>> dict(part.strip().split("=") for part in foo.split(";"))
{'token': 'foobar', 'abc': 'foo.bazz', 'defg': '6cab', 'rando': 'random'}
I have a bunch of strings:
"10people"
"5cars"
..
How would I split this to?
['10','people']
['5','cars']
It can be any amount of numbers and text.
I'm thinking about writing some sort of regex - however I'm sure there's an easy way to do it in Python.
>>> re.findall('(\d+|[a-zA-Z]+)', '12fgsdfg234jhfq35rjg')
['12', 'fgsdfg', '234', 'jhfq', '35', 'rjg']
Use the regex (\d+)([a-zA-Z]+).
import re
a = ["10people", "5cars"]
[re.match('^(\\d+)([a-zA-Z]+)$', x).groups() for x in a]
Result:
[('10', 'people'), ('5', 'cars')]
>>> re.findall("\d+|[a-zA-Z]+","10people")
['10', 'people']
>>> re.findall("\d+|[a-zA-Z]+","10people5cars")
['10', 'people', '5', 'cars']
In general, a split on /(?<=[0-9])(?=[a-z])|(?<=[a-z])(?=[0-9])/i separates a string that way.
>>> import re
>>> s = '10cars'
>>> m = re.match(r'(\d+)([a-z]+)', s)
>>> print m.group(1)
10
>>> print m.group(2)
cars
If you are like me and goes long loops around to avoid regexpes justbecause they are ugly, here is a non-regex approach:
data = "5people10cars"
numbers = "".join(ch if ch.isdigit() else "\n" for ch in data).split()
names = "".join(ch if not ch.isdigit() else "\n" for ch in data).split()
final = zip (numbers, names)
Piggybacking on jsbueno's idea, using str.translate, followed by split:
import string
allchars = ''.join(chr(i) for i in range(32,256))
digExtractTrans = string.maketrans(allchars, ''.join(ch if ch.isdigit() else ' ' for ch in allchars))
alpExtractTrans = string.maketrans(allchars, ''.join(ch if ch.isalpha() else ' ' for ch in allchars))
data = "5people10cars"
numbers = data.translate(digExtractTrans).split()
names = data.translate(alpExtractTrans).split()
You only need to create the translation tables once, then call translate and split as often as you want.