Inverse of string format in python - python

In python, we can use str.format to construct string like this:
string_format + value_of_keys = formatted_string
Eg:
FMT = '{name:} {age:} {gender}' # string_format
VoK = {'name':'Alice', 'age':10, 'gender':'F'} # value_of_keys
FoS = FMT.format(**VoK) # formatted_string
In this case, formatted_string = 'Alice 10 F'
I just wondering if there is a way to get the value_of_keys from formatted_string and string_format? It should be function Fun with
VoK = Fun('{name:} {age:} {gender}', 'Alice 10 F')
# the value of Vok is expected as {'name':'Alice', 'age':10, 'gender':'F'}
Is there any way to get this function Fun?
ADDED :
I would like to say, the '{name:} {age:} {gender}' and 'Alice 10 F' is just a simplest example. The realistic situation could be more difficult, the space delimiter may not exists.
And mathematically speaking, most of the cases are not reversible, such as:
FMT = '{key1:}{key2:}'
FoS = 'HelloWorld'
The VoK could be any one in below:
{'key1':'Hello','key2':'World'}
{'key1':'Hell','key2':'oWorld'}
....
So to make this question well defined, I would like to add two conditions:
1. There are always delimiters between two keys
2. All delimiters are not included in any value_of_keys.
In this case, this question is solvable (Mathematically speaking) :)
Another example shown with input and expected output:
In '{k1:}+{k2:}={k:3}', '1+1=2' Out {'k1':1,'k2':2, 'k3':3}
In 'Hi, {k1:}, this is {k2:}', 'Hi, Alice, this is Bob' Out {'k1':'Alice', 'k2':'Bob'}

You can indeed do this, but with a slightly different format string, called regular expressions.
Here is how you do it:
import re
# this is how you write your "format"
regex = r"(?P<name>\w+) (?P<age>\d+) (?P<gender>[MF])"
test_str = "Alice 10 F"
groups = re.match(regex, test_str)
Now you can use groups to access all the components of the string:
>>> groups.group('name')
'Alice'
>>> groups.group('age')
'10'
>>> groups.group('gender')
'F'
Regex is a very cool thing. I suggest you learn more about it online.

I wrote a funtion and it seems work:
import re
def Fun(fmt,res):
reg_keys = '{([^{}:]+)[^{}]*}'
reg_fmts = '{[^{}:]+[^{}]*}'
pat_keys = re.compile(reg_keys)
pat_fmts = re.compile(reg_fmts)
keys = pat_keys.findall(fmt)
lmts = pat_fmts.split(fmt)
temp = res
values = []
for lmt in lmts:
if not len(lmt)==0:
value,temp = temp.split(lmt,1)
if len(value)>0:
values.append(value)
if len(temp)>0:
values.append(temp)
return dict(zip(keys,values))
Usage:
eg1:
fmt = '{k1:}+{k2:}={k:3}'
res = '1+1=2'
print Fun(fmt,res)
>>>{'k2': '1', 'k1': '1', 'k': '2'}
eg2:
fmt = '{name:} {age:} {gender}'
res = 'Alice 10 F'
print Fun(fmt,res)
>>>
eg3:
fmt = 'Hi, {k1:}, this is {k2:}'
res = 'Hi, Alice, this is Bob'
print Fun(fmt,res)
>>>{'k2': 'Bob', 'k1': 'Alice'}

There is no way for python to determine how you created the formatted string once you get the new string.
For example: once your format "{something} {otherthing}" with values with space and you get the desired string, you can not differentiate whether the word with space was the part of {something} or {otherthing}
However you may use some hacks if you know about the format of the new string and there is consistency in the result.
For example, in your given example: if you are sure that you'll have word followed by space, then a number, then again a space and then a word, then you may use below regex to extract the values:
>>> import re
>>> my_str = 'Alice 10 F'
>>> re.findall('(\w+)\s(\d+)\s(\w+)', my_str)
[('Alice', '10', 'F')]
In order to get the desired dict from this, you may update the logic as:
>>> my_keys = ['name', 'age', 'gender']
>>> dict(zip(my_keys, re.findall('(\w+)\s(\d+)\s(\w+)', my_str)[0]))
{'gender': 'F', 'age': '10', 'name': 'Alice'}

I suggest another approach to this problem using **kwargs, such as...
def fun(**kwargs):
result = '{'
for key, value in kwargs.iteritems():
result += '{}:{} '.format(key, value)
# stripping the last space
result = result[:-1]
result += '}'
return result
print fun(name='Alice', age='10', gender='F')
# outputs : {gender:F age:10 name:Alice}
NOTE : kwargs is not an ordered dict, and will only keep the parameters order up to version 3.6 of Python. If order is something you with to keep, it is easy though to build a work-around solution.

This code produces strings for all the values, but it does split the string into its constituent components. It depends on the delimiter being a space, and none of the values containing a space. If any of the values contains a space this becomes a much harder problem.
>>> delimiters = ' '
>>> d = {k: v for k,v in zip(('name', 'age', 'gender'), 'Alice 10 F'.split(delimiters))}
>>> d
{'name': 'Alice', 'age': '10', 'gender': 'F'}

for your requirement, I have a solution.
This solution concept is:
change all delimiters to same delimiter
split input string by the same delimiter
get the keys
get the values
zip keys and values as dict
import re
from collections import OrderedDict
def Func(data, delimiters, delimiter):
# change all delimiters to delimiter
for d in delimiters:
data[0] = data[0].replace(d, delimiter)
data[1] = data[1].replace(d, delimiter)
# get keys with '{}'
keys = data[0].split(delimiter)
# if string starts with delimiter remove first empty element
if keys[0] == '':
keys = keys[1:]
# get keys without '{}'
p = re.compile(r'{([\w\d_]+):*.*}')
keys = [p.match(x).group(1) for x in keys]
# get values
vals = data[1].split(delimiter)
# if string starts with delimiter remove first empty element
if vals[0] == '':
vals = vals[1:]
# pack to a dict
result_1 = dict(zip(keys, vals))
# if you need Ordered Dict
result_2 = OrderedDict(zip(keys, vals))
return result_1, result_2
The usage:
In_1 = ['{k1}+{k2:}={k3:}', '1+2=3']
delimiters_1 = ['+', '=']
result = Func(In_1, delimiters_1, delimiters_1[0])
# Out_1 = {'k1':1,'k2':2, 'k3':3}
print(result)
In_2 = ['Hi, {k1:}, this is {k2:}', 'Hi, Alice, this is Bob']
delimiters_2 = ['Hi, ', ', this is ']
result = Func(In_2, delimiters_2, delimiters_2[0])
# Out_2 = {'k1':'Alice', 'k2':'Bob'}
print(result)
The output:
({'k3': '3', 'k2': '2', 'k1': '1'},
OrderedDict([('k1', '1'), ('k2', '2'), ('k3', '3')]))
({'k2': 'Bob', 'k1': 'Alice'},
OrderedDict([('k1', 'Alice'), ('k2', 'Bob')]))

try this :
import re
def fun():
k = 'Alice 10 F'
c = '{name:} {age:} {gender}'
l = re.sub('[:}{]', '', c)
d={}
for i,j in zip(k.split(), l.split()):
d[j]=i
print(d)
you can change the fun parameters as your wish and assign it to variables. It accepts the same string you want to give. and gives the dict like this:
{'name': 'Alice', 'age': '10', 'gender': 'F'}

I think the only right answer is that, what you are searching for isn't really possible generally after all. You just don't have enough information. A good example is:
#python 3
a="12"
b="34"
c="56"
string=f"{a}{b}{c}"
dic = fun("{a}{b}{c}",string)
Now dic might be {"a":"12","b":"34","c":"56"} but it might as well just be {"a":"1","b":"2","c":"3456"}. So any universal reversed format function would ultimately fail to this ambiguity. You could obviously force a delimiter between each variable, but that would defeat the purpose of the function.
I know this was already stated in the comments, but it should also be added as an answer for future visitors.

Related

Define a function to generate a dictionary

def create_dictionary(params)
How do I generate a dictionary from a string value which comes
from its parameter named params. The dictionary is created with the use of zip function to handle the keys and values which are generated from params
How do I make the output of this:
print(create_dictionary('name:= Jack ; grade:=3'))
To be like this:
{'name': 'Jack', 'grade': '3'}
def create_dictionary(string1):
s = string1.split(';')
dic = {}
for i in s:
key, value = i.split(':=')
dic[key] = value
return dic
If the string's syntax is same all the time, you can split the string accordingly. In this case:
First split them with ; to separate dictionary values.
Then split rest with := to get the key and the value. You may want to trim the resulting key and value to remove the white spaces around.
Note: If data itself contains the ; or :=, this solution will fail.
Edit
In addition to the Daniel Paul's answer for removing the white spaces:
def create_dictionary(string1):
s = string1.split(';')
dic = {}
for i in s:
key, value = i.split(':=')
dic[key.strip()] = value.strip()
return dic
Before: {'name': ' Jack ', ' grade': '3'} after {'name': 'Jack', 'grade': '3'}
One-liner using python's dict comprehension:
def create_dictionary(string):
{a.strip():b.strip() for a,b in [i.split(':=') for i in string.split(';')]}
You can use the split method and split := or you can follow up my code
def dict_fun(*args):
dict = {}
cnt = 0
count = 0
while cnt < len(args):
for i in args[cnt]:
if i == ':':
print(count)
dict[args[cnt][0:count]] = args[cnt].strip().replace(' ','')[count+2:]
break
else:
count += 1
cnt += 1
count = 0
return(dict)
print(dict_fun('name:=Jack', 'grade:=3'))
OUTPUT
{'name': 'Jack', 'grade': '3'}
Now you can give as many argument as many you want inside dict_fun('name:=jack','grade:=3',etc..) so as of taking single string now you can give mutiple string argument to the function and it would return you the dict of that string..

Values in python dictionary getting converted to double quotes rather than single quotes

I have a dictionary with the below values :-
test_dict = {'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', '3.3.3.3:3333,4.4.4.4:4444', '5.5.5.5:5555']}
I need to replace the comma (,) between 3.3.3.3:3333 and 4.4.4.4:4444 with (',) which is (single quote comma space) like that of the others.
I tried the code below but the output is coming with double quotes (")
val = ','
valnew = '\', \'' # using escape characters - all are single quotes
for k, v in test_dict.items():
for i, s in enumerate(v):
if val in s:
v[i] = s.replace(val, valnew)
print(test_dict)
Output:
{'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', "3.3.3.3:3333', '4.4.4.4:4444", '5.5.5.5:5555']}
Expected Output:
{'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', '3.3.3.3:3333', '4.4.4.4:4444', '5.5.5.5:5555']}
Please suggest.
print is displaying a representation of the dict, as if print(repr(test_dict)) was called.
[repr returns] a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval() ..
Since the value is a string which contains a ' it is using a " instead during the representation of the string. Example:
print(repr("helloworld")) # -> 'helloworld'
print(repr("hello'world")) # -> "hello'world"
This representation should generally only be used for diagnostic purposes. If needing to write this special format, the dict has to be walked and the values printed explicitly "per requirements".
If wishing for a reliable output/encoding with well-defined serialization rules, use a common format like JSON, XML, YAML, etc..
You're confusing data with representation. The single quotes, space, and comma ', ' are part of the representation of strings inside a list, not the string itself.
What you're actually trying to do is split a string on a comma, e.g.
>>> '3,4'.split(',')
['3', '4']
You can do this within a list by splitting and flattening, like this:
[s1 for s0 in v for s1 in s0.split(',')]
So:
>>> b = ['1', '2', '3,4', '5'] # Using simpler data for example
>>> b = [s1 for s0 in b for s1 in s0.split(',')]
>>> print(b)
['1', '2', '3', '4', '5']
'3.3.3.3:3333,4.4.4.4:4444' is a single string and the outer quote marks are just python's way of showing that. The same thing for "3.3.3.3:3333', '4.4.4.4:4444" - it is a single string. The outer double quotes are just python's way of showing you the string. The internal single quotes and comma are literally those characters in the string.
Your problem seems to be that some values in the list have been merged. Likely the problem is whatever wrote the list in the first place. We can fix it by splitting the strings and extending the list. List items that don't have embedded commas split to a single item list so extend into our new list as a single item. No change. But items with a comma split into a 2 item list and extend the new list by 2.
test_dict = {'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', '3.3.3.3:3333,4.4.4.4:4444', '5.5.5.5:5555']}
def list_expander(alist):
"""Return list where values with a comma are expanded"""
new_list = []
for value in alist:
new_list.extend(value.split(","))
return new_list
new_dict = {key:list_expander(val) for key, val in test_dict.items()}
print(new_dict)
The result is
{'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', '3.3.3.3:3333', '4.4.4.4:4444', '5.5.5.5:5555']}
Try something like this:
test_dict["b"] = ",".join(test_dict["b"]).split(",")
Updated:
import re
# do this once for the entire list
do_joinsplit_regex = re.compile(
r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,4}"
)
for d in sample_list:
for k,v in d.items():
if not isinstance(v, list) or len(v) < 1:
continue
d[k] = ",".join(v).split(",")

Splitting a string with numbers and letters [duplicate]

I'd like to split strings like these
'foofo21'
'bar432'
'foobar12345'
into
['foofo', '21']
['bar', '432']
['foobar', '12345']
Does somebody know an easy and simple way to do this in python?
I would approach this by using re.match in the following way:
import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
items = match.groups()
print(items)
>> ("foofo", "21")
def mysplit(s):
head = s.rstrip('0123456789')
tail = s[len(head):]
return head, tail
>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
Yet Another Option:
>>> [re.split(r'(\d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]
>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'
So, if you have a list of strings with that format:
import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]
Output:
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
I'm always the one to bring up findall() =)
>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(\w+?)(\d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
Note that I'm using a simpler (less to type) regex than most of the previous answers.
here is a simple function to seperate multiple words and numbers from a string of any length, the re method only seperates first two words and numbers. I think this will help everyone else in the future,
def seperate_string_number(string):
previous_character = string[0]
groups = []
newword = string[0]
for x, i in enumerate(string[1:]):
if i.isalpha() and previous_character.isalpha():
newword += i
elif i.isnumeric() and previous_character.isnumeric():
newword += i
else:
groups.append(newword)
newword = i
previous_character = i
if x == len(string) - 2:
groups.append(newword)
newword = ''
return groups
print(seperate_string_number('10in20ft10400bg'))
# outputs : ['10', 'in', '20', 'ft', '10400', 'bg']
import re
s = raw_input()
m = re.match(r"([a-zA-Z]+)([0-9]+)",s)
print m.group(0)
print m.group(1)
print m.group(2)
without using regex, using isdigit() built-in function, only works if starting part is text and latter part is number
def text_num_split(item):
for index, letter in enumerate(item, 0):
if letter.isdigit():
return [item[:index],item[index:]]
print(text_num_split("foobar12345"))
OUTPUT :
['foobar', '12345']
This is a little longer, but more versatile for cases where there are multiple, randomly placed, numbers in the string. Also, it requires no imports.
def getNumbers( input ):
# Collect Info
compile = ""
complete = []
for letter in input:
# If compiled string
if compile:
# If compiled and letter are same type, append letter
if compile.isdigit() == letter.isdigit():
compile += letter
# If compiled and letter are different types, append compiled string, and begin with letter
else:
complete.append( compile )
compile = letter
# If no compiled string, begin with letter
else:
compile = letter
# Append leftover compiled string
if compile:
complete.append( compile )
# Return numbers only
numbers = [ word for word in complete if word.isdigit() ]
return numbers
Here is simple solution for that problem, no need for regex:
user = input('Input: ') # user = 'foobar12345'
int_list, str_list = [], []
for item in user:
try:
item = int(item) # searching for integers in your string
except:
str_list.append(item)
string = ''.join(str_list)
else: # if there are integers i will add it to int_list but as str, because join function only can work with str
int_list.append(str(item))
integer = int(''.join(int_list)) # if you want it to be string just do z = ''.join(int_list)
final = [string, integer] # you can also add it to dictionary d = {string: integer}
print(final)
In Addition to the answer of #Evan
If the incoming string is in this pattern 21foofo then the re.match pattern would be like this.
import re
match = re.match(r"([0-9]+)([a-z]+)", '21foofo', re.I)
if match:
items = match.groups()
print(items)
>> ("21", "foofo")
Otherwise, you'll get UnboundLocalError: local variable 'items' referenced before assignment error.

Split list elements to key/val dictionary

I have this:
query='id=10&q=7&fly=none'
and I want to split it to create a dictionary like this:
d = { 'id':'10', 'q':'7', 'fly':'none'}
How can I do it with little code?
By splitting twice, once on '&' and then on '=' for every element resulting from the first split:
query='id=10&q=7&fly=none'
d = dict(i.split('=') for i in query.split('&'))
Now, d looks like:
{'fly': 'none', 'id': '10', 'q': '7'}
In your case, the more convenient way would be using of urllib.parse module:
import urllib.parse as urlparse
query = 'id=10&q=7&fly=none'
d = {k:v[0] for k,v in urlparse.parse_qs(query).items()}
print(d)
The output:
{'id': '10', 'q': '7', 'fly': 'none'}
Note, that urlparse.parse_qs() function would be more useful if there multiple keys with same value in a query string. Here is an example:
query = 'id=10&q=7&fly=none&q=some_identifier&fly=flying_away'
d = urlparse.parse_qs(query)
print(d)
The output:
{'q': ['7', 'some_identifier'], 'id': ['10'], 'fly': ['none', 'flying_away']}
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qs
This is what I came up with:
dict_query = {}
query='id=10&q=7&fly=none'
query_list = query.split("&")
for i in query_list:
query_item = i.split("=")
dict_query.update({query_item[0]: query_item[1]})
print(dict_query)
dict_query returns what you want. This code works by splitting the query up into the different parts, and then for each of the new parts, it splits it by the =. It then updates the dict_query with each new value. Hope this helps!

Python parse issue

I need to do a sort of reverse .format() to a string like
a = "01AA12345AB12345AABBCCDDEE".reverseformat({id:2d}{type:2s}{a:3d}{b:4s}{c:5d}{d:2s})
print a
>>>> {'id':1, 'type':'aa', 'a':'123', 'b':'45AB', 'c':'12345', 'd':'AA'}
I found this lib that makes almost what i need, the problem is that it gives me this result
msg = parse.parse("{id:2d}{type:3S}{n:5S}", "01D1dddffffffff")
print msg.named
>>>>{'type': 'D1dddfffffff', 'id': 1, 'n': 'f'}
and not
{'id':1, 'type':'D1d', 'n':'ddfffff'}
Does another lib/method/wathever that can "unpack" a string to a dict exists?
EDIT: Just for clarify, i already tryed the w and D format specification for string
Is there any reason you can't just slice it like a normal string if your format is always the same?
s = "01D1dddffffffff"
id = s[:2]
type = s[2:5]
n = s[5:]
Which gives id, type, and n as:
01
D1d
ddffffffff
And it's trivial to convert this into a dictionary from there if that's your need. If your parsing doesn't need to be dynamic (which it doesn't seem to be from your question in it's current state) then it's easy enough to wrap the slicing in a function which will extract all of the values.
This also has the advantage that from the slice it's clear how many characters and what position in the string you're extracting, but in the parse formatter the positions are all relative (i.e. finding which characters n extracts means counting how many characters id and type consume).
You can use regular expressions to do what you want here.
import re
a = "01AA12345AB12345AABBCCDDEE"
expr = re.compile(r"""
(?P<id>.{2}) # id:2d
(?P<type>.{2}) # type:2s
(?P<a>.{3}) # a:3d
(?P<b>.{4}) # b:4s
(?P<c>.{5}) # c:5d
(?P<d>.{2}) # d:2s""", re.X)
expr.match(a).groupdict()
# {'id': '01', 'b': '45AB', 'c': '12345', 'd': 'AA', 'a': '123', 'type': 'AA'}
You could even make a function that does this.
def unformat(s, formatting_str):
typingdict = {'s': str, 'f': float, 'd':int} # are there any more?
name_to_type = {}
groups = re.findall(r"{([^}]*)}", formatting_str)
expr_str = ""
for group in groups:
name, formatspec = group.split(":")
length, type_ = formatspec[:-1], typingdict.get(formatspec[-1], str)
expr_str += "(?P<{name}>.{{{length}}})".format(name=name, length=length)
name_to_type[name] = type_
g = re.match(expr_str, s).groupdict()
for k,v in g.items():
g[k] = name_to_type[k](v)
return g
Then calling like...
>>> a
'01AA12345AB12345AABBCCDDEE'
>>> result = unformat(a, "{id:2d}{type:2s}{a:3d}{b:4s}{c:5d}{d:2s}")
>>> result
{'id': 1, 'b': '45AB', 'c': 12345, 'd': 'AA', 'a': 123, 'type': 'AA'}
However I hope you can see how incredibly ugly this is. Don't do this -- just use string slicing.

Categories