Python parse issue - python

I need to do a sort of reverse .format() to a string like
a = "01AA12345AB12345AABBCCDDEE".reverseformat({id:2d}{type:2s}{a:3d}{b:4s}{c:5d}{d:2s})
print a
>>>> {'id':1, 'type':'aa', 'a':'123', 'b':'45AB', 'c':'12345', 'd':'AA'}
I found this lib that makes almost what i need, the problem is that it gives me this result
msg = parse.parse("{id:2d}{type:3S}{n:5S}", "01D1dddffffffff")
print msg.named
>>>>{'type': 'D1dddfffffff', 'id': 1, 'n': 'f'}
and not
{'id':1, 'type':'D1d', 'n':'ddfffff'}
Does another lib/method/wathever that can "unpack" a string to a dict exists?
EDIT: Just for clarify, i already tryed the w and D format specification for string

Is there any reason you can't just slice it like a normal string if your format is always the same?
s = "01D1dddffffffff"
id = s[:2]
type = s[2:5]
n = s[5:]
Which gives id, type, and n as:
01
D1d
ddffffffff
And it's trivial to convert this into a dictionary from there if that's your need. If your parsing doesn't need to be dynamic (which it doesn't seem to be from your question in it's current state) then it's easy enough to wrap the slicing in a function which will extract all of the values.
This also has the advantage that from the slice it's clear how many characters and what position in the string you're extracting, but in the parse formatter the positions are all relative (i.e. finding which characters n extracts means counting how many characters id and type consume).

You can use regular expressions to do what you want here.
import re
a = "01AA12345AB12345AABBCCDDEE"
expr = re.compile(r"""
(?P<id>.{2}) # id:2d
(?P<type>.{2}) # type:2s
(?P<a>.{3}) # a:3d
(?P<b>.{4}) # b:4s
(?P<c>.{5}) # c:5d
(?P<d>.{2}) # d:2s""", re.X)
expr.match(a).groupdict()
# {'id': '01', 'b': '45AB', 'c': '12345', 'd': 'AA', 'a': '123', 'type': 'AA'}
You could even make a function that does this.
def unformat(s, formatting_str):
typingdict = {'s': str, 'f': float, 'd':int} # are there any more?
name_to_type = {}
groups = re.findall(r"{([^}]*)}", formatting_str)
expr_str = ""
for group in groups:
name, formatspec = group.split(":")
length, type_ = formatspec[:-1], typingdict.get(formatspec[-1], str)
expr_str += "(?P<{name}>.{{{length}}})".format(name=name, length=length)
name_to_type[name] = type_
g = re.match(expr_str, s).groupdict()
for k,v in g.items():
g[k] = name_to_type[k](v)
return g
Then calling like...
>>> a
'01AA12345AB12345AABBCCDDEE'
>>> result = unformat(a, "{id:2d}{type:2s}{a:3d}{b:4s}{c:5d}{d:2s}")
>>> result
{'id': 1, 'b': '45AB', 'c': 12345, 'd': 'AA', 'a': 123, 'type': 'AA'}
However I hope you can see how incredibly ugly this is. Don't do this -- just use string slicing.

Related

Split string from two pattern based on regex Python

Given a two file path
Z:\home\user\dfolder\NO,AG,GK.jpg
Z:\home\user\dfolder\NI,DG,BJ (1).jpg
The objective is to split each string and store into a dict
Currently, I first split the path using os.path.split to get list of s
s=['NO,AG,GK.jpg','NI,DG,BJ (1).jpg']
and iteratively split the string as below
all_dic=[]
for ds in s:
k=ds.split(",")
kk=k[-1].split('.jpg')[0].split("(")[0] if bool(re.search('\(\d+\)', ds)) else k[-1].split('.jpg')[0]
nval={"f":k[0],"s":k[1],"t":kk}
all_dic.append(nval)
But, I am curious for a regex approach, or any 1 liner .
One liner parsing using regex + inline list parsing:
import re
s = ['NO,AG,GK.jpg', 'NI,DG,BJ (1).jpg']
keys = ['f', 's', 't']
all_dic = [{keys[k]: x for k, x in enumerate(
re.sub("(\s\(\d+\))?(\.jpg)?", "", item).split(','))} for item in s]
print(all_dic)
->
[{'f': 'NO', 's': 'AG', 't': 'GK'}, {'f': 'NI', 's': 'DG', 't': 'BJ'}]
Well, I think this is the easiest way to get the same output without using the split() function.
The regular expression takes only the letters and puts them in a list, so we don't even have to split the string or remove the (1) from it.
import re
s=['NO,AG,GK.jpg','NI,DG,BJ (1).jpg']
all_dic = []
for ds in s:
regex = '[a-zA-Z]+'
k = re.findall(regex,ds) # We extract all the matches (as a list)
nval={'f':k[0],'s':k[1],'t':k[2]} # We create the dictionary
all_dic.append(nval) # We append the dictionary to the list
print(all_dic)
# Output: [{'f': 'NO', 's': 'AG', 't': 'GK'}, {'f': 'NI', 's': 'DG', 't': 'BJ'}]
Also, you have the file extension in k[3], just in case you need it.

Values in python dictionary getting converted to double quotes rather than single quotes

I have a dictionary with the below values :-
test_dict = {'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', '3.3.3.3:3333,4.4.4.4:4444', '5.5.5.5:5555']}
I need to replace the comma (,) between 3.3.3.3:3333 and 4.4.4.4:4444 with (',) which is (single quote comma space) like that of the others.
I tried the code below but the output is coming with double quotes (")
val = ','
valnew = '\', \'' # using escape characters - all are single quotes
for k, v in test_dict.items():
for i, s in enumerate(v):
if val in s:
v[i] = s.replace(val, valnew)
print(test_dict)
Output:
{'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', "3.3.3.3:3333', '4.4.4.4:4444", '5.5.5.5:5555']}
Expected Output:
{'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', '3.3.3.3:3333', '4.4.4.4:4444', '5.5.5.5:5555']}
Please suggest.
print is displaying a representation of the dict, as if print(repr(test_dict)) was called.
[repr returns] a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval() ..
Since the value is a string which contains a ' it is using a " instead during the representation of the string. Example:
print(repr("helloworld")) # -> 'helloworld'
print(repr("hello'world")) # -> "hello'world"
This representation should generally only be used for diagnostic purposes. If needing to write this special format, the dict has to be walked and the values printed explicitly "per requirements".
If wishing for a reliable output/encoding with well-defined serialization rules, use a common format like JSON, XML, YAML, etc..
You're confusing data with representation. The single quotes, space, and comma ', ' are part of the representation of strings inside a list, not the string itself.
What you're actually trying to do is split a string on a comma, e.g.
>>> '3,4'.split(',')
['3', '4']
You can do this within a list by splitting and flattening, like this:
[s1 for s0 in v for s1 in s0.split(',')]
So:
>>> b = ['1', '2', '3,4', '5'] # Using simpler data for example
>>> b = [s1 for s0 in b for s1 in s0.split(',')]
>>> print(b)
['1', '2', '3', '4', '5']
'3.3.3.3:3333,4.4.4.4:4444' is a single string and the outer quote marks are just python's way of showing that. The same thing for "3.3.3.3:3333', '4.4.4.4:4444" - it is a single string. The outer double quotes are just python's way of showing you the string. The internal single quotes and comma are literally those characters in the string.
Your problem seems to be that some values in the list have been merged. Likely the problem is whatever wrote the list in the first place. We can fix it by splitting the strings and extending the list. List items that don't have embedded commas split to a single item list so extend into our new list as a single item. No change. But items with a comma split into a 2 item list and extend the new list by 2.
test_dict = {'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', '3.3.3.3:3333,4.4.4.4:4444', '5.5.5.5:5555']}
def list_expander(alist):
"""Return list where values with a comma are expanded"""
new_list = []
for value in alist:
new_list.extend(value.split(","))
return new_list
new_dict = {key:list_expander(val) for key, val in test_dict.items()}
print(new_dict)
The result is
{'a': ['a1', 'a2'], 'b': ['1.1.1.1:1111', '2.2.2.2:2222', '3.3.3.3:3333', '4.4.4.4:4444', '5.5.5.5:5555']}
Try something like this:
test_dict["b"] = ",".join(test_dict["b"]).split(",")
Updated:
import re
# do this once for the entire list
do_joinsplit_regex = re.compile(
r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,4}"
)
for d in sample_list:
for k,v in d.items():
if not isinstance(v, list) or len(v) < 1:
continue
d[k] = ",".join(v).split(",")

Input string. Associate index to each character in string for a dictionary

I am asking the user to enter a string. I am ultimately trying to pass the string to a dictionary, where the the index of each character is associated with each character in the string.
Ex: Input = CSC120
What I have done so far is entered a string and passed it to a set. The issue is that when I pass it to a set, it passes in : {'1', '2', 'C', '0', 'S'}. It is out of order. I was thinking I would be able to correlate the string to an index once it was passed to the set, but it is out of order and does not duplicate the 'C'.
The plan was to have 2 sets and link them in a dictionary. I am stuck at trying to get the string to be correctly passed to the set.
d = {}
set1 = set()
string1 = input("Enter a string:").upper()
for i in string1:
set1.add(i)
print(set1)
Ultimately the results I am trying to achieve is:
d = { 0:'C', 1:'S', 2:'C', 3:'1', 4:'2', 5:'0'}
It can be done with a dictionary display (aka comprehension):
Input = 'CSC120'
d = {i: c for i, c in enumerate(Input)}
print(d) # -> {0: 'C', 1: 'S', 2: 'C', 3: '1', 4: '2', 5: '0'}
However, it can be done with even less code (and likely more quickly), by passing the dict constructor the an enumeration of the characters in the string (as helpfully pointed-out by #coldspeed in a comment):
d = dict(enumerate(Input))
Here's the documentation for the built-in enumerate() function.

Inverse of string format in python

In python, we can use str.format to construct string like this:
string_format + value_of_keys = formatted_string
Eg:
FMT = '{name:} {age:} {gender}' # string_format
VoK = {'name':'Alice', 'age':10, 'gender':'F'} # value_of_keys
FoS = FMT.format(**VoK) # formatted_string
In this case, formatted_string = 'Alice 10 F'
I just wondering if there is a way to get the value_of_keys from formatted_string and string_format? It should be function Fun with
VoK = Fun('{name:} {age:} {gender}', 'Alice 10 F')
# the value of Vok is expected as {'name':'Alice', 'age':10, 'gender':'F'}
Is there any way to get this function Fun?
ADDED :
I would like to say, the '{name:} {age:} {gender}' and 'Alice 10 F' is just a simplest example. The realistic situation could be more difficult, the space delimiter may not exists.
And mathematically speaking, most of the cases are not reversible, such as:
FMT = '{key1:}{key2:}'
FoS = 'HelloWorld'
The VoK could be any one in below:
{'key1':'Hello','key2':'World'}
{'key1':'Hell','key2':'oWorld'}
....
So to make this question well defined, I would like to add two conditions:
1. There are always delimiters between two keys
2. All delimiters are not included in any value_of_keys.
In this case, this question is solvable (Mathematically speaking) :)
Another example shown with input and expected output:
In '{k1:}+{k2:}={k:3}', '1+1=2' Out {'k1':1,'k2':2, 'k3':3}
In 'Hi, {k1:}, this is {k2:}', 'Hi, Alice, this is Bob' Out {'k1':'Alice', 'k2':'Bob'}
You can indeed do this, but with a slightly different format string, called regular expressions.
Here is how you do it:
import re
# this is how you write your "format"
regex = r"(?P<name>\w+) (?P<age>\d+) (?P<gender>[MF])"
test_str = "Alice 10 F"
groups = re.match(regex, test_str)
Now you can use groups to access all the components of the string:
>>> groups.group('name')
'Alice'
>>> groups.group('age')
'10'
>>> groups.group('gender')
'F'
Regex is a very cool thing. I suggest you learn more about it online.
I wrote a funtion and it seems work:
import re
def Fun(fmt,res):
reg_keys = '{([^{}:]+)[^{}]*}'
reg_fmts = '{[^{}:]+[^{}]*}'
pat_keys = re.compile(reg_keys)
pat_fmts = re.compile(reg_fmts)
keys = pat_keys.findall(fmt)
lmts = pat_fmts.split(fmt)
temp = res
values = []
for lmt in lmts:
if not len(lmt)==0:
value,temp = temp.split(lmt,1)
if len(value)>0:
values.append(value)
if len(temp)>0:
values.append(temp)
return dict(zip(keys,values))
Usage:
eg1:
fmt = '{k1:}+{k2:}={k:3}'
res = '1+1=2'
print Fun(fmt,res)
>>>{'k2': '1', 'k1': '1', 'k': '2'}
eg2:
fmt = '{name:} {age:} {gender}'
res = 'Alice 10 F'
print Fun(fmt,res)
>>>
eg3:
fmt = 'Hi, {k1:}, this is {k2:}'
res = 'Hi, Alice, this is Bob'
print Fun(fmt,res)
>>>{'k2': 'Bob', 'k1': 'Alice'}
There is no way for python to determine how you created the formatted string once you get the new string.
For example: once your format "{something} {otherthing}" with values with space and you get the desired string, you can not differentiate whether the word with space was the part of {something} or {otherthing}
However you may use some hacks if you know about the format of the new string and there is consistency in the result.
For example, in your given example: if you are sure that you'll have word followed by space, then a number, then again a space and then a word, then you may use below regex to extract the values:
>>> import re
>>> my_str = 'Alice 10 F'
>>> re.findall('(\w+)\s(\d+)\s(\w+)', my_str)
[('Alice', '10', 'F')]
In order to get the desired dict from this, you may update the logic as:
>>> my_keys = ['name', 'age', 'gender']
>>> dict(zip(my_keys, re.findall('(\w+)\s(\d+)\s(\w+)', my_str)[0]))
{'gender': 'F', 'age': '10', 'name': 'Alice'}
I suggest another approach to this problem using **kwargs, such as...
def fun(**kwargs):
result = '{'
for key, value in kwargs.iteritems():
result += '{}:{} '.format(key, value)
# stripping the last space
result = result[:-1]
result += '}'
return result
print fun(name='Alice', age='10', gender='F')
# outputs : {gender:F age:10 name:Alice}
NOTE : kwargs is not an ordered dict, and will only keep the parameters order up to version 3.6 of Python. If order is something you with to keep, it is easy though to build a work-around solution.
This code produces strings for all the values, but it does split the string into its constituent components. It depends on the delimiter being a space, and none of the values containing a space. If any of the values contains a space this becomes a much harder problem.
>>> delimiters = ' '
>>> d = {k: v for k,v in zip(('name', 'age', 'gender'), 'Alice 10 F'.split(delimiters))}
>>> d
{'name': 'Alice', 'age': '10', 'gender': 'F'}
for your requirement, I have a solution.
This solution concept is:
change all delimiters to same delimiter
split input string by the same delimiter
get the keys
get the values
zip keys and values as dict
import re
from collections import OrderedDict
def Func(data, delimiters, delimiter):
# change all delimiters to delimiter
for d in delimiters:
data[0] = data[0].replace(d, delimiter)
data[1] = data[1].replace(d, delimiter)
# get keys with '{}'
keys = data[0].split(delimiter)
# if string starts with delimiter remove first empty element
if keys[0] == '':
keys = keys[1:]
# get keys without '{}'
p = re.compile(r'{([\w\d_]+):*.*}')
keys = [p.match(x).group(1) for x in keys]
# get values
vals = data[1].split(delimiter)
# if string starts with delimiter remove first empty element
if vals[0] == '':
vals = vals[1:]
# pack to a dict
result_1 = dict(zip(keys, vals))
# if you need Ordered Dict
result_2 = OrderedDict(zip(keys, vals))
return result_1, result_2
The usage:
In_1 = ['{k1}+{k2:}={k3:}', '1+2=3']
delimiters_1 = ['+', '=']
result = Func(In_1, delimiters_1, delimiters_1[0])
# Out_1 = {'k1':1,'k2':2, 'k3':3}
print(result)
In_2 = ['Hi, {k1:}, this is {k2:}', 'Hi, Alice, this is Bob']
delimiters_2 = ['Hi, ', ', this is ']
result = Func(In_2, delimiters_2, delimiters_2[0])
# Out_2 = {'k1':'Alice', 'k2':'Bob'}
print(result)
The output:
({'k3': '3', 'k2': '2', 'k1': '1'},
OrderedDict([('k1', '1'), ('k2', '2'), ('k3', '3')]))
({'k2': 'Bob', 'k1': 'Alice'},
OrderedDict([('k1', 'Alice'), ('k2', 'Bob')]))
try this :
import re
def fun():
k = 'Alice 10 F'
c = '{name:} {age:} {gender}'
l = re.sub('[:}{]', '', c)
d={}
for i,j in zip(k.split(), l.split()):
d[j]=i
print(d)
you can change the fun parameters as your wish and assign it to variables. It accepts the same string you want to give. and gives the dict like this:
{'name': 'Alice', 'age': '10', 'gender': 'F'}
I think the only right answer is that, what you are searching for isn't really possible generally after all. You just don't have enough information. A good example is:
#python 3
a="12"
b="34"
c="56"
string=f"{a}{b}{c}"
dic = fun("{a}{b}{c}",string)
Now dic might be {"a":"12","b":"34","c":"56"} but it might as well just be {"a":"1","b":"2","c":"3456"}. So any universal reversed format function would ultimately fail to this ambiguity. You could obviously force a delimiter between each variable, but that would defeat the purpose of the function.
I know this was already stated in the comments, but it should also be added as an answer for future visitors.

How can I replace certain characters in a string?

I'm relatively new to python(3.5.2) and I'd love some help with my assignment on lists& strings.
I am required to write a code that replaces: e and E with 3, a and A with 4, i and I with 1, o and O with 0 in any given string. Here is my attempt:
s = input("Enter a string: ")
leet = {'a':'4','e':'3','i':'1','o':'0','A':'4','E':'3','I':'1','O':'0'}
for character in s:
if character == leet.keys():
str.replace(leet.keys(),leet.values())
print(s)
This code does not yield any satisfying results for me, I'm wondering if I can use the str.replace method or is there any easier way of doing this?
Thanks!
you can do that in one line using a generator comprehension converted to a string using str.join (Using dict.get with defaults to the input character if not found in dictionary):
s = "a string Entered"
leet = {'a':'4','e':'3','i':'1','o':'0','A':'4','E':'3','I':'1','O':'0'}
crypted = "".join(leet.get(k,k) for k in s)
print(crypted)
result:
4 str1ng 3nt3r3d
replace() method is good. But you use it wrong. Remember that leet.keys() will return a list of all keys in the dictionary. So I suggest this:
s = input("Enter a string: ")
leet = {'a': '4', 'e': '3', 'i': '1', 'o': '0', 'A': '4', 'E': '3', 'I': '1', 'O': '0'}
for k, v in leet.items(): #iterating through dictionary (not string)
s = s.replace(k, v)
print(s)

Categories