Pythonic way to parse preflib Orders with Ties files - python

I'm working with data from preflib.org, especially with the "Orders with Ties" format. The format looks (somewhat) like this:
1,2,{3,4,5},6
2,{3,6,4},1,5
{2,3},{4,6},{1,5}
...
I need to parse every line of these files into a list of tuples, where every tuple contains one "equivalence class" of choices. In this example:
1,2,{3,4,5},6 -> [(1,), (2,), (3,4,5), (6,)]
2,{3,6,4},1,5 -> [(2,), (3,6,4), (1), (5,)]
{2,3},{4,6},{1,5} -> [(2,3), (4,6), (1,5)]
Currently this is solved with ugly string manipulations etc. and I am pretty sure there is something more pythonic to solve this (preferably with builtins only).
EDIT: What I do currently (very hacky and ugly ...):
s = "1,2,{3,4,5},6"
classes = []
equiv_cls = None
for token in s.split(","):
if token.startswith("{"):
equiv_cls = [token[1:]]
elif token.endswith("}"):
equiv_cls.append(token[:-1])
classes.append(tuple(equiv_cls))
equiv_cls = None
elif equiv_cls is not None:
equiv_cls.append(token)
else:
classes.append(tuple(token))

You can use ast.literal_eval with some str.replace calls:
>>> from ast import literal_eval
>>> s = '1,2,{3,4,5},6'
>>> [x if isinstance(x, tuple) else (x,) for x
in literal_eval(s.replace('{', '(').replace('}', ')'))]
[(1,), (2,), (3, 4, 5), (6,)]
As #Martijn Pieters suggested you can replace the two str.replace calls with a single str.translate call:
>>> from string import maketrans
>>> table = maketrans('{}', '()')
>>> [x if isinstance(x, tuple) else (x,) for x in literal_eval(s.translate(table))]
[(1,), (2,), (3, 4, 5), (6,)]
In Python 3 you won't need any str.replace or str.translate calls calls, it fails in Python 2.7 and here is the related bug:
>>> [tuple(x) if isinstance(x, set) else (x,) for x in literal_eval(s)]
[(1,), (2,), (3, 4, 5), (6,)]

This is a very crude and silly approach but worth a look
x = "2,{3,6,4},1,5"
y = x.replace("{",'(')
y = y.replace("}",')')
y = '['+y+']'
j = []
y = eval(y)
for i in y:
typ = str(type(i))
if(typ == "<class 'int'>"):
j.append((i,))
else:
j.append(i)
print (j)

Another regex approach:
def parse_orders_with_ties(s):
s2 = re.sub(r"{([\d,]+)}|(\d+)", r"(\g<0>,)", s)
s2 = re.sub(r"[{}]", "", s2)
v = ast.literal_eval("[" + s2 + "]")
return v

For converting this data into the required list string manipulation is necessary. After basic manipulation is done the data can be converted to list using only builtins.
The following function can be a possible solution:
def convert(str_data):
b = str_data.split(',')
list_data = []
flag = 0
for each_elem in b:
if flag == 0 :
next_str = ''
if '{' in each_elem :
next_str += each_elem.split('{')[1] + ','
flag = 1
elif flag == 1 and '}' not in each_elem :
next_str += each_elem + ','
elif flag == 1 and '}' in each_elem:
next_str += each_elem.split('}')[0]
list_data.append(next_str)
flag = 0
else:
list_data.append(each_elem)
return list_data
z = convert("{2,3},{4,6},{1,5}")
z
['2,3', '4,6', '1,5']

Related

A tuple that only accepts some group of letters

I was trying to create a function that only recieves tuples that have elements only with the letters C,B,E,D. Arguments like CEE, DDBBB, ECDBE, or CCCCB. The input was going to be a tup = ('CEE', 'DDBBB', 'ECDBE', 'CCCCB') and using other functions that i created should convert them in a number that represents a position.
def obter_pin(tup):
pin=()
posicao=5
if not 4<=len(tup)<=10 or 'CBED' not in tup:
raise ValueError('obter pin: argumento invalido')
else:
for ele in tup:
dig=obter_digito(ele,posicao)
posicao=dig
pin+=(dig,)
return pin
Using set comparison:
>>> allow = {'C', 'B', 'E', 'D'}
>>> tup1 = ('CEE', 'DDBBB', 'ECDBE', 'CCCCB')
>>> tup2 = ('CEE', 'DDBBB', 'ECDBE', 'CCCCBA')
>>> allow >= set().union(*tup1)
True
>>> allow >= set().union(*tup2)
False
You can use this one-line function.
Function definition:
def myfunction (t):
return all(set(e).issubset('CBED') for e in t)
Function in use:
tup1 = ('CEE', 'DDBBB', 'ECDBE', 'CCCCB')
tup2 = ('CEE', 'DDBBB', 'ECDBE', 'GOO')
tup3 = ('CEE', 'DDBBB', 'ECDBE', 'ALPHA')
print(myfunction(tup1)) #True
print(myfunction(tup2)) #Fale
print(myfunction(tup3)) #Fale
Even though I like #Jab's solution much better, I just wanted to add another way of going about it:
invalid = any(s.remove('C').remove('B').remove('E').remove('D') for s in tup)
Much better:
invalid = any(s.strip('CBED') for s in tup)
Or:
if ''.join(tup).strip('CBED'):
...

Python: how to replace substrings in a string given list of indices

I have a string:
"A XYZ B XYZ C"
and a list of index-tuples:
((2, 5), (8, 11))
I would like to apply a replacement of each substring defined by indices by the sum of them:
A 7 B 19 C
I can't do it using string replace as it will match both instances of XYZ. Replacing using index information will break on the second and forth iterations as indices are shifting throughout the process.
Is there a nice solution for the problem?
UPDATE. String is given for example. I don't know its contents a priori nor can I use them in the solution.
My dirty solution is:
text = "A XYZ B XYZ C"
replace_list = ((2, 5), (8, 11))
offset = 0
for rpl in replace_list:
l = rpl[0] + offset
r = rpl[1] + offset
replacement = str(r + l)
text = text[0:l] + replacement + text[r:]
offset += len(replacement) - (r - l)
Which counts on the order of index-tuples to be ascending. Could it be done nicer?
Imperative and stateful:
s = 'A XYZ B XYZ C'
indices = ((2, 5), (8, 11))
res = []
i = 0
for start, end in indices:
res.append(s[i:start] + str(start + end))
i = end
res.append(s[end:])
print(''.join(res))
Result:
A 7 B 19 C
You can use re.sub():
In [17]: s = "A XYZ B XYZ C"
In [18]: ind = ((2, 5), (8, 11))
In [19]: inds = map(sum, ind)
In [20]: re.sub(r'XYZ', lambda _: str(next(inds)), s)
Out[20]: 'A 7 B 19 C'
But note that if the number of matches is larger than your index pairs it will raise a StopIteration error. In that case you can pass a default argument to the next() to replace the sub-string with.
If you want to use the tuples of indices for finding the sub strings, here is another solution:
In [81]: flat_ind = tuple(i for sub in ind for i in sub)
# Create all the pairs with respect to your intended indices.
In [82]: inds = [(0, ind[0][0]), *zip(flat_ind, flat_ind[1:]), (ind[-1][-1], len(s))]
# replace the respective slice of the string with sum of indices of they exist in intended pairs, otherwise just the sub-string itself.
In [85]: ''.join([str(i+j) if (i, j) in ind else s[i:j] for i, j in inds])
Out[85]: 'A 7 B 19 C'
One way to do this using itertools.groupby.
from itertools import groupby
indices = ((2, 5), (8, 11))
data = list("A XYZ B XYZ C")
We start with replacing the range of matched items with equal number of None.
for a, b in indices:
data[a:b] = [None] * (b - a)
print(data)
# ['A', ' ', None, None, None, ' ', 'B', ' ', None, None, None, ' ', 'C']
The we loop over the grouped data and replace the None groups with the sum from indices list.
it = iter(indices)
output = []
for k, g in groupby(data, lambda x: x is not None):
if k:
output.extend(g)
else:
output.append(str(sum(next(it))))
print(''.join(output))
# A 7 B 19 C
Here's a quick and slightly dirty solution using string formatting and tuple unpacking:
s = 'A XYZ B XYZ C'
reps = ((2, 5), (8, 11))
totals = (sum(r) for r in reps)
print s.replace('XYZ','{}').format(*totals)
This prints:
A 7 B 19 C
First, we use a generator expression to find the totals for each of our replacements. Then, by replacing 'XYZ' with '{}' we can use string formatting - *totals will ensure we get the totals in the correct order.
Edit
I didn't realise the indices were actually string indices - my bad. To do this, we could use re.sub as follows:
import re
s = 'A XYZ B XYZ C'
reps = ((2, 5), (8, 11))
for a, b in reps:
s = s[:a] + '~'*(b-a) + s[b:]
totals = (sum(r) for r in reps)
print re.sub(r'(~+)', r'{}', s).format(*totals)
Assuming there are no tildes (~) used in your string - if there are, replace with a different character. This also assumes none of the "replacement" groups are consecutive.
Assuming there are no overlaps then you could do it in reverse order
text = "A XYZ B XYZ C"
replace_list = ((2, 5), (8, 11))
for start, end in reversed(replace_list):
text = f'{text[:start]}{start + end}{text[end:]}'
# A 7 B 19 C
Here's a reversed-order list-slice assignment solution:
text = "A XYZ B XYZ C"
indices = ((2, 5), (8, 11))
chars = list(text)
for start, end in reversed(indices):
chars[start:end] = str(start + end)
text = ''.join(chars) # A 7 B 19 C
There is also a solution which does exactly what you want.
I have not worked it out completely, but you may want to use:
re.sub() from the re library.
Look here, and look for the functions re.sub() or re.subn():
https://docs.python.org/2/library/re.html
If I have time, I will work out your example later today.
Yet another itertools solution
from itertools import *
s = "A XYZ B XYZ C"
inds = ((2, 5), (8, 11))
res = 'A 7 B 19 C'
inds = list(chain([0], *inds, [len(s)]))
res_ = ''.join(s[i:j] if k % 2 == 0 else str(i + j)
for k, (i,j) in enumerate(zip(inds, inds[1:])))
assert res == res_
Anticipating that if these pairs-of-integer selections are useful here, they will also be useful in other places, then I would proably do something like this:
def make_selections(data, selections):
start = 0
# sorted(selections) if you don't want to require the caller to provide them in order
for selection in selections:
yield None, data[start:selection[0]]
yield selection, data[selection[0]:selection[1]]
start = selection[1]
yield None, data[start:]
def replace_selections_with_total(data, selections):
return ''.join(
str(selection[0] + selection[1]) if selection else value
for selection, value in make_selections(data, selections)
)
This still relies on the selections not overlapping, but I'm not sure what it would even mean for them to overlap.
You could then make the replacement itself more flexible too:
def replace_selections(data, selections, replacement):
return ''.join(
replacement(selection, value) if selection else value
for selection, value in make_selections(data, selections)
)
def replace_selections_with_total(data, selections):
return replace_selections(data, selections, lambda s,_: str(s[0]+s[1]))

Determining length and position of character repeats in string

Assume a string s that may contain several adjacent occurrences of dashes. For the sake of simplicity, let's call each of these occurrences a "repeat motive". For example, the following string s contains five repeat motives of dashes, namely of length 3,2,6,5 and 1.
s = "abcde---fghij--klmnopq------rstuvw-----xy-z"
I am trying to come up with Python code that returns the respective length and the respective position within the string of each of the repeat motives. Preferentially, the code returns a list of tuples, with each tuple being of format (length, position).
sought_function(s)
# [(3,5), (2,13), (6,22), (5,34), (1,41)]
Would you have any suggestions as to how to start this code?
You can use groupby:
s = "abcde---fghij--klmnopq------rstuvw-----xy-z"
from itertools import groupby
[(next(g)[0], sum(1 for _ in g) + 1) for k, g in groupby(enumerate(s), lambda x: x[1]) if k == "-"]
# [(5, 3), (13, 2), (22, 6), (34, 5), (41, 1)]
Or as #Willem commented, replace the sum with len:
[(next(g)[0], len(list(g)) + 1) for k, g in groupby(enumerate(s), lambda x: x[1]) if k == "-"]
# [(5, 3), (13, 2), (22, 6), (34, 5), (41, 1)]
If you want to write your own function: simply iterate over the characters, and hold in memory the current length, if the sequence is cut off, you yield the element:
def find_sequences(s,to_find):
result = []
lng = 0
for i,c in enumerate(s):
if c == to_find:
lng += 1
else:
if lng:
result.append((lng,i-lng))
lng = 0
if lng:
result.append((lng,i-lng))
return result
so s is the string and to_find is the character you are interested in (here '-').
if using numpy is fine :
import numpy as np
a = "abcde---fghij--klmnopq------rstuvw-----xy-z"
bool_vec = np.array([letter == "-" for letter in a])
dots = np.where(np.diff(bool_vec)!=0)[0] + 1
number = np.diff(dots.reshape((-1,2)),1).ravel()
idx = dots[::2]
with number and idx two arrays that contain what you want :)
You could do re.split("(-+)", s) which will return a list of ["abcde", "---", ...], and then iterate over that.
Here is would be my suggestion for this:
import re
s = "abcde---fghij--klmnopq------rstuvw-----xy-z"
list1= []
for x in re.findall("[a-z]*-", s):
temp = x.strip("-")
if len(temp) > 0:
list1.append(temp)
print(list1)

python3 string "abcd" print: aababcabcd?

If a have a string like abcd or 1234 etc. how can I print together, the first character, then the first two characters, then the first three etc. all together?
For example for a string = 1234 I would like to print/return 1121231234 or aababcabcd
I have this code so far:
def string_splosion(str):
i = 0
while i <= len(str):
i += 1
print(str[:i])
print(string_splosion('abcd'))
But it prints/returns it in separate lines. I could write it manually as print(str[0:1], str[1:2] <...>) but how do I make python do it as I don't know how long the string is going to be?
You shouldn't use str as a variable name, because it shadows the built-in str type. You could join the sliced strings together in your loop:
def string_splosion(string):
i, result = 0, ''
while i < len(string): # < instead of <=
i += 1
result += string[:i]
return result
It's possible to shorten your code a little using str.join and range:
def string_splosion(string):
return ''.join(string[:i] for i in range(1, len(string) + 1))
or using itertools.accumulate (Python 3.2+):
import itertools
def string_splosion(string):
return ''.join(itertools.accumulate(string))
itertools.accumulate approach appears to be 2 times faster than str.join one and about 1.5 times faster than the original loop-based solution:
string_splosion_loop(abcdef): 2.3944241080715223
string_splosion_join_gen(abcdef): 2.757582983268288
string_splosion_join_lc(abcdef): 2.2879220573578865
string_splosion_itertools(abcdef): 1.1873638161591886
The code I used to time the functions is
import itertools
from timeit import timeit
string = 'abcdef'
def string_splosion_loop():
i, result = 0, ''
while i < len(string):
i += 1
result += string[:i]
return result
def string_splosion_join_gen():
return ''.join(string[:i] for i in range(1, len(string) + 1))
def string_splosion_join_lc():
# str.join performs faster when the argument is a list
return ''.join([string[:i] for i in range(1, len(string) + 1)])
def string_splosion_itertools():
return ''.join(itertools.accumulate(string))
funcs = (string_splosion_loop, string_splosion_join_gen,
string_splosion_join_lc, string_splosion_itertools)
for f in funcs:
print('{.__name__}({}): {}'.format(f, string, timeit(f)))
Just use:
"".join([s[:i] for i in range(len(s)+1)])
As #abc noted, don't use str as a variable name because it's one of the default type. see https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
E.g.:
>>> s = "1234"
>>> "".join([s[:i] for i in range(len(s)+1)])
'1121231234'
>>> s = "abcd"
>>> "".join([s[:i] for i in range(len(s)+1)])
'aababcabcd'
range(len(s)+1) is because of slicing, see Explain Python's slice notation:
>>> s = "1234"
>>> len(s)
4
>>> range(len(s))
[0, 1, 2, 3]
>>> s[:3]
'123'
>>> range(len(s)+1)
[0, 1, 2, 3, 4]
>>> s[:4]
'1234'
Then:
>>> s[:0]
''
>>> s[:1]
'1'
>>> s[:2]
'12'
>>> s[:3]
'123'
>>> s[:4]
'1234'
Lastly, join list([s[:1], s[:2], s[:3], s[:4]]) using "".join(list), see https://docs.python.org/2/library/string.html#string.join:
>>> list([s[:1], s[:2], s[:3], s[:4]])
['1', '12', '123', '1234']
>>> x = list([s[:1], s[:2], s[:3], s[:4]])
>>> "".join(x)
'1121231234'
>>> "-".join(x)
'1-12-123-1234'
>>> " ".join(x)
'1 12 123 1234'
To avoid extract iteration in loop, you can use range(1,len(s)+1) since s[:0] returns string of 0 length:
>>> s = "1234"
>>> "".join([s[:i] for i in range(1,len(s)+1)])
'1121231234'
>>> "".join([s[:i] for i in range(len(s)+1)])
'1121231234'
If you are using python 3 you can use this to print without a newline:
print(yourString, end="")
So your function could be:
def string_splosion(str):
for i in range(len(str)):
print(str[:i], end="")
print(string_splosion('abcd'))

How to join array based on position and datatype in Python?

I have a few arrays containing integer and strings. For example:
myarray1 = [1,2,3,"ab","cd",4]
myarray2 = [1,"a",2,3,"bc","cd","e",4]
I'm trying to combine only the strings in an array that are next to each other. So I want the result to be:
newarray1= [1,2,3,"abcd",4]
newarray2= [1,"a",2,3,"bccde",4]
Does anyone know how to do this? Thank you!
The groupby breaks the list up into runs of strings and runs of integers. The ternary operation joins the groups of strings and puts them into a temporary sequence. The chain re-joins the strings and the runs of integers.
from itertools import groupby, chain
def joinstrings(iterable):
return list(chain.from_iterable(
(''.join(group),) if key else group
for key, group in
groupby(iterable, key=lambda elem: isinstance(elem, basestring))))
>>> myarray1 = [1,2,3,"ab","cd",4]
>>> newarray1 = [myarray1[0]]
>>> for item in myarray1[1:]:
... if isinstance(item, str) and isinstance(newarray1[-1], str):
... newarray1[-1] = newarray1[-1] + item
... else:
... newarray1.append(item)
>>> newarray1
[1, 2, 3, 'abcd', 4]
reduce(lambda x, (tp, it): tp and x + ["".join(it)] or x+list(it), itertools.groupby( myarray1, lambda x: isinstance(x, basestring) ), [])
a = [1,2,3,"ab","cd",4]
b = [1,a,2,3,"bc","cd","e",4]
def func(a):
ret = []
s = ""
for x in a:
if isinstance(x, basestring):
s = s + x
else:
if s:
ret.append(s)
s = ""
ret.append(x)
return ret
print func(a)
print func(b)

Categories