How to extract colon separated values from the same line? - python

I am using python regular expressions. I want all colon separated values in a line.
e.g.
input = 'a:b c:d e:f'
expected_output = [('a','b'), ('c', 'd'), ('e', 'f')]
But when I do
>>> re.findall('(.*)\s?:\s?(.*)','a:b c:d')
I get
[('a:b c', 'd')]
I have also tried
>>> re.findall('(.*)\s?:\s?(.*)[\s$]','a:b c:d')
[('a', 'b')]

The following code works for me:
inpt = 'a:b c:d e:f'
re.findall('(\S+):(\S+)',inpt)
Output:
[('a', 'b'), ('c', 'd'), ('e', 'f')]

Use split instead of regex, also avoid giving variable name like keywords
:
inpt = 'a:b c:d e:f'
k= [tuple(i.split(':')) for i in inpt.split()]
print(k)
# [('a', 'b'), ('c', 'd'), ('e', 'f')]

The easiest way using list comprehension and split :
[tuple(ele.split(':')) for ele in input.split(' ')]
#driver values :
IN : input = 'a:b c:d e:f'
OUT : [('a', 'b'), ('c', 'd'), ('e', 'f')]

You may use
list(map(lambda x: tuple(x.split(':')), input.split()))
where
input.split() is
>>> input.split()
['a:b', 'c:d', 'e:f']
lambda x: tuple(x.split(':')) is function to convert string to tuple 'a:b' => (a, b)
map applies above function to all list elements and returns a map object (in Python 3) and this is converted to list using list
Result
>>> list(map(lambda x: tuple(x.split(':')), input.split()))
[('a', 'b'), ('c', 'd'), ('e', 'f')]

Related

Why tuples in a set won't convert to any other type in a loop?

I'm trying to remove a certain item from a set of tuples. to do so I must convert the tuples to a list or a set (i.e. a mutable object). I'm trying to do in a for loop but the tuples won't convert and my item is yet to be removed.
a = [('A', 'C'), ('B', 'C'), ('B', 'C')]
for i in a:
i = list(i)
if 'C' in i:
i.remove('C')
print(a)
This is the output:
[('A', 'C'), ('B', 'C'), ('B', 'C')]
You got the right intuition. As your tuples are immutable, you need to create new ones.
However, in your code, you create lists, modify them, but fail to save them back in the original list.
You could use a list comprehension.
[tuple(e for e in t if e != 'C') for t in a]
Output:
[('A',), ('B',), ('B',)]
You are modifying the list but are not creating a new list.
Try this:
a = [('A', 'C'), ('B', 'C'), ('B', 'C')]
b = []
for i in a:
i = list(i)
if 'C' in i:
i.remove('C')
b.append(i)
print(b)

how to pair each 2 elements of a list?

I have a list like this
attach=['a','b','c','d','e','f','g','k']
I wanna pair each two elements that followed by each other:
lis2 = [('a', 'b'), ('c', 'd'), ('e', 'f'), ('g', 'k')]
I did the following:
Category=[]
for i in range(len(attach)):
if i+1< len(attach):
Category.append(f'{attach[i]},{attach[i+1]}')
but then I have to remove half of rows because it also give 'b' ,'c' and so on. I thought maybe there is a better way
You can use zip() to achieve this as:
my_list = ['a','b','c','d','e','f','g','k']
new_list = list(zip(my_list[::2], my_list[1::2]))
where new_list will hold:
[('a', 'b'), ('c', 'd'), ('e', 'f'), ('g', 'k')]
This will work to get only the pairs, i.e. if number of the elements in the list are odd, you'll loose the last element which is not as part of any pair.
If you want to preserve the last odd element from list as single element tuple in the final list, then you can use itertools.zip_longest() (in Python 3.x, or itertools.izip_longest() in Python 2.x) with list comprehension as:
from itertools import zip_longest # In Python 3.x
# from itertools import izip_longest ## In Python 2.x
my_list = ['a','b','c','d','e','f','g','h', 'k']
new_list = [(i, j) if j is not None else (i,) for i, j in zip_longest(my_list[::2], my_list[1::2])]
where new_list will hold:
[('a', 'b'), ('c', 'd'), ('e', 'f'), ('g', 'h'), ('k',)]
# last odd number as single element in the tuple ^
You have to increment iterator i.e by i by 2 when moving forward
Category=[]
for i in range(0, len(attach), 2):
Category.append(f'{attach[i]},{attach[i+1]}')
Also, you don't need the if condition, if the len(list) is always even
lis2 = [(lis[i],lis[i+1]) for i in range(0,len(lis),2)]
lis2
You can use list comprehension

Creating a new list based on lists of tuples

Let's assume there is a list of tuples:
for something in x.something()
print(something)
and it returns
('a', 'b')
('c', 'd')
('e', 'f')
('g', 'h')
('i', 'j')
And I have created two other lists containing certain elements from the x.something():
y = [('a', 'b'), ('c', 'd')]
z = [('e', 'f'), ('g', 'h')]
So I want to assign the tuples from x.something() to a new list based on y and z by
newlist = []
for something in x.something():
if something in 'y':
newlist.append('color1')
elif something in 'z':
newlist.append('color2')
else:
newlist.append('color3')
What I would like to have is the newlist looks like:
['color1', 'color1', 'color2', 'color2', 'color3']
But I've got
TypeError: 'in <string>' requires string as left operand, not tuple
What went wrong and how to fix it?
I think you want to get if something in y instead of if something in 'y' because they are two seperate lists, not strings:
newlist = []
for something in x.something():
if something in y:
newlist.append('color1')
elif something in z:
newlist.append('color2')
else:
newlist.append('color3')
You should remove the quotes from if something in 'y' because it assumes that you're checking if something is in the string 'y'. Same for z.
try this:
t = [('a', 'b'),
('c', 'd'),
('e', 'f'),
('g', 'h'),
('i', 'j')]
y = [('a', 'b'), ('c', 'd')]
z = [('e', 'f'), ('g', 'h')]
new_list = []
for x in t:
if x in y:
new_list.append('color1')
elif x in z:
new_list.append('color2')
else:
new_list.append('color3')
print(new_list)
output:
['color1', 'color1', 'color2', 'color2', 'color3']

Select first item in each list

Here is my list:
[(('A', 'B'), ('C', 'D')), (('E', 'F'), ('G', 'H'))]
Basically, I'd like to get:
[('A', 'C'), ('E', 'G')]
So, I'd like to select first elements from the lowest-level lists and build mid-level lists with them.
====================================================
Additional explanation below:
I could just zip them by
list(zip([w[0][0] for w in list1], [w[1][0] for w in list1]))
But later I'd like to add a condition: the second elements in the lowest level lists must be 'B' and 'D' respectively, so the final outcome should be:
[('A', 'C')] # ('E', 'G') must be sorted out
I'm a beginner, but can't find the case anywhere... Would be grateful for help.
I'd do it the following way
list = [(('A', 'B'), ('C', 'D')), (('E', 'F'), ('G', 'H'))]
out = []
for i in list:
listAux = []
for j in i:
listAux.append(j[0])
out.append((listAux[0],listAux[1]))
print(out)
I hope that's what you're looking for.

write the elements of list to file

Bigram is a list which looks like-
[('a', 'b'), ('b', 'b'), ('b', 'b'), ('b', 'c'), ('c', 'c'), ('c', 'c'), ('c', 'd'), ('d', 'd'), ('d', 'e')]
Now I am trying to wrote each element if the list as a separate line in a file with this code-
bigram = list(nltk.bigrams(s.split()))
outfile1.write("%s" % ''.join(ele) for ele in bigram)
but I am getting this error :
TypeError: write() argument must be str, not generator
I want the result as in file-
('a', 'b')
('b', 'b')
('b', 'b')
('b', 'c')
('c', 'c')
......
you're passing a generator comprehension to write, which needs strings.
If I understand correctly you want to write one representation of tuple per line.
You can achieve that with:
outfile1.write("".join('{}\n'.format(ele) for ele in bigram))
or
outfile1.writelines('{}\n'.format(ele) for ele in bigram)
the second version passes a generator comprehension to writelines, which avoids to create the big string in memory before writing to it (and looks more like your attempt)
it produces a file with this content:
('a', 'b')
('b', 'b')
('b', 'b')
('b', 'c')
('c', 'c')
('c', 'c')
('c', 'd')
('d', 'd')
('d', 'e')
Try this:
outfile1.writelines("{}\n".format(ele) for ele in bigram)
This is the operator precedence problem.
You want an expression like this:
("%s" % ''.join(ele)) for ele in bigram
Instead, you get it interpreted like this, where the part in the parens is indeed a generator:
"%s" % (''.join(ele) for ele in bigram)
Use the explicit parentheses.
Please note that ("%s" % ''.join(ele)) for ele in bigram is itself a generator. You need to call write on each element from it.
If you want to write each pair in a separate line, you have to add line separators explicitly. The easiest, to my mind, is an explicit loop:
for pair in bigram:
outfile.write("(%s, %s)\n" % pair)

Categories