Converting a string to a list of 2-tuples - python

I have strings of this shape:
d="M 997.14282,452.3622 877.54125,539.83678 757.38907,453.12006 802.7325,312.0516 950.90847,311.58322 Z"
which are (x, y) coordinates of a pentagon (the first and last letters are metadata and to be ignored). What I want is a list of 2-tuples that would represent the coordinates in floating points without all the cruft:
d = [(997.14282, 452.3622), (877.54125, 539.83678), (757.38907, 453.12006), (802.7325,312.0516), (950.90847, 311.58322)]
Trimming the string was easy:
>>> d.split()[1:-2]
['997.14282,452.3622', '877.54125,539.83678', '757.38907,453.12006', '802.7325,312.0516']
but now I want to create the tuples in a succinct way. This obviously didn't work:
>>> tuple('997.14282,452.3622')
('9', '9', '7', '.', '1', '4', '2', '8', '2', ',', '4', '5', '2', '.', '3', '6', '2', '2')
Taking the original string, I could write something like this:
def coordinates(d):
list_of_coordinates = []
d = d.split()[1:-2]
for elem in d:
l = elem.split(',')
list_of_coordinates.append((float(l[0]), float(l[1])))
return list_of_coordinates
which works fine:
>>> coordinates("M 997.14282,452.3622 877.54125,539.83678 757.38907,453.12006 802.7325,312.0516 950.90847,311.58322 Z")
[(997.14282, 452.3622), (877.54125, 539.83678), (757.38907, 453.12006), (802.7325, 312.0516)]
However this processing is a small and trivial part of a bigger program and I'd rather keep it as short and succinct as possible. Can anyone please show me a less verbose way to convert the string to the list of 2-tuples?

A note, not sure if this is intended - when you do d.split()[1:-2] , you are losing the last coordinate. Assuming that is not intentional , A one liner for this would be -
def coordinates1(d):
return [tuple(map(float,coords.split(','))) for coords in d.split()[1:-1]]
If losing the last coordinate is intentional, use [1:-2] in the above code.

You can do this in one line using list comprehension.
x = [tuple(float(j) for j in i.split(",")) for i in d.split()[1:-2]]
This goes through d.split()[1:-2]], each pair that should be grouped together, splits them by a comma, converts each item in that to a float, and groups them together in a tuple.
Also, you might want to use d.split()[1:-1] because using -2 cuts out the last pair of coordinates.

While you do all right, it's could be some compressed using list comprehension or some functional stuff (i mean "map"):
def coordinates(d):
d = d[2:-2].split() # yeah, split here into pairs
d = map(str.split, d, ","*len(d)) # another split, into tokens
# here we'd multiplied string to get right size iterable
return list(map(tuple, d)) # and last map with creating list
# from "map object"
Of couse it can be reduced into one-line with list comprehension, but readablity would be reduced too (while right now code is read hard). And although Guido hates functional programming i'm find this more logical... After some practice. Good luck!

Related

How to call for information from a list?

I'm quite new to Python and I would like some help. I'm currently trying to store information from the first line of a txt file in a tuple and I am having some trouble. The txt file's second line is:
Water: 0 0 4 2 1 3
I want to store the numbers only so my current code is:
water = []
with open(file_name) as f:
lines = f.readlines()
water_values = lines[1].strip()
splitted = water_values.split(" ")
splitted.remove("Water:")
water.append(splitted)
However, when I call for water[1], expecting to receive 0, I find that the index is out of range and that the len(water) is only 1. When I print it, it says:
[['0', '0', '4', '2', '1', '3']]
How can I change it so that I can call for each element?
When you call water.append(splitted) you are adding a new element to the end of the list and splitted is a list so you get a list of lists.
If you want to combine two lists, you should instead call water += splitted. The += operator means that you are adding to the left side value, what ever is on the right side and is analogous to water = water + splitted.
You should use .extend rather than .append, i.e. instead of
water.append(splitted)
do
water.extend(splitted)
Simple example to show difference:
a = []
b = []
a.append([1,2,3])
b.extend([1,2,3])
print(a)
print(b)
output:
[[1, 2, 3]]
[1, 2, 3]
If you know to want more about handling lists in python read More on Lists in docs
your code water.append(splitted) just adds splitted (which is a list) as a first element of water list. To add values from splitted you could just do following:
water += splitted
instead of
water.append(splitted)
Doing so - you will get water = ['0', '0', '4', '2', '1', '3'].
You can read more here How do I concatenate two lists in Python?

List comprehension with duplicated function call [duplicate]

This question already has answers here:
Python list comprehension - want to avoid repeated evaluation
(12 answers)
Closed 5 years ago.
I want to transform a string such as following:
' 1 , 2 , , , 3 '
into a list of non-empty elements:
['1', '2', '3']
My solution is this list comprehension:
print [el.strip() for el in mystring.split(",") if el.strip()]
Just wonder, is there a nice, pythonic way to write this comprehension without calling el.strip() twice?
You can use a generator inside the list comprehension:
[x for x in (el.strip() for el in mylist.split(",")) if x]
# \__________________ ___________________/
# v
# internal generator
The generator thus will provide stripped elements, and we iterate over the generator, and only check the truthiness. We thus save on el.strip() calls.
You can also use map(..) for this (making it more functional):
[x for x in map(str.strip, mylist.split(",")) if x]
# \______________ ________________/
# v
# map
But this is basically the same (although the logic of the generator is - in my opinion - better encapsulated).
As a simple alternative to get a list of non-empty elements (in addition to previous good answers):
import re
s = ' 1 , 2 , , , 3 '
print(re.findall(r'[^\s,]+', s))
The output:
['1', '2', '3']
How about some regex to extract all the numbers from the string
import re
a = ' 1 , 2 , , , 3 '
print(re.findall(r'\d+', a))
Output:
['1', '2', '3']
In just one line of code that's about as terse you're going to get. Ofcourse, if you want to get fanciful you can try the functional approach:
filter(lambda x: x, map(lambda x: x.strip(), mylist.split(',')))
But this gets you terseness in exchange for visibility
Go full functional with map and filter by using:
s = ' 1 , 2 , , , 3 '
res = filter(None, map(str.strip, s.split(',')))
though similar to #omu_negru's answer, this avoids using lambdas which are arguably pretty ugly but, also, slow things down.
The argument None to filter translates to: filter on truthness, essentially x for x in iterable if x, while the map just maps the method str.strip (which has a default split value of whitespace) to the iterable obtained from s.split(',').
On Python 2, where filter still returns a list, this approach should easily edge out the other approaches in speed.
In Python 3 one would have to use:
res = [*filter(None, map(str.strip, s.split(',')))]
in order to get the list back.
If you have imported "re", then re.split() will work:
import re
s=' 1 , 2 , , , 3 '
print ([el for el in re.split(r"[, ]+",s) if el])
['1', '2', '3']
If strings separated by only spaces (with no intervening comma) should not be separated, then this will work:
import re
s=' ,,,,, ,,,, 1 , 2 , , , 3,,,,,4 5, 6 '
print ([el for el in re.split(r"\s*,\s*",s.strip()) if el])
['1', '2', '3', '4 5', '6']
List comprehensions are wonderful, but it's not illegal to use more than one line of code! You could even - heaven forbid - use a for loop!
result = []
for el in mystring.split(",")
x = el.strip()
if x:
result.append(x)
Here's a two-line version. It's actually the same as the accepted answer by Willem Van Onsem, but with a name given to a subexpression (and a generator changed to a list but it makes essentially no difference for a problem this small). In my view, this makes it a lot easier to read, despite taking fractionally more code.
all_terms = [el.strip() for el in mystring.split(",")]
non_empty_terms = [x for x in all_terms if x]
Some of the other answers are certainly shorter, but I'm not convinced any of them are simpler/easier to understand. Actually, I think the best answer is just the one in your question, because the repetition in this case is quite minor.

python add specific lists within a list

For this problem I am dealing with a big list,that it was imported from a CSV file, but let's say
I have a list like this:
[['name','score1','score2''score3''score4']
['Mike','5','1','6','2']
['Mike','1','1','1','1']
['Mike','3','0','3','0']
['jose','0','1','2','3']
['jose','2','3','4','5']
['lisa','4','4','4','4']]
and I want to have another list with this form(the sum of all score for each student):
[['Mike','9','2','10','3']
['jose','2','4','6','8']
['lisa','4','4','4','4']]
any ideas how this can be done?
I've been trying many ways, and I could not make it.
I was stuck when there where more than 2 same names, my solution only kept the last 2 lines to add.
I am new in python and programming in general.
If you are just learning Python I always recommend try to implement things without relying on external libraries. A good starting step is to start by trying to break the problem up into smaller components:
Remove the first entry (the column titles) from the input list. You don't need it for your result.
For each remaining entry:
Convert every entry except the first to an integer (so you can add them).
Determine if you have already encountered an entry with the same name (first column value). If not: add the entry to the output list. Otherwise: merge the entry with the one already in the output list (by adding values in the columns).
One possible implementation follows (untested):
input_list = [['name','score1','score2''score3''score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
print input_list
# Remove the first element
input_list = input_list[1:]
# Initialize an empty output list
output_list = []
# Iterate through each entry in the input
for val in input_list:
# Determine if key is already in output list
for ent in output_list:
if ent[0] == val[0]:
# The value is already in the output list (so merge them)
for i in range(1, len(ent)):
# We convert to int and back to str
# This could be done elsewhere (or not at all...)
ent[i] = str(int(ent[i]) + int(val[i]))
break
else:
# The value wasn't in the output list (so add it)
# This is a useful feature of the for loop, the following
# is only executed if the break command wasn't reached above
output_list.append(val)
#print input_list
print output_list
The above is not as efficient as using a dictionary or importing a library that can perform the same operation in a couple of lines, however it demonstrates a few features of the language. Be careful when working with lists though, the above modifies the input list (try un-commenting the print statement for the input list at the end).
Let us say you have
In [45]: temp
Out[45]:
[['Mike', '5', '1', '6', '2'],
['Mike', '1', '1', '1', '1'],
['Mike', '3', '0', '3', '0'],
['jose', '0', '1', '2', '3'],
['jose', '2', '3', '4', '5'],
['lisa', '4', '4', '4', '4']]
Then, you can use Pandas ...
import pandas as pd
temp = pd.DataFrame(temp)
def test(m):
try: return int(m)
except: return m
temp = temp.applymap(test)
print temp.groupby(0).agg(sum)
If you are importing it from a cvs file, you can directly read the file using pd.read_csv
You could use better solution as suggested but if you'd like to implement yourself and learn, you can follow and I will explain in comments:
# utilities for iteration. groupby makes groups from a collection
from itertools import groupby
# implementation of common, simple operations such as
# multiplication, getting an item from a list
from operator import itemgetter
def my_sum(groups):
return [
ls[0] if i == 0 else str(sum(map(int, ls))) # keep first one since it's name, sum otherwise
for i, ls in enumerate(zip(*groups)) # transpose elements and give number to each
]
# list comprehension to make a list from another list
# group lists according to first element and apply our function on grouped elements
# groupby reveals group key and elements but key isn't needed so it's set to underscore
result = [my_sum(g) for _, g in groupby(ls, key=itemgetter(0))]
To understand this code, you need to know about list comprehension, * operator, (int, enumerate, map, str, zip) built-ins and some handy modules, itertools and operator.
You edited to add header which will break our code so we need to remove it such that we need to pass ls[1:] to groupby instead of ls. Hope it helps.
As a beginner I would consider turning your data into a simpler structure like a dictionary, so that you are just summing a list of list. Assuming you get rid of the header row then you can turn this into a dictionary:
>>> data_dict = {}
>>> for row in data:
... data_dict.setdefault(row[0], []).append([int(i) for i in row[1:]])
>>> data_dict
{'Mike': [[5, 1, 6, 2], [1, 1, 1, 1], [3, 0, 3, 0]],
'jose': [[0, 1, 2, 3], [2, 3, 4, 5]],
'lisa': [[4, 4, 4, 4]]}
Now it should be relatively easy to loop over the dict and sum up the lists (you may want to look a sum and zip as a way to do that.
This is well suited for collections.Counter
from collections import Counter, defaultdict
csvdata = [['name','score1','score2','score3','score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
student_scores = defaultdict(Counter)
score_titles = csvdata[0][1:]
for row in csvdata[1:]:
student = row[0]
scores = dict(zip(score_titles, map(int, row[1:])))
student_scores[student] += Counter(scores)
print(student_scores["Mike"])
# >>> Counter({'score3':10, 'score1':9, 'score4':3, 'score2':2})
collections.defaultdict

Expanding a block of numbers in Python

Before I asked, I did some googling, and was unable to find an answer.
The scenario I have is this:
A list of numbers are passed to the script, either \n-delimited via a file, or comma-delimited via a command line arg. The numbers can be singular, or in blocks, like so:
File:
1
2
3
7-10
15
20-25
Command Line Arg:
1, 2, 3, 7-10, 15, 20-25
Both end up in the same list[]. I would like to expand the 7-10 or 20-25 blocks (obviously in the actual script these numbers will vary) and append them onto a new list with the final list looking like this:
['1','2','3','7','8','9','10','15','20','21','22','23','24','25']
I understand that something like .append(range(7,10)) could help me here, but I can't seem to be able to find out which elements of the original list[] have the need for expansion.
So, my question is this:
Given a list[]:
['1','2','3','7-10','15','20-25'],
how can I get a list[]:
['1','2','3','7','8','9','10','15','20','21','22','23','24','25']
So let's say you're given the list:
L = ['1','2','3','7-10','15','20-25']
and you want to expand out all the ranges contained therein:
answer = []
for elem in L:
if '-' not in elem:
answer.append(elem)
continue
start, end = elem.split('-')
answer.extend(map(str, range(int(start), int(end)+1)))
Of course, there's a handy one-liner for this:
answer = list(itertools.chain.from_iterable([[e] if '-' not in e else map(str, range(*[int(i) for i in e.split('-')]) + [int(i)]) for e in L]))
But this exploits the nature of leaky variables in python2.7, which I don't think will work in python3. Also, it's not exactly the most readable line of code. So I wouldn't really use it in production, if I were you... unless you really hate your manager.
References:  append()  continue  split()  extend()  map()  range()  list()  itertools.chain.from_iterable()  int()
Input:
arg = ['1','2','3','7-10','15','20-25']
Output:
out = []
for s in arg:
a, b, *_ = map(int, s.split('-') * 2)
out.extend(map(str, range(a, b+1)))
Or (in Python 2):
out = []
for s in arg:
r = map(int, s.split('-'))
out.extend(map(str, range(r[0], r[-1]+1)))
Good old map + reduce will come handy:
>>> elements = ['1','2','3','7-10','15','20-25']
>>> reduce(lambda original_list, element_list: original_list + map(str, element_list), [[element] if '-' not in element else range(*map(int, element.split('-'))) for element in elements])
['1', '2', '3', '7', '8', '9', '15', '20', '21', '22', '23', '24']
Well that would do the trick except that you want 20-25 to also contain 25... so here comes even more soup:
reduce(
lambda original_list, element_list: original_list + map(str, element_list),
[[element] if '-' not in element
else range(int(element.split('-')[0]), int(element.split('-')[1]) + 1)
for element in elements])
Now even though this works you are probably better off with some for-loop. Well that is a reason why they removed reduce in python 3.

How to convert strings to ints in a nested list? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to convert strings into integers in python?
listy = [['1', '2', '3', '4', '5'], "abc"]
for item in listy[0]:
int(item)
print listy
In a nested list, how can I change all those strings to ints? What's above gives me an output of:
[['1', '2', '3', '4', '5'], 'abc']
Why is that?
Thanks in advance!
You need to assign the converted items back to the sub-list (listy[0]):
listy[0][:] = [int(x) for x in listy[0]]
Explanation:
for item in listy[0]:
int(item)
The above iterates over the items in the sub-list and converts them to integers, but it does not assign the result of the expression int(item) to anything. Therefore the result is lost.
[int(x) for x in listy[0]] is a list comprehension (kind of shorthand for your for loop) that iterates over the list, converting each item to an integer and returning a new list. The new list is then assigned back (in place, optional) to the outer list.
This is a very custom solution for your specific question. A more general solution involves recursion to get at the sub-lists, and some way of detecting the candidates for numeric conversion.

Categories