Expanding a block of numbers in Python - python

Before I asked, I did some googling, and was unable to find an answer.
The scenario I have is this:
A list of numbers are passed to the script, either \n-delimited via a file, or comma-delimited via a command line arg. The numbers can be singular, or in blocks, like so:
File:
1
2
3
7-10
15
20-25
Command Line Arg:
1, 2, 3, 7-10, 15, 20-25
Both end up in the same list[]. I would like to expand the 7-10 or 20-25 blocks (obviously in the actual script these numbers will vary) and append them onto a new list with the final list looking like this:
['1','2','3','7','8','9','10','15','20','21','22','23','24','25']
I understand that something like .append(range(7,10)) could help me here, but I can't seem to be able to find out which elements of the original list[] have the need for expansion.
So, my question is this:
Given a list[]:
['1','2','3','7-10','15','20-25'],
how can I get a list[]:
['1','2','3','7','8','9','10','15','20','21','22','23','24','25']

So let's say you're given the list:
L = ['1','2','3','7-10','15','20-25']
and you want to expand out all the ranges contained therein:
answer = []
for elem in L:
if '-' not in elem:
answer.append(elem)
continue
start, end = elem.split('-')
answer.extend(map(str, range(int(start), int(end)+1)))
Of course, there's a handy one-liner for this:
answer = list(itertools.chain.from_iterable([[e] if '-' not in e else map(str, range(*[int(i) for i in e.split('-')]) + [int(i)]) for e in L]))
But this exploits the nature of leaky variables in python2.7, which I don't think will work in python3. Also, it's not exactly the most readable line of code. So I wouldn't really use it in production, if I were you... unless you really hate your manager.
References:  append()  continue  split()  extend()  map()  range()  list()  itertools.chain.from_iterable()  int()

Input:
arg = ['1','2','3','7-10','15','20-25']
Output:
out = []
for s in arg:
a, b, *_ = map(int, s.split('-') * 2)
out.extend(map(str, range(a, b+1)))
Or (in Python 2):
out = []
for s in arg:
r = map(int, s.split('-'))
out.extend(map(str, range(r[0], r[-1]+1)))

Good old map + reduce will come handy:
>>> elements = ['1','2','3','7-10','15','20-25']
>>> reduce(lambda original_list, element_list: original_list + map(str, element_list), [[element] if '-' not in element else range(*map(int, element.split('-'))) for element in elements])
['1', '2', '3', '7', '8', '9', '15', '20', '21', '22', '23', '24']
Well that would do the trick except that you want 20-25 to also contain 25... so here comes even more soup:
reduce(
lambda original_list, element_list: original_list + map(str, element_list),
[[element] if '-' not in element
else range(int(element.split('-')[0]), int(element.split('-')[1]) + 1)
for element in elements])
Now even though this works you are probably better off with some for-loop. Well that is a reason why they removed reduce in python 3.

Related

sort list elements in python in ascending order [duplicate]

I have dict in Python with keys of the following form:
mydict = {'0' : 10,
'1' : 23,
'2.0' : 321,
'2.1' : 3231,
'3' : 3,
'4.0.0' : 1,
'4.0.1' : 10,
'5' : 11,
# ... etc
'10' : 32,
'11.0' : 3,
'11.1' : 243,
'12.0' : 3,
'12.1.0': 1,
'12.1.1': 2,
}
Some of the indices have no sub-values, some have one level of sub-values and some have two. If I only had one sub-level I could treat them all as numbers and sort numerically. The second sub-level forces me to handle them all as strings. However, if I sort them like strings I'll have 10 following 1 and 20 following 2.
How can I sort the indices correctly?
Note: What I really want to do is print out the dict sorted by index. If there's a better way to do it than sorting it somehow that's fine with me.
You can sort the keys the way that you want, by splitting them on '.' and then converting each of the components into an integer, like this:
sorted(mydict.keys(), key=lambda a:map(int,a.split('.')))
which returns this:
['0',
'1',
'2.0',
'2.1',
'3',
'4.0.0',
'4.0.1',
'5',
'10',
'11.0',
'11.1',
'12.0',
'12.1.0',
'12.1.1']
You can iterate over that list of keys, and pull the values out of your dictionary as needed.
You could also sort the result of mydict.items(), very similarly:
sorted(mydict.items(), key=lambda a:map(int,a[0].split('.')))
This gives you a sorted list of (key, value) pairs, like this:
[('0', 10),
('1', 23),
('2.0', 321),
('2.1', 3231),
('3', 3),
# ...
('12.1.1', 2)]
Python's sorting functions can take a custom compare function, so you just need to define a function that compares keys the way you like:
def version_cmp(a, b):
'''These keys just look like version numbers to me....'''
ai = map(int, a.split('.'))
bi = map(int, b.split('.'))
return cmp(ai, bi)
for k in sorted(mydict.keys(), version_cmp):
print k, mydict[k]
In this case you should better to use the key parameter to sorted(), though. See Ian Clelland's answer for an example for that.
As an addendum to Ian Clelland's answer, the map() call can be replaced with a list comprehension... if you prefer that style. It may also be more efficient (though negligibly in this case I suspect).
sorted(mydict.keys(), key=lambda a: [int(i) for i in a.split('.')])
For fun & usefulness (for googling ppl, mostly):
f = lambda i: [int(j) if re.match(r"[0-9]+", j) else j for j in re.findall(r"([0-9]+|[^0-9]+)", i)]
cmpg = lambda x, y: cmp(f(x), f(y))
use as sorted(list, cmp=cmpg).
Additionally, regexes might be pre-compiled (rarely necessary though, actually, with re module's caching).
And, it may be (easily) modified, for example, to include negative values (add -? to num regex, probably) and/or to use float values.
It might be not very efficient, but even with that it's quite useful.
And, uhm, it can be used as key= for sorted() too.
I would do a search on "sorting a python dictionary" and take a look at the answers. I would give PEP-265 a read as well. The sorted() function is what you are looking for.
There is a nice sorting HOWTO on the python web site: http://wiki.python.org/moin/HowTo/Sorting .
It makes a good introduction to sorting, and discusses different techniques to adapt the sorting result to your needs.

List comprehension with duplicated function call [duplicate]

This question already has answers here:
Python list comprehension - want to avoid repeated evaluation
(12 answers)
Closed 5 years ago.
I want to transform a string such as following:
' 1 , 2 , , , 3 '
into a list of non-empty elements:
['1', '2', '3']
My solution is this list comprehension:
print [el.strip() for el in mystring.split(",") if el.strip()]
Just wonder, is there a nice, pythonic way to write this comprehension without calling el.strip() twice?
You can use a generator inside the list comprehension:
[x for x in (el.strip() for el in mylist.split(",")) if x]
# \__________________ ___________________/
# v
# internal generator
The generator thus will provide stripped elements, and we iterate over the generator, and only check the truthiness. We thus save on el.strip() calls.
You can also use map(..) for this (making it more functional):
[x for x in map(str.strip, mylist.split(",")) if x]
# \______________ ________________/
# v
# map
But this is basically the same (although the logic of the generator is - in my opinion - better encapsulated).
As a simple alternative to get a list of non-empty elements (in addition to previous good answers):
import re
s = ' 1 , 2 , , , 3 '
print(re.findall(r'[^\s,]+', s))
The output:
['1', '2', '3']
How about some regex to extract all the numbers from the string
import re
a = ' 1 , 2 , , , 3 '
print(re.findall(r'\d+', a))
Output:
['1', '2', '3']
In just one line of code that's about as terse you're going to get. Ofcourse, if you want to get fanciful you can try the functional approach:
filter(lambda x: x, map(lambda x: x.strip(), mylist.split(',')))
But this gets you terseness in exchange for visibility
Go full functional with map and filter by using:
s = ' 1 , 2 , , , 3 '
res = filter(None, map(str.strip, s.split(',')))
though similar to #omu_negru's answer, this avoids using lambdas which are arguably pretty ugly but, also, slow things down.
The argument None to filter translates to: filter on truthness, essentially x for x in iterable if x, while the map just maps the method str.strip (which has a default split value of whitespace) to the iterable obtained from s.split(',').
On Python 2, where filter still returns a list, this approach should easily edge out the other approaches in speed.
In Python 3 one would have to use:
res = [*filter(None, map(str.strip, s.split(',')))]
in order to get the list back.
If you have imported "re", then re.split() will work:
import re
s=' 1 , 2 , , , 3 '
print ([el for el in re.split(r"[, ]+",s) if el])
['1', '2', '3']
If strings separated by only spaces (with no intervening comma) should not be separated, then this will work:
import re
s=' ,,,,, ,,,, 1 , 2 , , , 3,,,,,4 5, 6 '
print ([el for el in re.split(r"\s*,\s*",s.strip()) if el])
['1', '2', '3', '4 5', '6']
List comprehensions are wonderful, but it's not illegal to use more than one line of code! You could even - heaven forbid - use a for loop!
result = []
for el in mystring.split(",")
x = el.strip()
if x:
result.append(x)
Here's a two-line version. It's actually the same as the accepted answer by Willem Van Onsem, but with a name given to a subexpression (and a generator changed to a list but it makes essentially no difference for a problem this small). In my view, this makes it a lot easier to read, despite taking fractionally more code.
all_terms = [el.strip() for el in mystring.split(",")]
non_empty_terms = [x for x in all_terms if x]
Some of the other answers are certainly shorter, but I'm not convinced any of them are simpler/easier to understand. Actually, I think the best answer is just the one in your question, because the repetition in this case is quite minor.

Python issue with list and join function

How do I append two digit integer into a list using for loop without splitting them. For example I give the computer 10,14,13,15 and I get something like 1,0,1,4,1,3,1,5. I tried to go around this, but I ended up with a new issue, which is Type Error: sequence item 0: expected string, int found
def GetNumbers(List):
q=[]
Numberlist = []
for i in List:
if i.isdigit():
q.append(int(i))
else:
Numberlist.append(''.join(q[:]))
del q[:]
return Numberlist
Ideal way will be to use str.split() function as:
>>> my_num_string = "10,14,13,15"
>>> my_num_string.split(',')
['10', '14', '13', '15']
But, since you mentioned you can not use split(), you may use regex expression to extract numbers from string as:
>>> import re
>>> re.findall('\d+', my_num_string)
['10', '14', '13', '15']
Else, if you do not want to go with any fancy method, you may achieve it with simple for loop as:
num_str, num_list = '', []
# ^ Needed for storing the state of number while iterating over
# the string character by character
for c in my_num_string:
if c.isdigit():
num_str += c
else:
num_list.append(num_str)
num_str = ''
The numbers in num_list will be in the form of str. In order to convert them to int, you may explicitly convert them as:
num_list = [int(i) for i in num_list] # OR, list(map(int, num_list))

python add specific lists within a list

For this problem I am dealing with a big list,that it was imported from a CSV file, but let's say
I have a list like this:
[['name','score1','score2''score3''score4']
['Mike','5','1','6','2']
['Mike','1','1','1','1']
['Mike','3','0','3','0']
['jose','0','1','2','3']
['jose','2','3','4','5']
['lisa','4','4','4','4']]
and I want to have another list with this form(the sum of all score for each student):
[['Mike','9','2','10','3']
['jose','2','4','6','8']
['lisa','4','4','4','4']]
any ideas how this can be done?
I've been trying many ways, and I could not make it.
I was stuck when there where more than 2 same names, my solution only kept the last 2 lines to add.
I am new in python and programming in general.
If you are just learning Python I always recommend try to implement things without relying on external libraries. A good starting step is to start by trying to break the problem up into smaller components:
Remove the first entry (the column titles) from the input list. You don't need it for your result.
For each remaining entry:
Convert every entry except the first to an integer (so you can add them).
Determine if you have already encountered an entry with the same name (first column value). If not: add the entry to the output list. Otherwise: merge the entry with the one already in the output list (by adding values in the columns).
One possible implementation follows (untested):
input_list = [['name','score1','score2''score3''score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
print input_list
# Remove the first element
input_list = input_list[1:]
# Initialize an empty output list
output_list = []
# Iterate through each entry in the input
for val in input_list:
# Determine if key is already in output list
for ent in output_list:
if ent[0] == val[0]:
# The value is already in the output list (so merge them)
for i in range(1, len(ent)):
# We convert to int and back to str
# This could be done elsewhere (or not at all...)
ent[i] = str(int(ent[i]) + int(val[i]))
break
else:
# The value wasn't in the output list (so add it)
# This is a useful feature of the for loop, the following
# is only executed if the break command wasn't reached above
output_list.append(val)
#print input_list
print output_list
The above is not as efficient as using a dictionary or importing a library that can perform the same operation in a couple of lines, however it demonstrates a few features of the language. Be careful when working with lists though, the above modifies the input list (try un-commenting the print statement for the input list at the end).
Let us say you have
In [45]: temp
Out[45]:
[['Mike', '5', '1', '6', '2'],
['Mike', '1', '1', '1', '1'],
['Mike', '3', '0', '3', '0'],
['jose', '0', '1', '2', '3'],
['jose', '2', '3', '4', '5'],
['lisa', '4', '4', '4', '4']]
Then, you can use Pandas ...
import pandas as pd
temp = pd.DataFrame(temp)
def test(m):
try: return int(m)
except: return m
temp = temp.applymap(test)
print temp.groupby(0).agg(sum)
If you are importing it from a cvs file, you can directly read the file using pd.read_csv
You could use better solution as suggested but if you'd like to implement yourself and learn, you can follow and I will explain in comments:
# utilities for iteration. groupby makes groups from a collection
from itertools import groupby
# implementation of common, simple operations such as
# multiplication, getting an item from a list
from operator import itemgetter
def my_sum(groups):
return [
ls[0] if i == 0 else str(sum(map(int, ls))) # keep first one since it's name, sum otherwise
for i, ls in enumerate(zip(*groups)) # transpose elements and give number to each
]
# list comprehension to make a list from another list
# group lists according to first element and apply our function on grouped elements
# groupby reveals group key and elements but key isn't needed so it's set to underscore
result = [my_sum(g) for _, g in groupby(ls, key=itemgetter(0))]
To understand this code, you need to know about list comprehension, * operator, (int, enumerate, map, str, zip) built-ins and some handy modules, itertools and operator.
You edited to add header which will break our code so we need to remove it such that we need to pass ls[1:] to groupby instead of ls. Hope it helps.
As a beginner I would consider turning your data into a simpler structure like a dictionary, so that you are just summing a list of list. Assuming you get rid of the header row then you can turn this into a dictionary:
>>> data_dict = {}
>>> for row in data:
... data_dict.setdefault(row[0], []).append([int(i) for i in row[1:]])
>>> data_dict
{'Mike': [[5, 1, 6, 2], [1, 1, 1, 1], [3, 0, 3, 0]],
'jose': [[0, 1, 2, 3], [2, 3, 4, 5]],
'lisa': [[4, 4, 4, 4]]}
Now it should be relatively easy to loop over the dict and sum up the lists (you may want to look a sum and zip as a way to do that.
This is well suited for collections.Counter
from collections import Counter, defaultdict
csvdata = [['name','score1','score2','score3','score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
student_scores = defaultdict(Counter)
score_titles = csvdata[0][1:]
for row in csvdata[1:]:
student = row[0]
scores = dict(zip(score_titles, map(int, row[1:])))
student_scores[student] += Counter(scores)
print(student_scores["Mike"])
# >>> Counter({'score3':10, 'score1':9, 'score4':3, 'score2':2})
collections.defaultdict

Converting a string to a list of 2-tuples

I have strings of this shape:
d="M 997.14282,452.3622 877.54125,539.83678 757.38907,453.12006 802.7325,312.0516 950.90847,311.58322 Z"
which are (x, y) coordinates of a pentagon (the first and last letters are metadata and to be ignored). What I want is a list of 2-tuples that would represent the coordinates in floating points without all the cruft:
d = [(997.14282, 452.3622), (877.54125, 539.83678), (757.38907, 453.12006), (802.7325,312.0516), (950.90847, 311.58322)]
Trimming the string was easy:
>>> d.split()[1:-2]
['997.14282,452.3622', '877.54125,539.83678', '757.38907,453.12006', '802.7325,312.0516']
but now I want to create the tuples in a succinct way. This obviously didn't work:
>>> tuple('997.14282,452.3622')
('9', '9', '7', '.', '1', '4', '2', '8', '2', ',', '4', '5', '2', '.', '3', '6', '2', '2')
Taking the original string, I could write something like this:
def coordinates(d):
list_of_coordinates = []
d = d.split()[1:-2]
for elem in d:
l = elem.split(',')
list_of_coordinates.append((float(l[0]), float(l[1])))
return list_of_coordinates
which works fine:
>>> coordinates("M 997.14282,452.3622 877.54125,539.83678 757.38907,453.12006 802.7325,312.0516 950.90847,311.58322 Z")
[(997.14282, 452.3622), (877.54125, 539.83678), (757.38907, 453.12006), (802.7325, 312.0516)]
However this processing is a small and trivial part of a bigger program and I'd rather keep it as short and succinct as possible. Can anyone please show me a less verbose way to convert the string to the list of 2-tuples?
A note, not sure if this is intended - when you do d.split()[1:-2] , you are losing the last coordinate. Assuming that is not intentional , A one liner for this would be -
def coordinates1(d):
return [tuple(map(float,coords.split(','))) for coords in d.split()[1:-1]]
If losing the last coordinate is intentional, use [1:-2] in the above code.
You can do this in one line using list comprehension.
x = [tuple(float(j) for j in i.split(",")) for i in d.split()[1:-2]]
This goes through d.split()[1:-2]], each pair that should be grouped together, splits them by a comma, converts each item in that to a float, and groups them together in a tuple.
Also, you might want to use d.split()[1:-1] because using -2 cuts out the last pair of coordinates.
While you do all right, it's could be some compressed using list comprehension or some functional stuff (i mean "map"):
def coordinates(d):
d = d[2:-2].split() # yeah, split here into pairs
d = map(str.split, d, ","*len(d)) # another split, into tokens
# here we'd multiplied string to get right size iterable
return list(map(tuple, d)) # and last map with creating list
# from "map object"
Of couse it can be reduced into one-line with list comprehension, but readablity would be reduced too (while right now code is read hard). And although Guido hates functional programming i'm find this more logical... After some practice. Good luck!

Categories