Creating a diff array using lambda functions in python - python

I wish to create a diff array in python as follows
>>> a = [1,5,3,8,2,4,7,6]
>>> diff = []
>>> a = sorted(a,reverse=True)
>>> for i in xrange(len(a)-1):
diff.append(a[i]-a[i+1])
But I wanted to refactor the above code. I tried to achieve it using lambda functions. But failed to get the result.
>>> [i for i in lambda x,y:y-x,sorted(a,reverse=True)]
The above code returns
[<function <lambda> at 0x00000000023B9C18>, [1, 2, 3, 4, 5, 6, 7, 8]]
I wished to know can the required functionality be achieved using lambda functions or any other technique?
Thanks in advance for any help!!
NOTES:
1) Array 'a' can be huge. Just for the sake of example I have taken a small array.
2) The result must be achieved in minimum time.

If you can use numpy:
import numpy as np
a = [1,5,3,8,2,4,7,6]
j = np.diff(np.sorted(a)) # array([1, 1, 1, 1, 1, 1, 1])
print list(j)
# [1, 1, 1, 1, 1, 1, 1]
k = np.diff(a) # array([ 4, -2, 5, -6, 2, 3, -1])
print list(k)
# [4, -2, 5, -6, 2, 3, -1]
Timing comparisons with one-hundred-thousand random ints - numpy is faster if the data needs to be sorted:
from timeit import Timer
a = [random.randint(0, 1000000) for _ in xrange(100000)]
##print a[:100]
def foo(a):
a = sorted(a, reverse=True)
return [a[i]-a[i+1] for i in xrange(len(a)-1)]
def bar(a):
return np.diff(np.sort(a))
t = Timer('foo(a)', 'from __main__ import foo, bar, np, a')
print t.timeit(10)
# 0.86916993838
t = Timer('bar(a)', 'from __main__ import foo, bar, np, a')
print t.timeit(10)
# 0.28586356791

You can use list comprehension, as follows:
>>> a = sorted([1,5,3,8,2,4,7,6], reverse=True)
>>> diff = [a[i]-a[i+1] for i in xrange(len(a)-1)]
>>> diff
[1, 1, 1, 1, 1, 1, 1]
>>>
You said or any other technique, so I take this to be valid. However, I haven't found a working lambda solution yet :)
Comparing the time of this answer with all of the below:
Mine:
1.59740447998e-05 seconds
#Marcin's
0.00110197067261 seconds
#roippi's
0.000382900238037
#wwii's
0.00154685974121
Therefore, mine was clearly the fastest by more than twice, followed by #roippi, followed by #Marcin, followed by #wwi.
P.S. I was completely unbiased here, my timing method was using current time.time() minus previous time.time().

a = [1,5,3,8,2,4,7,6]
a = sorted(a,reverse=True)
Can't really improve these lines. You need to transform your data by sorting it, no sense changing what you've done.
from itertools import izip, starmap
from operator import sub
list(starmap(sub,izip(a,a[1:])))
Out[12]: [1, 1, 1, 1, 1, 1, 1]
If a is really massive, you can replace the a[1:] slice with islice to save on memory overhead:
list(starmap(sub,izip(a,islice(a,1,None))))
Though if it is really that massive, you should probably be using numpy anyway.
np.diff(a) * -1
Out[24]: array([1, 1, 1, 1, 1, 1, 1])

You could do as follows:
diff = [v[0] - v[1] for v in zip(sorted(a,reverse=True)[0:-1], sorted(a,reverse=True)[1:])]
#gives: diff = [1, 1, 1, 1, 1, 1, 1]
Though here you use sorting twice. Not sure if this matters to you or not.
As #aj8uppal sugested its better to have a as sorted version before, so in this case you do:
a = sorted([1,5,3,8,2,4,7,6], reverse=True)
diff = [v[0] - v[1] for v in zip(a[0:-1], a[1:])]
#gives: diff = [1, 1, 1, 1, 1, 1, 1]

Related

how to make sure that two numbers next to each other in a list are different

I have a simple code that generates a list of random numbers.
x = [random.randrange(0,11) for i in range(10)]
The problem I'm having is that, since it's random, it sometimes produces duplicate numbers right next to each other. How do I change the code so that it never happens? I'm looking for something like this:
[1, 7, 2, 8, 7, 2, 8, 2, 6, 5]
So that every time I run the code, all the numbers that are next to each other are different.
x = []
while len(x) < 10:
r = random.randrange(0,11)
if not x or x[-1] != r:
x.append(r)
x[-1] contains the last inserted element, which we check not to be the same as the new random number. With not x we check that the array is not empty, as it would generate a IndexError during the first iteration of the loop
Here's an approach that doesn't rely on retrying:
>>> import random
>>> x = [random.choice(range(12))]
>>> for _ in range(9):
... x.append(random.choice([*range(x[-1]), *range(x[-1]+1, 12)]))
...
>>> x
[6, 2, 5, 8, 1, 8, 0, 4, 6, 0]
The idea is to choose each new number by picking from a list that excludes the previously picked number.
Note that having to re-generate a new list to pick from each time keeps this from actually being an efficiency improvement. If you were generating a very long list from a relatively short range, though, it might be worthwhile to generate different pools of numbers up front so that you could then select from the appropriate one in constant time:
>>> pool = [[*range(i), *range(i+1, 3)] for i in range(3)]
>>> x = [random.choice(random.choice(pool))]
>>> for _ in range(10000):
... x.append(random.choice(pool[x[-1]]))
...
>>> x
[0, 2, 0, 2, 0, 2, 1, 0, 1, 2, 0, 1, 2, 1, 0, ...]
O(n) solution by adding to the last element randomly from [1,stop) modulo stop
import random
x = [random.randrange(0,11)]
x.extend((x[-1]+random.randrange(1,11)) % 11 for i in range(9))
x
Output
[0, 10, 4, 5, 10, 1, 4, 8, 0, 9]
from random import randrange
from itertools import islice, groupby
# Make an infinite amount of randrange's results available
pool = iter(lambda: randrange(0, 11), None)
# Use groupby to squash consecutive values into one and islice to at most 10 in total
result = [v for v, _ in islice(groupby(pool), 10)]
Function solution that doesn't iterate to check for repeats, just checks each add against the last number in the list:
import random
def get_random_list_without_neighbors(lower_limit, upper_limit, length):
res = []
# add the first number
res.append(random.randrange(lower_limit, upper_limit))
while len(res) < length:
x = random.randrange(lower_limit, upper_limit)
# check that the new number x doesn't match the last number in the list
if x != res[-1]:
res.append(x)
return res
>>> print(get_random_list_without_neighbors(0, 11, 10)
[10, 1, 2, 3, 1, 8, 6, 5, 6, 2]
def random_sequence_without_same_neighbours(n, min, max):
x = [random.randrange(min, max + 1)]
uniq_value_count = max - min + 1
next_choises_count = uniq_value_count - 1
for i in range(n - 1):
circular_shift = random.randrange(0, next_choises_count)
x.append(min + (x[-1] + circular_shift + 1) % uniq_value_count)
return x
random_sequence_without_same_neighbours(n=10, min=0, max=10)
It's not to much pythonic but you can do something like this
import random
def random_numbers_generator(n):
"Generate a list of random numbers but without two duplicate numbers in a row "
result = []
for _ in range(n):
number = random.randint(1, n)
if result and number == result[-1]:
continue
result.append(number)
return result
print(random_numbers_generator(10))
Result:
3, 6, 2, 4, 2, 6, 2, 1, 4, 7]

Reading a text document containing python list into a python program

I have a text file(dummy.txt) which reads as below:
['abc',1,1,3,3,0,0]
['sdf',3,2,5,1,3,1]
['xyz',0,3,4,1,1,1]
I expect this to be in lists in python as below:
article1 = ['abc',1,1,3,3,0,0]
article2 = ['sdf',3,2,5,1,3,1]
article3 = ['xyz',0,3,4,1,1,1]
That many articles have to be created as many lines present in dummy.txt
I was trying the following things:
Opened the file, split it by '\n' and appended it to an empty list in python, it had extra quotes and square brackets hence tried to use 'ast.literal_eval' which did not work as well.
my_list = []
fvt = open("dummy.txt","r")
for line in fvt.read():
my_list.append(line.split('\n'))
my_list = ast.literal_eval(my_list)
I also tried to manually remove additional quotes and extra square brackets using replace, that did not help me either. Any leads much appreciated.
This should help.
import ast
myLists = []
with open(filename) as infile:
for line in infile: #Iterate Each line
myLists.append(ast.literal_eval(line)) #Convert to python object and append.
print(myLists)
Output:
[['abc', 1, 1, 3, 3, 0, 0], ['sdf', 3, 2, 5, 1, 3, 1], ['xyz', 0, 3, 4, 1, 1, 1]]
fvt.read() will produce the entire file string, so that means line will contain a single character string. So this will not work very well, you also use literal_eval(..) with the entire list of strings, and not a single string.
You can obtain the results by iterating over the file handler, and each time call literal_eval(..) on a single line:
from ast import literal_eval
with open("dummy.txt","r") as f:
my_list = [literal_eval(line) for line in f]
or by using map:
from ast import literal_eval
with open("dummy.txt","r") as f:
my_list = list(map(literal_eval, f))
We then obtain:
>>> my_list
[['abc', 1, 1, 3, 3, 0, 0], ['sdf', 3, 2, 5, 1, 3, 1], ['xyz', 0, 3, 4, 1, 1, 1]]
ast.literal_eval is the right approach. Note that creating a variable number of variables like article1, article2, ... is not a good idea. Use a dictionary instead if your names are meaningful, a list otherwise.
As Willem mentioned in his answer fvt.read() will give you the whole file as one string. It is much easier to exploit the fact that files are iterable line-by-line. Keep the for loop, but get rid of the call to read.
Additionally,
my_list = ast.literal_eval(my_list)
is problematic because a) you evaluate the wrong data structure - you want to evaluate the line, not the list my_list to which you append and b) because you reassign the name my_list, at this point the old my_list is gone.
Consider the following demo. (Replace fake_file with the actual file you are opening.)
>>> from io import StringIO
>>> from ast import literal_eval
>>>
>>> fake_file = StringIO('''['abc',1,1,3,3,0,0]
... ['sdf',3,2,5,1,3,1]
... ['xyz',0,3,4,1,1,1]''')
>>> result = [literal_eval(line) for line in fake_file]
>>> result
[['abc', 1, 1, 3, 3, 0, 0], ['sdf', 3, 2, 5, 1, 3, 1], ['xyz', 0, 3, 4, 1, 1, 1]]
Of course, you could also use a dictionary to hold the evaluated lines:
>>> result = {'article{}'.format(i):literal_eval(line) for i, line in enumerate(fake_file, 1)}
>>> result
{'article2': ['sdf', 3, 2, 5, 1, 3, 1], 'article1': ['abc', 1, 1, 3, 3, 0, 0], 'article3': ['xyz', 0, 3, 4, 1, 1, 1]}
where now you can issue
>>> result['article2']
['sdf', 3, 2, 5, 1, 3, 1]
... but as these names are not very meaningful, I'd just go for the list instead which you can index with 0, 1, 2, ...
When I do this:
import ast
x = '[ "A", 1]'
x = ast.literal_eval(x)
print(x)
I get:
["A", 1]
So, your code should be:
for line in fvt.read():
my_list.append(ast.literal_eval(line))
Try this split (no imports needed) (i recommend):
with open('dummy.txt','r') as f:
l=[i[1:-1].strip().replace("'",'').split(',') for i in f]
Now:
print(l)
Is:
[['abc', 1, 1, 3, 3, 0, 0], ['sdf', 3, 2, 5, 1, 3, 1], ['xyz', 0, 3, 4, 1, 1, 1]]
As expected!!!

numpy.searchsorted for multiple instances of the same entry - python

I have the following variables:
import numpy as np
gens = np.array([2, 1, 2, 1, 0, 1, 2, 1, 2])
p = [0,1]
I want to return the entries of gens that match each element of p.
So ideally I would like it to return:
result = [[4],[2,3,5,7],[0,2,6,8]]
#[[where matched 0], [where matched 1], [the rest]]
--
My attempts so far only work with one variable:
indx = gens.argsort()
res = np.searchsorted(gens[indx], [0])
gens[res] #gives 4, which is the position of 0
But I try with with
indx = gens.argsort()
res = np.searchsorted(gens[indx], [1])
gens[res] #gives 1, which is the position of the first 1.
So:
how can I search for an entry that has multiple occurrences
how can I search for multiple entries each of which have multiple occurrences?
You can use np.where
>>> np.where(gens == p[0])[0]
array([4])
>>> np.where(gens == p[1])[0]
array([1, 3, 5, 7])
>>> np.where((gens != p[0]) & (gens != p[1]))[0]
array([0, 2, 6, 8])
Or np.in1d and np.nonzero
>>> np.nonzero(np.in1d(gens, p[0]))[0]
>>> np.nonzero(np.in1d(gens, p[1]))[0]
>>> np.nonzero(~np.in1d(gens, p))[0]

Extract a larger slice than the numpy array's size

I want to extract a slice of length 10, beginning at index 2, of a numpy array A:
import numpy
A = numpy.array([1,3,5,3,9])
def bigslice(A, begin_at, length):
a = A[begin_at:begin_at + length]
while len(a) + len(A) < length:
a = numpy.concatenate((a,A))
return numpy.concatenate((a, A[:length-len(a)]))
print bigslice(A, begin_at = 2, length = 10)
#[5,3,9,1,3,5,3,9,1,3]
This is correct. But I'm looking for a more efficient way to do this (especially when I'll have arrays of thousands of elements at the end) : I suspect the concatenate used here to recreate lots of new temporary arrays, and that would be un-efficient.
How to do the same thing more efficiently ?
Since the middle part of the array is already known to you (i.e. n repetitions of the full array), you can simply construct the middle portion using np.tile:
def cyclical_slice(A, start, length):
arr_l = len(A)
middle = np.tile(A, length // arr_l)
return np.array([A[start:], middle, A[0:length - len(middle)]])
Your code doesn't seem to guarantee that you get a slice of length length, e.g.
>>> A = numpy.array([1,3,5,3,9])
>>> bigslice(A, 0, 3)
array([1, 3, 5, 3, 9, 1, 3, 5])
Assuming that this is an oversight, maybe you could use np.pad, e.g.
def wpad(A, begin_at, length):
to_pad = max(length + begin_at - len(A), 0)
return np.pad(A, (0, to_pad), mode='wrap')[begin_at:begin_at+length]
which gives
>>> wpad(A, 0, 3)
array([1, 3, 5])
>>> wpad(A, 0, 10)
array([1, 3, 5, 3, 9, 1, 3, 5, 3, 9])
>>> wpad(A, 2, 10)
array([5, 3, 9, 1, 3, 5, 3, 9, 1, 3])
and so on.

counting up and then down a range in python

I am trying to program a standard snake draft, where team A pick, team B, team C, team C, team B, team A, ad nauseum.
If pick number 13 (or pick number x) just happened how can I figure which team picks next for n number of teams.
I have something like:
def slot(n,x):
direction = 'down' if (int(x/n) & 1) else 'up'
spot = (x % n) + 1
slot = spot if direction == 'up' else ((n+1) - spot)
return slot
I have feeling there is a simpler, more pythonic what than this solution. Anyone care to take a hack at it?
So I played around a little more. I am looking for the return of a single value, rather than the best way to count over a looped list. The most literal answer might be:
def slot(n, x): # 0.15757 sec for 100,000x
number_range = range(1, n+1) + range(n,0, -1)
index = x % (n*2)
return number_range[index]
This creates a list [1,2,3,4,4,3,2,1], figures out the index (e.g. 13 % (4*2) = 5), and then returns the index value from the list (e.g. 4). The longer the list, the slower the function.
We can use some logic to cut the list making in half. If we are counting up (i.e. (int(x/n) & 1) returns False), we get the obvious index value (x % n), else we subtract that value from n+1:
def slot(n, x): # 0.11982 sec for 100,000x
number_range = range(1, n+1) + range(n,0, -1)
index = ((n-1) - (x % n)) if (int(x/n) & 1) else (x % n)
return number_range[index]
Still avoiding a list altogether is fastest:
def slot(n, x): # 0.07275 sec for 100,000x
spot = (x % n) + 1
slot = ((n+1) - spot) if (int(x/n) & 1) else spot
return slot
And if I hold the list as variable rather than spawning one:
number_list = [1,2,3,4,5,6,7,8,9,10,11,12,12,11,10,9,8,7,6,5,4,3,2,1]
def slot(n, x): # 0.03638 sec for 100,000x
return number_list[x % (n*2)]
Why not use itertools cycle function:
from itertools import cycle
li = range(1, n+1) + range(n, 0, -1) # e.g. [1, 2, 3, 4, 4, 3, 2, 1]
it = cycle(li)
[next(it) for _ in xrange(10)] # [1, 2, 3, 4, 4, 3, 2, 1, 1, 2]
Note: previously I had answered how to run up and down, as follows:
it = cycle(range(1, n+1) + range(n, 0, -1)) #e.g. [1, 2, 3, 4, 3, 2, 1, 2, 3, ...]
Here's a generator that will fulfill what you want.
def draft(n):
while True:
for i in xrange(1,n+1):
yield i
for i in xrange(n,0,-1):
yield i
>>> d = draft(3)
>>> [d.next() for _ in xrange(12)]
[1, 2, 3, 3, 2, 1, 1, 2, 3, 3, 2, 1]
from itertools import chain, cycle
def cycle_up_and_down(first, last):
up = xrange(first, last+1, 1)
down = xrange(last, first-1, -1)
return cycle(chain(up, down))
turns = cycle_up_and_down(1, 4)
print [next(turns) for n in xrange(10)] # [1, 2, 3, 4, 4, 3, 2, 1, 1, 2]
Here is a list of numbers that counts up, then down:
>>> [ -abs(5-i)+5 for i in range(0,10) ]
[0, 1, 2, 3, 4, 5, 4, 3, 2, 1]
Written out:
count_up_to = 5
for i in range( 0, count_up_to*2 ):
the_number_you_care_about = -abs(count_up_to-i) + count_up_to
# do stuff with the_number_you_care_about
Easier to read:
>>> list( range(0,5) ) + list( range( 5, 0, -1 ) )
[0, 1, 2, 3, 4, 5, 4, 3, 2, 1]
Written out:
count_up_to = 5
for i in list( range(0,5) ) + list( range(5, 0, -1) ):
# i is the number you care about
Another way:
from itertools import chain
for i in chain( range(0,5), range(5,0,-1) ):
# i is the number you care about

Categories