Replace entry in specific numpy array stored in dictionary - python

I have a dictionary containing a variable number of numpy arrays (all same length), each array is stored in its respective key.
For each index I want to replace the value in one of the arrays by a newly calculated value. (This is a very simplyfied version what I'm actually doing.)
The problem is that when I try this as shown below, the value at the current index of every array in the dictionary is replaced, not just the one I specify.
Sorry if the formatting of the example code is confusing, it's my first question here (Don't quite get how to show the line example_dict["key1"][idx] = idx+10 properly indented in the next line of the for loop...).
>>> import numpy as np
>>> example_dict = dict.fromkeys(["key1", "key2"], np.array(range(10)))
>>> example_dict["key1"]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> example_dict["key2"]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> for idx in range(10):
example_dict["key1"][idx] = idx+10
>>> example_dict["key1"]
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> example_dict["key2"]
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
I expected the loop to only access the array in example_dict["key1"], but somehow the same operation is applied to the array stored in example_dict["key2"] as well.

>>> hex(id(example_dict["key1"]))
'0x26a543ea990'
>>> hex(id(example_dict["key2"]))
'0x26a543ea990'
example_dict["key1"] and example_dict["key2"] are pointing at the same address. To fix this, you can use a dict comprehension.
import numpy
keys = ["key1", "key2"]
example_dict = {key: numpy.array(range(10)) for key in keys}

Related

This particular way of using .map() in python

I was reading an article and I came across this below-given piece of code. I ran it and it worked for me:
x = df.columns
x_labels = [v for v in sorted(x.unique())]
x_to_num = {p[1]:p[0] for p in enumerate(x_labels)}
#till here it is okay. But I don't understand what is going with this map.
x.map(x_to_num)
The final result from the map is given below:
Int64Index([ 0, 3, 28, 1, 26, 23, 27, 22, 20, 21, 24, 18, 10, 7, 8, 15, 19,
13, 14, 17, 25, 16, 9, 11, 6, 12, 5, 2, 4],
dtype='int64')
Can someone please explain to me how the .map() worked here. I searched online, but could not find anything related.
ps: df is a pandas dataframe.
Let's look what .map() function in general does in python.
>>> l = [1, 2, 3]
>>> list(map(str, l))
# ['1', '2', '3']
Here the list having numeric elements is converted to string elements.
So, whatever function we are trying to apply using map needs an iterator.
You probably might have got confused because the general syntax of map (map(MappingFunction, IteratorObject)) is not used here and things still work.
The variable x takes the form of IteratorObject , while the dictionary x_to_num contains the mapping and hence takes the form of MappingFunction.
Edit: this scenario has nothing to with pandas as such, x can be any iterator type object.

Finding where a given number falls in a partition

Suppose I have a sorted array of integers say
partition = [0, 3, 7, 12, 18, 23, 27]
and then given a value
value = 9
I would like to return the interval on which my value sits. For example
bounds = function(partition, value)
print(bounds)
>>>[7,12]
Is there a function out there that might be able to help me or do I have to build this from scratch?
Try numpy.searchsorted(). From the documentary:
Find indices where elements should be inserted to maintain order.
import numpy as np
partition = np.array( [0, 3, 7, 12, 18, 23, 27] )
value = 9
idx = np.searchsorted(partition,value)
bound = (partition[idx-1],partition[idx])
print(bound)
>>>>(7,12)
The advantage of searchsorted is that it can give you the index for multiple values at once.
The bisect module is nice for doing this efficiently. It will return the index of the higher bound.
You'll need to do some error checking if the value can fall outside the bounds:
from bisect import bisect
partition = [0, 3, 7, 12, 18, 23, 27]
value = 9
top = bisect(partition, value)
print(partition[top-1], partition[top])
# 7 12
def function(partition,value):
for i in range(len(partition)):
if partition[i]<value and partition[i+1]>value:
print [partition[i],partition[i+1]]
partition = [0, 3, 7, 12, 18, 23, 27,5,10]
value=9
function(partition,value)

Save list values in csv file with Python

I have a Python simple list like this one:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
I want to save it in a csv file that looks like this:
1,2,3,4,5,6,7,8
9,10,11,12,13,14,15,16
How can I do that? I tried:
np.savetxt('fname.csv', bbox_form, fmt='%d')
But I don't know how to write a new line only after 8 values.
col_num=8
row_num=len(a)/col_num
b=np.reshape(a, [row_num,col_num])
d=[','.join(map(str,c)) for c in b]
np.savetxt('fname.csv', d, fmt='%s')
This should work
Numpy will save each "row" in your array to a line. If you want to save it to multiple lines, then you'll need to reshape your array.
Have a look here:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html

Working around evaluation time discrepancy in generators

I found myself running into the gotcha under 'evaluation time discrepancy' from this list today, and am having a hard time working around it.
As a short demonstration of my problem, I make infinite generators that skip every nth number, with n going from [2..5]:
from itertools import count
skip_lists = []
for idx in range(2, 5):
# skip every 2nd, 3rd, 4th.. number
skip_lists.append(x for x in count() if (x % idx) != 0)
# print first 10 numbers of every skip_list
for skip_list in skip_lists:
for _, num in zip(range(10), skip_list):
print("{}, ".format(num), end="")
print()
Expected output:
1, 3, 5, 7, 9, 11, 13, 15, 17, 19,
1, 2, 4, 5, 7, 8, 10, 11, 13, 14,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
Actual output:
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
Once I remembered that great feature, I tried to "solve" it by binding the if clause variable to a constant that would be part of the skip_list:
from itertools import count
skip_lists = []
for idx in range(2, 5):
# bind the skip distance
skip_lists.append([idx])
# same as in the first try, but use bound value instead of 'idx'
skip_lists[-1].append(x for x in count() if (x % skip_lists[-1][0]) != 0)
# print first 10 numbers of every skip_list
for skip_list in (entry[1] for entry in skip_lists):
for _, num in zip(range(10), skip_list):
print("{}, ".format(num), end="")
print()
But again:
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
1, 2, 3, 5, 6, 7, 9, 10, 11, 13,
Apart from an actual solution, I would also love to learn why my hack didn't work.
The value of idx is never looked up until you start iterating on the generators (generators are evaluated lazily), at which point idx = 4 the latest iteratee value, is what is present in the module scope.
You can make each appended generator stateful in idx by passing idx to a function and reading the value from the function scope at each generator's evaluation time. This exploits the fact that the iterable source of a generator expression is evaluated at the gen. exp's creation time, so the function is called at each iteration of the loop, and idx is safely stored away in the function scope:
from itertools import count
skip_lists = []
def skip_count(skip):
return (x for x in count() if (x % skip) != 0)
for idx in range(2, 5):
# skip every 2nd, 3rd, 4th.. number
skip_lists.append(skip_count(idx))
Illustration of generator expression's iterable source evaluation at gen. exp's creation:
>>> (i for i in 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
Your case is a bit trickier since the exclusions are actually done in a filter which is not evaluated at the gen exp's creation time:
>>> (i for i in range(2) if i in 5)
<generator object <genexpr> at 0x109a0da50>
The more reason why the for loop and filter all need to be moved into a scope that stores idx; not just the filter.
On a different note, you can use itertools.islice instead of the inefficient logic you're using to print a slice of the generator expressions:
from itertools import islice
for skip_list in skip_lists:
for num in islice(skip_list, 10):
print("{}, ".format(num), end="")
print()

Find lists which together contain all values from 0-23 in list of lists python

I have a list of lists. The lists within these list look like the following:
[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23],
[9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]
Every small list has 8 values from 0-23 and there are no value repeats within a small list.
What I need now are the three lists which have the values 0-23 stored. It is possible that there are a couple of combinations to accomplish it but I do only need one.
In this particular case the output would be:
[0,2,5,8,7,12,16,18], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15]
I thought to do something with the order but I'm not a python pro so it is hard for me to handle all the lists within the list (to compare all).
Thanks for your help.
The following appears to work:
from itertools import combinations, chain
lol = [[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]]
for p in combinations(lol, 3):
if len(set((list(chain.from_iterable(p))))) == 24:
print(p)
break # if only one is required
This displays the following:
([0, 2, 5, 8, 7, 12, 16, 18], [1, 3, 4, 17, 19, 6, 13, 23], [9, 22, 21, 10, 11, 20, 14, 15])
If it will always happen that 3 list will form numbers from 0-23, and you only want first list, then this can be done by creating combinations of length 3, and then set intersection:
>>> li = [[0,2,5,8,7,12,16,18], [0,9,18,23,5,8,15,16], [1,3,4,17,19,6,13,23], [9,22,21,10,11,20,14,15], [2,8,23,0,7,16,9,15], [0,5,8,7,9,11,20,16]]
>>> import itertools
>>> for t in itertools.combinations(li, 3):
... if not set(t[0]) & set(t[1]) and not set(t[0]) & set(t[2]) and not set(t[1]) & set(t[2]):
... print t
... break
([0, 2, 5, 8, 7, 12, 16, 18], [1, 3, 4, 17, 19, 6, 13, 23], [9, 22, 21, 10, 11, 20, 14, 15])
Let's do a recursive solution.
We need a list of lists that contain these values:
target_set = set(range(24))
This is a function that recursively tries to find a list of lists that match exactly that set:
def find_covering_lists(target_set, list_of_lists):
if not target_set:
# Done
return []
if not list_of_lists:
# Failed
raise ValueError()
# Two cases -- either the first element works, or it doesn't
try:
first_as_set = set(list_of_lists[0])
if first_as_set <= target_set:
# If it's a subset, call this recursively for the rest
return [list_of_lists[0]] + find_covering_lists(
target_set - first_as_set, list_of_lists[1:])
except ValueError:
pass # The recursive call failed to find a solution
# If we get here, the first element failed.
return find_covering_lists(target_set, list_of_lists[1:])

Categories