How to get each element of numpy array? - python

I have a numpy array as follows :
Keys which will store some values. for example
Keys [2,3,4,7,8]
How to get index of 4 and store the index in a int variable ?
For example the index value of 4 is 2, so 2 will be stored in a int variable.
I have tried with following code segment
//enter code here
for i in np.nditer(Keys):
print(keys[i]);
//enter code here
I am using python 3.5
Spyder 3.5.2
Anaconda 4.2.0

Is keys a list or numpy array
keys = [[2,3,4,7,8] # or
keys = np.array([2,3,4,7,8])
You don't need to iterate to see the elements of either. But you can do
for i in keys:
print(i)
for i in range(len(keys)):
print(keys[i])
[i for i in keys]
these work for either.
If you want the index of the value 4, the list has a method:
keys.index(4)
for the array
np.where(keys==4)
is a useful bit of code. Also
np.in1d(keys, 4)
np.where(np.in1d(keys, 4))
Forget about np.nditer. That's for advanced programming, not routine iteration.

There are several ways. If the list is not too large, then:
where_is_4 = [e for i,e in enumerate(Keys) if i==4][0]
What this does is it loops over the list with an enumerator and creates a list that contains the value of the enumerator every time the value '4' occurs.

Why not just do:
for i in range( len( Key ) ):
if ( Key[ i ] == 4 ):
print( i )

You can find all indices where the value is 4 using:
>>> keys = np.array([2,3,4,7,8])
>>> np.flatnonzero(keys == 4)
array([2])

There is a native numpy method for this called where.
It will return an array of the indices where some given condition is true. So you can just pick the first entry, if the list isn't empty:
N = 4
indicies = np.where(x==N)[0]
index = None
if indicies:
index = indicies[0]

Use of numpy.where(condition) will be a good choice here. From the below code you can get location of 4.
import numpy as np
keys = np.array([2,3,4,7,8])
result = np.where(keys==4)
result[0][0]

Related

Find the index of a value in a 2D array

Here is my code:
test_list= [
["Romeo and Juliet","Shakespeare"],
["Othello","Play"],
["Macbeth","Tragedy"]
]
value = "Tragedy"
print(test_list.index(value))
As a result I get “ValueError: ‘Tragedy’ is not in list
I’d come to the conclusion that .index only works for 1D arrays? But then how do I do it die 2D arrays? This code works fine if I make the array 1D. Please help, but in simple terms as I am a beginner.
Apologies for formatting issues on mobile. The array is set out correctly for me.
Loop through your list and search each sublist for the string.
Testlist = [
["Romeo and Juliet","Shakespeare"],
["Othello","Play"],
["Macbeth","Tragedy"]
]
Value = "Tragedy"
for index, lst in enumerate(Testlist):
if Value in lst:
print( index, lst.index(Value) )
You can also use the map operator:
# Get a boolean array - true if sublist contained the lookup value
value_in_sublist = map(lambda x: value in x, test_list)
# Get the index of the first True
print(value_in_sublist.index(True))
You can also use numpy:
import numpy as np
test_list = np.array(test_list)
value = 'Tragedy'
print(np.where(test_list == value))
Output:
(array([2]), array([1]))
If you have multiple occurences of an element, then np.where will give you a list of indices for all the occurences.
numpy arrays may help in your specific case
import numpy
test_array = numpy.array(Testlist)
value = "Tragedy"
numpy.where(test_array==value)
# you will get (array([2]), array([1]))

Unknown error on PySpark map + broadcast

I have a big group of tuples with tuple[0] = integer and tuple[1] = list of integers (resulting from a groupBy). I call the value tuple[0] key for simplicity.
The values inside the lists tuple[1] can be eventually other keys.
If key = n, all elements of key are greater than n and sorted / distinct.
In the problem I am working on, I need to find the number of common elements in the following way:
0, [1,2]
1, [3,4,5]
2, [3,7,8]
.....
list of values of key 0:
1: [3,4,5]
2: [3,7,8]
common_elements between list of 1 and list of 2: 3 -> len(common_elements) = 1
Then I apply the same for keys 1, 2 etc, so:
list of values of 1:
3: ....
4: ....
5: ....
The sequential script I wrote is based on pandas DataFrame df, with the first column v as list of 'keys' (as index = True) and the second column n as list of list of values:
for i in df.v: #iterate each value
for j in df.n[i]: #iterate within the list
common_values = set(df.n[i]).intersection(df.n[j])
if len(common_values) > 0:
return len(common_values)
Since is a big dataset, I'm trying to write a parallelized version with PySpark.
df.A #column of integers
df.B #column of integers
val_colA = sc.parallelize(df.A)
val_colB = sc.parallelize(df.B)
n_values = val_colA.zip(val_colB).groupByKey().MapValues(sorted) # RDD -> n_values[0] will be the key, n_values[1] is the list of values
n_values_broadcast = sc.broadcast(n_values.collectAsMap()) #read only dictionary
def f(element):
for i in element[1]: #iterating the values of "key" element[0]
common_values = set(element[1]).intersection(n_values_broadcast.value[i])
if len(common_values) > 0:
return len(common_values)
collection = n_values.map(f).collect()
The programs fails after few seconds giving error like KeyError: 665 but does not provide any specific failure reason.
I'm a Spark beginner thus not sure whether this the correct approach (should I consider foreach instead? or mapPartition) and especially where is the error.
Thanks for the help.
The error is actually pretty clear and Spark specific. You are accessing Python dict with __getitem__ ([]):
n_values_broadcast.value[i]
and if key is missing in the dictionary you'll get KeyError. Use get method instead:
n_values_broadcast.value.get(i, [])

Compare 1 column of 2D array and remove duplicates Python

Say I have a 2D array like:
array = [['abc',2,3,],
['abc',2,3],
['bb',5,5],
['bb',4,6],
['sa',3,5],
['tt',2,1]]
I want to remove any rows where the first column duplicates
ie compare array[0] and return only:
removeDups = [['sa',3,5],
['tt',2,1]]
I think it should be something like:
(set first col as tmp variable, compare tmp with remaining and #set array as returned from compare)
for x in range(len(array)):
tmpCol = array[x][0]
del array[x]
removed = compare(array, tmpCol)
array = copy.deepcopy(removed)
print repr(len(removed)) #testing
where compare is:
(compare first col of each remaining array items with tmp, if match remove else return original array)
def compare(valid, tmpCol):
for x in range(len(valid)):
if valid[x][0] != tmpCol:
del valid[x]
return valid
else:
return valid
I keep getting 'index out of range' error. I've tried other ways of doing this, but I would really appreciate some help!
Similar to other answers, but using a dictionary instead of importing counter:
counts = {}
for elem in array:
# add 1 to counts for this string, creating new element at this key
# with initial value of 0 if needed
counts[elem[0]] = counts.get(elem[0], 0) + 1
new_array = []
for elem in array:
# check that there's only 1 instance of this element.
if counts[elem[0]] == 1:
new_array.append(elem)
One option you can try is create a counter for the first column of your array before hand and then filter the list based on the count value, i.e, keep the element only if the first element appears only once:
from collections import Counter
count = Counter(a[0] for a in array)
[a for a in array if count[a[0]] == 1]
# [['sa', 3, 5], ['tt', 2, 1]]
You can use a dictionary and count the occurrences of each key.
You can also use Counter from the library collections that actually does this.
Do as follows :
from collection import Counter
removed = []
for k, val1, val2 in array:
if Counter([k for k, _, _ in array])[k]==1:
removed.append([k, val1, val2])

how to convert a set in python into a dictionary

I am new to python and trying to convert a Set into a Dictionary. I am struggling to find a way to make this possible. Any inputs are highly appreciated. Thanks.
Input : {'1438789225', '1438789230'}
Output : {'1438789225':1, '1438789230':2}
Use enumerate() to generate a value starting from 0 and counting upward for each item in the dictionary, and then assign it in a comprehension:
input_set = {'1438789225', '1438789230'}
output_dict = {item:val for val,item in enumerate(input_set)}
Or a traditional loop:
output_dict = {}
for val,item in enumerate(input_set):
output_dict[item] = val
If you want it to start from 1 instead of 0, use item:val+1 for the first snippet and output_dict[item] = val+1 for the second snippet.
That said, this dictionary would be pretty much the same as a list:
output = list(input_set)
My one-liner:
output = dict(zip(input_set, range(1, len(s) + 1)))
zip mixes two lists (or sets) element by element (l1[0] + l2[0] + l1[1] + l2[1] + ...).
We're feeding it two things:
the input_set
a list from 1 to the length of the set + 1 (since you specified you wanted to count from 1 onwards, not from 0)
The output is a list of tuples like [('1438789225', 1), ('1438789230', 2)] which can be turned into a dict simply by feeding it to the dict constructor... dict.
But like TigerhawkT3 said, I can hardly find a use for such a dictionary. But if you have your motives there you have another way of doing it. If you take away anything from this post let it be the existence of zip.
an easy way of doing this is by iterating on the set, and populating the result dictionary element by element, using a counter as dictionary key:
def setToIndexedDict(s):
counter = 1
result = dict()
for element in s:
result[element] = counter #adding new element to dictionary
counter += 1 #incrementing dictionary key
return result
My Python is pretty rusty, but this should do it:
def indexedDict(oldSet):
dic = {}
for elem,n in zip(oldSet, range(len(oldSet)):
dic[elem] = n
return dic
If I wrote anything illegal, tell me and I'll fix it. I don't have an interpreter handy.
Basically, I'm just zipping the list with a range object (basically a continuous list of numbers, but more efficient), then using the resulting tuples.
Id got with Tiger's answer, this is basically a more naive version of his.

python: how to know the index when you randomly select an element from a sequence with random.choice(seq)

I know very well how to select a random item from a list with random.choice(seq) but how do I know the index of that element?
import random
l = ['a','b','c','d','e']
i = random.choice(range(len(l)))
print i, l[i]
You could first choose a random index, then get the list element at that location to have both the index and value.
>>> import random
>>> a = [1, 2, 3, 4, 5]
>>> index = random.randint(0,len(a)-1)
>>> index
0
>>> a[index]
1
You can do it using randrange function from random module
import random
l = ['a','b','c','d','e']
i = random.randrange(len(l))
print i, l[i]
The most elegant way to do so is random.randrange:
index = random.randrange(len(MY_LIST))
value = MY_LIST[index]
One can also do this in python3, less elegantly (but still better than .index) with random.choice on a range object:
index = random.choice(range(len(MY_LIST)))
value = MY_LIST[index]
The only valid solutions are this solution and the random.randint solutions.
The ones which use list.index not only are slow (O(N) per lookup rather than O(1); gets really bad if you do this for each element, you'll have to do O(N^2) comparisons) but ALSO you will have skewed/incorrect results if the list elements are not unique.
One would think that this is slow, but it turns out to only be slightly slower than the other correct solution random.randint, and may be more readable. I personally consider it more elegant because one doesn't have to do numerical index fiddling and use unnecessary parameters as one has to do with randint(0,len(...)-1), but some may consider this a feature, though one needs to know the randint convention of an inclusive range [start, stop].
Proof of speed for random.choice: The only reason this works is that the range object is OPTIMIZED for indexing. As proof, you can do random.choice(range(10**12)); if it iterated through the entire list your machine would be slowed to a crawl.
edit: I had overlooked randrange because the docs seemed to say "don't use this function" (but actually meant "this function is pythonic, use it"). Thanks to martineau for pointing this out.
You could of course abstract this into a function:
def randomElement(sequence):
index = random.randrange(len(sequence))
return index,sequence[index]
i,value = randomElement(range(10**15)) # try THAT with .index, heh
# (don't, your machine will die)
# use xrange if using python2
# i,value = (268840440712786, 268840440712786)
If the values are unique in the sequence, you can always say: list.index(value)
Using randrage() as has been suggested is a great way to get the index. By creating a dictionary created via comprehension you can reduce this code to one line as shown below. Note that since this dictionary only has one element, when you call popitem() you get the combined index and value in a tuple.
import random
letters = "abcdefghijklmnopqrstuvwxyz"
# dictionary created via comprehension
idx, val = {i: letters[i] for i in [random.randrange(len(letters))]}.popitem()
print("index {} value {}" .format(idx, val))
We can use sample() method also.
If you want to randomly select n elements from list
import random
l, n = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 2
index_list = random.sample(range(len(l)), n)
index_list will have unique indexes.
I prefer sample() over choices() as sample() does not allow duplicate elements in a sequence.

Categories