How do I fill a 2d array value from an updated 1d list ?
for example I have a list that I get from this code :
a=[]
for k, v in data.items():
b=v/sumcount
a.append(b)
What I want to do is produce several 'a' list and put their value into 2d array with different column. OR put directly the b value into 2D array whic one colum represent loop for number of k.
*My difficulties here is, k is not integer. its dict keys (str). whose length=9
I have tried this but does not work :
row = len(data.items())
matrix=np.zeros((9,2))
for i in range (1,3) :
a=[]
for k, v in data.items():
b=v/sumcount
matrix[x][i].fill(b), for x in range (1, 10)
a list is
1
2
3
4
5
6
7
8
9
and for example I do the outer loop, what I expect is
*for example 1 to 2 outer loop so I expect there will be 2 column and 9 row.
1 6
2 7
3 8
4 9
5 14
6 15
7 16
8 17
9 18
I want to fill matrix value with b
import numpy as np
import pandas as pd
matrix = np.zeros((9, 2))
df = pd.DataFrame({'aaa': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
sumcount = [1, 2]
for i in range(len(sumcount)):
matrix[:, i] = df['aaa']/sumcount[i]
print(matrix)
As far as I understand you: you need to get the result of the column from the dataframe and place it in a numpy array. No need to iterate over each row if your sumcount is the same number. This will work slowly. In general, loops are used as a last resort, if there is no other possibility.
Slicing is used to set values in numpy.
bbb = np.array([df['aaa']/sumcount[i] for i in range(len(sumcount))]).transpose()
print(bbb)
Or do without a loop at all using list comprehension, wrap the result in np.array and apply transpose.
Related
If I have a dictionary, say
test_dict
and it contains 81 entries, in the current correct order.
How would I convert the dictionary's 81 values only into a 9x9 2D array? First 9 values make up the first 9 item array, second 9 values make up the second, and so on. Is it possible with numpy? I feel as though I'm missing something simple.
You could try this.
Here I have taken a dictionary having 4 elements.
Extracted values from dictionary to a numpy array, then reshaped it to 2X2. You can reshape it to 9 by 9
import numpy as np
values = {1 : 1, 2 : 2, 3 : 2, 4 : 5}
vals = np.fromiter(values.values(), dtype=int)
print(vals.reshape(2,2))
Output:
[[1 2]
[2 5]]
Based on a large dataset of daily observations from 20 assets, I created a dictionary which comprises (rolling) correlation matrices. I am using the date index as a key for the dictionary.
What I want to do now (in an efficient manner) is to compare all correlation matrizes within the dictionary and save the result in a new matrix. The idea is to compare correlation structures over time.
import pandas as pd
import numpy as np
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import cophenet
key_list = dict_corr.keys()
# Create empty matrix
X = np.empty(shape=[len(key_list),len(key_list)])
key1_index = 0
key2_index = 0
for key1 in key_list:
# Extract correlation matrix from dictionary
corr1_temp = d[key1]
# Transform correlation matrix into distance matrix
dist1_temp = ((1-corr1_temp)/2.)**.5
# Extract hierarchical structure from distance matrix
link1_temp = linkage(dist1_temp,'single')
for key2 in key_list:
corr2_temp = d[key2]
dist2_temp = ((1-corr2_temp)/2.)**.5
link2_temp = linkage(dist2_temp,'single')
# Compare hierarchical structure between the two correlation matrizes -> results in 2x2 matrix
temp = np.corrcoef(cophenet(link1_temp),cophenet(link2_temp))
# Extract from the resulting 2x2 matrix the correlation
X[key1_index, key2_index] = temp[1,0]
key2_index =+ 1
key1_index =+1
I'm well aware of the fact that using two for loops is probably the least efficient way to do it.
So I'm grateful for any helpful comment how to speed up the calculations!
Best
You can look at itertools and then insert your code to compute the correlation within a function (compute_corr) called in the single for loop:
import itertools
for key_1, key_2 in itertools.combinations(dict_corr, 2):
correlation = compute_corr(key_1, key_2, dict_corr)
#now store correlation in a list
If you care about the order use itertools.permutations(dict_corr, 2) instead of combinations.
EDIT
Since you want all possible combination of keys (also a key with itself), you should use itertools.product.
l_corr = [] #list to store all the output from the function
for key_1, key_2 in itertools.product(key_list, repeat= 2 ):
l_corr.append(compute_corr(key_1, key_2, dict_corr))
Now l_corr will be long: len(key_list)*len(key_list).
You can convert this list to a matrix in this way:
np.array(l_corr).reshape(len(key_list),len(key_list))
Dummy example:
def compute_corr(key_1, key_2, dict_corr):
return key_1 * key_2 #dummy result from the function
dict_corr={1:"a",2:"b",3:"c",4:"d",5:"f"}
key_list = dict_corr.keys()
l_corr = []
for key_1, key_2 in itertools.product(key_list, repeat= 2 ):
print(key_1, key_2)
l_corr.append(compute_corr(key_1, key_2, dict_corr))
Combinations:
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
4 1
4 2
4 3
4 4
4 5
5 1
5 2
5 3
5 4
5 5
Create the final matrix:
np.array(l_corr).reshape(len(key_list),len(key_list))
array([[ 1, 2, 3, 4, 5],
[ 2, 4, 6, 8, 10],
[ 3, 6, 9, 12, 15],
[ 4, 8, 12, 16, 20],
[ 5, 10, 15, 20, 25]])
Let me know in case I missed something. Hope this may help you
Sorry, I know there are so many questions relating to indexing, and it's probably starring me in the face, but I'm having a little trouble with this. I am familiar with .loc, .iloc, and .index methods and slicing in general. The method .reset_index may not have been (and may not be able to be) called on our dataframe and therefore index lables may not be in order. The dataframe and numpy array(s) are actually different length subsets of the dataframe, but for this example I'll keep them the same size (I can handle offsetting once I have an example).
Here is a picture that show's what I'm looking for:
I can pull cols of rows from the dataframe based on some search criteria.
idxlbls = df.index[df['timestamp'] == dt]
stuff = df.loc[idxlbls, 'col3':'col5']
But how do I map that to row number (array indices, not label indices) to be used as an array index in numpy (assuming same row length)?
stuffprime = array[?, ?]
The reason I need it is because the dataframe is much larger and more complete and contains the column searching criteria, but the numpy arrays are subsets that have been extracted and modified prior in the pipeline (and do not have the same searching criteria in them). I need to search the dataframe and pull the equivalent data from the numpy arrays. Basically I need to correlate specific rows from a dataframe to the corresponding rows of a numpy array.
I would map pandas indices to numpy indicies:
keys_dict = dict(zip(idxlbls, range(len(idxlbls))))
Then you may use the dictionary keys_dict to address the array elements by a pandas index: array[keys_dict[some_df_index], :]
I believe need get_indexer for positions by filtered columns names, for index is possible use same way or numpy.where for positions by boolean mask:
df = pd.DataFrame({'timestamp':list('abadef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4]}, index=list('ABCDEF'))
print (df)
timestamp B C D E
A a 4 7 1 5
B b 5 8 3 3
C a 4 9 5 6
D d 5 4 7 9
E e 5 2 1 2
F f 4 3 0 4
idxlbls = df.index[df['timestamp'] == 'a']
stuff = df.loc[idxlbls, 'C':'E']
print (stuff)
C D E
A 7 1 5
C 9 5 6
a = df.index.get_indexer(stuff.index)
Or get positions by boolean mask:
a = np.where(df['timestamp'] == 'a')[0]
print (a)
[0 2]
b = df.columns.get_indexer(stuff.columns)
print (b)
[2 3 4]
consider two numpy arrays
array1 = np.arange(0,6)
array2 = np.arange(0,12)
i want to a run a loop (preferably a list comprehension) where the desire output for a single round is
print(array1[0])
print(array2[0],array2[1]) or
print(array1[1])
print(array2[2], array2[3])
ie the loop runs six times, but for every round in array1 it selects the two consecutive elements from array2.
I have tried something like
for i in xrange(array1):
for v in xrange(array2):
but this evidently runs the second loop inside the first one, How can i run them simultaneously but select different number of elements from each array in one round?
I have also tried making the loops equal in length such as
array1 = np.repeat(np.arange(0,6),2).ravel()
array1 = [0,0,1,1,2,2.....5,5]
however, this will make the length of the two arrays equal but i still cannot get the desired output
(In actual case, the elements of the array are pandas Series objects)
There are a bunch of different ways of going about this. One thing you can do is use the indices:
for ind, item in array1:
print(item, array2[2*ind:2*ind+2])
This does not use the full power of numpy, however. The easiest thing I can think of is to concatenate your arrays into a single array containing the desired sequence. You can make it into a 2D array for easy iteration, where each column or row will be the sequence of three elements you want:
array1 = np.arange(6)
array2 = np.arange(12)
combo = np.concatenate((array1.reshape(-1, 1), array2.reshape(-1, 2)), axis=1)
for row in combo:
print(row)
Results in
[0 0 1]
[1 2 3]
[2 4 5]
[3 6 7]
[4 8 9]
[ 5 10 11]
In this case, the explicit reshape of array1 is necessary because array1.T will result in a 1D array.
You can use a hybrid of the two approaches, as #Divakar suggests, where you reshape a2 but iterate using the index:
array3 = array2.reshape(-1, 2)
for ind, item in array1:
print(item, array3[ind])
Yes, as #MadPhysicist mentioned, there are a lot of ways to do this.... but the simplest is
>>> for x,y,z in zip(array1,array2[:-1:2],array2[1::2]):
... print x,y,z
...
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
5 10 11
for i in xrange(array1):
print(array1[i])
print(array2[2*i],array2[2*i+1])
I have a numpy array and want to "telescope" the values based on the top row. An example is the best way to describe it
Start array:
9 9 8 7 7 7 6
1 2 3 4 5 6 3
3 4 5 6 7 6 3
5 6 7 8 9 6 4
desired output array:
9 8 7 6
3 3 15 3
7 5 19 3
11 7 23 4
The idea is to unique-ify the top row and sum values along the subsequent rows grouped by value in the top row. The top row will be sorted and the array will be about 2000 cells wide and 200,000 cells long. There could be any number of consecutive identical numbers in the top row. My current hack is this (slightly different top row labels in the example and I am printing to screen rather than creating the final array to check the output. Plan is to stack the output to generate the output array)
import numpy as N
kk=N.array([[90,90,85,80,80,80,70],[1,2,3,4,5,6,3],[3,4,5,6,7,6,3],[5,6,7,8,9,6,4]])
ll=kk[:,0]
for i in range(1,len(kk[0])):
if kk[0][i]==kk[0][i-1]:
ll=ll+kk[:,i]
elif kk[0][i]!=kk[0][i-1]:
print "sum=", ll, i,kk[0][i],kk[0][i-1]
ll=kk[:,i]
There are two defects. The major one is that it isn't dealing with the final column and I don't see why. The minor one is that it is summing the top row too. It's obvious why this minor one is happening. I suspect I can cludge my way around that one but the failure to deal with the final column has been frustrating me for a while and I'd really appreciate any suggestions for dealing with it.
thanks for any help
If you have 200,000 rows, a Python loop is likely going to be very slow. With NumPy you can vectorize that operation using np.add.reduceat, but you first need to create an array with the indices of the first item of each group of repeated entries in the first row:
mask = np.concatenate(([True], kk[0, 1:] != kk[0, :-1]))
indices, = np.nonzero(mask)
You can then get your first row by indexing it with the mask boolean array:
>>> kk[0, mask]
array([90, 85, 80, 70])
and the rest of the array using reduceat with indices:
>>> np.add.reduceat(kk[1:], indices, axis=1)
array([[ 3, 3, 15, 3],
[ 7, 5, 19, 3],
[11, 7, 23, 4]])
Assuming that your original array is of the default integer type, you could assemble your array by doing something like:
out = np.empty((kk.shape[0], len(indices)), dtype=kk.dtype)
out[0] = kk[0, mask]
np.add.reduceat(kk[1:], indices, axis=1, out=out[1:])
>>> out
array([[90, 85, 80, 70],
[ 3, 3, 15, 3],
[ 7, 5, 19, 3],
[11, 7, 23, 4]])
You should use the unique function from numpy
import numpy as np
a = np.array([[90,90,85,80,80,80,70],[1,2,3,4,5,6,3],[3,4,5,6,7,6,3],[5,6,7,8,9,6,4]])
u, v = np.unique(a[0], return_inverse=True)
output = np.zeros((a.shape[0], u.shape[0]))
output[0] = u.copy()
for i in xrange(u.shape[0]):
pos = np.where(v==i)[0]
output[1:,i] = np.sum(a[1:,pos], axis=1)
You should notice that u is going to be sorted from lowest to highest. If you want it from highest to lowest you have to do
output = output[:,::-1]
at the end.
You can make use of groupby:
from itertools import groupby
import numpy as N
kk=N.array([[90,90,85,80,80,80,70],[1,2,3,4,5,6,3],[3,4,5,6,7,6,3],[5,6,7,8,9,6,4]])
keys = kk[0]
vals = kk[1:]
uniq = map(lambda x: x[0], groupby(keys))
new = [uniq]
for row in vals:
new.append([sum(map(lambda x: x[1], group)) for _, group in groupby(zip(keys, row), lambda x: x[0])])
print N.array(new)
Provides the output:
[[90 85 80 70]
[ 3 3 15 3]
[ 7 5 19 3]
[11 7 23 4]]