Hi guys I'm trying to build a Numpy matrix from two dictionary.
First dict has an integer key and a float64 value; the other one has coordinate as key and a integer value references to key of the first dict.
The goal is to build a Numpy matrix with coordinate the key in the second dict and value the float value corresponding to the integer key.
dict_coord = {(0,0): 1, (0,1): 0, (0,2): 2,
(1,0): 1, (1,1): 1, (1,2): 0,
(2,0): 1, (2,1): 2, (2,2): 0}
dict_values = {0: 1.1232.., 1: 0.3523.., 2: -1.2421..}
result = np.array([[0.3523,1.1232,-1.2421],
[0.3523,0.3523,1.1232],
[0.3523,-1.2421,1.1232]])
I found the simplest solution, but it's too slow. I'm working with matrix with 300 x 784 cells and this algorithm takes ~110ms to complete.
import numpy as np
def build_matrix(dict_index,dict_values):
mat_ret = np.zeros([300,784])
for k,v in dict_index.items():
mat_ret[k] = dict_values[v]
return mat_ret
If you can help me find a better and plain solution to this problem, I'll be grateful!
Given your dict_coord keys are always sorted in that way, you can simply transform both dicts to arrays and then index one with the other:
coord_array = np.asarray(list(dict_coord.values()))
values_array = np.asarray(list(dict_values.values()))
values_array[coord_array].reshape(3, 3)
# array([[ 0.3523, 1.1232, -1.2421],
# [ 0.3523, 0.3523, 1.1232],
# [ 0.3523, -1.2421, 1.1232]])
Related
Problem
Given a sequence (list or numpy array) of 1's and 0's how can I find the number of contiguous sub-sequences of values? I want to return a JSON-like dictionary of dictionaries.
Example
[0, 0, 1, 1, 0, 1, 1, 1, 0, 0] would return
{
0: {
1: 1,
2: 2
},
1: {
2: 1,
3: 1
}
}
Tried
This is the function I have so far
def foo(arr):
prev = arr[0]
count = 1
lengths = dict.fromkeys(arr, {})
for i in arr[1:]:
if i == prev:
count += 1
else:
if count in lengths[prev].keys():
lengths[prev][count] += 1
else:
lengths[prev][count] = 1
prev = i
count = 1
return lengths
It is outputting identical dictionaries for 0 and 1 even if their appearance in the list is different. And this function isn't picking up the last value. How can I improve and fix it? Also, does numpy offer any quicker ways to solve my problem if my data is in a numpy array? (maybe using np.where(...))
You're suffering from Ye Olde Replication Error. Let's instrument your function to show the problem, adding one line to check the object ID of each dict in the list:
lengths = dict.fromkeys(arr, {})
print(id(lengths[0]), id(lengths[1]))
Output:
140130522360928 140130522360928
{0: {2: 2, 1: 1, 3: 1}, 1: {2: 2, 1: 1, 3: 1}}
The problem is that you gave the same dict as initial value for each key. When you update either of them, you're changing the one object to which they both refer.
Replace it with an explicit loop -- not a mutable function argument -- that will create a new object for each dict entry:
for key in lengths:
lengths[key] = {}
print(id(lengths[0]), id(lengths[1]))
Output:
139872021765576 139872021765288
{0: {2: 1, 1: 1}, 1: {2: 1, 3: 1}}
Now you have separate objects.
If you want a one-liner, use a dict comprehension:
lengths = {key: {} for key in lengths}
I have a dictionary like this
{0: array([-6139.66579119, -8102.82498701, -8424.43378713, -8699.96492463,
-9411.35741859]),
1: array([ -7679.11144698, -16699.49166421, -3057.05148494, -10657.0539235 ,
-3091.04936367]),
2: array([ -7316.47405724, -15367.98445067, -6660.88963907, -9634.54357714,
-6667.05832509]),
3: array([-7609.14675848, -9894.14708559, -4040.51364199, -8661.16152946,
-4363.71589143]),
4: array([-5068.85919923, -6691.36104136, -6659.66791024, -6666.66570889,
-5365.35153533]),
5: array([ -8341.96211464, -13495.42783124, -4782.52084352, -10355.98002 ,
-5424.48813488]),
6: array([ -7740.36341878, -16165.48430318, -5169.42471878, -12369.79859385,
-5807.66380805]),
7: array([-10645.12432969, -5465.30533986, -6756.65159092, -4146.34937333,
-6765.69595854]),
8: array([ -7765.04423986, -11679.3889257 , -4218.9629257 , -6565.64225892,
-4538.09199979]),
9: array([-5869.18259848, -7809.21110907, -3272.33611955, -3881.64743889,
-3275.54657818])}
What I want to do is:
compare the first value in each array, in this case, -6139, -7649......and find the max value (-5068), then return the key 4 in a list.
compare the second value in each array, -8102, -16699......find the max and return the key , append to the list.
How can I do that?
My code is like this:
def predict(trainingData, testData):
pred = {}
maxLabel = None
prediction=[]
maxValue = -9999999999
pred = postProb(trainingData, testData)
for key, value in pred.items():
for i in range(value.shape[0]):
for j in range(10):
if pred[j][i] > maxValue:
maxValue = pred[key][i]
maxLabel = key
prediction.append(maxLabel)
return prediction
pred is the dictionary. It seems that the first loop is not necessary but I need it to get through the elements in the dictionary
You can use numpy array's argmax method to get what you want.
np.array(list(abc.values())).argmax(axis=0)
Out: array([4, 7, 1, 9, 1])
This works only if your keys are consecutive integers like in your example. IF you want a more fool proof method, You could use pandas.
import pandas as pd
df = pd.DataFrame(my_dict)
my_list = list(df.idxmax(axis=1))
print(my_list)
Out: [4, 7, 1, 9, 1]
I am trying to create a Matlab file (*.mat) from Python that contains a Matlab data structure that would look like:
s.key1 where key1 is an array of values
s.key2 where key2 is an array of 1D arrays
s.key3 where key3 is an array of 2D arrays
If I use savemat and a dictionary, the Matlab output is a cell array rather than a Matlab data structure.
I have tried using
np.core.records.fromarrays(data_list, names=q_keys)
but this doesn't seem to work for keys with 2D arrays. I have both 2D and 3D arrays that need to be in a Matlab structure for compatibility with an existing file format. Is there a way to do this in Python?
Thanks
Here's a stab at the task:
In [292]: dt = np.dtype([('key1',int),('key2',int, (3,)),('key3',object)])
In [293]: arr = np.zeros((5,), dt)
In [294]: arr
Out[294]:
array([(0, [0, 0, 0], 0), (0, [0, 0, 0], 0), (0, [0, 0, 0], 0),
(0, [0, 0, 0], 0), (0, [0, 0, 0], 0)],
dtype=[('key1', '<i8'), ('key2', '<i8', (3,)), ('key3', 'O')])
In [295]: arr['key1']=np.arange(5)
In [296]: arr['key2']=np.arange(15).reshape(5,3)
In [302]: arr['key3']=[1,np.arange(5),np.ones((2,3),int),'astring',[['a','b']]]
In [303]: io.savemat('test.mat', {'astruct':arr})
In Octave:
>> load test.mat
>> format compact
>> astruct
astruct =
1x5 struct array containing the fields:
key1
key2
key3
>> astruc.key1
error: 'astruc' undefined near line 1 column 1
>> astruct.key1
ans = 0
ans = 1
ans = 2
ans = 3
ans = 4
>> astruct.key2
ans =
0 1 2
ans =
3 4 5
ans =
6 7 8
ans =
9 10 11
ans =
12 13 14
>> astruct.key3
ans = 1
ans =
0 1 2 3 4
ans =
1 1 1
1 1 1
ans = astring
ans = ab
Back in ipython:
In [304]: d = io.loadmat('test.mat')
In [305]: d
Out[305]:
{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Wed Jun 6 15:36:23 2018',
'__version__': '1.0',
'__globals__': [],
'astruct': array([[(array([[0]]), array([[0, 1, 2]]), array([[1]])),
(array([[1]]), array([[3, 4, 5]]), array([[0, 1, 2, 3, 4]])),
(array([[2]]), array([[6, 7, 8]]), array([[1, 1, 1],
[1, 1, 1]])),
(array([[3]]), array([[ 9, 10, 11]]), array(['astring'], dtype='<U7')),
(array([[4]]), array([[12, 13, 14]]), array([['a', 'b']], dtype='<U1'))]],
dtype=[('key1', 'O'), ('key2', 'O'), ('key3', 'O')])}
So while a created a numpy structured array with dtypes like int and int(3), the loaded array has object dtype for all fields. loadmat makes heavy use of object dtype arrays to handle the generality of MATLAB cells and struct. loadmat has various loading parameters, which we can play with.
This was just a guess based on previous experience loading MATLAB files. If this isn't what you want, I'd suggest constructing sample data in MATLAB, save that, and then load to see how loadmat constructs it. You may have to go back and forth a few times to work out the bugs.
Given the direction provided by hpaulj, I developed the following function that created a structure from a list of objects.
def listobj2struct(list_in):
"""Converts a list of objects to a structured array.
Parameters
----------
list_in: list
List of objects
Returns
-------
struct: np.array
Structured array
"""
# Create data type for each variable in object
keys = list(vars(list_in[0]).keys())
data_type = []
for key in keys:
data_type.append((key, list))
# Create structured array based on data type and length of list
dt = np.dtype(data_type)
struct = np.zeros((len(list_in),), dt)
# Populate the structure with data from the objects
for n, item in enumerate(list_in):
new_dict = vars(item)
for key in new_dict:
struct[key][n] = new_dict[key]
return struct
To complete what I needed to do to create a Matlab file from a complex nesting of objects I also wrote the following functions. Perhaps this will help others facing similar tasks. There may be better ways, but this worked for me.
def obj2dict(obj):
"""Converts object variables to dictionaries. Works recursively to all levels of objects.
Parameters
----------
obj: object
Object of some class
Returns
-------
obj_dict: dict
Dictionary of all object variables
"""
obj_dict = vars(obj)
for key in obj_dict:
# Clean out NoneTypes
if obj_dict[key] is None:
obj_dict[key] = []
# If variable is another object convert to dictionary recursively
elif str(type(obj_dict[key]))[8:13] == 'Class':
obj_dict[key]=obj2dict(obj_dict[key])
return obj_dict
def listobj2dict(list_in):
"""Converts list of objects to list of dictionaries. Works recursively to all levels of objects.
Parameters
----------
obj: object
Object of some class
Returns
-------
new_list: list
List of dictionaries
"""
new_list = []
for obj in list_in:
new_list.append(obj2dict(obj))
return new_list
def listdict2struct(list_in):
"""Converts a list of dictionaries to a structured array.
Parameters
----------
list_in: list
List of dictionaries
Returns
-------
struct: np.array
Structured array
"""
# Create data type for each variable in object
keys = list(list_in[0].keys())
data_type = []
for key in keys:
data_type.append((key, list))
# Create structured array based on data type and length of list
dt = np.dtype(data_type)
struct = np.zeros((len(list_in),), dt)
# Populate the structure with data from the objects
for n, item in enumerate(list_in):
new_dict = item
for key in new_dict:
struct[key][n] = new_dict[key]
return struct
I want to create a numpy matrix with three columns, in which the first two columns contain integers and the third column contains floats. I want to start with an empty matrix, and add a single row every time in a for loop. However, I cannot get it to work to add a row to a numpy matrix with a specific data type. This is the code I started with:
import numpy as np
def grow_table():
dat_dtype = {
'names' : ['A', 'B', 'C'],
'formats' : ['i', 'i', 'd']}
S = np.zeros(0, dat_dtype)
X = np.array([1, 2, 3.5], dat_dtype)
S = np.vstack((S, X))
if __name__ == '__main__':
grow_table()
However, this gives a TypeError: expected a readable buffer object.
I then change the line in which I define the row as follows:
X = np.array((1, 2, 3.5), dat_dtype)
This line is accepted. However, now X is a tuple. If I try to print X[0], I end up with an IndexError: 0-d arrays can't be indexed. Furthermore, I can't add X to S, it will give me a ValueError: all the input array dimensions except for the concatenation axis must match exactly.
Next, I remove the names from the data type; in this case I end up with a ValueError: entry not a 2- or 3- tuple.
Am I on the right track of tackling this problem, or should I try it completely different?
I'm not a huge fan of the hybrid dtypes, could instead use separate arrays, arrays in a dictionary, or pandas data-frames. Anyway, here is how you can do it:
X = np.array([(1, 2, 3.5)], dat_dtype)
S = np.vstack((S[:,None], X, X, X))
Restacking each iteration is generally slow, and you may be better off making a list of the 1-row arrays and vstack-ing them at the end, or creating the array with known size and assigning to the elements.
I'm not a fan of growing arrays incrementally, but here's a way to do it:
import numpy as np
def grow_table():
dt=np.dtype({'names':['A','B','C'],'formats':['i','i','d']})
S = np.zeros(0, dtype=dt)
for i in range(5):
X = np.array((i, 2*i, i+.5), dtype=dt)
S = np.hstack((S, X))
return S
if __name__ == '__main__':
S = grow_table()
print S
print S['A']
producing:
[(0, 0, 0.5) (1, 2, 1.5) (2, 4, 2.5) (3, 6, 3.5) (4, 8, 4.5)]
[0 1 2 3 4]
S starts with shape (0,). X has shape (); it is 0d. In the end S has shape (5,). We have to use hstack because we are creating a 1d array; an array of tuples. That's what you get with a dtype like this. Also when assigning values to arrays like this, the values need to be in a tuple, not a list.
A better incremental build is:
def make_table(N=5):
dt=np.dtype({'names':['A','B','C'],'formats':['i','i','d']})
S = np.zeros(N, dtype=dt)
for i in range(N):
S[i] = (i, 2*i, i+.5)
return S
or even using a list of tuples:
def better(N=5):
dt=np.dtype({'names':['A','B','C'],'formats':['i','i','d']})
L = [(i, 2*i, i+.5) for i in range(N)]
return np.array(L, dtype=dt)
for csv output:
S = better()
np.savetxt('S.txt', S, fmt='%d, %d, %f')
produces:
0, 0, 0.500000
1, 2, 1.500000
...
Trying to savetxt a (N,1) array produces one or more errors.
savetxt attempts to write
for row in S:
write(fmt%row)
With the (N,) array, a row is (0, 0, 0.5), but for (N,1) it is [(0, 0, 0.5)].
np.savetxt('S.txt', S, fmt='%s')
works, producing
(0, 0, 0.5)
(1, 2, 1.5)
...
But you don't need this dtype if you just want to save 2 columns of ints and one float. Just let the fmt do all the work:
def simple(N=5):
return np.array([(i, 2*i, i+.5) for i in range(N)])
S = simple()
np.savetxt('S.txt',S, fmt='%d, %d, %f')
I'm looking to quickly (hopefully without a for loop) generate a Numpy array of the form:
array([a,a,a,a,0,0,0,0,0,b,b,b,0,0,0, c,c,0,0....])
Where a, b, c and other values are repeated at different points for different ranges. I'm really thinking of something like this:
import numpy as np
a = np.zeros(100)
a[0:3,9:11,15:16] = np.array([a,b,c])
Which obviously doesn't work. Any suggestions?
Edit (jterrace answered the original question):
The data is coming in the form of an N*M Numpy array. Each row is mostly zeros, occasionally interspersed by sequences of non-zero numbers. I want to replace all elements of each such sequence with the last value of the sequence. I'll take any fast method to do this! Using where and diff a few times, we can get the start and stop indices of each run.
raw_data = array([.....][....])
starts = array([0,0,0,1,1,1,1...][3, 9, 32, 7, 22, 45, 57,....])
stops = array([0,0,0,1,1,1,1...][5, 12, 50, 10, 30, 51, 65,....])
last_values = raw_data[stops]
length_to_repeat = stops[1]-starts[1]
Note that starts[0] and stops[0] are the same information (which row the run is occurring on). At this point, since the only route I know of is what jterrace suggest, we'll need to go through some contortions to get similar start/stop positions for the zeros, then interleave the zero start/stop with the values start/stops, and interleave the number 0 with the last_values array. Then we loop over each row, doing something like:
for i in range(N)
values_in_this_row = where(starts[0]==i)[0]
output[i] = numpy.repeat(last_values[values_in_this_row], length_to_repeat[values_in_this_row])
Does that make sense, or should I explain some more?
If you have the values and repeat counts fully specified, you can do it this way:
>>> import numpy
>>> values = numpy.array([1,0,2,0,3,0])
>>> counts = numpy.array([4,5,3,3,2,2])
>>> numpy.repeat(values, counts)
array([1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 3, 3, 0, 0])
you can use numpy.r_:
>>> np.r_[[a]*4,[b]*3,[c]*2]
array([1, 1, 1, 1, 2, 2, 2, 3, 3])