I need the values from a CSV to have a comma after each individual value as well at the end of each row/array.
I have used tolist() before having these changes. The conversion of numerical values to strings is not wanted.
The code below is what I currently have.
import numpy as np
dataset = open("Dataset.csv")
next(dataset) # Skips first line of dataset
games = np.loadtxt(dataset, delimiter=",")
dataset.close()
print(games)
This is what the code outputs:
[[ 0.228 0.5 0.685 0.378 0.439 0.183 0.387 0.25 0.169]
[ 0.206 0.125 0.686 0.069 0.131 0.778 2.71 0.75 -0.092]]
I am looking for the code to output this:
[[0.228,0.5 ,0.685,0.378,0.439,0.183,0.387,0.25 ,0.169],
[0.206,0.125 ,0.686 ,0.069 ,0.131,0.778 ,2.71 ,0.75 ,-0.092]
You can basically set any formatter you desire to print your output with via np.set_print_optiones (this does not change your original array type and only change the printing format, which I think is what you are looking for). I think this is what you are looking for, but if it is not, you can define your desirable format through this:
#be mindful this creates comma after each float number including the last number in sub-arrays
float_formatter = "{:},".format
np.set_printoptions(formatter={'float_kind':float_formatter})
print(games)
output:
[[0.228, 0.5, 0.685, 0.378, 0.439, 0.183, 0.387, 0.25, 0.169,]
[0.206, 0.125, 0.686, 0.069, 0.131, 0.778, 2.71, 0.75, -0.092,]]
and your datatype is float:
print(games.dtype)
float64
A better option mentioned by #David Buck in comments is to use repr
print(repr(games))
output:
array([[ 0.228, 0.5 , 0.685, 0.378, 0.439, 0.183, 0.387, 0.25 ,
0.169],
[ 0.206, 0.125, 0.686, 0.069, 0.131, 0.778, 2.71 , 0.75 ,
-0.092 ]])
Make sure you understand what python object you have, and what the commas, or lack, means.
With loadtxt you created a numpy array. A simpler way of doing the same:
In [212]: arr = np.arange(12).reshape(2,6)
The repr display for an array is:
In [213]: arr
Out[213]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
The str dislay omits the commas. That's intentional, helping to distinguish an array from a list:
In [214]: print(arr)
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]]
In [215]: type(arr)
Out[215]: numpy.ndarray
The print display of a list has commas:
In [216]: print(arr.tolist())
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11]]
The distinction between a list (or list of lists) and an array is important. Whether the display uses commas or not is superficial.
Related
I am translating a J language code into Python, but the way of python's apply function seems little unclear to me...
I currently have a (3, 3, 2) matrix A, and a (3, 3) matrix B.
I want to divide each matrix in A by rows in B:
A = np.arange(1,19).reshape(3,3,2)
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16],
[17, 18]]])
B = np.arange(1,10).reshape(3,3)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
That is the result would be like
1 2
1.5 2
1.66667 2
1.75 2
1.8 2
1.83333 2
1.85714 2
1.875 2
1.88889 2
for the first matrix of the result, the way I want to compute is the following:
1/1 2/1
3/2 4/2
5/3 6/3
I have tried
np.apply_along_axis(np.divide,1,A,B)
but it says
operands could not be broadcast together with shapes (10,) (10,10,2)
Any advice?
Thank you in advance = ]
ps. the J code is
A %"2 1 B
This means "divide each matrix("2) from A by each row ("1) from B"
or just simply
A % B
Broadcasting works if the trailing dimensions match or are one! So we can basically add a dummy dimension!
import numpy as np
A = np.arange(1,19).reshape(3,3,2)
B = np.arange(1,10).reshape(3,3)
B = B[...,np.newaxis] # This adds new dummy dimension in the end, B's new shape is (3,3,1)
A/B
array([[[1. , 2. ],
[1.5 , 2. ],
[1.66666667, 2. ]],
[[1.75 , 2. ],
[1.8 , 2. ],
[1.83333333, 2. ]],
[[1.85714286, 2. ],
[1.875 , 2. ],
[1.88888889, 2. ]]])
What I need is to divide/spread 0 to 1. according to single number which is more than 2.
like number 5 so 0 to 5 will be divided like this
0.00
0.25
0.50
0.75
1.00
5 values in a list
and my other question is what to do to get a sequence like this where middle number is 1 and first and last number is 0 , if number is 10.
0.00
0.25
0.50
0.75
1.00
1.00
0.75
0.50
0.25
0.00
The upper bound of the range(..) is exclusive (meaning it is not enumerated), so you need to add one step to the range(..) function:
for i in range(0,11):
b = i*(1.0/10)
print b
That being said, if you want to create such array, you can use numpy.arange(..):
>>> import numpy as np
>>> np.arange(0, 1.1, 0.1)
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
This thus allows you to specify floats for the offset, end, and step parameter.
As for your second question, you can itertools.chain iterables together, like:
from itertools import chain
for i in chain(range(0, 11), range(10, -1, -1)):
print(i/10.0)
Here we thus have one range(..) that iterates from 0 to 10 (both inclusive), and one that iterates from 10, to 0 (both inclusive).
You should use range(0,11) to get all the numbers from 0 to 10.
range 0 to 10 will give you numbers from 0 to 9. Here is some practical to explain:
>>> list(range(0,10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(0,11))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>
>>> list(range(0,1))
[0]
>>>
For a numpy array X, the location of its element X[k[0], ..., k[d-1]] is offset from the location of X[0,..., 0] by k[0]*s[0] + ... + k[d-1]*s[d-1], where (s[0],...,s[d-1]) is the tuple representing X.strides.
As far as I understand nothing in numpy array specs requires that distinct indexes of array X correspond to distinct addresses in memory, the simplest instance of this being a zero value of the stride, e.g. see advanced NumPy section of scipy lectures.
Does the numpy have a built-in predicate to test if the strides and the shape are such that distinct indexes map to distinct memory addresses?
If not, how does one write one, preferably so as to avoid sorting of the strides?
edit: It took me a bit to figure what you are asking about. With striding tricks it's possible to index the same element in a databuffer in different ways, and broadcasting actually does this under the covers. Normally we don't worry about it because it is either hidden or intentional.
Recreating in the strided mapping and looking for duplicates may be the only way to test this. I'm not aware of any existing function that checks it.
==================
I'm not quite sure what you concerned with. But let me illustrate how shape and strides work
Define a 3x4 array:
In [453]: X=np.arange(12).reshape(3,4)
In [454]: X.shape
Out[454]: (3, 4)
In [455]: X.strides
Out[455]: (16, 4)
Index an item
In [456]: X[1,2]
Out[456]: 6
I can get it's index in a flattened version of the array (e.g. the original arange) with ravel_multi_index:
In [457]: np.ravel_multi_index((1,2),X.shape)
Out[457]: 6
I can also get this location using strides - keeping mind that strides are in bytes (here 4 bytes per item)
In [458]: 1*16+2*4
Out[458]: 24
In [459]: (1*16+2*4)/4
Out[459]: 6.0
All these numbers are relative to the start of the data buffer. We can get the data buffer address from X.data or X.__array_interface__['data'], but usually don't need to.
So this strides tells us that to go from entry to the next, step 4 bytes, and to go from one row to the next step 16. 6 is located at one row down, 2 over, or 24 bytes into the buffer.
In the as_strided example of your link, strides=(1*2, 0) produces repeated indexing of specific values.
With my X:
In [460]: y=np.lib.stride_tricks.as_strided(X,strides=(16,0), shape=(3,4))
In [461]: y
Out[461]:
array([[0, 0, 0, 0],
[4, 4, 4, 4],
[8, 8, 8, 8]])
y is a 3x4 that repeatedly indexes the 1st column of X.
Changing one item in y ends up changing one value in X but a whole row in y:
In [462]: y[1,2]=10
In [463]: y
Out[463]:
array([[ 0, 0, 0, 0],
[10, 10, 10, 10],
[ 8, 8, 8, 8]])
In [464]: X
Out[464]:
array([[ 0, 1, 2, 3],
[10, 5, 6, 7],
[ 8, 9, 10, 11]])
as_strided can produce some weird effects if you aren't careful.
OK, maybe I've figured out what's bothering you - can I identify a situation like this where two different indexing tuples end up pointing to the same location in the data buffer? Not that I'm aware of. That y strides contains a 0 is a pretty good indicator.
as_stridedis often used to create overlapping windows:
In [465]: y=np.lib.stride_tricks.as_strided(X,strides=(8,4), shape=(3,4))
In [466]: y
Out[466]:
array([[ 0, 1, 2, 3],
[ 2, 3, 10, 5],
[10, 5, 6, 7]])
In [467]: y[1,2]=20
In [469]: y
Out[469]:
array([[ 0, 1, 2, 3],
[ 2, 3, 20, 5],
[20, 5, 6, 7]])
Again changing 1 item in y ends up changing 2 values in y, but only 1 in X.
Ordinary array creation and indexing does not have this duplicate indexing issue. Broadcasting may do something like, under the cover, where a (4,) array is changed to (1,4) and then to (3,4), effectively replicating rows. I think there's another stride_tricks function that does this explicitly.
In [475]: x,y=np.lib.stride_tricks.broadcast_arrays(X,np.array([.1,.2,.3,.4]))
In [476]: x
Out[476]:
array([[ 0, 1, 2, 3],
[20, 5, 6, 7],
[ 8, 9, 10, 11]])
In [477]: y
Out[477]:
array([[ 0.1, 0.2, 0.3, 0.4],
[ 0.1, 0.2, 0.3, 0.4],
[ 0.1, 0.2, 0.3, 0.4]])
In [478]: y.strides
Out[478]: (0, 8)
In any case, in normal array use we don't have to worry about this ambiguity. We get it only with intentional actions, not accidental ones.
==============
How about this for a test:
def dupstrides(x):
uniq={sum(s*j for s,j in zip(x.strides,i)) for i in np.ndindex(x.shape)}
print(uniq)
print(len(uniq))
print(x.size)
return len(uniq)<x.size
In [508]: dupstrides(X)
{0, 32, 4, 36, 8, 40, 12, 44, 16, 20, 24, 28}
12
12
Out[508]: False
In [509]: dupstrides(y)
{0, 4, 8, 12, 16, 20, 24, 28}
8
12
Out[509]: True
It turns out this test is already implemented in numpy, see mem_overlap.c:842.
The test is exposed as numpy.core.multiarray_tests.internal_overlap(x).
Example:
>>> import numpy as np
>>> from numpy.core.multiarray_tests import internal_overlap
>>> from numpy.lib.stride_tricks import as_strided
Now, create a contiguous array, and use as_strided to create an array with internal overlapping, and confirm this with the testing:
>>> x = np.arange(3*4, dtype=np.float64).reshape((3,4))
>>> y = as_strided(x, shape=(5,4), strides=(16, 8))
>>> y
array([[ 0., 1., 2., 3.],
[ 2., 3., 4., 5.],
[ 4., 5., 6., 7.],
[ 6., 7., 8., 9.],
[ 8., 9., 10., 11.]])
>>> internal_overlap(x)
False
>>> internal_overlap(y)
True
The function is optimized to quickly returns False for Fortran- or C- contiguous arrays.
How can I read a Numpy array from a string? Take a string like:
"[[ 0.5544 0.4456], [ 0.8811 0.1189]]"
and convert it to an array:
a = from_string("[[ 0.5544 0.4456], [ 0.8811 0.1189]]")
where a becomes the object: np.array([[0.5544, 0.4456], [0.8811, 0.1189]]).
I'm looking for a very simple interface. A way to convert 2D arrays (of floats) to a string and then a way to read them back to reconstruct the array:
arr_to_string(array([[0.5544, 0.4456], [0.8811, 0.1189]])) should return "[[ 0.5544 0.4456], [ 0.8811 0.1189]]".
string_to_arr("[[ 0.5544 0.4456], [ 0.8811 0.1189]]") should return the object array([[0.5544, 0.4456], [0.8811, 0.1189]]).
Ideally arr_to_string would have a precision parameter that controlled the precision of floating points converted to strings, so that you wouldn't get entries like 0.4444444999999999999999999.
There's nothing I can find in the NumPy docs that does this both ways. np.save lets you make a string but then there's no way to load it back in (np.load only works for files).
The challenge is to save not only the data buffer, but also the shape and dtype. np.fromstring reads the data buffer, but as a 1d array; you have to get the dtype and shape from else where.
In [184]: a=np.arange(12).reshape(3,4)
In [185]: np.fromstring(a.tostring(),int)
Out[185]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [186]: np.fromstring(a.tostring(),a.dtype).reshape(a.shape)
Out[186]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
A time honored mechanism to save Python objects is pickle, and numpy is pickle compliant:
In [169]: import pickle
In [170]: a=np.arange(12).reshape(3,4)
In [171]: s=pickle.dumps(a*2)
In [172]: s
Out[172]: "cnumpy.core.multiarray\n_reconstruct\np0\n(cnumpy\nndarray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\nI4\ntp6\ncnumpy\ndtype\np7\n(S'i4'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x04\\x00\\x00\\x00\\x06\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\n\\x00\\x00\\x00\\x0c\\x00\\x00\\x00\\x0e\\x00\\x00\\x00\\x10\\x00\\x00\\x00\\x12\\x00\\x00\\x00\\x14\\x00\\x00\\x00\\x16\\x00\\x00\\x00'\np13\ntp14\nb."
In [173]: pickle.loads(s)
Out[173]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
There's a numpy function that can read the pickle string:
In [181]: np.loads(s)
Out[181]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
You mentioned np.save to a string, but that you can't use np.load. A way around that is to step further into the code, and use np.lib.npyio.format.
In [174]: import StringIO
In [175]: S=StringIO.StringIO() # a file like string buffer
In [176]: np.lib.npyio.format.write_array(S,a*3.3)
In [177]: S.seek(0) # rewind the string
In [178]: np.lib.npyio.format.read_array(S)
Out[178]:
array([[ 0. , 3.3, 6.6, 9.9],
[ 13.2, 16.5, 19.8, 23.1],
[ 26.4, 29.7, 33. , 36.3]])
The save string has a header with dtype and shape info:
In [179]: S.seek(0)
In [180]: S.readlines()
Out[180]:
["\x93NUMPY\x01\x00F\x00{'descr': '<f8', 'fortran_order': False, 'shape': (3, 4), } \n",
'\x00\x00\x00\x00\x00\x00\x00\x00ffffff\n',
'#ffffff\x1a#\xcc\xcc\xcc\xcc\xcc\xcc##ffffff*#\x00\x00\x00\x00\x00\x800#\xcc\xcc\xcc\xcc\xcc\xcc3#\x99\x99\x99\x99\x99\x197#ffffff:#33333\xb3=#\x00\x00\x00\x00\x00\x80##fffff&B#']
If you want a human readable string, you might try json.
In [196]: import json
In [197]: js=json.dumps(a.tolist())
In [198]: js
Out[198]: '[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]'
In [199]: np.array(json.loads(js))
Out[199]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Going to/from the list representation of the array is the most obvious use of json. Someone may have written a more elaborate json representation of arrays.
You could also go the csv format route - there have been lots of questions about reading/writing csv arrays.
'[[ 0.5544 0.4456], [ 0.8811 0.1189]]'
is a poor string representation for this purpose. It does look a lot like the str() of an array, but with , instead of \n. But there isn't a clean way of parsing the nested [], and the missing delimiter is a pain. If it consistently uses , then json can convert it to list.
np.matrix accepts a MATLAB like string:
In [207]: np.matrix(' 0.5544, 0.4456;0.8811, 0.1189')
Out[207]:
matrix([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
In [208]: str(np.matrix(' 0.5544, 0.4456;0.8811, 0.1189'))
Out[208]: '[[ 0.5544 0.4456]\n [ 0.8811 0.1189]]'
Forward to string:
import numpy as np
def array2str(arr, precision=None):
s=np.array_str(arr, precision=precision)
return s.replace('\n', ',')
Backward to array:
import re
import ast
import numpy as np
def str2array(s):
# Remove space after [
s=re.sub('\[ +', '[', s.strip())
# Replace commas and spaces
s=re.sub('[,\s]+', ', ', s)
return np.array(ast.literal_eval(s))
If you use repr() to convert array to string, the conversion will be trivial.
I'm not sure there's an easy way to do this if you don't have commas between the numbers in your inner lists, but if you do, then you can use ast.literal_eval:
import ast
import numpy as np
s = '[[ 0.5544, 0.4456], [ 0.8811, 0.1189]]'
np.array(ast.literal_eval(s))
array([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
EDIT: I haven't tested it very much, but you could use re to insert commas where you need them:
import re
s1 = '[[ 0.5544 0.4456], [ 0.8811 -0.1189]]'
# Replace spaces between numbers with commas:
s2 = re.sub('(\d) +(-|\d)', r'\1,\2', s1)
s2
'[[ 0.5544,0.4456], [ 0.8811,-0.1189]]'
and then hand on to ast.literal_eval:
np.array(ast.literal_eval(s2))
array([[ 0.5544, 0.4456],
[ 0.8811, -0.1189]])
(you need to be careful to match spaces between digits but also spaces between a digit an a minus sign).
In my case I found following command helpful for dumping:
string = str(array.tolist())
And for reloading:
array = np.array( eval(string) )
This should work for any dimensionality of numpy array.
numpy.fromstring() allows you to easily create 1D arrays from a string. Here's a simple function to create a 2D numpy array from a string:
import numpy as np
def str2np(strArray):
lItems = []
width = None
for line in strArray.split("\n"):
lParts = line.split()
n = len(lParts)
if n==0:
continue
if width is None:
width = n
else:
assert n == width, "invalid array spec"
lItems.append([float(str) for str in lParts])
return np.array(lItems)
Usage:
X = str2np("""
-2 2
-1 3
0 1
1 1
2 -1
""")
print(f"X = {X}")
Output:
X = [[-2. 2.]
[-1. 3.]
[ 0. 1.]
[ 1. 1.]
[ 2. -1.]]
in Python, given an n x p matrix, e.g. 4 x 4, how can I return a matrix that's 4 x 2 that simply averages the first two columns and the last two columns for all 4 rows of the matrix?
e.g. given:
a = array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
return a matrix that has the average of a[:, 0] and a[:, 1] and the average of a[:, 2] and a[:, 3].
I want this to work for an arbitrary matrix of n x p assuming that the number of columns I am averaging of n is obviously evenly divisible by n.
let me clarify: for each row, I want to take the average of the first two columns, then the average of the last two columns. So it would be:
1 + 2 / 2, 3 + 4 / 2 <- row 1 of new matrix
5 + 6 / 2, 7 + 8 / 2 <- row 2 of new matrix, etc.
which should yield a 4 by 2 matrix rather than 4 x 4.
thanks.
How about using some math? You can define a matrix M = [[0.5,0],[0.5,0],[0,0.5],[0,0.5]] so that A*M is what you want.
from numpy import array, matrix
A = array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
M = matrix([[0.5,0],
[0.5,0],
[0,0.5],
[0,0.5]])
print A*M
Generating M is pretty simple too, entries are 1/n or zero.
reshape - get mean - reshape
>>> a.reshape(-1, a.shape[1]//2).mean(1).reshape(a.shape[0],-1)
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
is supposed to work for any array size, and reshape doesn't make a copy.
It's a bit unclear what should happen for matrices with n > 4, but this code will do what you want:
a = N.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=float)
avg = N.vstack((N.average(a[:,0:2], axis=1), N.average(a[:,2:4], axis=1))).T
This yields avg =
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
Here's a way to do it. You only need to change groupsize to make it work with other sizes like you said, though I'm not fully sure what you want.
groupsize = 2
out = np.hstack([np.mean(x,axis=1,out=np.zeros((a.shape[0],1))) for x in np.hsplit(a,groupsize)])
yields
array([[ 1.5, 3.5],
[ 5.5, 7.5],
[ 9.5, 11.5],
[ 13.5, 15.5]])
for out. Hopefully it gives you some ideas on how to do exactly what it is that you want to do. You can make groupsize dependent on the dimensions of a for instance.