Matplotlib scatter plot with array of y values for each x - python

This is in the similar vein as Python Scatter Plot with Multiple Y values for each X ; that is, I have data which is:
data = [
[1, [15, 16, 17, 18, 19, 20]],
[2, [21, 22, 23, 24, 25, 26]],
[3, [27, 28, 29, 30, 31, 32]],
]
... so first column (0) is x-coordinates, and second column (1) contains arrays of y values corresponding to the single x coordinate. And, I want to plot this as a scatter plot, and the best I could do is this (code below):
Same as in the linked post, I've had to use three ax.scatter plots, and hence we have three colours, one for each column.
So my question is:
Can I issue a single ax.scatter command to get a plot like the above (but with single color/marker) from the data I have (instead of having to issue three commands)?
Alternatively, can I somehow transform the data I have, so I to get a plot like the above (but with single color/marker) with a single ax.scatter command?
Here is the code:
#!/usr/bin/env python3
import sys
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
print("versions: Python {} matplotlib {} numpy {}".format(sys.version.replace('\n', ''), matplotlib.__version__, np.version.version))
data = [
[1, [15, 16, 17, 18, 19, 20]],
[2, [21, 22, 23, 24, 25, 26]],
[3, [27, 28, 29, 30, 31, 32]],
]
ndata = np.asarray(data, dtype=object)
fig = plt.figure()
# Null formatter
ax = fig.add_subplot(1, 1, 1)
print()
print(ndata[1])
print(ndata[:,0].astype(float))
print(ndata[:,1])
datay_2D = np.stack(ndata[:,1], axis=0) # convert numpy array of lists to numpy 2D array
print()
print(datay_2D[:,0])
print(datay_2D[0])
print([ndata[:,0][0]]*len(datay_2D[0]))
ax.scatter([ndata[:,0][0]]*len(datay_2D[0]), datay_2D[0], marker="x")
ax.scatter([ndata[:,0][1]]*len(datay_2D[1]), datay_2D[1], marker="x")
ax.scatter([ndata[:,0][2]]*len(datay_2D[1]), datay_2D[2], marker="x")
plt.show()
Printout:
versions: Python 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0] matplotlib 2.1.1 numpy 1.13.3
[2 list([21, 22, 23, 24, 25, 26])]
[ 1. 2. 3.]
[list([15, 16, 17, 18, 19, 20]) list([21, 22, 23, 24, 25, 26])
list([27, 28, 29, 30, 31, 32])]
[15 21 27]
[15 16 17 18 19 20]
[1, 1, 1, 1, 1, 1]

I suppose all lists of y values have the same length? In that case
import numpy as np
import matplotlib.pyplot as plt
data = [
[1, [15, 16, 17, 18, 19, 20]],
[2, [21, 22, 23, 24, 25, 26]],
[3, [27, 28, 29, 30, 31, 32]],
]
x, y = zip(*data)
y = np.array(y)
plt.scatter(np.repeat(x, y.shape[1]), y.flat)
plt.show()

Related

How can I extract a set of 2D slices from a larger 2D numpy array?

If I have a large 2D numpy array and 2 arrays which correspond to the x and y indices I want to extract, It's easy enough:
h = np.arange(49).reshape(7,7)
# h = [[0, 1, 2, 3, 4, 5, 6],
# [7, 8, 9, 10, 11, 12, 13],
# [14, 15, 16, 17, 18, 19, 20],
# [21, 22, 23, 24, 25, 26, 27],
# [28, 29, 30, 31, 32, 33, 34],
# [35, 36, 37, 38, 39, 40, 41],
# [42, 43, 44, 45, 46, 47, 48]]
x_indices = np.array([1,3,4])
y_indices = np.array([2,3,5])
reduced_h = h[x_indices, y_indices]
#reduced_h = [ 9, 24, 33]
However, I would like to, for each x,y pair cut out a square (denoted by 'a' - the number of indices in each direction from the centre) surrounding this 'coordinate' and return an array of these little 2D arrays.
For example, for h, x,y_indices as above and a=1:
reduced_h = [[[1,2,3],[8,9,10],[15,16,17]], [[16,17,18],[23,24,25],[30,31,32]], [[25,26,27],[32,33,34],[39,40,41]]]
i.e one 3x3 array for each x-y index pair corresponding to the 3x3 square of elements centred on the x-y index. In general, this should return a numpy array which has shape (len(x_indices),2a+1, 2a+1)
By analogy to reduced_h[0] = h[x_indices[0]-1:x_indices[0]+1 , y_indices[0]-1:y_indices[0]+1] = h[1-1:1+1 , 2-1:2+1] = h[0:2, 1:3] my first try was the following:
h[x_indices-a : x_indices+a, y_indices-a : y_indices+a]
However, perhaps unsurprisingly, slicing between the arrays fails.
So the obvious next thing to try is to create this slice manually. np.arange seems to struggle with this but linspace works:
a=1
xrange = np.linspace(x_indices-a, x_indices+a, 2*a+1, dtype=int)
# xrange = [ [0, 2, 3], [1, 3, 4], [2, 4, 5] ]
yrange = np.linspace(y_indices-a, y_indices+a, 2*a+1, dtype=int)
Now can try h[xrange,yrange] but this unsurprisingly does this element-wise meaning I get only one (2a+1)x(2a+1) array (the same dimensions as xrange and yrange). It there a way to, for every index, take the right slices from these ranges (without loops)? Or is there a way to make the broadcast work initially without having to set up linspace explicitly? Thanks
You can index np.lib.stride_tricks.sliding_window_view using your x and y indices:
import numpy as np
h = np.arange(49).reshape(7,7)
x_indices = np.array([1,3,4])
y_indices = np.array([2,3,5])
a = 1
window = (2*a+1, 2*a+1)
out = np.lib.stride_tricks.sliding_window_view(h, window)[x_indices-a, y_indices-a]
out:
array([[[ 1, 2, 3],
[ 8, 9, 10],
[15, 16, 17]],
[[16, 17, 18],
[23, 24, 25],
[30, 31, 32]],
[[25, 26, 27],
[32, 33, 34],
[39, 40, 41]]])
Note that you may need to pad h first to handle windows around your coordinates that reach "outside" h.

Table in Matplotlib, can't get two columns?

I'm struggling with tables for matplotlib (blume). The table is for an automation project that will produce 22 different maps. The code below produce a table with 49 rows. Some figures will only have 6 rows. When the number of rows exceeds 25 I would like to use two columns.
import pandas as pd
import matplotlib.pyplot as plt
from blume.table import table
# Dataframe
df=pd.DataFrame({'nr': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
'KnNamn': ['Härryda', 'Partille', 'Öckerö', 'Stenungsund', 'Tjörn', 'Orust',
'Sotenäs', 'Munkedal', 'Tanum', 'Dals-Ed', 'Färgelanda', 'Ale',
'Lerum', 'Vårgårda', 'Bollebygd', 'Grästorp', 'Essunga',
'Karlsborg', 'Gullspång', 'Tranemo', 'Bengtsfors', 'Mellerud',
'Lilla Edet', 'Mark', 'Svenljunga', 'Herrljunga', 'Vara', 'Götene',
'Tibro', 'Töreboda', 'Göteborg', 'Mölndal', 'Kungälv', 'Lysekil',
'Uddevalla', 'Strömstad', 'Vänersborg', 'Trollhättan', 'Alingsås',
'Borås', 'Ulricehamn', 'Åmål', 'Mariestad', 'Lidköping', 'Skara',
'Skövde', 'Hjo', 'Tidaholm', 'Falköping'],
'rel': [0.03650425, 0.05022105, 0.03009109, 0.03966735, 0.02793296,
0.03690838, 0.04757161, 0.05607283, 0.0546372 , 0.05452821,
0.06640368, 0.04252673, 0.03677577, 0.05385784, 0.0407173 ,
0.04024881, 0.05613226, 0.04476127, 0.08543165, 0.04070175,
0.09281077, 0.08711656, 0.06111578, 0.04564958, 0.05058988,
0.04618078, 0.04640402, 0.04826498, 0.08514253, 0.07799246,
0.07829886, 0.04249149, 0.03909206, 0.06835601, 0.08027622,
0.07087295, 0.09013876, 0.1040369 , 0.05004451, 0.06584845,
0.04338739, 0.10570863, 0.0553109 , 0.05024871, 0.06531729,
0.05565605, 0.05041816, 0.04885198, 0.07954831]})
# Table
fig,ax = plt.subplots(1, figsize=(10, 7))
val =[]
ax.axis('off')
for i, j, k in zip(df.nr, df.KnNamn, df.rel):
k = k*100
k = round(k,2)
k= (str(k) + ' %')
temp=str(i)+'. ' +str(j)+': ' + str(k)
val.append(temp)
val=[[el] for el in val]
#val=val[0] + val[1]
tab=table(ax,cellText=val,
#rowLabels=row_lab,
colLabels=['Relativ arbetslöshet'], loc='left', colWidths=[0.3], cellLoc='left')
plt.show()
As I understands it, if I want a table with two columns, my val object should be structured in a different way. In the case above, val is a nested list with 49 lists inside. I need to merge lists, I figure. I tried this pairwise for loop but that didn't work with range?
I'm sure there is a simple solution to this problem I have. Help would be much appreciated.
for i, j in zip(range(len(val)), range(len(val))[1:] + range(len(val))[:1]):
print(i, j)
I don't know if it is what you need but you could use zip() or better itertools.zip_longest() with val[:25], val[25:]
two_columns = []
for col1, col2 in itertools.zip_longest(values[:25], values[25:], fillvalue=''):
#print(f'{col1:25} | {col2}')
two_columns.append([col1, col2])
Full working example
import pandas as pd
import matplotlib.pyplot as plt
from blume.table import table
import itertools
df = pd.DataFrame({
'nr': [
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49
],
'KnNamn': [
'Härryda', 'Partille', 'Öckerö', 'Stenungsund', 'Tjörn', 'Orust',
'Sotenäs', 'Munkedal', 'Tanum', 'Dals-Ed', 'Färgelanda', 'Ale',
'Lerum', 'Vårgårda', 'Bollebygd', 'Grästorp', 'Essunga',
'Karlsborg', 'Gullspång', 'Tranemo', 'Bengtsfors', 'Mellerud',
'Lilla Edet', 'Mark', 'Svenljunga', 'Herrljunga', 'Vara', 'Götene',
'Tibro', 'Töreboda', 'Göteborg', 'Mölndal', 'Kungälv', 'Lysekil',
'Uddevalla', 'Strömstad', 'Vänersborg', 'Trollhättan', 'Alingsås',
'Borås', 'Ulricehamn', 'Åmål', 'Mariestad', 'Lidköping', 'Skara',
'Skövde', 'Hjo', 'Tidaholm', 'Falköping'
],
'rel': [
0.03650425, 0.05022105, 0.03009109, 0.03966735, 0.02793296,
0.03690838, 0.04757161, 0.05607283, 0.0546372 , 0.05452821,
0.06640368, 0.04252673, 0.03677577, 0.05385784, 0.0407173 ,
0.04024881, 0.05613226, 0.04476127, 0.08543165, 0.04070175,
0.09281077, 0.08711656, 0.06111578, 0.04564958, 0.05058988,
0.04618078, 0.04640402, 0.04826498, 0.08514253, 0.07799246,
0.07829886, 0.04249149, 0.03909206, 0.06835601, 0.08027622,
0.07087295, 0.09013876, 0.1040369 , 0.05004451, 0.06584845,
0.04338739, 0.10570863, 0.0553109 , 0.05024871, 0.06531729,
0.05565605, 0.05041816, 0.04885198, 0.07954831
]
})
# df = df[:25] # test for 25 rows
# ---
fig, ax = plt.subplots(1, figsize=(10, 7))
ax.axis('off')
# --- values ---
#values = []
#for number, name, rel in zip(df.nr, df.KnNamn, df.rel):
# text = f'{number}. {name}: {rel*100:.2} %'
# values.append(text)
values = df.apply(lambda row: f'{row["nr"]}. {row["KnNamn"]}: {row["rel"]*100:.2} %', axis=1).values
# --- columns ---
if len(values) > 25:
two_columns = []
for col1, col2 in itertools.zip_longest(values[:25], values[25:], fillvalue=''):
#print(f'{col1:25} | {col2}')
two_columns.append([col1, col2])
tab = table(ax, cellText=two_columns,
#rowLabels=row_lab,
colLabels=['Col1', 'Col2'], colWidths=[0.3, 0.3], loc=-100, cellLoc='left')
else:
one_column = [[item] for item in values]
tab = table(ax, cellText=one_column,
#rowLabels=row_lab,
colLabels=['Col1'], colWidths=[0.3], loc=-100, cellLoc='left')
# --- plot ---
plt.show()
Result:
EDIT:
More universal version which can create many columns.
Example automatically create 3 columns for ROWS = 20.
import pandas as pd
import matplotlib.pyplot as plt
from blume.table import table
import itertools
df = pd.DataFrame({
'nr': [
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49
],
'KnNamn': [
'Härryda', 'Partille', 'Öckerö', 'Stenungsund', 'Tjörn', 'Orust',
'Sotenäs', 'Munkedal', 'Tanum', 'Dals-Ed', 'Färgelanda', 'Ale',
'Lerum', 'Vårgårda', 'Bollebygd', 'Grästorp', 'Essunga',
'Karlsborg', 'Gullspång', 'Tranemo', 'Bengtsfors', 'Mellerud',
'Lilla Edet', 'Mark', 'Svenljunga', 'Herrljunga', 'Vara', 'Götene',
'Tibro', 'Töreboda', 'Göteborg', 'Mölndal', 'Kungälv', 'Lysekil',
'Uddevalla', 'Strömstad', 'Vänersborg', 'Trollhättan', 'Alingsås',
'Borås', 'Ulricehamn', 'Åmål', 'Mariestad', 'Lidköping', 'Skara',
'Skövde', 'Hjo', 'Tidaholm', 'Falköping'
],
'rel': [
0.03650425, 0.05022105, 0.03009109, 0.03966735, 0.02793296,
0.03690838, 0.04757161, 0.05607283, 0.0546372 , 0.05452821,
0.06640368, 0.04252673, 0.03677577, 0.05385784, 0.0407173 ,
0.04024881, 0.05613226, 0.04476127, 0.08543165, 0.04070175,
0.09281077, 0.08711656, 0.06111578, 0.04564958, 0.05058988,
0.04618078, 0.04640402, 0.04826498, 0.08514253, 0.07799246,
0.07829886, 0.04249149, 0.03909206, 0.06835601, 0.08027622,
0.07087295, 0.09013876, 0.1040369 , 0.05004451, 0.06584845,
0.04338739, 0.10570863, 0.0553109 , 0.05024871, 0.06531729,
0.05565605, 0.05041816, 0.04885198, 0.07954831
]
})
#df = df[:25] # test for 25 rows
# ---
fig, ax = plt.subplots(1, figsize=(10, 7))
ax.axis('off')
# --- values ---
def convert(row):
return f'{row["nr"]}. {row["KnNamn"]}: {row["rel"]*100:.2} %'
values = df.apply(convert, axis=1).values
# --- columns ---
ROWS = 20
#ROWS = 25
columns = []
for idx in range(0, len(values), ROWS):
columns.append(values[idx:idx+ROWS])
columns_widths = [0.3] * len(columns)
columns_labels = [f'Col{i}' for i in range(1, len(columns)+1)]
rows = list(itertools.zip_longest(*columns, fillvalue=''))
# --- plot ---
tab = table(ax,
cellText=rows,
#rowLabels=row_lab,
colLabels=columns_labels,
colWidths=columns_widths,
loc=-100,
cellLoc='left')
plt.show()
Result:

Python Create vertical Numpy array

I have created a code in which from my lists I create an array, which must be vertical, like a vector, the problem is that using the reshape method I don't get anything.
import numpy as np
data = [[ 28, 29, 30, 19, 20, 21],
[ 31, 32, 33, 22, 23, 24],
[ 1, 34, 35, 36, 25, 26],
[ 2, 19, 20, 21, 10, 11],
[ 3, 4, 5, 6, 7, 8 ]]
index = []
for i in range(len(data)):
index.append([data[i][0], data[i][1], data[i][2],
data[i][3], data[i][4], data[i][5]])
y = np.array([index[i]])
# y.reshape(6,1)
Is there any solution for these cases? Thank you.
I'm looking for something like this to remain:
If you want to view each row as a column, transpose the array in any one of the following ways:
index = data.T
index = np.transpose(data)
index = data.transpose()
index = np.swapaxes(data, 0, 1)
index = np.moveaxis(data, 1, 0)
...
Each column of index will be a row of data. If you just want to access one column at a time, you can do that too. For example, to get row 3 (4th row) of the original array, any of the following would work:
y = data[3, :]
y = data[3]
y = index[:, 3]
You can get a column vector from the result by explicitly reshaping it to one:
y = y.reshape(-1, 1)
y = np.reshape(y, (-1, 1))
y = np.expand_dims(y, 1)
Remember that reshaping creates a new array object which views the same data as the original. The only way I know to reshape an array in-place is to assign to its shape attribute:
y.shape = (y.size, 1)
You can use flatten() from numpy https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html
(if you want a copy of the original array without modifying the original)
import numpy as np
data = [[ 28, 29, 30, 19, 20, 21],
[ 31, 32, 33, 22, 23, 24],
[ 1, 34, 35, 36, 25, 26],
[ 2, 19, 20, 21, 10, 11],
[ 3, 4, 5, 6, 7, 8 ]]
data = np.array(data).flatten()
print(data.shape)
(30,)
You can also use ravel()
(if you don't want a copy)
data = np.array(data).ravel()
If your array always has 2-d, this also works,
data = data.reshape(-1)

How to fill a matrix in Python using iteration over rows and columns

So I have an array of 5 integers v and another of 10 integers v.
I have a 5 by 10 matrix P that I would want to fill so that (P)ij = v[i] + u[j]
I tried:
P = np.empty((len(asset_grid),len(asset_grid)))
for i in range(asset_grid):
for j in range(asset_grid):
P[i,j] = asset_grid[i] + asset_grid[j]
but it gives me an error
TypeError: only integer arrays with one element can be converted to an index
How should I be able to do this in Python. I apologize if my approach is too naive, I am used to Matlab and now slowly learning Python. Any help is appreciated.
Broadcasting is what you want to do. Although for small arrays such as yours, it doesn't make a difference, it makes a significant difference with larger arrays:
>>> arr1 = np.arange(5)
>>> arr2 = np.arange(10,20)
>>> arr1[:,None] + arr2
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
Generally with numpy you want to avoid iteration over rows and columns and use vectorized/broadcasted operations. This is where speed improvements actually come from.
So, elaborating based on your comment:
Say P_ij is ith element of x raised to the 4th power minus jth element of y raised to 2nd power
In general, Python supports most arithmetical operations you would want in a vectorized way, using the usual Python operators:
>>> arr1[:, None]**4 - arr2**2
array([[-100, -121, -144, -169, -196, -225, -256, -289, -324, -361],
[ -99, -120, -143, -168, -195, -224, -255, -288, -323, -360],
[ -84, -105, -128, -153, -180, -209, -240, -273, -308, -345],
[ -19, -40, -63, -88, -115, -144, -175, -208, -243, -280],
[ 156, 135, 112, 87, 60, 31, 0, -33, -68, -105]])

Python numpy array multiplication [duplicate]

This question already has an answer here:
How to calculate the outer product of two matrices A and B per rows faster in python (numpy)?
(1 answer)
Closed 6 years ago.
If I have to arrays X (X has n rows and k columns) and Y (Y has n rows and q columns) how do I multiply the two in the vector form, such that I obtain array Z with following characteristics:
Z[0]=X[:,0]*Y
Z[1]=X[:,1]*Y
Z[2]=X[:,2]*Y
...
Z[K-1]=X[:,k-1]*Y
Z[K]=X[:,k]*Y
for c in range(X.shape[1]):
Z[c]=X[:,c].dot(Y)
From your description, and almost no thinking:
Z=np.einsum('nk,nq->kq',X,Y)
I could also write it with np.dot, with a transpose or two. np.dot does the matrix sum over the last dim of the 1st and 2nd to last of 2nd
Z = np.dot(X.T, Y)
=================
In [566]: n,k,q=2,3,4
In [567]: X=np.arange(n*k).reshape(n,k)
In [568]: Y=np.arange(n*q).reshape(n,q)
In [569]: Z=np.einsum('nk,nq->kq',X,Y)
In [570]: Z
Out[570]:
array([[12, 15, 18, 21],
[16, 21, 26, 31],
[20, 27, 34, 41]])
In [571]: Z1=np.empty((k,q))
In [572]: Z1=np.array([X[:,c].dot(Y) for c in range(k)])
In [573]: Z1
Out[573]:
array([[12, 15, 18, 21],
[16, 21, 26, 31],
[20, 27, 34, 41]])
In [574]: X.T.dot(Y)
Out[574]:
array([[12, 15, 18, 21],
[16, 21, 26, 31],
[20, 27, 34, 41]])

Categories