Related
I have a list with x- and y-coordinates of start and Endpoints of some lines.Lines as csv
331,178,486,232
185,215,386,308
172,343,334,419
406,128,570,165
306,106,569,166
159,210,379,299
236,143,526,248
303,83,516,178
409,62,572,106
26,287,372,427
31,288,271,381
193,228,432,330
120,196,432,329
136,200,374,297
111,189,336,289
284,186,560,249
333,202,577,254
229,194,522,219
349,111,553,165
121,322,342,416
78,303,285,391
103,315,340,415
The lines look like this on my example image. Lines plotted
I want to group lines which are close to each other into clusters and create one line for each cluster. For this example i would like to have 5 clusters. After that i want to calculate the distance from each clusterline to the next.
import csv, math
file = open("lines.csv")
csvreader = csv.reader(file)
lines = []
for data in csvreader:
lines.append({'x1':int(data[0]), 'y1':int(data[1]), 'x2':int(data[2]), 'y2':int(data[3])})
def point_delta(p1, p2):
return abs(p1 - p2)
for line in lines[:2]:
for line_rev in lines:
#x_start_delta = abs(line['x1'] - line_rev['x1'])
x_start_delta = point_delta(line['x1'], line_rev['x1'])
y_start_delta = abs(line['y1'] - line_rev['y1'])
start_distance = math.sqrt(x_start_delta**2 + y_start_delta**2)
x_end_delta = abs(line['x2'] - line_rev['x2'])
y_end_delta = abs(line['y2'] - line_rev['y2'])
end_distance = math.sqrt(x_end_delta**2 + y_end_delta**2)
avg_distance = (start_distance + end_distance)/2
cluster = 0
if avg_distance < 100:
print(f"distance: {avg_distance}")
print("############## next line ##############")
I have written some code to calculate the distance between each line but cant find a way to save the lines which are near to each other in different lists.
Does somebody know how to do this or is there another way to create clusters? Im also thinking about using the middlepoint instead of the start-/endpoint
You could throw a clustering on it, but it has trouble with the lonely line at the end
data = [[331,178,486,232],
[185,215,386,308],
[172,343,334,419],
[406,128,570,165],
[306,106,569,166],
[159,210,379,299],
[236,143,526,248],
[303,83,516,178],
[409,62,572,106],
[26,287,372,427],
[31,288,271,381],
[193,228,432,330],
[120,196,432,329],
[136,200,374,297],
[111,189,336,289],
[284,186,560,249],
[333,202,577,254],
[229,194,522,219],
[349,111,553,165],
[121,322,342,416],
[78,303,285,391],
[103,315,340,415]]
import pandas as pd
import sklearn
from sklearn.cluster import MiniBatchKMeans
import numpy as np
lines = pd.DataFrame(data)
CLUSTERS = 5
X = lines.values
kmeans = MiniBatchKMeans(n_clusters=CLUSTERS,max_no_improvement=100).fit(X)
import numpy as np
import pylab as pl
from matplotlib import collections as mc
lines_segments = [ [ (l[0],l[1]),([l[2],l[3]]) ] for l in lines.values]
center_segments = [ [ (l[0],l[1]),([l[2],l[3]]) ] for l in kmeans.cluster_centers_]
line_collection = mc.LineCollection(lines_segments, linewidths=2)
centers = mc.LineCollection(center_segments, colors='red', linewidths=4, alpha=1)
fig, ax = pl.subplots()
ax.add_collection(line_collection)
ax.add_collection(centers)
ax.autoscale()
ax.margins(0.1)
You can see the centers with
kmeans.cluster_centers_
I am trying to generate a 3D contour plot using data stored as lists for two angles phi2 and theta in degrees. I have in total 88 datapoints. I am trying to generate the joint multivariate normal DPF using the scipy stats multivariate_normal and then plot the graph. But the attached code does not work it gives me errors refered to that z is 1D and has to be 2D.
Could anybody be so kind of direct me on how to get a decent density surface and/or contour with the data I have and fix this code? Thank you in advance
This is my code:
phi2 = [68.74428813, 73.81435267, 66.13791645, 178.54309657, 179.52273055, 161.05564169,
157.29079181, 191.92405566, 91.49774385, 96.19566795, 70.59561195, 119.9603657,
120.22305924, 98.52577754, 102.37894512, 100.12088791, 150.21004667, 139.18249739,
139.09246089, 89.51031839, 88.39689092, 136.47397506, 286.26056406, 283.74464006,
290.17913953, 286.74459786, 284.86706369, 328.13937238, 275.44219073, 303.47499211,
260.52134486, 259.35788745, 306.90146741, 11.20622691, 10.78220574, 19.15446087,
12.15462016, 13.58160662, 3.83673279, 0.12494051, 17.73139875, 8.53784067, 16.50118845,
2.53838974, 233.88019465, 234.93195189, 229.57996459, 233.07447083, 233.59862002,
231.18392245, 207.88397566, 237.31741345, 183.95293031, 179.42872881, 213.32271268,
140.7533708, 150.16895446, 130.61256041, 130.89734197, 128.63260154, 12.06830893,
200.28087782, 189.90378416, 62.39275508, 58.30936802, 205.64840358, 277.30394295,
287.76441089, 284.93518941, 265.89041707, 265.04884345, 343.86712163, 9.14315768,
341.43239609, 259.68283323, 260.00152679, 319.65245694, 341.08153124, 328.45596486,
336.02665804, 334.51276135, 334.8480636, 14.23480894, 12.53212715, 326.89899848,
42.62591188, 45.9396189, 335.39967741]
theta = [162.30883091334002, 162.38681345640427, 159.9615814653753, 174.16782637794842,
174.2151437560705, 176.40150466571572, 172.99139904772483, 175.92043493594562,
170.54952038009057, 172.72436078059172, 157.8929621077973, 168.98842698365024,
171.98480108104968, 157.1025039563731, 158.00939405227624, 157.85195861050553,
171.7970456599733, 173.88542939027778, 174.13060483554227, 157.06302225640127,
156.68490146086768, 174.10583004433656, 12.057892850177469, 22.707446760473047,
10.351988334104147, 10.029845365897357, 9.685493520484972, 7.903767103756965,
2.4881977395826027, 5.95349444674959, 30.507921155263, 30.63344201861564,
12.408566633469452, 3.9720259901877712, 4.65662142520097, 4.638183341072918,
4.106777084823232, 4.080743212101051, 4.747614837690929, 5.50356343278645,
3.5832926179292923, 3.495358074328152, 2.980060059242138, 5.785575733164003,
172.46901133841854, 172.2062576963548, 173.0410300278859, 174.06303865166896,
174.21162725364357, 170.0470319897294, 174.10752252682713, 171.23903792872886,
172.86412623832285, 174.4850965754363, 172.82274147050111, 176.9008741019669,
177.0080169547876, 171.90883294152246, 173.22247813491, 173.4304905772758,
89.63634206258786, 175.70086864635368, 175.71009499829492, 162.5980851129683,
162.16583875715634, 175.35616287818408, 4.416907543506939, 4.249480386717373,
5.265265803392446, 21.091392446454336, 21.573883985068303, 7.135649687137961,
5.332884425609576, 1.4184699545284118, 24.487533963462965, 25.63021267148377,
5.005913657707176, 7.562769691801299, 7.52664594699765, 7.898159135060811,
7.167861631741688, 7.018092266267269, 5.939275995893341, 5.975608665369072,
7.138904478798905, 9.93153808410636, 9.415946863231648, 7.154298332687937]
import sys, os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from numpy import loadtxt
import matplotlib
from matplotlib.mlab import bivariate_normal
import math
from scipy.stats import multivariate_normal
from astropy.stats import circcorrcoef
from astropy import units as u
from scipy.stats import circvar
from scipy.stats import circmean
phi2_vacuum = np.array(phi2_vacuum)
theta_vacuum = np.array(theta_vacuum)
angle1 = np.radians(phi2_vacuum)
angle2 = np.radians(theta_vacuum)
# Obtain the circular variance
var_angle1 = circvar(angle1)
var_angle2 = circvar(angle2)
# Obtain circular mean from scipy
mean_angle1 = circmean(angle1)
mean_angle2 = circmean(angle2)
# Obtain circular covar between both angles in degrees
corr = circcorrcoef(angle1, angle2)
covar = corr * np.sqrt(var_angle1*var_angle2)
# Create the covar matrix
covar_matrix = np.array([[var_angle1, covar], [covar, var_angle2]])
# Obtain circular prob
delta = covar / (var_angle1 * var_angle2)
S = ((angle1-mean_angle1)/var_angle1) + ((angle2-mean_angle2)/var_angle2) - ((2*delta*
(angle1-mean_angle1)*(angle2-mean_angle2))/(var_angle1*var_angle2))
# Obtain exponential of PDF
exp = -1 * S / (2 * (1 - delta**2))
# Calculate the PDF
#prob = (1/(2*np.pi*var_angle1*var_angle2*np.sqrt(1-(delta**2)))) * np.e**exp
prob = multivariate_normal([mean_angle1, mean_angle2], covar_matrix)
# Create the stacking
pos = np.dstack((angle1, angle2))
fig2 = plt.figure()
ax2 = fig2.add_subplot(111)
ax2.contourf(angle1, angle2, pdf.pdf(pos))
I have been trying to apply SOM on my dataframe, my dataframe has 25 columns where each column represents a house, each house has a values for power consumption for two years, and I want to cluster the data with number of clusters = 3.
I have done the following:
import sys
sys.path.insert(0, '../')
%load_ext autoreload
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pylab import plot,axis,show,pcolor,colorbar,bone
from matplotlib.patches import Patch
%matplotlib inline
from minisom import MiniSom
from sklearn.preprocessing import minmax_scale, scale
%autoreload 2
data1 = pd.read_excel(r"C:\Users\user\Desktop\Thesis\Tarek\Consumption.xlsx")
data1['h1'] = data1['h1'].str.split(';').str[2].astype('float')
data1['h2'] = data1['h2'].str.split(';').str[2].astype('float')
data1['h3'] = data1['h3'].str.split(';').str[2].astype('float')
data1['h4'] = data1['h4'].str.split(';').str[2].astype('float')
data1['h5'] = data1['h5'].str.split(';').str[2].astype('float')
data1['h6'] = data1['h6'].str.split(';').str[2].astype('float')
data1['h7'] = data1['h7'].str.split(';').str[2].astype('float')
data1['h8'] = data1['h8'].str.split(';').str[2].astype('float')
data1['h9'] = data1['h9'].str.split(';').str[2].astype('float')
data1['h10'] = data1['h10'].str.split(';').str[2].astype('float')
data1['h11'] = data1['h11'].str.split(';').str[2].astype('float')
data1['h12'] = data1['h12'].str.split(';').str[2].astype('float')
data1['h13'] = data1['h13'].str.split(';').str[2].astype('float')
data1['h14'] = data1['h14'].str.split(';').str[2].astype('float')
data1['h15'] = data1['h15'].str.split(';').str[2].astype('float')
data1['h16'] = data1['h16'].str.split(';').str[2].astype('float')
data1['h17'] = data1['h17'].str.split(';').str[2].astype('float')
data1['h18'] = data1['h18'].str.split(';').str[2].astype('float')
data1['h19'] = data1['h19'].str.split(';').str[2].astype('float')
data1['h20'] = data1['h20'].str.split(';').str[2].astype('float')
data1['h21'] = data1['h21'].str.split(';').str[2].astype('float')
data1['h22'] = data1['h22'].str.split(';').str[2].astype('float')
data1['h23'] = data1['h23'].str.split(';').str[2].astype('float')
data1['h24'] = data1['h24'].str.split(';').str[2].astype('float')
data1['h25'] = data1['h25'].str.split(';').str[2].astype('float')
data1.fillna(0,inplace=True)
data1=data1.round(decimals=2)
X=data1.values
som =MiniSom(x=3,y=3,input_len=25,sigma=1.0, learning_rate=0.5)
som.random_weights_init(X)
som.train_batch(data=X ,num_iteration=1000,verbose=True)
bone()
pcolor(som.distance_map().T)
colorbar()
markers = ['o' , 's','v']
colors = ['r', 'g','y']
for i, x in enumerate(X):
w = som.winner(x)
plot(w[0] + 0.5,
w[1] + 0.5,
markers[i],
markeredgecolor = colors[i],
markerfacecolor = 'None',
markersize = 10,
markeredgewidth = 2)
show()
when I am running the code, I am getting this error:
IndexError: list index out of range
please any tips to add the markers and colors in the right way without having any problems, and I would be glad if any one can help, I am a bit new to Python and tried to find a solution but I couldn`t find any.
The problem seems to be that the length of your X=data1.values is around 25 but the length of your markers and colors is only 3. So in the following for loop, when i is 3, you are trying to access markers[3] and colors[3] which throws an IndexError because both markers and colors goes up to index 2 (indexing starts from 0 in python)
for i, x in enumerate(X):
One solution is to define custom list of 25 markers and 25 colors. While you might want to define your own markers, you can leave the colors out and let the code choose automatic colors for the markeredgecolor
I am using skimage. I need to create a mask equal in area to an image. The mask will have a region which will hide part of the image. I am building it as in the sample below but this is very slow and am sure there is a pythonic way of doing it. Could anyone highlight this please?
Code am using presently:
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
import skimage as sk
sourceimage = './sample.jpg'
img = np.copy(io.imread(sourceimage, as_gray=True))
mask = np.full(img.shape, 1)
maskpolygon = [(1,200),(300,644),(625,490),(625,1)]
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
pgon = Polygon(maskpolygon)
for r in range(mask.shape[0]):
for c in range(mask.shape[1]):
p = Point(r,c)
if pgon.contains(p):
mask[r,c] = 0
Expected result is like (for a 9x9 image - but I am working on 700x700)
[1,1,1,1,1,1,1,1,1]
[1,1,1,1,1,1,1,1,1]
[1,1,0,0,1,1,1,1,1]
[1,1,0,0,1,1,1,1,1]
[1,1,0,0,0,0,1,1,1]
[1,1,0,0,0,0,0,1,1]
[1,1,1,0,0,0,0,1,1]
[1,1,1,1,0,0,1,1,1]
[1,1,1,1,1,1,1,1,1]
I can invert 1's and 0's to show/hide region.
Thank you.
I have been able to resolve this thanks to #HansHirse.
Below is how I worked it out
sourceimage = './sample.jpg'
figuresize = (100, 100)
from skimage.draw import polygon
#open source and create a copy
img = np.copy(io.imread(sourceimage, as_gray=True))
mask = np.full(img.shape, 0)
maskpolygon = [(1,1), (280,1),(625, 280),(460, 621),(15, 625)]
maskpolygonr = [x[0] for x in maskpolygon]
maskpolygonc = [x[1] for x in maskpolygon]
rr, cc = polygon(maskpolygonr, maskpolygonc)
mask[rr ,cc] = 1
masked_image = img * mask
# show step by step what is happening
fig, axs = plt.subplots(nrows = 3, ncols = 1, sharex=True, sharey = True, figsize=figuresize )
ax = axs.ravel()
ax[0].imshow(img)#, cmap=plt.cm.gray)
ax[1].imshow(mask)#, cmap=plt.cm.gray)
ax[2].imshow(masked_image)#, cmap=plt.cm.gray)
I am trying to plot a very big file (~5 GB) using python and matplotlib. I am able to load the whole file in memory (the total available in the machine is 16 GB) but when I plot it using simple imshow I get a segmentation fault. This is most probable to the ulimit which I have set to 15000 but I cannot set higher. I have come to the conclusion that I need to plot my array in batches and therefore made a simple code to do that. My main isue is that when I plot a batch of the big array the x coordinates start always from 0 and there is no way I can overlay the images to create a final big one. If you have any suggestion please let me know. Also I am not able to install new packages like "Image" on this machine due to administrative rights. Here is a sample of the code that reads the first 12 lines of my array and make 3 plots.
import os
import sys
import scipy
import numpy as np
import pylab as pl
import matplotlib as mpl
import matplotlib.cm as cm
from optparse import OptionParser
from scipy import fftpack
from scipy.fftpack import *
from cmath import *
from pylab import *
import pp
import fileinput
import matplotlib.pylab as plt
import pickle
def readalllines(file1,rows,freqs):
file = open(file1,'r')
sizer = int(rows*freqs)
i = 0
q = np.zeros(sizer,'float')
for i in range(rows*freqs):
s =file.readline()
s = s.split()
#print s[4],q[i]
q[i] = float(s[4])
if i%262144 == 0:
print '\r ',int(i*100.0/(337*262144)),' percent complete',
i += 1
file.close()
return q
parser = OptionParser()
parser.add_option('-f',dest="filename",help="Read dynamic spectrum from FILE",metavar="FILE")
parser.add_option('-t',dest="dtime",help="The time integration used in seconds, default 10",default=10)
parser.add_option('-n',dest="dfreq",help="The bandwidth of each frequency channel in Hz",default=11.92092896)
parser.add_option('-w',dest="reduce",help="The chuncker divider in frequency channels, integer default 16",default=16)
(opts,args) = parser.parse_args()
rows=12
freqs = 262144
file1 = opts.filename
s = readalllines(file1,rows,freqs)
s = np.reshape(s,(rows,freqs))
s = s.T
print s.shape
#raw_input()
#s_shift = scipy.fftpack.fftshift(s)
#fig = plt.figure()
#fig.patch.set_alpha(0.0)
#axes = plt.axes()
#axes.patch.set_alpha(0.0)
###plt.ylim(0,8)
plt.ion()
i = 0
for o in range(0,rows,4):
fig = plt.figure()
#plt.clf()
plt.imshow(s[:,o:o+4],interpolation='nearest',aspect='auto', cmap=cm.gray_r, origin='lower')
if o == 0:
axis([0,rows,0,freqs])
fdf, fdff = xticks()
print fdf
xticks(fdf+o)
print xticks()
#axis([o,o+4,0,freqs])
plt.draw()
#w, h = fig.canvas.get_width_height()
#buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8)
#buf.shape = (w,h,4)
#buf = np.rol(buf, 3, axis=2)
#w,h,_ = buf.shape
#img = Image.fromstring("RGBA", (w,h),buf.tostring())
#if prev:
# prev.paste(img)
# del prev
#prev = img
i += 1
pl.colorbar()
pl.show()
If you plot any array with more than ~2k pixels across something in your graphics chain will down sample the image in some way to display it on your monitor. I would recommend down sampling in a controlled way, something like
data = convert_raw_data_to_fft(args) # make sure data is row major
def ds_decimate(row,step = 100):
return row[::step]
def ds_sum(row,step):
return np.sum(row[:step*(len(row)//step)].reshape(-1,step),1)
# as per suggestion from tom10 in comments
def ds_max(row,step):
return np.max(row[:step*(len(row)//step)].reshape(-1,step),1)
data_plotable = [ds_sum(d) for d in data] # plug in which ever function you want
or interpolation.
Matplotlib is pretty memory-inefficient when plotting images. It creates several full-resolution intermediate arrays, which is probably why your program is crashing.
One solution is to downsample the image before feeding it into matplotlib, as #tcaswell suggests.
I also wrote some wrapper code to do this downsampling automatically, based on your screen resolution. It's at https://github.com/ChrisBeaumont/mpl-modest-image, if it's useful. It also has the advantage that the image is resampled on the fly, so you can still pan and zoom without sacrificing resolution where you need it.
I think you're just missing the extent=(left, right, bottom, top) keyword argument in plt.imshow.
x = np.random.randn(2, 10)
y = np.ones((4, 10))
x[0] = 0 # To make it clear which side is up, etc
y[0] = -1
plt.imshow(x, extent=(0, 10, 0, 2))
plt.imshow(y, extent=(0, 10, 2, 6))
# This is necessary, else the plot gets scaled and only shows the last array
plt.ylim(0, 6)
plt.colorbar()
plt.show()