How to speed up matplotlib animation made from database results?

How to speed up matplotlib animation made from database results? - python

I'm using matplotlib to make contourplots over some maps I have stored in a database but the process is taking several hours to produce each movie. The code I'm using to make this movie is:
def load_img(option, obs_id, columnshape):
'''
This function will load the image
from the database and make the
conversion from string to nparray
'''
#starting a db session
session = makesession()
#defining the cases to query the db
case = {'Bz': Observations.mean_bz,
'Es': Observations.poyn_Es,
'En': Observations.poyn_En,
'Et': Observations.poyn_Et}
#checking if the option selected was
#a valid one
while option not in case.keys():
#feedback
print('Invalid option. Select a valid one: ', case.keys())
option = str(input())
#querying the db
s = sql.select([case[option]]).where(Observations.id == obs_id)
#fetching the result
rp = session.execute(s)
result = rp.fetchone()
#restoring the image
img = ajuste(result[0],columnshape)
return(img)
def db_animation(ar_id, option):
'''
Description
'''
vmin = -1e17
vmax = 1e17
levels = [vmin, 0.8*vmin, 0.6*vmin, 0.4*vmin, 0.2*vmin,
0.2*vmax, 0.4*vmax, 0.6*vmax,0.8*vmax,vmax]
#getting the observation ids
obs_ids = scout_obs_ids(ar_id)
#obs_ids = [x for x in range(400,450)]
#getting the columnshape
columnshape = scout_colshape(obs_ids[0])
#creating the figure objects
fig, ax = plt.subplots(figsize = (12,8))
#loading the first data
data_bz = load_img('Bz', obs_ids[0], columnshape)
data_E = load_img(option, obs_ids[0], columnshape)
#making the image objects
img1 = ax.imshow(data_bz, origin = 'lower', cmap = plt.cm.gray,
animated = True)
img2 = [ax.contourf(data_E, alpha = 0.35,
#vmax = 1e17, vmin = -1e17,
levels = levels,
origin = 'lower',
cmap = 'PiYG')]
#adding a colorbar
fig.colorbar(img2[0], shrink = 0.75, label = 'W')
def refresher(frame_number, img1,img2):
'''
description
'''
#taking the new data
new_data_bz = load_img('Bz', obs_id = obs_ids[frame_number+1],
columnshape = columnshape)
new_data_E = load_img(option, obs_id = obs_ids[frame_number+1],
columnshape = columnshape)
#setting the new data
img1.set_data(new_data_bz)
#removing the contours to start anew
for tp in img2[0].collections:
tp.remove()
img2[0] = ax.contourf(new_data_E, alpha = 0.35,
levels = levels,
origin = 'lower', cmap = 'PiYG')
return(img1, img2[0].collections,)
#using the animation function
ani = FuncAnimation(fig, refresher,
frames=range(len(obs_ids)-1),
interval = 100,
#blit = True,
fargs = [img1,img2])
#saving
ani.save("test.mp4")
return
On average those movies take 1200 images for each img object (a total around 2400) from the database. Each pair of images are individually loaded and restored to make the background image and the contour plot.
I was wondering about reasons why the processing time was escalating quickly as I increase the number of images to make the movie but could not get to a conclusion on my own. I find it particularly intriguing that when I set blit to True (which according to the documentation should help improve performance) I get the following error:
AttributeError: 'silent_list' object has no attribute 'set_animated'
I imagine that either my queries or the way I constructed my animation function are then highly inefficient. But I suspect more on the latter since when I'm normally using the DB the results are loaded in what I imagine is a reasonable time.
Can someone cast some light at this struggle for me?

Related

store values and continue running function

I am using astropy to detect sources in an image. I am trying to write a function that will detect my sources, with an option of storing their coordinates in an array, and another option to plot the sources. This is my code:
def SourceDetect(file, SIG, FWHM, THRESH, store = False, PlotStars = False):
image = pf.open(file)
im = image[0].data
mean, median, std = sigma_clipped_stats(im, sigma=SIG)
daofind = DAOStarFinder(fwhm = FWHM, threshold = THRESH * std)
sources = daofind(im - median)
positions = np.transpose((sources['xcentroid'], sources['ycentroid']))
if(store):
return positions
if(PlotStars):
apertures = CircularAperture(positions, r = 6.)
norm = ImageNormalize(stretch = SqrtStretch())
plt.imshow(im, cmap = 'Greys', origin = 'lower', norm = norm,
interpolation = 'nearest')
apertures.plot(color = 'blue', lw = 1.5, alpha = 1)
for i in range(0, len(sources)):
plt.text(sources[i]['xcentroid'], sources[i]['ycentroid'], i, color='black');
However when I run this code with both store and plot set to True only the store part of the function runs and I can't get it to plot unless I make store False. Is there a way to write this code where I will be able to have my coordinates stored and plotted?

Simple change order - first PlotStars, next store
if PlotStars:
# ... code ...
if store:
return positions
and this will first display plot and later it exits function with value.
But if you want first get value and later plot then you should run it two times - first only with store=True and later only with PlotStars=True

Grouping data in Python using Bokeh and visualizing it

I am trying to build a visual that tracks widget counts by category using hbar. The source data is not aggregated. This is what it looks like:
This data is aggregated at MktCatKey level, but I want to group by category and then perform a calculation on the counts. Lets say if the category is Category_A, I want to add +10 to the counts. Finally, I want to display both current and projected on a visual.
This is how far I have gotten:
query = open('workingsql.sql')
dataset = pd.read_sql_query(query.read(), cnxn)
query.close()
p = figure()
CurrentCount = dataset.Current
ProjCount = dataset.Projected
Cat = dataset.Category
grouped = dataset.groupby('Category')['Current','Projected'].sum()
source = ColumnDataSource(grouped)
p = figure(y_range=Cat)
p.hbar(y=Cat, right = CurrentCount, left = 0, height = 0.5,source=source, fill_color="#D7D7D7")
p.hbar(y=Cat, right = ProjCount, left = 0, height = 0.5,source=source, fill_color="#E21150")
hover = HoverTool()
hover.tooltips = [("Totals", "#Current Current Count")]
hover.mode = 'hline'
p.add_tools(hover)
show(p)
I was able to get this to work if I source directly from the dataset. But since I’m trying to perform a calculation, I cant use the source directly. I’m not fully familiar on how to do an if statement on CurrentCount to see if it’s for Category_A or not but that’s where I’m at.
I have additional things I want to do on this dataset (like bring in a goals dataset and plot against that), but taking small steps for now. Any help is appreciated.
Working code below:
import pyodbc
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, Div, Select, Slider, TextInput
from bokeh.embed import components
from bokeh.models.tools import HoverTool
query = open('workingsql.sql')
dataset = pd.read_sql_query(query.read(), cnxn)
query.close()
p = figure()
CurrentCount = dataset.Current
ProjCount = dataset.Projected
Cat = dataset.Category
grouped = dataset.groupby('Category')['Current','Projected'].sum()
source = ColumnDataSource(pd.DataFrame(grouped))
Category = source.data['Category'].tolist()
p = figure(y_range=Category)
p.hbar(y='Category', right = 'Current', left = 0, height = 0.5,source=source, fill_color="#D7D7D7")
p.hbar(y='Category', right = 'Projected', left = 0, height = 0.5,source=source, fill_color="#E21150")
hover = HoverTool()
hover.tooltips = [("Totals", "#Current Current Count")]
hover.mode = 'hline'
p.add_tools(hover)
show(p)

Openpyxl minor gridlines

I am working on a Python application where I am collecting data from a device, and attempting to plot it in an excel file by using the Openpyxl library. I am successfully able to do everything including plotting the data, and formatting the scatter plot that I made, but I am having some trouble in adding minor gridlines to the plot.
I feel like this is definitely possible because in the API, I can see under the openpyxl.chart.axis module, there is a “minorGridlines” attribute, but it is not a boolean input (ON/OFF), rather it takes a Chartlines class. I tried going a bit down the rabbit-hole of seeing how I would do this, but I am wondering what the most straightforward way of adding the minor-gridlines would be? Do you have to construct chart lines manually, or is there a simple way of doing this?
I would really appreciate any help!

I think I answered my own question, but I will post it here if anybody else needs this (as I don’t see any other answers to this question on the forum).
Example Code (see lines 4, 38):
# Imports for script
from openpyxl import Workbook # For plotting things in excel
from openpyxl.chart import ScatterChart, Reference, Series
from openpyxl.chart.axis import ChartLines
from math import log10
# Variables for script
fileName = 'testFile.xlsx'
dataPoints = 100
# Generating a workbook to test with
wb = Workbook()
ws = wb.active # Fill data into the first sheet
ws_name = ws.title
# We will just generate a logarithmic plot, and scale the axis logarithmically (will look linear)
x_data = []
y_data = []
for i in range(dataPoints):
x_data.append(i + 1)
y_data.append(log10(i + 1))
# Go back through the data, and place the data into the sheet
ws['A1'] = 'x_data'
ws['B1'] = 'y_data'
for i in range(dataPoints):
ws['A%d' % (i + 2)] = x_data[i]
ws['B%d' % (i + 2)] = y_data[i]
# Generate a reference to the cells that we can plot
x_axis = Reference(ws, range_string='%s!A2:A%d' % (ws_name, dataPoints + 1))
y_axis = Reference(ws, range_string='%s!B2:B%d' % (ws_name, dataPoints + 1))
function = Series(xvalues=x_axis, values=y_axis)
# Actually create the scatter plot, and append all of the plots to it
ScatterPlot = ScatterChart()
ScatterPlot.x_axis.minorGridlines = ChartLines()
ScatterPlot.x_axis.scaling.logBase = 10
ScatterPlot.series.append(function)
ScatterPlot.x_axis.title = 'X_Data'
ScatterPlot.y_axis.title = 'Y_Data'
ScatterPlot.title = 'Openpyxl Plotting Test'
ws.add_chart(ScatterPlot, 'D2')
# Save the file at the end to output it
wb.save(fileName)
Background on solution:
I looked at how the code for Openpyxl generates the Major axis gridlines, which seems to follow a similar convention as the Minor axis gridlines, and I found that in the ‘NumericAxis’ class, they generated the major gridlines with the following line (labeled ‘##### This Line #####’ which is originally copied from the ‘openpyxl->chart->axis’ file):
class NumericAxis(_BaseAxis):
tagname = "valAx"
axId = _BaseAxis.axId
scaling = _BaseAxis.scaling
delete = _BaseAxis.delete
axPos = _BaseAxis.axPos
majorGridlines = _BaseAxis.majorGridlines
minorGridlines = _BaseAxis.minorGridlines
title = _BaseAxis.title
numFmt = _BaseAxis.numFmt
majorTickMark = _BaseAxis.majorTickMark
minorTickMark = _BaseAxis.minorTickMark
tickLblPos = _BaseAxis.tickLblPos
spPr = _BaseAxis.spPr
txPr = _BaseAxis.txPr
crossAx = _BaseAxis.crossAx
crosses = _BaseAxis.crosses
crossesAt = _BaseAxis.crossesAt
crossBetween = NestedNoneSet(values=(['between', 'midCat']))
majorUnit = NestedFloat(allow_none=True)
minorUnit = NestedFloat(allow_none=True)
dispUnits = Typed(expected_type=DisplayUnitsLabelList, allow_none=True)
extLst = Typed(expected_type=ExtensionList, allow_none=True)
__elements__ = _BaseAxis.__elements__ + ('crossBetween', 'majorUnit',
'minorUnit', 'dispUnits',)
def __init__(self,
crossBetween=None,
majorUnit=None,
minorUnit=None,
dispUnits=None,
extLst=None,
**kw
):
self.crossBetween = crossBetween
self.majorUnit = majorUnit
self.minorUnit = minorUnit
self.dispUnits = dispUnits
kw.setdefault('majorGridlines', ChartLines()) ######## THIS Line #######
kw.setdefault('axId', 100)
kw.setdefault('crossAx', 10)
super(NumericAxis, self).__init__(**kw)
#classmethod
def from_tree(cls, node):
"""
Special case value axes with no gridlines
"""
self = super(NumericAxis, cls).from_tree(node)
gridlines = node.find("{%s}majorGridlines" % CHART_NS)
if gridlines is None:
self.majorGridlines = None
return self
I took a stab, and after importing the ‘Chartlines’  class like so:
from openpyxl.chart.axis import ChartLines
 
I was able to add minor gridlines to the x-axis like so:
ScatterPlot.x_axis.minorGridlines = ChartLines()
As far as formatting the minor gridlines, I’m at a bit of a loss, and personally have no need, but this at least is a good start.

gee 'sampleRectangle()' returning 1x1 array

I'm facing an issue when trying to use 'sampleRectangle()' function in GEE, it is returning 1x1 arrays and I can't seem to find a workaround. Please, see below a python code in which I'm using an approach posted by Justin Braaten. I suspect there's something wrong with the geometry object I'm passing to the function, but at the same time I've tried several ways to check how this argument is behaving and couldn't no spot any major issue.
Can anyone give me a hand trying to understand what is happening?
Thanks!
import json
import ee
import numpy as np
import matplotlib.pyplot as plt
ee.Initialize()
point = ee.Geometry.Point([-55.8571, -9.7864])
box_l8sr = ee.Geometry(point.buffer(50).bounds())
box_l8sr2 = ee.Geometry.Polygon(box_l8sr.coordinates())
# print(box_l8sr2)
# Define an image.
# l8sr_y = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_038029_20180810')
oli_sr_coll = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
## Function to mask out clouds and cloud-shadows present in Landsat images
def maskL8sr(image):
## Bits 3 and 5 are cloud shadow and cloud, respectively.
cloudShadowBitMask = (1 << 3)
cloudsBitMask = (1 << 5)
## Get the pixel QA band.
qa = image.select('pixel_qa')
## Both flags should be set to zero, indicating clear conditions.
mask = qa.bitwiseAnd(cloudShadowBitMask).eq(0)
mask = qa.bitwiseAnd(cloudsBitMask).eq(0)
return image.updateMask(mask)
l8sr_y = oli_sr_coll.filterDate('2019-01-01', '2019-12-31').map(maskL8sr).mean()
l8sr_bands = l8sr_y.select(['B2', 'B3', 'B4']).sampleRectangle(box_l8sr2)
print(type(l8sr_bands))
# Get individual band arrays.
band_arr_b4 = l8sr_bands.get('B4')
band_arr_b3 = l8sr_bands.get('B3')
band_arr_b2 = l8sr_bands.get('B2')
# Transfer the arrays from server to client and cast as np array.
np_arr_b4 = np.array(band_arr_b4.getInfo())
np_arr_b3 = np.array(band_arr_b3.getInfo())
np_arr_b2 = np.array(band_arr_b2.getInfo())
print(np_arr_b4.shape)
print(np_arr_b3.shape)
print(np_arr_b2.shape)
# Expand the dimensions of the images so they can be concatenated into 3-D.
np_arr_b4 = np.expand_dims(np_arr_b4, 2)
np_arr_b3 = np.expand_dims(np_arr_b3, 2)
np_arr_b2 = np.expand_dims(np_arr_b2, 2)
# # print(np_arr_b4.shape)
# # print(np_arr_b5.shape)
# # print(np_arr_b6.shape)
# # Stack the individual bands to make a 3-D array.
rgb_img = np.concatenate((np_arr_b2, np_arr_b3, np_arr_b4), 2)
# print(rgb_img.shape)
# # Scale the data to [0, 255] to show as an RGB image.
rgb_img_test = (255*((rgb_img - 100)/3500)).astype('uint8')
# plt.imshow(rgb_img)
plt.show()
# # # create L8OLI plot
# fig, ax = plt.subplots()
# ax.set(title = "Satellite Image")
# ax.set_axis_off()
# plt.plot(42, 42, 'ko')
# img = ax.imshow(rgb_img_test, interpolation='nearest')

I have the same issue. It seems to have something to do with .mean(), or any reduction of image collections for that matter.
One solution is to reproject after the reduction. For example, you could try adding "reproject" at the end:
l8sr_y = oli_sr_coll.filterDate('2019-01-01', '2019-12-31').map(maskL8sr).mean().reproject(crs = ee.Projection('EPSG:4326'), scale=30)
It should work.

Matplotlib connects end of data after animation refreshes

I'm having an issue exactly like this post, but slightly more frustrating.
I'm using matplotlib to read from a file that is being fed data from another application. For some reason, the ends of the data only connect after the animation (animation.FuncAnimation) has completed its first refresh. Here are some images:
This is before the refresh:
And this is after the refresh:
Any ideas as to why this could be happening? Here is my code:
import json
import itertools
import dateutil.parser
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from matplotlib import style
import scipy.signal as sig
import numpy as np
import pylab as plt
sensors = {}
data = []
lastLineReadNum = 0
class Sensor:
def __init__(self, name, points = 0, lastReading = 0):
self.points = points
self.lastReading = lastReading
self.name = name
self.x = []
self.y = []
class ScanResult:
def __init__(self, name, id, rssi, macs, time):
self.name = name
self.id = id
self.rssi = rssi
self.macs = macs
# Is not an integer, but a datetime.datetime
self.time = time
def readJSONFile(filepath):
with open(filepath, "r") as read_file:
global lastLineReadNum
# Load json results into an object that holds important info
for line in itertools.islice(read_file, lastLineReadNum, None):
temp = json.loads(line)
# Only reads the most recent points...
data.append(ScanResult(name = temp["dev_id"],
id = temp["hardware_serial"],
rssi = temp["payload_fields"]["rssis"],
macs = temp["payload_fields"]["macs"],
time = dateutil.parser.parse(temp["metadata"]["time"])))
lastLineReadNum += 1
return data
style.use('fivethirtyeight')
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
def smooth(y, box_pts):
box = np.ones(box_pts)/box_pts
y_smooth = np.convolve(y, box, mode='same')
return y_smooth
def determineClosestSensor():
global sensors
#sensors.append(Sensor(time = xs3, rssi = ys3))
def determineXAxisTime(scanresult):
return ((scanresult.time.hour * 3600) + (scanresult.time.minute * 60) + (scanresult.time.second)) / 1000.0
def animate(i):
data = readJSONFile(filepath = "C:/python_testing/rssi_logging_json.json")
for scan in data:
sensor = sensors.get(scan.name)
# First time seeing the sensor
if(sensor == None):
sensors[scan.name] = Sensor(scan.name)
sensor = sensors.get(scan.name)
sensor.name = scan.name
sensor.x.append(determineXAxisTime(scan))
sensor.y.append(scan.rssi)
else:
sensor.x.append(determineXAxisTime(scan))
sensor.y.append(scan.rssi)
ax1.clear()
#basic smoothing using nearby averages
#y_smooth3 = smooth(np.ndarray.flatten(np.asarray(sensors.get("sentrius_sensor_3").y)), 1)
for graphItem in sensors.itervalues():
smoothed = smooth(np.ndarray.flatten(np.asarray(graphItem.y)), 1)
ax1.plot(graphItem.x, smoothed, label = graphItem.name, linewidth = 2.0)
ax1.legend()
determineClosestSensor()
fig.suptitle("Live RSSI Graph from Sentrius Sensors", fontsize = 14)
def main():
ani = animation.FuncAnimation(fig, animate, interval = 15000)
plt.show()
if __name__ == "__main__":
main()

As far as I can tell you are regenerating your data in each animation step by appending to the existing datasets, but then this means that your last x point from the first step is followed by the first x point in the second step, leading to a rewind in the plot. This appears as the line connecting the last datapoint with the first one; the rest of the data is unchanged.
The relevant part of animate:
def animate(i):
data = readJSONFile(filepath = "C:/python_testing/rssi_logging_json.json")
for scan in data:
sensor = sensors.get(scan.name)
# First time seeing the sensor
if(sensor is None): # always check for None with `is`!
... # stuff here
else:
sensor.x.append(determineXAxisTime(scan)) # always append!
sensor.y.append(scan.rssi) # always append!
... # rest of the stuff here
So, in each animation step you
1. load the same JSON file
2. append the same data to an existing sensor identified by sensors.get(scan.name)
3. plot stuff without ever using i.
Firstly, your animate should naturally make use of the index i: you're trying to do something concerning step i. I can't see i being used anywhere.
Secondly, your animate should be as lightweigh as possible in order to get a smooth animation. Load your data once before plotting, and only handle the drawing differences in animate. This will involve slicing or manipulating your data as a function of i.
Of course if the file really does change from step to step, and this is the actual dynamics in the animation (i.e. i is a dummy variable that is never used), all you need to do is zero-initialize all the plotting data in each step. Start with a clean slate. Then you'll stop seeing the lines corresponding to these artificial jumps in the data. But again, if you want a lightweigh animate, you should look into manipulating the underlying data of existing plots rather than replotting everything all the time (especially since calls to ax1.plot will keep earlier points on the canvas, which is not what you usually want in an animation).

try changing :
ani = animation.FuncAnimation(fig, animate, interval = 15000)
to :
ani = animation.FuncAnimation(fig, animate, interval = 15000, repeat = False)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.