I want to visualize data in a LinePlot using reportlab. The data has x-axis values (timestamps) with the form YYYYMMDDHHMMSS. I know that a reportlab x-axis class NormalDateXValueAxis exists but it only takes dates (YYYYMMDD) and does not allow to use time.
One question is does reportlab already support this with any class that I have not found yet?
A different approach I am trying is to simply use the timestamp string as x-axis values and define a formatter for these values. An example is:
from reportlab.graphics.charts.lineplots import LinePlot
from reportlab.graphics.shapes import Drawing, _DrawingEditorMixin
from datetime import datetime
def formatter(val):
dtstr = str(int(val))
print(dtstr)
dt = (datetime.strptime(str(int(val)), "%Y%m%d%H%M%S")).strftime("%d.%m.%Y %H:%M:%S")
return dt
class Test(_DrawingEditorMixin, Drawing):
def __init__(self,width=258,height=150,*args,**kw):
Drawing.__init__(self,width,height,*args,**kw)
# font
fontSize = 7
# chart
self._add(self,LinePlot(),name='chart',validate=None,desc=None)
self.chart.y = 16
self.chart.x = 32
self.chart.width = 212
self.chart.height = 90
# x axis
self.chart.xValueAxis.labels.fontSize = fontSize-1
self.chart.xValueAxis.labelTextFormat = formatter
# y axis
self.chart.yValueAxis.labels.fontSize = fontSize -1
# sample data
self.chart.data = [
[
(20200225130120, 100),
(20200225130125, 0),
(20200225130130, 300),
(20200225130135, 0),
(20200225130140, 500),
(20200225130145, 0),
(20200225130150, 700),
(20200225130155, 0),
(20200225130315, 900)
]
]
if __name__=="__main__": #NORUNTESTS
Test().save(formats=['pdf'],outDir='.',fnRoot=None)
But I have two problems with this aproach.
The values given to the formatter are unpredictable (at least for me). Reportlab seems to modify the ticks in a way it deems to be best. The result is sometimes there are values that are not valid timestamps and can't be parsed by datetime. I sometimes got the exception that seconds must be between 0 and 59. Reportlab created a tick with value 20200225136000.
Since the x axis does not know that that these values are timestamps it still leaves room for 20200225135961, 20200225135965, etc. The result is a gap in the graph.
One question is does reportlab already support this with any class
that I have not found yet?
Not that I know of, but I think what you want can be achieved from ValueAxis. If you can change the library, I suggest you to do with matplotlib as I've seen previous working examples. You can also try to see if PYX (which is also a good alternative to ReportLab) deals with such scenarios, but I didn't find any.
Inside the documentation it has a function inside the lineplots.py file called SimpleTimeSeriesPlot(LinePlot)
By looking at it when you specify .xValueAxis It will read your data as a date (how much flexibility this has I am not sure as I haven't tested it but it's worth testing that out)
Instead of calling LinePlot() you would call SimpleTimeSeriesPlot() and push through that same line of code and just add the .xValueAxis to your code.
You can also specify min and max dates to parse by doing .xValueAxis.valueMin or Max
Related
I am trying to convert data obtained using a metatrader module for python found on the official mql5 website. I am trying to use tick data rather than importing candlestick data. Tick bars or candlesticks sample a set amount of ticks rather than a set amount of time in order to calculate ohlc. For example, 100 ticks creates a candle instead of 1 minute. Using the functions to copy ticks from metatrader5
copy_ticks_from
or
copy_ticks_range
results in a dataframe called copy_ ticks_from or copy_ticks_range but the data output is the same format.
time bid ask last volume time_msc flags volume_real
Ive watched videos and searched and searched, and will continue to but any help is greatly appreciated.
an example of code input and out can be found at https://www.mql5.com/en/docs/integration/python_metatrader5/mt5copyticksfrom_py
edit426221500
I was inspired by this article https://towardsdatascience.com/advanced-candlesticks-for-machine-learning-i-tick-bars-a8b93728b4c5
I think I am understanding a but more after this read through. I believe i need to use similar code to get desired output. Im working on converting my dataframe to a numpy array. After I will modify the code found in the reference above to be
Something like
def generate_tickbars(ticks, frequency=1000):
times = ticks[:,0]
time = ticks[:,1]
prices = ticks[:,2,3]
not sure about volume or the preceding lines but I think im on the right track or this may at least be one way of doing it.
researching from https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_numpy.html and then going to try and get the conversion working.
Edit426221640
using
ticks_frame.to_numpy(dtype=None, copy=True,)
I get a numpy array as an output.
array([[Timestamp('2020-01-10 01:05:00'), 1552.91, 1553.16, ...,
1578618300331, 134, 0.0],
[Timestamp('2020-01-10 01:05:00'), 1552.83, 1553.32, ...,
1578618300634, 134, 0.0],
[Timestamp('2020-01-10 01:05:01'), 1552.87, 1553.32, ...,
1578618301834, 130, 0.0],
...,etc
I am now stuck at the code referenced in the link above from the previous edit.
# expects a numpy array with trades
# each trade is composed of: [time, price, quantity]
def generate_tickbars(ticks, frequency=1000):
times = ticks[:,0]
prices = ticks[:,1]
volumes = ticks[:,2]
res = np.zeros(shape=(len(range(frequency, len(prices), frequency)), 6))
it = 0
for i in range(frequency, len(prices), frequency):
res[it][0] = times[i-1] # time
res[it][1] = prices[i-frequency] # open
res[it][2] = np.max(prices[i-frequency:i]) # high
res[it][3] = np.min(prices[i-frequency:i]) # low
res[it][4] = prices[i-1] # close
res[it][5] = np.sum(volumes[i-frequency:i]) # volume
it += 1
return res
How do make this work for my data? Is there a simpler way to accomplish this?
Edit426221745
I believe i have resampled the data correctly using different approach.
def bar(xs, y): return np.int64(xs / y) * y
ticks_frame.groupby(bar(np.arange(len(ticks_frame)),
1000)).agg({'bid': 'ohlc', 'volume': 'sum'})
Now onto plotting bars or candlesticks.
Edit426222250
Still stuck at the point of last edit. Although i can use bid for ohlc and group ticks and view that way it seems my issue is i need to reshape the dataframe or create a new dataframe from ticks_frame that uses bid to calculate ohlc values. Any and all help is greatly appreciated.
Ive seen many different answers, but need one specifically for the use of Plotly in Python....my code is below, but the Y axis doesn't come back in basic decimal points (I believe it come back in some micro format where instead of .000258 it will show 258.XXX)
floki_ohlc = df.iplot(kind = "ohlc",fill = True, colorscale = "rdylbu", theme = "solar",
title = "FLOKI INU/USDT", xTitle = "Time", yTitle = "Price (USD)")
And I cant find anything in the documentation about changing the values, only the titles.
Thanks in advance#
I'm guessing you're using express. I used the diamonds dataset from the R library ggplot2 for the plot.
It can be done in Python when you create the graph. Since I don't have your data, I've made a few examples and what changes and how it changes when I use it.
Plotly formatting uses D3. You can read more about that here.
import pandas as pd
pd.options.plotting.backend = "plotly"
diam_py = r.diamonds
df = pd.DataFrame(diam_py)
fig = df.plot.bar(x = "color",
y = "price",
color = "cut",
template = "simple_white",
barmode = "group")
fig.show()
fig.update_yaxes(tickformat=",.0f").show() # use thousand comma; round to whole number
fig.update_yaxes(tickformat="none").show() # show number as is
fig.update_yaxes(tickformat=",.0s").show() # SI prefix with 0 significant digits
fig.update_yaxes(tickformat="#x").show() # prefixed lowercase hexidecimal
# more details at https://github.com/d3/d3-format
The figures are in the order in which they appear in the code.
Default y-axis formatting
Use commas at the thousands & round to the whole number
Remove all formatting—show as is in the data
SI prefix with no significant digits (M for millions here)
Hexidecimal formatting...in case binary is of interest
I am pretty new to Python programing and have already been confronted with a problem that drives me insane. I kept on searching for the problem - even here at stack overflow. However, I didn't get any solution to solve my problem, which made me sign up for this site.
Nevertheless, this is my problem:
I have several txt files, that contain 3 columns. The first one can be neglected, the second one contains a mixture of date and time, separated with the letter "T" and the third column contains the value (pressure, temperature, what so ever).
Now, what I want to do is, to plot the second column (time and date) on the x axis and the value on the y axis.
I've tried MANY codes - also some are described here at stack overflow - but none of them was the one I was searching for and brought the right results.
More detailed, this is what a my txt files look like:
# MagPy ASCII
234536326.456,2014-06-17T14:23:00.000000,459.7463393940044
674346235.235,2014-06-17T14:28:00.000000,462.8783040474751
and so on.
Forget about the first column. Only the second and third one are relevant. So here, I guess, I have to skip the first line (and the first column), right?
HOWEVER - and here comes the part I cannot solve - with this "T" inside the second column, this becomes a string format.
One of my many errors I get is: could not convert string to float
Well, I searched stack overflow and came across the following code:
x, y = np.loadtxt('example.txt', dtype=int, delimiter=',',
unpack=True, usecols=(1,2))
plt.plot(x, y)
plt.title('example 1')
plt.xlabel('D')
plt.ylabel('Frequency')
plt.show()
I edited the "usecols" to 1 and 2, but with this code, I get the error: list index out of range
So, it doesn't matter what I do, I get an error any time. And the only thing I want is a plot (with matplotlib), that contains time and date on the x axis and the value (e.g. 459.7463393940044 from above) on the y axis.
And talking about what I need: At the end, I have to put several diagrams (about 4-6), that were generated with MANY txt file data, in one figure.
Please, can anyone help me with this? I'd appreciate your help a lot!
This is numpy datetime format. You need to add an explicit converter for the datetime field. The documentation contains additional format details. This question shows how to fill in the converters argument.
date, closep, highp, lowp, openp, volume =
np.loadtxt(f, delimiter=',', unpack=True,
converters={0:mdates.strpdate2num('%d-%b-%y')})
Is that enough to lead you to a full solution?
First, thanks for your response! Unfortunately, this doesn't work at all. I tried it with your converter and combined it with my code, but it didn't work out well. I tried then this code:
# Converter function
datefunc = lambda x: mdates.date2num(datetime.strptime(x, '%d %m %Y %H %M %S'))
# Read data from 'file.dat'
dates, levels = np.genfromtxt('BMP085_10085001_0001_201508.txt', # Data to be read
delimiter=19, # First column is 19 characters wide
converters={1: datefunc}, # Formatting of column 0
dtype=float, # All values are floats
unpack=True) # Unpack to several variables
fig = plt.figure()
ax = fig.add_subplot(111)
# Configure x-ticks
ax.set_xticks(dates) # Tickmark + label at every plotted point
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%Y %H:%M'))
ax.plot_date(dates, levels, ls='-', marker='o')
ax.set_title('title')
ax.set_ylabel('Waterlevel (m)')
ax.grid(True)
# Format the x-axis for dates (label formatting, rotation)
fig.autofmt_xdate(rotation=45)
fig.tight_layout()
fig.show()
And I get an list index out of range error again. Seriously, I do not know any possible solution to make it look like that in the end: {Plot date and time (x axis) versus a value (y axis) using data from file} - see diagram on the bottom of the page.
I've had success doing the following. First, it's ok to read in your ISO8601 date as a string (so a basic read a line and split on comma will work). To convert the date string to a datetime object you can use
import dateutil
# code to read in date_strings and my_values as lists goes here ...
# Here's the magic to parse the ISO8601 strings and make them into datetime objects
my_dates = dateutil.parser.parse(date_strings)
# Now plot the results
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot_date(x=my_dates, y=my_values, marker='o', linestyle='')
ax.set_xlabel('Time')
I am analysing race results from a CSV which looks like this:
Position,Time,Race #,Batch,Name,Surname,Category,Sex,Age
1,00:25:04,58,E,Luke,Schlebusch,Junior,Male,17
2,00:25:16,92,E,Anrich,Zimmermann,Junior,Male,17
3,00:26:27,147,E,Ryan,Mathaba,Open,Male,33
4,00:26:58,53,E,Daniel,Rademan,Junior,Male,16
5,00:27:17,19,E,Werner,Du Preez,Open,Male,29
6,00:27:44,148,E,Mazu,Ndandani,Open,Male,37
7,00:27:45,42,E,Dakota,Murphy,Open,Male,20
8,00:28:29,56,E,David,Schlebusch,Master,Male,51
9,00:28:32,52,E,Caleb,Rademan,Minimee,Male,12
I am using the following call to read_csv to parse this into a Pandas dataframe:
race1 = pandas.read_csv('data.csv', parse_dates='Time', index_col='Time')
This enables me to plot a cumulative distribution of race times very easily by just doing:
race1.Position.plot()
Pandas handles all the intricacies of the date data type and makes a nice x axis with proper formatting of the times.
Is there an elegant way of getting a histogram of times which is similarly straightforward? Ideally, I would like to be able to do race1.index.hist() or race1.index.to_series().hist(), but I know that doesn't work.
I've been able to coerce the time to a timedelta and get a working result with
times = race1.index.to_series()
((times - times[0]).dt.seconds/60).hist()
This produces a histogram of the correct shape, but obviously with wrong x values (they are off by the fastest time).
Is there an elegant way to read the column as a timedelta to begin with, and is there a better way of creating the histogram, including proper ticks? Proper ticks here mean that they use the correct locator and updates properly.
This appears to work pretty well, although I would be happier with it if it didn't go through the Matplotlib date specifics regarding ordinal dates.
times = race1.index.to_series()
today = pandas.Timestamp('00:00:00')
timedelta = times - today
times_ordinal = timedelta.dt.seconds/(24*60*60) + today.toordinal()
ax = times_ordinal.hist()
ax.xaxis_date()
plt.gcf().autofmt_xdate()
plt.ylabel('Number of finishers')
I've got data like this:
col1 ;col2
2001-01-01;1
2001-01-01;2
2001-01-02;3
2001-01-03;4
2001-01-03;2
2001-01-04;2
I'm reading it in Python/Pandas using pd.read_csv(...) into a DataFrame.
Now I want to plot col2 on the y-axis and col1 on the x-axis day-wise. I searched a lot but couldn't too many very useful pages describing this in detail. I found that matplotlib does currently NOT support the dataformat in which the dates are stored in (datetime64).
I tried converting it like this:
fig, ax = plt.subplots()
X = np.asarray(df['col1']).astype(DT.datetime)
xfmt = mdates.DateFormatter('%b %d')
ax.xaxis.set_major_formatter(xfmt)
ax.plot(X, df['col2'])
plt.show()
but this does NOT work.
What is the best way?
I can only find bits there and bits there, but nothing really working in complete and more importantly, up-to-date ressources related to this functionality for the latest version of pandas/numpy/matplotlib.
I'd also be interested to convert this absolut dates to consecutive day-indices, i.e:
The starting day 2001-01-01 is Day 1, thus the data would look like this:
col1 ;col2 ; col3
2001-01-01;1;1
2001-01-01;2;1
2001-01-02;3;2
2001-01-03;4;3
2001-01-03;2;3
2001-01-04;2;4
.....
2001-02-01;2;32
Thank you very much in advance.
Pandas.read_csv supports parse_dates=True (default of course is False) That would save you converting the dates separately.
Also for a simple dataframe like this, pandas plot() function works perfectly well.
Example:
dates = pd.date_range('20160601',periods=4)
dt = pd.DataFrame(np.random.randn(4,1),index=dates,columns=['col1'])
dt.plot()
plt.show()
Ok as far as I can see there's no need anymore to use matplotlib directly, but instead pandas itself already offer plotting functions which can be used as methods to the dataframe-objects, see http://pandas.pydata.org/pandas-docs/stable/visualization.html. These functions themselves use matplotlib, but are easier to use because they handle the datatypes correctly themselves :-)