At present I'm using gnuplot to plot data against a time line. However the precision of the time line is in milliseconds but gnuplot only seems to be able to handle seconds.
I've looked at a couple of alternatives, but really I just need something like gnuplot that can cope with fractions of a second.
The programming language used for the main script is Python and whilst I've looked at matplotlib, it seems to be a lot more 'heavy duty' than gnuplot. As I won't always be the one updating the graphing side of things, I want to keep it as easy as possible.
Any suggestions?
Update
I'm using this with gnuplot:
set xdata time
set timefmt "%Y-%m-%d-%H:%M:%S"
However there is no %f to get milliseconds. For example, this works:
2011-01-01-09:00:01
but I need:
2011-01-01-09:00:01.123456
According to the gnuplot 4.6 manual it states, under "Time/date specifiers" (page 114 of the gnuplot 4.6 PDF):
%S - second, integer 0–60 on output, (double) on input
What this means is that when reading timestamps such as 2013-09-16 09:56:59.412 the fractional portion will be included as part of the %S specifier. Such a timestamp will be handled correctly with:
set timefmt "%Y-%m-%d %H:%M:%S"
set datafile separator ","
plot "timed_results.data" using 1:2 title 'Results' with lines
and fed with data like:
2013-09-16 09:56:53.405,10.947
2013-09-16 09:56:54.392,10.827
2013-09-16 09:56:55.400,10.589
2013-09-16 09:56:56.394,9.913
2013-09-16 09:56:58.050,11.04
You can set the ticks format with
set format x '%.6f'
or (maybe, I have not tried it, as I now prefer to use Matplotlib and do not have gnuplot installed on my machines):
set timefmt "%Y-%m-%d-%H:%M:%.6S"
(note the number of digits specified along with the %S format string).
More details can be found in the excellent not so Frequently Asked Questions.
I'm using gnuplot for the same purposes, my input looks like:
35010.59199,100,101
35010.76560,100,110
35011.05703,100,200
35011.08119,100,110
35011.08154,100,200
35011.08158,100,200
35011.08169,100,200
35011.10814,100,200
35011.16955,100,110
35011.16985,100,200
35011.17059,100,200
The first column is seconds since midnight and after the comma a nanosecond part. You can save this in a csv file and in gnuplut do:
set datafile separator ','
plot "test.csv" using 1:3 with lines
I originally misunderstood your problem. I think the finer resolution to the time format is a big problem with gnuplot and one that to my knowledge is not implemented.
One possible work-around would be to use awk to convert your date into the number of seconds with something like
plot "<awk 'your_awk_one_liner' file1.dat" with lines
and then just do a regular double by double plot and forget that it was every time at all (a bit like Martin's solution).
I'm afraid I am not very good with awk and so I cannot help with this bit -
these pages might help though -
http://www.gnu.org/manual/gawk/html_node/Time-Functions.html and http://www.computing.net/answers/unix/script-to-convert-datetime-to-seco/3795.html.
The use of awk with gnuplot is described here: http://t16web.lanl.gov/Kawano/gnuplot/datafile3-e.html.
You could then plot a second axis (and not the data) with the correct times - something like the method used here: Is there a way to plot change of day on an hourly timescale on the x axis?
I'm afraid I don't have time to try and write a complete solution - but something reasonable should be possible.
Good luck - keep us updated if you get something working - I would be interested.
Related
Working with Jupyter Lab!
I loaded a simple ASCII file (as I've done 100s of times before...), with three columns and 2000+ lines. I did it with
with open(file) as f:
d = f.readlines()
and also with NumPy d = np.loadtxt(file, delimiter=',') to see if something would change.
All values in the 2nd column (Latitude) are -32. and something and all values in the 3rd column (Longitude) are -52. and something. However, de variation of the data is about the 5th decimal number... (I think this is making things weird!)
When I printed the data on the screen, it seems ok! But when I try to plot them, I got pretty weird stuff... the numbers in the X and Y axis are nonsense, especially the one scaling the x-axis. 1e-5-5.2103e1
I open the data in a spreadsheet (Libre Calc), and there the plot seems alright. Then I save it with another name and tried to load in the Jupyter again and got the same weird result.
I also tried it also using a different computer... same result!
Tried a script using Atom... same result!
Can someone give a clue about what is going on?
The file is shared at:
https://drive.google.com/file/d/1eDwlijQ7y3KoIRafoE00eqK3UYsIcMvf/view?usp=sharing
First lines o the file...
9738,-32.13689233,-52.10339483
9739,-32.13689233,-52.10339483
9740,-32.13689233,-52.10339483
9741,-32.13689233,-52.10339483
9742,-32.13689233,-52.10339483
9743,-32.13689233,-52.10339483
9744,-32.13689233,-52.10339483
9745,-32.13689233,-52.10339483
9746,-32.13689233,-52.10339483
9747,-32.13689233,-52.10339483
9748,-32.13689233,-52.10339483
9749,-32.13689433,-52.10339417
9750,-32.13689433,-52.10339417
9751,-32.13689433,-52.10339417
9752,-32.13689433,-52.10339417
9753,-32.13689433,-52.10339417
9754,-32.13689433,-52.10339417
9755,-32.13689433,-52.10339417
9756,-32.13689433,-52.10339417
9757,-32.13689433,-52.10339417
9758,-32.13689433,-52.10339417
9759,-32.13688733,-52.10339367
It looks like your plot is correct, even though the scaling of the axes is quite strange (likely due to the very small range). I plotted the same data using Altair, which was able to handle the axes ranges much better:
import pandas as pd
import altair as alt
with open("lat_long.csv", newline="") as f:
frame = pd.read_csv(f, delimiter=",", header=None, names=["index", "longitude", "latitude"])
alt.Chart(frame).mark_circle(size=10).encode(
alt.X('latitude',
scale=alt.Scale(zero=False)
),
alt.Y('longitude',
scale=alt.Scale(zero=False)
),
).interactive()
The result matches yours, except for the more sensible axes:
My guess is that the pandas plotter is just doing something strange with displaying the axes, but the data is being read in correctly.
This is how matplotlib displays very large and small tick markers. For example, to read the y-axis you take that number at the top and add each tick marker. I'll admit, the number on the x-axis is weird, but you can see a recognizable prefix in there: -5.2103e1. The 1e-5 might just be signifying the decimal place the ticks start at. Looking through the file, the plot makes sense, and the labels do as well.
I have a dataframe with dates in string format. I convert those dates to timestamp, so that I could use this date column in the later part of the code. Everything is fine with calculations/comparisons etc, but I would like the timestamp to appear in %d.%m.%Y format, as opposed to default %Y-%m-%d. Let me illustrate it -
dt=pd.DataFrame({'date':['09.12.1998','07.04.2014']},index=[1,2])
dt
Out[4]:
date
1 09.12.1998
2 07.04.2014
dt['date_1']=pd.to_datetime(dt['date'],format='%d.%m.%Y')
dt
Out[7]:
date date_1
1 09.12.1998 1998-12-09
2 07.04.2014 2014-04-07
I would like to have dt['date_1'] to de displayed in the same format as dt['date']. I don't wish to use .strftime() function because it will convert the datatype from timestamp to string.
In Nutshell: How can I invoke the python system in displaying the timestamp in the format of my choice(months could be like APR, MAY etc), rather than getting a default format(like 1998-12-09), keeping in mind that the data type remains a timestamp, rather than string?
It seems Pandas didn't implement this option yet:
https://github.com/pandas-dev/pandas/issues/11501
having a look at https://pandas.pydata.org/pandas-docs/stable/options.html looks like you can set the display to achieve some of this, although not all.
display.date_dayfirst When True, prints and parses dates with the day first, eg 20/01/2005
display.date_yearfirst When True, prints and parses dates with the year first, eg 2005/01/20
so you can have dayfirst, but they haven't included names for months.
On a more fundamental level, whenever you're displaying something it is a string, right? I'm not sure why you wouldn't be able to convert it when you're displaying it without having to change the original dataframe.
your code would be:
pd.set_option("display.date_dayfirst", True)
except actually this doesn't work:
https://github.com/pandas-dev/pandas/issues/11501
the options have been implemented for parsing, but not for displaying.
Hallo Stael/Cezar/Droravr, Thank you all for providing your inputs. I value your time and appreciate your help a lot. Thanks for sharing this link https://github.com/pandas-dev/pandas/issues/11501 as well. I went through the link and understood that this problem can be broken down to a 'displaying problem' ultimately, as also expounded by jreback. This issue to have the dates displayed to your desired format has been marked as an Enhancement, so probably will be added to future versions.
All I wanted was the have to dates exported as dd-mm-yyy and by just formatting the string while exporting, we could solve this problem.
So, I sorted this issue by exporting the file as -
dt.to_csv(filename, date_format='%d-%m-%Y',index=False).
date date_1
09.12.1998 09-12-1998
07.04.2014 07-04-2014
Thus, this issue stands SOLVED.
Once again, thank you all for your kind help and the precious hours you spent with this issue. Deeply appreciated.
I am trying to generate all 16^16,
but there are a few problems. Mainly memory.
I tried to generate them in python like this:
for y in range (0, 16**16):
print '0x%0*X' % (16,y)
This gives me:
OverflowError: range() result has too many items
If I use sys.maxint I get a MemoryError.
To be more precise, I want to generate all combinations of HEX in length of 16, i.e:
0000000000000000
0000000000000001
0000000000000002
...
FFFFFFFFFFFFFFFF
Also, how do I calculate the approximate time it will take me to generate them?
I am open to the use of any programming language as long as I can save them to an output file.
Well... 16^16 = 1.8446744e+19, so lets say you could calculate 10 values per nanosecond (that's a 10GHz rate btw). Then it would take you 16^16 / 10 nanoseconds to compute them all, or 58.4 years. Also, if you could somehow compress each value into 1-bit (which is impossible), it would require 2 exabytes of memory to contain those values (16^16/8/2^60).
This seems like a very artificial exercise. Is it homework, or is there a reason for generating this list? It will be very long (see other answers)!
Having said that, you should ask yourself: why is this happening? The answer is that in Python 2.x, range produces an actual list. If you want to avoid that, you can:
Use Python 3.x, in which range does not actually make a list, but a special generator-like object.
Use xrange, which also doesn't actually make a list, but again produces an object.
As for timing, all of the time will be in writing to the file or screen. You can get an estimate by making a somewhat smaller list and then doing some math, but you have to be careful that it's big enough that the time is dominated by writing the lines, and not opening and closing the file.
But you should also ask yourself how big the resultant file will be... You may not like what you find. Perhaps you mean 2^16?
I've been working with matplotlib.pyplot to plot some data over date ranges, but have been running across some weird behavior, not too different from this question.
The primary difference between my issue and that one (aside from the suggested fix not working) is they refer to different locators (WeekdayLocator() in my case, AutoDateLocator() in theirs.) As some background, here's what I'm getting:
The expected and typical result, where my data is displayed with a reasonable date range:
And the very occasional result, where the data is given some ridiculous range of about 5 years (from what I can see):
I did some additional testing with a generic matplotlib.pyplot.plot and it seemed to be unrelated to using a subplot, or just creating the plot using the module directly.
plt.plot(some plot)
vs.
fig = plt.figure(...)
sub = fig.add_subplot(...)
sub.plot(some plot)
From what I could find, the odd behavior only happens when the data set only has one point (and therefore only having a single date to plot). The outrageous number of ticks is caused by the WeekdayLocator() which, for some reason, attempts to generate 1653 ticks for the x-axis date range (from about 2013 to 2018) based on this error output:
RuntimeError: RRuleLocator estimated to generate 1635 ticks from
2013-07-11 19:23:39+00:00 to 2018-01-02 00:11:39+00:00: exceeds Locator.MAXTICKS * 2 (20)
(This was from some experimenting with the WeekdayLocator().MAXTICKS member set to 10)
I then tried changing the Locator based on how many date points I had to plot:
# If all the entries in the plot dictionary have <= 1 data point to plot
if all(len(times[comp]) <= 1 for comp in times.keys()):
sub.xaxis.set_major_locator(md.DayLocator())
else:
sub.xaxis.set_major_locator(md.WeekdayLocator())
This worked for the edge cases where I'd have a line 2 points and a line with 1 (or just a point) and wanted the normal ticking since it didn't get messed up, but only sort of worked to fix my problem:
Now I don't have a silly amount of tick marks, but my date range is still 5 years! (Side Note: I also tried using an HourLocator(), but it attempted to generate almost 40,000 tick marks...)
So I guess my question is this: is there some way to rein in the date range explosion when only having one date to plot, or am I at the mercy of a strange bug with Matplotlib's date plotting methods?
What I would like to have is something similar to the first picture, where the date range goes from a little before the first date and a little after the last date. Even if Matplotlib were to fill up the axis range to about match the frequency of ticks in the first image, I would expect it to only span the course of a month or so, not five whole years.
Edit:
Forgot to mention that the range explosion also appears to occur regardless of which Locator I use. Plotting with zero points just results in a blank x-axis (due to no date range at all), a single point gives me the described huge date range, and multiple points/lines gives the expected date ranges.
I am trying to graph alarm counts in Python to give some sort of display to give an idea of the peak amount of network elements down between two timespans. The way that our alarms report handles it is in CSV like this:
Name,Alarm Start,Alarm Clear
NE1,15:42 08/09/11,15:56 08/09/11
NE2,15:42 08/09/11,15:57 08/09/11
NE3,15:42 08/09/11,16:31 08/09/11
NE4,15:42 08/09/11,15:59 08/09/11
I am trying to graph the start and end between those two points and how many NE's were down during that time, including the maximum number and when it went under or over a certain count. An example is below:
15:42 08/09/11 - 4 Down
15:56 08/09/11 - 3 Down
etc.
Any advice where to start on this would be great. Thanks in advance, you guys and gals have been a big help in the past.
I'd start by parsing your indata to a map indexed by dates with counts as values. Just increase the count for each row with the same date you encounter.
After that, use some plotting module, for instance matplotlib to plot the keys of the map versus the values. That should cover it!
Do you need any more detailed ideas?