I am trying to visualize measured data using Pyplot.
The data is stored in a dataframe from Pandas like this:
Dataframe
So I want to plot the outputs over the respective date and time.
Just plotting it works great, however there is an issue that I cannot fix.
Sometimes the measurements are stopped and resumed at later time. That means that during some measurements there is a gap in the date (e.g. 1 or 2 days skipped).
When I now plot the above dataframe, there is a large region that is skipped due to no data being present.
Graph with gap
Is there any way to change that, so the gap is closed and all the data is presented in a better way?
Related
I'm working on developing a charting package that is able to take in tick data and, given a predefined time interval (timeframe), plot price movements on a candlestick chart (open, high, low, and close) in simulated real-time, allowing one to change the speed and direction of the simulation, as well as pause it. Here is an example of the desired behavior (updating the data in real-time) represented graphically:
Real-time price data plotting using a candlestick chart.
The UI aspect of the plotting is not a concern, as the graphical aspect of the project has already been created (I am using a function that takes in all the aggregated OHLC data and plots it according to timeframe). I am, however, having difficulty deciding on how to achieve the above described behavior in a performance-friendly manner (e.g. not re-creating the entire OHLC dataframe from the tick data every time a new tick is read into the method) as the dataset I am working with is quite large.
I've thought of some preliminary solutions, however all them have their flaws and I am at a loss for how to tackle this problem. The following are some ideas I've had to tackle this problem:
Deriving OHLC data directly from the tick data with each new tick.
The idea here is that we'll have a sliding timestamp that will allow us to read in all the tick data up to the "current" time in the simulation, and then aggregate it into OHLC format using a dataframe.
Although this achieves our desired behavior, this requires us to recreate the dataframe containing the OHLC data every tick, thus the overhead for a large set of data would be unacceptable in this case.
Pre-processing the ticks into OHLC and reading directly from the dataframe.
Similar to the above, using a sliding timestamp to read data up to the "current" time in the simulation. The difference here, is that we'd be reading data from the OHLC dataframe directly after pre-processing all the tick data into it.
This allows us to easily go through the data, as it would exist in a format that the method which plots it would understand (that being OHLC), but it doesn't achieve our required behavior of simulating movements that happen during a candle's formation.
Performing the first solution every n milliseconds.
Although this would reduce the overhead, it would still recreate the dataframe every n milliseconds, which would still make it unfeasible for large datasets.
Only modifying the current candle.
The idea here is that all the historical candles will be stored in a dataframe, while the current candle is updated with every new tick until it closes (and so on with each successive candle).
While this achieves the desired simulation behavior in a more performance-friendly manner, I'm not sure how we could get rewinding to work in this context, as moving backward on the historical data wouldn't be possible since it would be in OHLC format.
I am also unsure of how we could change the timeframe using this method as it would need to determine how much time to look back and gather ticks to create a new "current" candle.
I'm working with Python, however I think this is more of a conceptual problem than a language-specific one.
After simply concatenating two csv files, I have created a single Dataframe so that by using matplotlib library I can plot the data. but the problem is that the chart which shows up is complete mess.
The chart looks like this
on the left (vertical) are the prices whereas on x axis it is actually time period which starts from 9:15 A.M to 3:30 P.M (IST)
what I actually want is as below
please play close attention to x axis where time period is mentioned
observe the first data point on x axis which is 9.15.25 and the other data point which is 9.25.04 along the same axis.
How can I create a chart which would look very similar to the other chart which I have created in excel
I am trying to plot a large set of values against time. My dataset spans over 46 days and includes data for every second of the day. Since the plots are incomprehensible when plotted directly, I tried to group them. the groupby function in pandas works fine as long as one needs to find some aggregates or summary statistics. I tried the following command, but it just gives a blop on the plot and does not do what I want it to.
df1 = df.groupby(pd.Grouper(key='time', freq='7D'))['values']
Is there a way to group the data according to a column and then add it in a new column?
I also tried plots after making time the index, but that also does not help.
I am working on data visualization problem, where I am plotting daily active users on about 15k pages against time/dates in python. On some days, I have peaks on specific page, but those peaks are artificially created and affect cumulative results. I want to show overall trends, either by suppressing peaks or adjusting the data in some other way.
I am plotting using Pandas with Python, in jupyter notebook.
Question: Is there any efficient way to solve this problem?
Sample Graph is attached, Red line is original graph, where blue line is my attempt to suppress peaks. On x-axis, date is mentioned, on y-axis, sum of daily traffic is ploted
currently, I am working in a project and would like to plot data from a logger in a daily basis. The format of the written output is a .csv file and contains in a column the Date/Time stamp
ex: 2018-10-15 10:00. In the other columns, there is just data in float format. I get the written stamp automatically in 10mins interval from 00:00 until 23:50.
I am looking to analyze the data and group it by days*using groupby() and further on compute mean and deviations of the day. I want to plot the mean and std_deviation data for several years as scatter or line graph. The major ticks are years or months with days as minor ticks.
On a daily basis I want to compare the variation of the mean within a certain month and plot against the entire time interval with hours as major ticks and every 10mins intervals as minor ticks. I want to be able to put this in a for loop if possible.
To be honest I've tried a lot of different possibilities but I can't achieve everything with only one. If I could, I would try not to use set_index() to be the Date/Time column, so it is easier to apply the group. I am using the Pandas module to achieve my whole analysis for convenience.
I would be really happy for any guidance.
Thanks you very much!!!!!
Just a couple of pointers:
When reading the csv with pd.read_csv (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) you can specify which columns contain date/times:
df = pd.read_csv('myfile.csv', parse_dates=['date'])
Then you can use .dt to access date/time specific features, see: https://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties
So you can add a column with only day numbers, like:
df['day'] = df['date'].dt.dayofyear
Then you can group by this new column..