Plot tuple and string using pyplot - python

I've got list of tuples with timestamps and usernames and I'm trying to make a plot of the frequency of the timestamps.
That means the x-axis will be from the earliest timestamp to the latest, y-axis will show the frequency within a time period. I'm also looking for way to mark the plots of username the timestamp is connected to (with different colors etc.).
I've been googling for hours, but can't find any examples that does what I'm looking for. Could anyone here point me in the right direction?
The timestamp is epoch time. Example:
lst= [('john', '1446302675'), ('elvis', '1446300605'),('peter','1446300622'), ...]
Thanks

You could create the histogram by simply counting the users per date, and then show it with a bar chart:
import numpy as np
import matplotlib.pyplot as plt
# some data
data=[('2013-05-15','test'),('2013-05-16','test'),('2013-05-14','user2'),('2013-05-14', 'user')]
# to create the histogram I use a dict()
hist = dict()
for x in data:
if x[0] in hist:
hist[x[0]] += 1
else:
hist[x[0]] = 1
# extract the labels and sort them
labels = [x for x in hist]
labels = sorted(labels)
# extract the values
values = [hist[x] for x in labels]
num = len(labels)
# now plot the values as bar chart
fig, ax = plt.subplots()
barwidth = 0.3
ax.bar(np.arange(num),values,barwidth)
ax.set_xticks(np.arange(num)+barwidth/2)
ax.set_xticklabels(labels)
plt.show()
This results in:
For more details on creating a bar chart see [an example]
(http://matplotlib.org/examples/api/barchart_demo.html).
Edit 1: with the data format you added you can use my example by converting the timestamp using this method:
>>> from datetime import datetime
>>> datetime.fromtimestamp(float('1446302675')).strftime('%Y-%m-%d %H:%M:%S')
'2015-10-31 15:44:35'
If you only want to use year-month-date, then you can use:
>>> datetime.fromtimestamp(float('1446302675')).strftime('%Y-%m-%d')
'2015-10-31'

Related

How to plot large dataset of date vs time using matplot lib

I want to plot date vs time graph using matplot lib. The issue I am facing is that due to access of data many lines are showing on the xaxis and I can't find a way to plot my time on xaxis cleanly with one hour gap. Say i have data in my list as string as ['6:01','6:30','7:20','7:25']. I want to divide my xaxis from 6:00 to 7:00 and the time points between them should be plotted based on time.
Note: time list is just and example I want to do this for whole 24 hour.
I tried to use ticks and many other options to complete my task but unfortunatly I am stuck at this problem. My data is in csv file.
Below is my code:
def arrivalGraph():
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
with open("Timetable2021.csv","r") as f:
fileData = f.readlines()
del fileData[0]
date = []
train1 = []
for data in fileData:
ind = data.split(",")
date.append(datetime.strptime(ind[0],"%d/%m/%Y").date())
train1Time = datetime.strptime(ind[1],"%H:%M").time()
train1.append(train1Time.strftime("%H:%M"))
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
plt.gca().xaxis.set_major_formatter(dateformater) # to format the xaxis
plt.xlabel("Date")
plt.ylabel("Time")
plt.title("Train Time vs Date Schedule")
plt.tight_layout()
plt.show()
When i run the code i get the following output:
output of above code
Assuming that every single minute that every single minute is present in train1 (i.e. train1 = ["00:00", "00:01", "00:02", "00:03", ... , "23:59"]), you can use plt.xticks() by generating an array representing xticks with empty string on every minute which is not 0.
unique_times = sorted(set(train1))
xticks = ['' if time[-2:]!='00' else time for time in unique_times]
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
# I think you wanted to format the yaxis instead of xaxis
plt.gca().yaxis.set_major_formatter(dateformater) # to format the yaxis
plt.ylabel("Date")
plt.xlabel("Time")
plt.title("Train Time vs Date Schedule")
plt.xticks(range(len(xticks)), xticks)
plt.tight_layout()
plt.show()
If every single minute is not in the train1 array, you have to keep train1 data as an object and generate arrays representing xticks location and values to be used as plt.xticks() parameters.
date = []
train1 = []
for data in fileData:
ind = data.split(",")
date.append(datetime.strptime(ind[0],"%d/%m/%Y").date())
train1Time = datetime.strptime(ind[1],"%H:%M")
train1.append(train1Time)
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
# I think you wanted to format the y axis instead of xaxis
plt.gca().yaxis.set_major_formatter(dateformater) # to format the yaxis
plt.ylabel("Date")
plt.xlabel("Time")
plt.title("Train Time vs Date Schedule")
ax = plt.gca()
xticks_val = []
xticks_loc = []
distance = (ax.get_xticks()[-1] - ax.get_xticks()[0]) / 24
def to_hour_str(x):
x = str(x)
if len(x) < 2:
x = '0' + x
return x + ':00'
for h in range(25):
xticks_val.append(to_hour_str(h))
xticks_loc.append(ax.get_xticks()[0] + h * distance)
plt.xticks(xticks_loc, xticks_val, rotation=90, ha='left')
plt.tight_layout()
plt.show()
Here's the code output using dummy data I generated myself.

Wanted to partially remove items on x axis while using matplotlib.pyplot

I am designing a currency converter app and I had an idea to add graphical currency analysis to it.
for this I've started using matplotlib.pyplot . I am taking from date(i.e. date from which graph compares data ) as input from user.And using this data , i am taking real time currency data from certain sources.
But here came the main issue.When i drew the graph the x - axis is really bad😫.
Ill insert the output i am getting--> graph and a rough code of mine.The main isuue i want to eliminate is that i want only certain parts of x-axis visible.
import matplotlib.pyplot as plt
import requests
x = []
y = []
for i in range(fyear,tyear):
for j in range(fmonth,tmonth):
for k in range(fday,tday):
response = requests.get("https://api.ratesapi.io/api/{}-{}-{}?base={}&symbols{}".format(i,j,k,inp_curr,out_curr))
data = response.json()
rate = data['rates'][out_curr]
y.append(rate)
x.append("{}/{}/{}".format(j,i,k))
plt.plot(x,y)
OBTAINED OUTPUT:
enter image description here
need answer quickly.....
If for parts you mean to set only few labels along x axis you could use xticks and locator_params. See docs here: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.xticks.html
import matplotlib.pyplot as plt
import numpy as np
import requests
# use some fake data for testing - use your params
fyear = 2019
tyear = 2020
fmonth = 1
tmonth = 13
fday=1
tday=28
inp_curr = "EUR"
out_curr = "GBP"
# init lists
x = []
y = []
for i in range(fyear,tyear):
for j in range(fmonth,tmonth):
for k in range(fday,tday):
response = requests.get("https://api.ratesapi.io/api/{}-{}-{}?base={}&symbols{}".format(i,j,k,inp_curr,out_curr))
data = response.json()
rate = data['rates'][out_curr]
y.append(rate)
x.append("{}/{}/{}".format(j,i,k))
# create subplot
fig, ax = plt.subplots(1,1, figsize=(20, 11))
# plot image
img = ax.plot(x, y)
# set the total number of x_ticks (the ticks on the x label)
ax.set_xticks(np.arange(len(x)))
# set the labels for each x_tick (actually is x list)
ax.set_xticklabels(x)
# set the number of ticks you want to visualize
# you can just select a number i.e. 10 and you will visualize onlu 10 ticks
# in order to visualize, say the first day of each month set this
n = round(len(x)/(tday-fday))
plt.locator_params(axis='x', nbins=n)
# change labels position to oblique
ax.get_figure().autofmt_xdate()
fig.tight_layout()
Remember to import numpy! Hope it helps you. Here you can see my output.

How to plot multiple graphs with Plotly, where each plot is for a different (next) day?

I want to plot machine observation data by days separately,
so changes between Current, Temperature etc. can be seen by hour.
Basically I want one plot for each day. Thing is when I make too many of these Jupyter Notebook can't display each one of them and plotly gives error.
f_day --> first day
n_day --> next day
I think of using sub_plots with a shared y-axis but then I don't know how I can put different dates in x-axis
How can I make these with graph objects and sub_plots ? So therefore using only 1 figure object so plots doesn't crash.
Data looks like this
,ID,IOT_ID,DATE,Voltage,Current,Temperature,Noise,Humidity,Vibration,Open,Close
0,9466,5d36edfe125b874a36c6a210,2020-08-06 09:02:00,228.893,4.17,39.9817,73.1167,33.3133,2.05,T,F
1,9467,5d36edfe125b874a36c6a210,2020-08-06 09:03:00,228.168,4.13167,40.0317,69.65,33.265,2.03333,T,F
2,9468,5d36edfe125b874a36c6a210,2020-08-06 09:04:00,228.535,4.13,40.11,71.7,33.1717,2.08333,T,F
3,9469,5d36edfe125b874a36c6a210,2020-08-06 09:05:00,228.597,4.14,40.1683,71.95,33.0417,2.0666700000000002,T,F
4,9470,5d36edfe125b874a36c6a210,2020-08-06 09:06:00,228.405,4.13333,40.2317,71.2167,32.9933,2.0,T,F
Code with display error is this
f_day = pd.Timestamp('2020-08-06 00:00:00')
for day in range(days_between.days):
n_day = f_day + pd.Timedelta('1 days')
fig_df = df[(df["DATE"] >= f_day) & (df["DATE"] <= n_day) & (df["IOT_ID"] == iot_id)]
fig_cn = px.scatter(
fig_df, x="DATE", y="Current", color="Noise", color_continuous_scale= "Sunset",
title= ("IoT " + iot_id + " " + str(f_day.date())),
range_color= (min_noise,max_noise)
)
f_day = n_day
fig_cn.show()
updated
The question was with respect to plotly not matplotlib. Same approach works. Clearly axis and titles need some beautification
import pandas as pd
import plotly.subplots
import plotly.express as px
import datetime as dt
import random
df = pd.DataFrame([{"DATE":d, "IOT_ID":random.randint(1,5), "Noise":random.uniform(0,1), "Current":random.uniform(15,25)}
for d in pd.date_range(dt.datetime(2020,9,1), dt.datetime(2020,9,4,23,59), freq="15min")])
# get days to plot
days = df["DATE"].dt.floor("D").unique()
# create axis for each day
fig = plotly.subplots.make_subplots(len(days))
iot_id=3
for i,d in enumerate(days):
# filter data and plot ....
mask = (df["DATE"].dt.floor("D")==d)&(df["IOT_ID"]==iot_id)
splt = px.scatter(df.loc[mask], x="DATE", y="Current", color="Noise", color_continuous_scale= "Sunset",
title= f"IoT ({iot_id}) Date:{pd.to_datetime(d).strftime('%d %b')}")
# select_traces() returns a generator so turn it into a list and take first one
fig.add_trace(list(splt.select_traces())[0], row=i+1, col=1)
fig.show()
It's simple - create the axis that you want to plot on first. Then plot. I've simulated your data as you didn't provide in your question.
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import random
df = pd.DataFrame([{"DATE":d, "IOT_ID":random.randint(1,5), "Noise":random.uniform(0,1), "Current":random.uniform(15,25)}
for d in pd.date_range(dt.datetime(2020,9,1), dt.datetime(2020,9,4,23,59), freq="15min")])
# get days to plot
days = df["DATE"].dt.floor("D").unique()
# create axis for each day
fig, ax = plt.subplots(len(days), figsize=[20,10],
sharey=True, sharex=False, gridspec_kw={"hspace":0.4})
iot_id=3
for i,d in enumerate(days):
# filter data and plot ....
df.loc[(df["DATE"].dt.floor("D")==d)&(df["IOT_ID"]==iot_id),].plot(kind="scatter", ax=ax[i], x="DATE", y="Current", c="Noise",
colormap= "turbo", title=f"IoT ({iot_id}) Date:{pd.to_datetime(d).strftime('%d %b')}")
ax[i].set_xlabel("") # it's in the titles...
output

Adjusting x-axis in matplotlib

I have a range of values for every hour of year. Which means there are 24 x 365 = 8760 values. I want to plot this information neatly with matplotlib, with x-axis showing January, February......
Here is my current code:
from matplotlib import pyplot as plt
plt.plot(x_data,y_data,label=str("Plot"))
plt.xticks(rotation=45)
plt.xlabel("Time")
plt.ylabel("Y axis values")
plt.title("Y axis values vs Time")
plt.legend(loc='upper right')
axes = plt.gca()
axes.set_ylim([0,some_value * 3])
plt.show()
x_data is a list containing dates in datetime format. y_data contains values corresponding to the values in x_data. How can I get the plot neatly done with months on the X axis? An example:
You could create a scatter plot with horizontal lines as markers. The month is extracted by using the datetime module. In case the dates are not ordered, the plot sorts both lists first according to the date:
#creating a toy dataset for one year, random data points within month-specific limits
from datetime import date, timedelta
import random
x_data = [date(2017, 1, 1) + timedelta(days = i) for i in range(365)]
random.shuffle(x_data)
y_data = [random.randint(50 * (i.month - 1), 50 * i.month) for i in x_data]
#the actual plot starts here
from matplotlib import pyplot as plt
#get a scatter plot with horizontal markers for each data point
#in case the dates are not ordered, sort first the dates and the y values accordingly
plt.scatter([day.strftime("%b") for day in sorted(x_data)], [y for _xsorted, y in sorted(zip(x_data, y_data))], marker = "_", s = 900)
plt.show()
Output
The disadvantage is obviously that the lines have a fixed length. Also, if a month doesn't have a data point, it will not appear in the graph.
Edit 1:
You could also use Axes.hlines, as seen here.
This has the advantage, that the line length changes with the window size. And you don't have to pre-sort the lists, because each start and end point is calculated separately.
The toy dataset is created as above.
from matplotlib import pyplot as plt
#prepare the axis with categories Jan to Dec
x_ax = [date(2017, 1, 1) + timedelta(days = 31 * i) for i in range(12)]
#create invisible bar chart to retrieve start and end points from automatically generated bars
Bars = plt.bar([month.strftime("%b") for month in x_ax], [month.month for month in x_ax], align = "center", alpha = 0)
start_1_12 = [plt.getp(item, "x") for item in Bars]
end_1_12 = [plt.getp(item, "x") + plt.getp(item, "width") for item in Bars]
#retrieve start and end point for each data point line according to its month
x_start = [start_1_12[day.month - 1] for day in x_data]
x_end = [end_1_12[day.month - 1] for day in x_data]
#plot hlines for all data points
plt.hlines(y_data, x_start, x_end, colors = "blue")
plt.show()
Output
Edit 2:
Now your description of the problem is totally different from what you show in your question. You want a simple line plot with specific axis formatting. This can be found easily in the matplotlib documentation and all over SO. An example, how to achieve this with the above created toy dataset would be:
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MonthLocator
ax = plt.subplot(111)
ax.plot([day for day in sorted(x_data)], [y for _xsorted, y in sorted(zip(x_data, y_data))], "r.-")
ax.xaxis.set_major_locator(MonthLocator(bymonthday=15))
ax.xaxis.set_minor_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter("%B"))
plt.show()
Output

Formatting datetime for plotting time-series with pandas

I am plotting some time-series with pandas dataframe and I ran into the problem of gaps on weekends. What can I do to remove gaps in the time-series plot?
date_concat = pd.to_datetime(pd.Series(df.index),infer_datetime_format=True)
pca_factors.index = date_concat
pca_colnames = ['Outright', 'Curve', 'Convexity']
pca_factors.columns = pca_colnames
fig,axes = plt.subplots(2)
pca_factors.Curve.plot(ax=axes[0]); axes[0].set_title('Curve')
pca_factors.Convexity.plot(ax=axes[1]); axes[1].set_title('Convexity'); plt.axhline(linewidth=2, color = 'g')
fig.tight_layout()
fig.savefig('convexity.png')
Partial plot below:
Ideally, I would like the time-series to only show the weekdays and ignore weekends.
To make MaxU's suggestion more explicit:
convert to datetime as you have done, but drop the weekends
reset the index and plot the data via this default Int64Index
change the x tick labels
Code:
date_concat = data_concat[date_concat.weekday < 5] # drop weekends
pca_factors = pca_factors.reset_index() # from MaxU's comment
pca_factors['Convexity'].plot() # ^^^
plt.xticks(pca_factors.index, date_concat) # change the x tick labels

Categories