I'm having trouble in arranging correctly the dates labels on my X axis in my plot. I put the graph image so you can better understand my problem.
And here there is the code that creates the graph:
plt.figure()
x_axis = [] # contains the date value
y_axis = [] # an integer value
x_values_str = [] # contains the date representation
for data in json.loads(trend.trend_values):
date_obj = datetime.datetime.strptime(str(data[3]), '%m-%d-%Y %H:%M')
x_axis.append(dates.date2num(date_obj))
y_axis.append(str(data[2]))
x_values_str.append(str(data[3]))
plt.xticks(x_axis, x_values_str, rotation=45)
plt.plot_date(x_axis, y_axis, tz=None, xdate=True, ydate=False, linestyle='-', marker='D',color='g')
plt.title("Vowel: " + trend.vowel)
plt.margins(0.1)
plt.subplots_adjust(bottom=0.1)
So, the problem here is that the X values are not equally-distanced among each other. How can I adjust the interval ?
Thanks
Your x_axis should be a integer range, instead of a list of dates, then you will get all data points with a distance of 1, but labelled with your dates from x_values_str.
x_axis.append(range(1,len(data[3])+1))
Related
I want to plot date vs time graph using matplot lib. The issue I am facing is that due to access of data many lines are showing on the xaxis and I can't find a way to plot my time on xaxis cleanly with one hour gap. Say i have data in my list as string as ['6:01','6:30','7:20','7:25']. I want to divide my xaxis from 6:00 to 7:00 and the time points between them should be plotted based on time.
Note: time list is just and example I want to do this for whole 24 hour.
I tried to use ticks and many other options to complete my task but unfortunatly I am stuck at this problem. My data is in csv file.
Below is my code:
def arrivalGraph():
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
with open("Timetable2021.csv","r") as f:
fileData = f.readlines()
del fileData[0]
date = []
train1 = []
for data in fileData:
ind = data.split(",")
date.append(datetime.strptime(ind[0],"%d/%m/%Y").date())
train1Time = datetime.strptime(ind[1],"%H:%M").time()
train1.append(train1Time.strftime("%H:%M"))
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
plt.gca().xaxis.set_major_formatter(dateformater) # to format the xaxis
plt.xlabel("Date")
plt.ylabel("Time")
plt.title("Train Time vs Date Schedule")
plt.tight_layout()
plt.show()
When i run the code i get the following output:
output of above code
Assuming that every single minute that every single minute is present in train1 (i.e. train1 = ["00:00", "00:01", "00:02", "00:03", ... , "23:59"]), you can use plt.xticks() by generating an array representing xticks with empty string on every minute which is not 0.
unique_times = sorted(set(train1))
xticks = ['' if time[-2:]!='00' else time for time in unique_times]
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
# I think you wanted to format the yaxis instead of xaxis
plt.gca().yaxis.set_major_formatter(dateformater) # to format the yaxis
plt.ylabel("Date")
plt.xlabel("Time")
plt.title("Train Time vs Date Schedule")
plt.xticks(range(len(xticks)), xticks)
plt.tight_layout()
plt.show()
If every single minute is not in the train1 array, you have to keep train1 data as an object and generate arrays representing xticks location and values to be used as plt.xticks() parameters.
date = []
train1 = []
for data in fileData:
ind = data.split(",")
date.append(datetime.strptime(ind[0],"%d/%m/%Y").date())
train1Time = datetime.strptime(ind[1],"%H:%M")
train1.append(train1Time)
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
# I think you wanted to format the y axis instead of xaxis
plt.gca().yaxis.set_major_formatter(dateformater) # to format the yaxis
plt.ylabel("Date")
plt.xlabel("Time")
plt.title("Train Time vs Date Schedule")
ax = plt.gca()
xticks_val = []
xticks_loc = []
distance = (ax.get_xticks()[-1] - ax.get_xticks()[0]) / 24
def to_hour_str(x):
x = str(x)
if len(x) < 2:
x = '0' + x
return x + ':00'
for h in range(25):
xticks_val.append(to_hour_str(h))
xticks_loc.append(ax.get_xticks()[0] + h * distance)
plt.xticks(xticks_loc, xticks_val, rotation=90, ha='left')
plt.tight_layout()
plt.show()
Here's the code output using dummy data I generated myself.
I have parsed out data form .json than plotted them but I only wants a certain range from it
e.g. year-mounth= 2014-12to 2020-03
THE CODE IS
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_json("observed-solar-cycle-indices.json", orient='records')
data = pd.DataFrame(data)
print(data)
x = data['time-tag']
y = data['ssn']
plt.plot(x, y, 'o')
plt.xlabel('Year-day'), plt.ylabel('SSN')
plt.show()
Here is the result, as you can see it is too many
here is the json file: https://services.swpc.noaa.gov/json/solar-cycle/observed-solar-cycle-indices.json
How to either parse out certain value from the JSON file or plot a certain range?
The following should work:
Select the data using a start and end date
ndata = data[ (data['time-tag'] > '2014-01') & (data['time-tag'] < '2020-12')]
Plot the data. The x-axis labeling is adapted to display only every 12th label
x = ndata['time-tag']
y = ndata['ssn']
fig, ax = plt.subplots()
plt.plot(x, y, 'o')
every_nth = 12
for n, label in enumerate(ax.xaxis.get_ticklabels()):
if n % every_nth != 0:
label.set_visible(False)
plt.xlabel('Year-Month')
plt.xticks(rotation='vertical')
plt.ylabel('SSN')
plt.show()
You could do a search for the index value of your start and end dates for both x and y values. Use this to create a smaller set of lists that you can plot.
For example, it might be something like
x = data['time-tag']
y = data['ssn']
start_index = x.index('2014-314')
end_index = x.index('2020-083')
x_subsection = x[start_index : end_index]
y_subsection = y[start_index : end_index]
plt.plot(x_subsection, y_subsection, 'o')
plt.xlabel('Year-day'), plt.ylabel('SSN')
plt.show()
You may need to convert the dataframe into an array with np.array().
I have a range of values for every hour of year. Which means there are 24 x 365 = 8760 values. I want to plot this information neatly with matplotlib, with x-axis showing January, February......
Here is my current code:
from matplotlib import pyplot as plt
plt.plot(x_data,y_data,label=str("Plot"))
plt.xticks(rotation=45)
plt.xlabel("Time")
plt.ylabel("Y axis values")
plt.title("Y axis values vs Time")
plt.legend(loc='upper right')
axes = plt.gca()
axes.set_ylim([0,some_value * 3])
plt.show()
x_data is a list containing dates in datetime format. y_data contains values corresponding to the values in x_data. How can I get the plot neatly done with months on the X axis? An example:
You could create a scatter plot with horizontal lines as markers. The month is extracted by using the datetime module. In case the dates are not ordered, the plot sorts both lists first according to the date:
#creating a toy dataset for one year, random data points within month-specific limits
from datetime import date, timedelta
import random
x_data = [date(2017, 1, 1) + timedelta(days = i) for i in range(365)]
random.shuffle(x_data)
y_data = [random.randint(50 * (i.month - 1), 50 * i.month) for i in x_data]
#the actual plot starts here
from matplotlib import pyplot as plt
#get a scatter plot with horizontal markers for each data point
#in case the dates are not ordered, sort first the dates and the y values accordingly
plt.scatter([day.strftime("%b") for day in sorted(x_data)], [y for _xsorted, y in sorted(zip(x_data, y_data))], marker = "_", s = 900)
plt.show()
Output
The disadvantage is obviously that the lines have a fixed length. Also, if a month doesn't have a data point, it will not appear in the graph.
Edit 1:
You could also use Axes.hlines, as seen here.
This has the advantage, that the line length changes with the window size. And you don't have to pre-sort the lists, because each start and end point is calculated separately.
The toy dataset is created as above.
from matplotlib import pyplot as plt
#prepare the axis with categories Jan to Dec
x_ax = [date(2017, 1, 1) + timedelta(days = 31 * i) for i in range(12)]
#create invisible bar chart to retrieve start and end points from automatically generated bars
Bars = plt.bar([month.strftime("%b") for month in x_ax], [month.month for month in x_ax], align = "center", alpha = 0)
start_1_12 = [plt.getp(item, "x") for item in Bars]
end_1_12 = [plt.getp(item, "x") + plt.getp(item, "width") for item in Bars]
#retrieve start and end point for each data point line according to its month
x_start = [start_1_12[day.month - 1] for day in x_data]
x_end = [end_1_12[day.month - 1] for day in x_data]
#plot hlines for all data points
plt.hlines(y_data, x_start, x_end, colors = "blue")
plt.show()
Output
Edit 2:
Now your description of the problem is totally different from what you show in your question. You want a simple line plot with specific axis formatting. This can be found easily in the matplotlib documentation and all over SO. An example, how to achieve this with the above created toy dataset would be:
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MonthLocator
ax = plt.subplot(111)
ax.plot([day for day in sorted(x_data)], [y for _xsorted, y in sorted(zip(x_data, y_data))], "r.-")
ax.xaxis.set_major_locator(MonthLocator(bymonthday=15))
ax.xaxis.set_minor_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter("%B"))
plt.show()
Output
I am trying to use matplotib/pandas to plot a variable (y-axis) as a function of date (x-axis). The simple code snippet below works great for one trace/set of data:
import datetime as dt
fig = plt.figure()
dates = grouping_data_for_profit_trace["date"]
x = [dt.datetime.strptime(d,'%Y-%m-%d').date() for d in dates]
y = grouping_data_for_profit_trace["variable"]
plt.plot(x,y)
plt.xticks(rotation=30)
plt.show()
However, I want to plot several datasets on the same plot. If I put the code above into a loop and simply recalculate and plot x and y for each loop iteration/dataset i.e.
loop...
#fig = plt.figure() - now commented out
x = [dt.datetime.strptime(d,'%Y-%m-%d').date() for d in dates]
y = grouping_data_for_profit_trace["variable"]
plt.plot(x,y)
plt.xticks(rotation=30)
I start to see strange errors I don't understand e.g.
---> 53 plt.xticks(rotation=30)
.....
ValueError: ordinal must be >= 1
Can anyone comment on why this is happening or offer a suggestion as to how 2 sets of data can be plotted against x axis dates on the same plot?
I am trying to make a 3D scatter plot that has the following axis: Price, Time and Dates. Essentially I want to plot the price vs the time for a single day and then do that for multiple days and stack them on an axis.
Here is the code I have so far:
df=pd.read_csv("C:\\Desktop\\dat.csv")
date= df['date'].unique()
dfd = pd.DataFrame(date)
days= dfd.sort_values(0)
fig= plt.figure()
ax= fig.add_subplot(111, projection= '3d')
xx = np.arange(len(days[0]))
ys = [i+xx+(i*xx)**2 for i in range(len(days[0]))]
colors = cm.rainbow(np.linspace(0, 1, len(ys)))
for d,cc in zip(days[0],colors):
td= df.loc[df['date'] == d]
st= td[['time','price']]
x = st['time']
y = st['price']
x=list(x)
y=list(y)
ax.scatter(x,y,d,color=cc)
plt.show()
The main problem is that x is a list with times and that d is a date. when I run the code matplotlib throws an error saying
ValueError: invalid literal for float(): 9:30:01
The dates are in the following format : 2017.04.12, time format : 9:30:01
I am aware that I can set up tick labels for the date, but how can I handle the time?
How can I plot this 3d plot if the axis indexes contain dates and times?