plotting a scatter plot for list/array in matplotlib - python

I have wasted so much time looking through the examples given on internet. But I just cant figure out why can't I plot a simple graph of a list of datetime.date objects against a list of integers
appleProd1 appleProd2 appleProd3 appleProd4 ..... 70 Products
apple.com 2010-09-12 2008-01-01 2009-03-02 .......................
I wanted to plot a scatter plot for the launch dates with x axis as dates and y axis as the products. And I do this
plt.subplot(1, 1, 1)
product_list = []
label_ticks = []
date_list = []
yval = 1
for prod, date in df.iteritems(): #date is a datetime.date object
date = pd.to_datetime(date)
product = prod
date_list.append(date)
prod_list.append(prod)
label_ticks.append(yval)
yval+=1
plt.plot_date(date_list, label_ticks)
The last line plt.plot gives me an error saying TypeError: float() argument must be a string or a number . I have also tried converting both the lists to numpy array and use same plt.scatter. Same error. Length of both lists is same. Referred to this site also
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter
Some dates are Nan also. So converting using date2num is giving error there.

I figured out what the problem was. There were indeed some string values in the date_list so the plt.plot complained. The plt.plot works just fine with a list of datetime.date values. There is no need to convert the datetime.date to any other format.

Related

How do I access the integers given by nunique in Pandas?

I am trying to access the items in each column that is outputted given the following code. It outputs two columns, 'Accurate_Episode_Date' values, and the count (the frequency of each Date). My goal is to plot the date on the x axis, and the count on the y axis using a scatterplot, but first I need to be able to access the actual count values.
data = pd.read_csv('CovidDataset.csv')
Barrie = data.loc[data['Reporting_PHU_City'] == 'Barrie']
dates_barrie = Barrie[['Accurate_Episode_Date']]
num = data.groupby('Accurate_Episode_Date')['_id'].nunique()
print(num.tail(5))
The code above outputs the following:
2021-01-10T00:00:00 1326
2021-01-11T00:00:00 1875
2021-01-12T00:00:00 1274
2021-01-13T00:00:00 492
2021-01-14T00:00:00 8
Again, I want to plot the dates on the x axis, and the counts on the y axis in scatterplot form. How do I access the count and date values?
EDIT: I just want a way to plot dates like 2021-01-10T00:00:00 and so on on the x axis, and the corresponding count: 1326 on the Y-axis.
Turns out this was mainly a data type issue. Basically all that was needed was accessing the datetime index and typecasting it to string with num.index.astype(str).
You could probably change it "in-place" and use the plot like below.
num.index = num.index.astype(str)
num.plot()
If you only want to access the values of a DataFrame or Series you just need to access them like this: num.values
If you want to plot the date column on X, you don't need to access that column separately, just use pandas internals:
# some dummy dates + counts
dates = [datetime.now() + timedelta(hours=i) for i in range(1, 6)]
values = np.random.randint(1, 10, 5)
df = pd.DataFrame({
"Date": dates,
"Values": values,
})
# if you only have 1 other column you can skip `y`
df.plot(x="Date", y="Values")
you need to convert date column using pd.to_datetime(df['dates']) then you can plot
updated answer:
here no need to convert to pd.to_datetime(df['dates'])
ax=df[['count']].plot()
ax.set_xticks(df.count.index)
ax.set_xticklabels(df.date)

python plotting from multi-index dataframe

I am having trouble plotting the multi-index datafram below,
since I cant use the m04hour.index range value for some reason.
output from m04hour.head()
However, this plot command works fine:
m04hour['consumption (kWh)'].plot(figsize=(12,2))
But this one doesnt:
fig,ax = plt.subplots(figsize=(8,4))
ax.plot(m04hour.index, m04hour['consumption(kWh)'],c='red',lw=1,label='queens')
Because the "m04hour.index" is returning the error:
ValueError: setting an array element with a sequence.
So the question is how to refer to the m04hour.index value for setting the plotting x-axis?
You index in this m04hour is not pd.MultiIndex. It is a index with tuples.
First let's convert that list of tuples into a pd.MultiIndex.
df.index = pd.MultiIndex.from_tuples(df.index)
fig,ax = plt.subplots(figsize=(8,4))
ax.plot(m04hour.index.get_level_values(1), m04hour['consumption(kWh)'],c='red',lw=1,label='queens')
Output:

Seaborn stripplot of datetime objects not working

Following the first example from URL:
http://seaborn.pydata.org/tutorial/categorical.html
I am able to load the dataset called 'tips' and reproduce the stripplot showed. However this plot is not shown when applied to my pandas dataframe (called df) consisting of datetime objects. My df consists of 19300 rows and 7 columns, of which 2 columns are in the form of datetime objects (dates and times respectively). I would like to use the Python Seaborn package's stripplot function to visualize these two df columns together. My code reads as follows:
sns.stripplot(x=df['DATE'], y=df['TIME'], data=df);
And the output error reads as follows:
TypeError: float() argument must be a string or a number
I have made sure to remove the header from the data columns before applying the plotting command.
Other failed attempts include (but not limited to)
sns.stripplot(x=df['DATE'], y=df['TIME']);
It is my guess that this error might be due to the datetype object nature of the column data types, and that this type must somehow be changed into either strings or integer values. Is this correct? And how might one proceed to accomplish this task?
To illustrate the df data, here is a working code which uses matplotlib.pyplot (as plt)
ax1.plot(x, y, 'o', label='Events')
Any help is much appreciated.
One can also try to convert dates/times into seconds to plot them as numeric values:
dates = df.DATE
times = df.TIME
start_date = dates.min()
dates_as_seconds = dates.map(lambda d: (d - start_date).total_seconds())
times_as_seconds = times.map(lambda t: t.second + t.minute*60 + t.hour*3600)
ax = sns.stripplot(x=dates_as_seconds, y=times_as_seconds)
ax.set_xticklabels(dates)
ax.set_yticklabels(times)
Of course, data frame should be sorted by dates and times to match ticks and values.
After applying the following code to previous script:
x = df['DATE']
data = df['TIME']
y = data[1:len(x)]
x = x[1:len(x)]
s = []
for time in y:
a = int(str(time).replace(':',''))
s.append(a)
k = []
for date in x:
a = str(date)
k.append(a)
x = k
y = s
stripplot worked:
sns.stripplot(x, y)
You just need to put the variables name as input of x and y; not the data themselves. For example :
sns.stripplot(x="value", y="measurement", hue="species",
data=iris, dodge=True, alpha=.25, zorder=1)
https://seaborn.pydata.org/examples/jitter_stripplot.html

Plot list of lists (which different length) in a plot in python

I have a list of lists, each of these internal lists has a different length, and I would like to show them in a graph.
They would look like this:
data = [[4,3,4],[2,3],[5,6,4,5]]
for each of these, I would like to plot them against there index (x-axis), so for instance, for the first list: (0,4),(1,3),(2,4)
If my lists would have been the same length, I would have converted them to a numpy array and just plotted them:
data_np = np.vstack(data)
plot_data_np = np.transpose(data_np)
plt.plot(plot_data_np)
However, there is this length issue... In a hopeful attempt I tried:
plt.plot(data)
But alas.
What about just doing
data = [[4,3,4],[2,3],[5,6,4,5]]
for d in data:
plt.plot(d)
?

Plot dates vs values in Python

I have a very standard dataset with 2 columns, 1 for dates and 1 for values. I've put them into two arrays:
dates
['1/1/2014', '1/2/2014', '1/3/2014', ...]
values
[1423, 4321, 1234, ...]
How can I create a simple line graph with "values" on the y-axis and "dates" on the x-axis?
What I've tried:
I can do a "Hello world" line plot with only "values" in just 1 line (awesome):
import numpy as np
import matplotlib.pyplot as plt
plt.plot(values)
Next, let's add "dates" as the x-axis. At this point I'm stuck. How can I transform my "dates" array, which is strings, into an array that is plottable?
Looking at examples, I believe we are supposed to cast the strings into Python Date objects. Let's import those libraries:
import datetime
import matplotlib.dates
Ok so now I can transform a string into a date time.strptime(dates[1], '%m/%d/%Y'). How can I transform the entire array? I could write a loop, but assuming there is a better way.
I'm not 100% sure I'm even on the right path to making something usable for "Dates" vs "values". If you know the code to make this graph (I'm assuming it's very basic once you know Python + the libraries), please let me know.
Ok, knew it was a 1 liner. Here is how to do it:
So you start with your value and dates (as strings) array:
dates = ['1/1/2014', '1/2/2014', '1/3/2014', ...]
values = [1423, 4321, 1234, ...]
Then turn your dates which are strings into date objects:
date_objects = [datetime.strptime(date, '%m/%d/%Y').date() for date in dates]
Then just plot.
plt.plot(date_objects, values)
Remember you can use xticks and use whatever the hell you want in place of the values. So, you can do a
xVals = range(len(values))
plot(xVals, values)
xticks(xVals, dates)
This is ok if your dates are linear like you have. Otherwise, you can always get a number which quantifies the dates (number of days from the first day (see the datetime module) and use that for xVals.
so here's my code:
date = []
for time_val in value:
date.append(datetime(*time.strptime(time_val, "%m/%d/%Y")[:3]))
fig, ax = plt.subplots()
ax.plot_date(date, value_list)
date_format = mdates.DateFormatter("%m/%d/%Y")
ax.xaxis.set_major_formatter(date_format)
ax.autoscale_view()
title_name = 'plot based on time'
ax.set_title(title_name)
y_label = 'values'
ax.set_ylabel(y_label)
ax.grid(True)
fig.autofmt_xdate()
plt.show()

Categories