I've got two dataframes (both indexed on time), and I'd like plot columns from both dataframes together on the same plot, with legend as if there were two columns in the same dataframe.
If I turn on legend with one column, it works fine, but if I try to do both, the 2nd one overwrites the first one.
import pandas as pd
# Use ERDDAP's built-in relative time functionality to get last 48 hours:
start='now-7days'
stop='now'
# URL for wind data
url='http://www.neracoos.org/erddap/tabledap/E01_met_all.csv?\
station,time,air_temperature,barometric_pressure,wind_gust,wind_speed,\
wind_direction,visibility\
&time>=%s&time<=%s' % (start,stop)
# load CSV data into Pandas
df_met = pd.read_csv(url,index_col='time',parse_dates=True,skiprows=[1]) # skip the units row
# URL for wave data
url='http://www.neracoos.org/erddap/tabledap/E01_accelerometer_all.csv?\
station,time,mooring_site_desc,significant_wave_height,dominant_wave_period&\
time>=%s&time<=%s' % (start,stop)
# Load the CSV data into Pandas
df_wave = pd.read_csv(url,index_col='time',parse_dates=True,skiprows=[1]) # skip the units row
plotting one works fine:
df_met['wind_speed'].plot(figsize=(12,4),legend=True);
but if I try to plot both, the first legend disappears:
df_met['wind_speed'].plot(figsize=(12,4),legend=True)
df_wave['significant_wave_height'].plot(secondary_y=True,legend=True);
Okay, thanks to the comment by unutbu pointing me to essentially the same question (which I searched for but didn't find), I just need to modify my plot command to:
df_met['wind_speed'].plot(figsize=(12,4))
df_wave['significant_wave_height'].plot(secondary_y=True);
ax = gca();
lines = ax.left_ax.get_lines() + ax.right_ax.get_lines()
ax.legend(lines, [l.get_label() for l in lines])
and now I get this, which is what I was looking for:
Well. Almost. It would be nice to get the (right) and (left) on the legend to make it clear which scale was for which line. #unutbu to the rescue again:
df_met['wind_speed'].plot(figsize=(12,4))
df_wave['significant_wave_height'].plot(secondary_y=True);
ax = gca();
lines = ax.left_ax.get_lines() + ax.right_ax.get_lines()
ax.legend(lines, ['{} ({})'.format(l.get_label(), side) for l, side in zip(lines, ('left', 'right'))]);
produces:
Related
I have a dataframe, where I would like to make a time series plot with three different lines that each show the daily occurrences (the number of rows per day) for each of the values in another column.
To give an example, for the following dataframe, I would like to see the development for how many a's, b's and c's there have been each day.
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-14','2019-10-09','2019-10-10','2019-10-08','2019-10-14','2019-10-10','2019-10-08','2019-10-08','2019-10-13','2019-10-08','2019-10-12','2019-10-11','2019-10-09','2019-10-08']),
'letter':['a','b','c','a','b','b','b','b','c','b','b','a','b','a','c']})
When I try the command below (my best guess so far), however, it does not filter for the different dates (I would like three lines representing each of the letters.
Any ideas on how to solve this?
df.groupby(['date']).count().plot()['letter']
I have also tried a solution in Matplotlib, though this one gives an error..
fig, ax = plt.subplots()
ax.plot(df['date'], df['letter'].count())
Based on your question, I believe you are looking for a line plot which has dates in X-axis and the counts of letters in the Y-axis. To achieve this, these are the steps you will need to do...
Group the dataframe by date and then letter - get the number of entries/rows for each which you can do using size()
Flatten the grouped dataframe using reset_index(), rename the new column to Counts and sort by letter column (so that the legend shows the data in the alphabetical format)... these are more to do with keeping the new dataframe and graph clean and presentable. I would suggest you do each step separately and print, so that you know what is happening in each step
Plot each line plot separately using filtering the dataframe by each specific letter
Show legend and rotate date so that it comes out with better visibility
The code is shown below....
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-14','2019-10-09','2019-10-10','2019-10-08','2019-10-14','2019-10-10','2019-10-08','2019-10-08','2019-10-13','2019-10-08','2019-10-12','2019-10-11','2019-10-09','2019-10-08']),
'letter':['a','b','c','a','b','b','b','b','c','b','b','a','b','a','c']})
df_grouped = df.groupby(by=['date', 'letter']).size().reset_index() ## New DF for grouped data
df_grouped.rename(columns = {0 : 'Counts'}, inplace = True)
df_grouped.sort_values(['letter'], inplace=True)
colors = ['r', 'g', 'b'] ## New list for each color, change as per your preference
for i, ltr in enumerate(df_grouped.letter.unique()):
plt.plot(df_grouped[df_grouped.letter == ltr].date, df_grouped[df_grouped.letter == ltr].Counts, '-o', label=ltr, c=colors[i])
plt.gcf().autofmt_xdate() ## Rotate X-axis so you can see dates clearly without overlap
plt.legend() ## Show legend
Output graph
country = str(input())
import matplotlib.pyplot as plt
lines = f.readlines ()
x = []
y = []
results = []
for line in lines:
words = line.split(',')
f.close()
plt.plot(x,y)
plt.show()
First problem is in the title of the plot. It is giving Population inCountryI instead of Population in Country I.
Second problem is in the graph.
While my answer could point out the mistakes in your code, I think it might also be enlightening to show another, perhaps more standard way, of doing this. This is particularly useful if you're going to do this more often, or with large datasets.
Handling CSV files and creating subgroups out of them by yourself is nice, but can become very tricky. Python already has a built-in csv module, but the Pandas library is nowadays basically the default (there are other options as well) for handling tabular data. Which means it is widely available, and/or easy to install. Plus it goes well with Matplotlib. (Read some of Pandas' user's guide for a good overview.)
With Pandas, you can use the following (I've put comments on the code in between the actual code):
import pandas as pd
import matplotlib.pyplot as plt
mpl.rcParams['figure.figsize'] = (8, 8)
# Read the CSV file into a Pandas dataframe
# For a normal CSV, this will work fine without tweaks
df = pd.read_csv('population.csv')
# Convert the month and year columns to a datetime
# Years have to be converted to string type for that
# '%b%Y' is the format for month abbrevation (English) and 4-digit year;
# see e.g. https://strftime.org/
# Instead of creating a new column, we set the date as the index ("row-indices")
# of the dataframe
df.index = pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%b%Y')
# We can remove the month and year columns now
df = df.drop(columns=['Month', 'Year'])
# For nicety, replace the dot in the country name with a space
df['Country'] = df['Country'].str.replace('.', ' ', regex=False)
# Group the dataframe by country, and loop over the groups
# The resulting grouped dataframes, `grouped`, will have just
# their index (date) values and population values
# The .plot() method will therefore automatically use
# the index/dates as x-axis, and the population as
# y-axis.
for country, grouped in df.groupby('Country'):
# Use the convenience .plot() method
grouped.plot()
# Standard Matplotlib functions are still available
plt.title(country)
The resulting plots are shown below (2, given the example data).
If you don't want a legend (since there is only one line), use grouped.plot(legend=None) instead.
If you want to pick one specific country, remove and replace the whole for-loop with the following
country = "Country II"
df[df['Country'] == country].plot()
If you want to do even more, also have a look at the Seaborn library.
Resulting plots:
I have 3 basic problems when displaying things in IPython / JupyterLab.
(1) I have a pandas dataframe with many columns. First, I make sure I can see a decent portion of it:
import numpy as np
import pandas as pd
np.set_printoptions(linewidth=240,edgeitems=5,precision=3)
pd.set_option('display.width',1800) #number of pixels of the output
pd.set_option('display.max_columns',100) #replace the number with None to print all columns
pd.set_option('display.max_rows',10) #max_columns/max_rows sets the maximum number of columns/rows displayed when a frame is pretty-printed
pd.set_option('display.min_rows',9) #once max_rows is exceeded, min_rows determines how many rows are shown in the truncated representation
pd.set_option('display.precision',3) #number of digits in the printed float number
If I print it, everything gets mushed together:
enter image description here
Is it possible to print text wide, i.e. in a way that each line (even if longer) is printed on only 1 line in output, with a slider when lines are wider than the window?
(2) If I display the mentioned dataframe, it looks really nice (has a slider), but some string entries are displayed over 4 rows:
enter image description here
How can I make sure every entry is displayed in 1 row?
(3) The code below produces the output, which works fine:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(0.1,30,1000);
fig,ax=plt.subplots(1, 4, constrained_layout=True, figsize=[15,2])
ax=ax.ravel()
ax[0].plot( x, np.sin(x))
ax[1].plot( x, np.log(1+x))
ax[2].plot( x, np.sin(30/x))
ax[3].plot( x, np.sin(x)*np.sin(2*x))
plt.show()
enter image description here
However, when I change [15,2] to [35,2], the figure will only be as wide as the window. How can I achieve that larger widths produce a slider (like display of a dataframe) so that I can make images as wide as I wish?
You solved (1) already by deciding to display the dataframe with the method in (2). Using print to display a dataframe is not very useful in my opinion.
(2): The display(df) automatically utilizes white spaces to wrap cell content. I did not find a pandas option to disable this behavior. Luckily, someone else had the same problem already and another person provided a solution.
You have to change the style properties of your dataframe. For this you use the Styler, which holds the styled dataframe. I made a short example from which you can copy the line:
import pandas as pd
# Construct data frame content
long_text = 'abcabcabc ' * 10
long_row = [long_text for _ in range(45)]
long_table = [long_row for _ in range(15)]
# Create dataframe
df = pd.DataFrame(long_table)
# Select data you want to output
# df_selected = df.head(5) # First five rows
# df_selected = df.tail(5) # Last five rows
df_selected = df.iloc[[1,3, 7]] # Select row 1,3 and 7
# Create styler of df_selected with changed style of white spaces
dataframe_styler = df_selected.style.applymap(lambda x:'white-space:nowrap')
# Display styler
display(dataframe_styler)
Output:
(3) As I already mentioned in the comment, you simply have to double click onto the diagram and it is displayed with a slider.
I am pretty new to Python programing and have already been confronted with a problem that drives me insane. I kept on searching for the problem - even here at stack overflow. However, I didn't get any solution to solve my problem, which made me sign up for this site.
Nevertheless, this is my problem:
I have several txt files, that contain 3 columns. The first one can be neglected, the second one contains a mixture of date and time, separated with the letter "T" and the third column contains the value (pressure, temperature, what so ever).
Now, what I want to do is, to plot the second column (time and date) on the x axis and the value on the y axis.
I've tried MANY codes - also some are described here at stack overflow - but none of them was the one I was searching for and brought the right results.
More detailed, this is what a my txt files look like:
# MagPy ASCII
234536326.456,2014-06-17T14:23:00.000000,459.7463393940044
674346235.235,2014-06-17T14:28:00.000000,462.8783040474751
and so on.
Forget about the first column. Only the second and third one are relevant. So here, I guess, I have to skip the first line (and the first column), right?
HOWEVER - and here comes the part I cannot solve - with this "T" inside the second column, this becomes a string format.
One of my many errors I get is: could not convert string to float
Well, I searched stack overflow and came across the following code:
x, y = np.loadtxt('example.txt', dtype=int, delimiter=',',
unpack=True, usecols=(1,2))
plt.plot(x, y)
plt.title('example 1')
plt.xlabel('D')
plt.ylabel('Frequency')
plt.show()
I edited the "usecols" to 1 and 2, but with this code, I get the error: list index out of range
So, it doesn't matter what I do, I get an error any time. And the only thing I want is a plot (with matplotlib), that contains time and date on the x axis and the value (e.g. 459.7463393940044 from above) on the y axis.
And talking about what I need: At the end, I have to put several diagrams (about 4-6), that were generated with MANY txt file data, in one figure.
Please, can anyone help me with this? I'd appreciate your help a lot!
This is numpy datetime format. You need to add an explicit converter for the datetime field. The documentation contains additional format details. This question shows how to fill in the converters argument.
date, closep, highp, lowp, openp, volume =
np.loadtxt(f, delimiter=',', unpack=True,
converters={0:mdates.strpdate2num('%d-%b-%y')})
Is that enough to lead you to a full solution?
First, thanks for your response! Unfortunately, this doesn't work at all. I tried it with your converter and combined it with my code, but it didn't work out well. I tried then this code:
# Converter function
datefunc = lambda x: mdates.date2num(datetime.strptime(x, '%d %m %Y %H %M %S'))
# Read data from 'file.dat'
dates, levels = np.genfromtxt('BMP085_10085001_0001_201508.txt', # Data to be read
delimiter=19, # First column is 19 characters wide
converters={1: datefunc}, # Formatting of column 0
dtype=float, # All values are floats
unpack=True) # Unpack to several variables
fig = plt.figure()
ax = fig.add_subplot(111)
# Configure x-ticks
ax.set_xticks(dates) # Tickmark + label at every plotted point
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%Y %H:%M'))
ax.plot_date(dates, levels, ls='-', marker='o')
ax.set_title('title')
ax.set_ylabel('Waterlevel (m)')
ax.grid(True)
# Format the x-axis for dates (label formatting, rotation)
fig.autofmt_xdate(rotation=45)
fig.tight_layout()
fig.show()
And I get an list index out of range error again. Seriously, I do not know any possible solution to make it look like that in the end: {Plot date and time (x axis) versus a value (y axis) using data from file} - see diagram on the bottom of the page.
I've had success doing the following. First, it's ok to read in your ISO8601 date as a string (so a basic read a line and split on comma will work). To convert the date string to a datetime object you can use
import dateutil
# code to read in date_strings and my_values as lists goes here ...
# Here's the magic to parse the ISO8601 strings and make them into datetime objects
my_dates = dateutil.parser.parse(date_strings)
# Now plot the results
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot_date(x=my_dates, y=my_values, marker='o', linestyle='')
ax.set_xlabel('Time')
I'm trying to plot two data series with different y-axes in the same plot in Excel 2003 using Python and win32com.client. I started with VBA to try to get the code I needed. Here's what it looks like so far:
chart = xlApp.Charts.Add()
# This part successfully creates the first series I want
series = chart.SeriesCollection(1)
series.XValues = xlSheet.Range("L13:L200")
series.Values = xlSheet.Range("M13:M200")
# This is what I added to try to plot the second series
series.AxisGroup = xlPrimary
series2 = chart.SeriesCollection(2)
series2.XValues = xlSheet.Range("L13:L200")
series2.Values = xlSheet.Range("N13:N200")
series2.AxisGroup = xlSecondary
# The rest is for formatting it the way I want, but it doesn't work now that I'm
# to plot the second series. (It stops working when I add the last five lines of code).
chart.Legend.Delete() # Delete legend; MUST BE DONE BEFORE CHART IS MOVED
series.Name = file
chart.Location(2, xlSheet.Name) # Copy chart to active worksheet
chart = xlSheet.Shapes(1)
chart.Top = 51
chart.Left = 240
chart.Width = 500
chart.Height = 350
This plots the first series, but as noted in the comments, no longer adds the title, moves the chart, deletes the legend or resizes the chart. It does nothing with the second series. It is also not generating an error.