Overlaying bar charts in python

Overlaying bar charts in python - python

Can I overlay 3 barcharts in python? The code I used to produce the three barcharts can be seen below:
fig3.set_title('Sample 2(2019-10-05)- Averge bikes used per hour')
fig3.set_xlabel('Hour')
fig3.set_ylabel('Average Percentage')
fig3.set_ylim(ymin=70) ```
fig4=average_bikes_used_hours3.plot.bar(y='Average bikes used in a hour', x='hour',figsize=(20,10))
fig4.set_title('Sample 3(2019-08-31)- Averge bikes used per hour')
fig4.set_xlabel('Hour')
fig4.set_ylabel('Average Percentage')
fig4.set_ylim(ymin=70)
fig5=average_bikes_used_hours4.plot.bar(y='Average bikes used in a hour', x='hour',figsize=(20,10))
fig5.set_title('Sample 4(2019-08-31)- Averge bikes used per hour')
fig5.set_xlabel('Hour')
fig5.set_ylabel('Average Percentage')
fig5.set_ylim(ymin=70)

The most intuitive way is:
create a single DataFrame,
with index for consecutive hours,
with separate columns for each sample.
Something like:
Sample 2 Sample 3 Sample 4
Hour
8 20 25 21
9 22 27 27
10 23 34 29
11 21 30 22
12 19 22 24
Then just plot:
df.plot.bar();
and you will have all samples in a single picture.
For the above data, I got:
If you want some extra space for the legend, pass ylim parameter, e.g.:
df.plot.bar(ylim=(0,40));

Related

Average for similar looking data in a column using Pandas

I'm working on a large data with more than 60K rows.
I have continuous measurement of current in a column. A code is measured for a second where the equipment measures it for 14/15/16/17 times, depending on the equipment speed and then the measurement moves to the next code and again measures for 14/15/16/17 times and so forth.
Every time measurement moves from one code to another, there is a jump of more than 0.15 on the current measurement
The data with top 48 rows is as follows,
Index
Curr(mA)
0
1.362476
1
1.341721
2
1.362477
3
1.362477
4
1.355560
5
1.348642
6
1.327886
7
1.341721
8
1.334804
9
1.334804
10
1.348641
11
1.362474
12
1.348644
13
1.355558
14
1.334805
15
1.362477
16
1.556172
17
1.542336
18
1.549252
19
1.528503
20
1.549254
21
1.528501
22
1.556173
23
1.556172
24
1.542334
25
1.556172
26
1.542336
27
1.542334
28
1.556170
29
1.535415
30
1.542334
31
1.729109
32
1.749863
33
1.749861
34
1.749861
35
1.736024
36
1.770619
37
1.742946
38
1.763699
39
1.749861
40
1.749861
41
1.763703
42
1.756781
43
1.742946
44
1.736026
45
1.756781
46
1.964308
47
1.957395
I want to write a script where similar data of 14/15/16/17 times is averaged in a separate column for each code measurement .. I have been thinking of doing this with pandas..
I want the data to look like
Index
Curr(mA)
0
1.34907
1
1.54556
2
1.74986
Need some help to get this done. Please help

First get the indexes of every row where there's a jump. Use Pandas' DataFrame.diff() to get the difference between the value in each row and the previous row, then check to see if it's greater than 0.15 with >. Use that to filter the dataframe index, and save the resulting indices (in the case of your sample data, three) in a variable.
indices = df.index[df['Curr(mA)'].diff() > 0.15]
The next steps depend on if there are more columns in the source dataframe that you want in the output, or if it's really just curr(mA) and index. In the latter case, you can use np.split() to cut the dataframe into a list of dataframes based on the indexes you just pulled. Then you can go ahead and average them in a list comphrension.
[df['Curr(mA)'].mean() for df in np.split(df, indices)]
> [1.3490729374999997, 1.5455638666666667, 1.7498627333333332, 1.9608515]
To get it to match your desired output above (same thing but as one-column dataframe rather than list) convert the list to pd.Series and reset_index().
pd.Series(
[df['Curr(mA)'].mean() for df in np.split(df, indices)]
).reset_index(drop=True)
index 0
0 0 1.349073
1 1 1.545564
2 2 1.749863
3 3 1.960851

How to plot straight lines in correspondence of highest values?

I have the following data:
Date
01/27/2020 55
03/03/2020 44
02/25/2020 39
03/11/2020 39
01/28/2020 39
02/05/2020 38
03/17/2020 37
03/16/2020 37
03/19/2020 37
03/14/2020 35
03/09/2020 35
03/26/2020 33
03/06/2020 33
01/29/2020 33
03/23/2020 27
03/15/2020 27
02/26/2020 27
03/27/2020 26
03/02/2020 25
02/28/2020 25
03/24/2020 24
03/04/2020 24
01/21/2020 23
03/01/2020 21
02/27/2020 21
01/22/2020 21
02/18/2020 18
01/31/2020 18
03/22/2020 18
01/26/2020 18
03/31/2020 18
02/24/2020 17
01/20/2020 16
01/23/2020 16
03/12/2020 16
03/21/2020 15
02/29/2020 14
03/28/2020 13
02/19/2020 13
03/08/2020 13
02/04/2020 13
02/12/2020 12
02/01/2020 12
02/07/2020 12
03/30/2020 12
02/20/2020 11
03/07/2020 11
03/29/2020 11
02/09/2020 11
02/06/2020 11
using groupby. On the right I have the frequency of values by date.
The plot is
generated by
fig, ax = plt.subplots(figsize=(15,7))
df.groupby(['Date']).count()['NN'].plot(ax=ax)
I would like to have vertical straight lines in correspondence of the first highest values, i.e.
01/27/2020 55
03/03/2020 44
02/25/2020 39
03/11/2020 39
01/28/2020 39
How could I add these lines in my plot?

The .axvline method should do the trick, regarding the vertical lines. If you try to plot a pandas DataFrame/Series using a set of strings for the index, pandas does some fancy footwork in the background.
You could mess around with the xticks and all sorts, but the easiest thing to do is to convert your column to datetime64.
First, let's make some fluff data:
import random
import pandas as pd
from string import ascii_lowercase
# Make some fluff
dates = [f'01/{random.randint(1,28)}/1901' for _ in range(100)]
fluff = [ascii_lowercase[random.randint(1,26):random.randint(1,26)]
for _ in range(100)]
# Pack into a DataFrame
df = pd.DataFrame({'Date': dates, 'NN': fluff})
# Aggregate
counted = df.groupby('Date').count()
Taking a quick peek:
>>> counted
NN
Date
01/10/1901 2
01/11/1901 6
01/12/1901 2
... ...
You can substitute this for whatever data you have. It's probably easiest if you convert your column before doing the groupby, so:
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
agg_df = df.groupby(['Date']).count()
fig, ax = plt.subplots(figsize=(8,6))
agg_df['NN'].plot(ax=ax)
The plot is similar to above. Note that I'm using 8 by 6 for the figsize so that the figure will fit easier on the StackOverflow page. Change it back to 15 by 7 when running your code.
I've used %m/%d/%Y format, as that appears to be what you are using. See here for more info on date formatting: official datetime doc
Finally, get the vertical lines by using a datetime directly:
import datetime
ax.axvline(datetime.datetime(1901,01,10), color='k')
If you want to get vertical straight lines for the highest values, sort your aggregated DataFrame, then whack it in a for-loop.
for d in agg_df.sort_values('NN',ascending=False).index[:5]:
ax.axvline(d, color='k')

python pandas averaging columns to produce new ones

I have a Pandas DataFrame with the following data, displaying the hours worked per week for employees at a company:
name week 1 week 2 week 3 week 4...
joey 20 15 35 10
thomas 20 10 25 15
mark 30 20 25 10
sal 25 25 15 20
amy 25 30 20 10
Assume the data carries on in the same for 100+ weeks.
What I want to produce is a biweekly average of hours for each employee,
so the average hours worked over two weeks. Shown in the following DataFrame:
name weeks 1-2 weeks 2-4...
joey 17.5 22.5
thomas 15 20
mark 25 17.5
sal 25 17.5
amy 27.5 15
How could I make this work? Trying out iterating right now but I'm stuck.

You can achieve that with the following:
for i in range(0, len(df.columns), 2):
df[f'weeks {i+1}-{i+2}'] = df.iloc[:, i:i+1].mean(axis=1)
This code basically iterates through the amount of columns, taking a step of size 2. Then it selects the column indicated by the current iteration (variable i) and the following column (i+1), averages these two, and stores in a new column.
It assumes columns are properly ordered.

Missing xticks on chart for matplotlib on Python 3

I am following this section, I realize this code was made using Python 2 but they have xticks showing on the 'Start Date' axis and I do not. My chart only shows Start Date and no dates are provided.
# Set as_index=False to keep the 0,1,2,... index. Then we'll take the mean of the polls on that day.
poll_df = poll_df.groupby(['Start Date'],as_index=False).mean()
# Let's go ahead and see what this looks like
poll_df.head()
Start Date Number of Observations Obama Romney Undecided Difference
0 2009-03-13 1403 44 44 12 0.00
1 2009-04-17 686 50 39 11 0.11
2 2009-05-14 1000 53 35 12 0.18
3 2009-06-12 638 48 40 12 0.08
4 2009-07-15 577 49 40 11 0.09
Great! Now plotting the Differencce versus time should be straight forward.
# Plotting the difference in polls between Obama and Romney
fig = poll_df.plot('Start Date','Difference',figsize=(12,4),marker='o',linestyle='-',color='purple')
https://nbviewer.jupyter.org/github/jmportilla/Udemy-notes/blob/master/Data%20Project%20-%20Election%20Analysis.ipynb

Plotting multiple lines in one graph with pandas and matplotlib, using climate data

I'm trying to create a graph that shows whether or not average temperatures in my city are increasing. I'm using data provided by NOAA and have a DataFrame that looks like this:
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1939-11 53.1 11 1939
5 1939-12 52.5 12 1939
This is saved in a variable called "avgs", and I then use groupby and plot functions like so:
avgs.groupby(["YEAR"]).plot(kind='line',x='MONTH', y='TAVG')
This produces a line graph (see below for example) for each year that shows the average temperature for each month. That's great stuff, but I'd like to be able to put all of the yearly line graphs into one graph, for the purposes of visual comparison (to see if the monthly averages are increasing).
Example output
I'm a total noob with matplotlib and pandas, so I don't know the best way to do this. Am I going wrong somewhere and just don't realize it? And if I'm on the right track, where should I go from here?

Very similar to the other answer (by Anake), but you can get control over legend here (the other answer, legends for all years will be "TAVG". I add a new year entries into your data just to show this.
avgs = '''
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1940-11 53.1 11 1940
5 1940-12 52.5 12 1940
'''
ax = plt.subplot()
for key, group in avgs.groupby("YEAR"):
ax.plot(group.MONTH, group.TAVG, label = key)
ax.set_xlabel('Month')
ax.set_ylabel('TAVG')
plt.legend()
plt.show()
will result in

You can do:
ax = None
for group in df.groupby("YEAR"):
ax = group[1].plot(x="MONTH", y="TAVG", ax=ax)
plt.show()
Each plot() returns the matplotlib Axes instance where it drew the plot. So by feeding that back in each time, you can repeatedly draw on the same set of axes.
I don't think you can do that directly in the functional style as you have tried unfortunately.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Overlaying bar charts in python - python

Related

Average for similar looking data in a column using Pandas

How to plot straight lines in correspondence of highest values?

python pandas averaging columns to produce new ones

Missing xticks on chart for matplotlib on Python 3

Plotting multiple lines in one graph with pandas and matplotlib, using climate data

Categories

Resources