How to display X axis from Pandas Dataframe Object to Matplotlib barchart - python

I have created a pandas dataframe object from a CSV file being read into a dataframe. The csv is very short and includes the following data:
Group Title, Hosts ,Soccer 5, Soccer 4, Soccer 3 , Soccer 2, Soccer 1, Soccer X ,Soccer Y, Soccer Total
Units,11,1,3,4,4,5,
[1 rows x 8 columns]
I have successfully displayed the data on a bar chart however I want the x axis to be labelled for each group title (Hosts, Socker 5, Socker 4 and so on) and plotted in alignment with the data
Please see a picture of my current graph below to get a better understanding
I know I can do this manually but I want it to be read from the CSV file i.e the dataframe object. I have tried different methods to do this such as trying to add the following code
dataframe.plot.bar(dataframe['Group Title'], title="Soccer", ylabel="Quantity", xlabel="Devices")
My full code is below
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
dataframe = pd.read_csv('C:\Scripts\custom.csv', delimiter=",")
print(dataframe)
dataframe.plot()
#plt.rcParams['figure.figsize'] = (15,8)
matplotlib.style.use('ggplot')
dataframe.plot.bar(title="Devices", ylabel="Quantity", xlabel="Devices")
#Show Graphs
plt.show()
Any help or guidance will be appreciated.
Thank you

You can use seaborn's barplot:
import seaborn as sns
import matplotlib.pyplot as plt
sns.barplot(data=df)
plt.xticks(rotation=45)
Alternatively, with pandas only:
df.set_index('Group Title').T.plot.bar()

Related

Plotting complex graph in pandas

I have the following dataset
ids count
1 2000210
2 -23123
3 100
4 500
5 102300120
...
1 million 123213
I want a graph where I have group of ids (all unique ids) in the x axis and count in y axis and a distribution chart that looks like the following
How can I achieve this in pandas dataframe in python.
I tried different ways but I am only getting a basic plot and not as complex as the drawing.
What I tried
df = pd.DataFrame(np.random.randn(1000000, 2), columns=["count", "ids"]).cumsum()
df["range"] = pd.Series(list(range(len(df))))
df.plot(x="range", y="count");
But the plots dont make any sense. I am also new to plotting in pandas. I searched for a long time for charts like this in the internet and could really use some help with such graphs
From what I understood from your question and comments here is what you can do:
1) Import the libraries and set the default theme:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme()
2) Create your dataframe:
df = pd.DataFrame(np.random.randn(1000000, 2), columns=["count", "ids"]).cumsum()
df["range"] = pd.Series(list(range(len(df))))
3) Plot your data
3.1) Simple take using only the seaborn library:
sns.kdeplot(data=df, x="count", weights="range")
Output:
3.2) More complex take using seaborn and matplotlib libraries:
sns.histplot(x=df["count"], weights=df["range"], discrete=True,
color='darkblue', edgecolor='black',
kde=True, kde_kws={'cut': 2}, line_kws={'linewidth': 4})
plt.ylabel("range")
plt.show()
Output:
Personal note: please make sure to check all the solutions, if they
are not enough comment and we will work together in order to find you
a solution
From a distribution plot of ids you can use:
import numpy as np
import pandas as pd
np.random.seed(seed=123)
df = pd.DataFrame(np.random.randn(1000000), columns=["ids"])
df['ids'].plot(kind='kde')

How can one create histograms with subplots according to grouped variables in seaborn?

I am attempting to create a histogram using seaborn and census data that displays 3 subplots for age composition, and I have the data grouped the way that I would like it, but I am struggling to turn that into a histogram.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
filename = "/scratch/%s_class_root/%s_class/materials/data/pums_short.csv.gz"
acs = pd.read_csv(filename)
R65_agg = acs.groupby(["R65", "PUMA"])["HINCP"]
R65_meds = R65_agg.agg(np.median).unstack()
R65_f = R65_meds.dropna()
R65_f = R65_meds.reset_index(drop = True)
I was expecting this code to give me data that I could plug into a histogram but instead of being distinct subplots, the "0.0, 1.0, 2,0" in the final variable just get added together when I apply the .describe() function. Any advice for how I can convert this into a form that's readable with the sns.histplot() function?

Plotting top 10 Values in Big Data

I need help plotting some categorical and numerical Values in python. the code is given below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv('train_feature_store.csv')
df.info
df.head
df.columns
plt.figure(figsize=(20,6))
sns.countplot(x='Store', data=df)
plt.show()
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)
However, the data size is so huge (Big data) that I'm not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given below:-
In an attempt to plot the thing, I'm trying to put the below code into a dataframe and plot it, but not able to do so. Can anyone help me out in this:-
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)
Below, is a link to the sample dataset. However, the dataset is a representation, in the original one where I'm trying to do the EDA, which has around 3 thousand unique stores and 60 thousand rows of data. PLEASE HELP! Thanks!
https://drive.google.com/drive/folders/1PdXaKXKiQXX0wrHYT3ZABjfT3QLIYzQ0?usp=sharing
You were pretty close.
import pandas as pd
import seaborn as sns
df = pd.read_csv('train_feature_store.csv')
sns.set(rc={'figure.figsize':(16,9)})
g = df.groupby('Store', as_index=False)['Size'].sum().sort_values(by='Size', ascending=False).head(10)
sns.barplot(data=g, x='Store', y='Size', hue='Store', dodge=False).set(xticklabels=[]);
First of all.. looking at the data ..looks like it holds data from scotland to Kolkata ..
categorize the data by geography first & then visualize.
Regards
Maitryee

How to plot data from .csv file which has the data from CAN communication(its receives data in 4 packets of 4 data points)

The format of the .csv file is as below it has gaps in between as it gets data in packets. I want to plot the data with timestamp on the x-axis and sensor1 on the y-axis using matplotlib in python so is there a possibility.
This is the data in the CSV file so you can see 4 data points received 4 times this is being read at different time stamps. I tried approaching the normal way but it shows a blank plot.
This is the link to the CSV file.
https://docs.google.com/spreadsheets/d/17SIabIYYmSogOdeYTzpEwy9s2pZuVO3ghoChSSgGwAg/edit?usp=sharing
thanks in advance.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("cantype.csv")
X = data.time_stamp
Y = data.sensor1
# plt.plot(data.time_stamp,data.BMS_01_CellVolt01)
plt.plot(X,Y)
plt.show()
Data
time_stamp,sensor1,sensor2,sensor3,sensor4,sensor5,sensor6,sensor7,sensor8,sensor9,sensor10,sensor11,sensor12,sensor13,sensor14,sensor15,sensor16
1.37E+12,1.50465,1.50405,1.50435,1.5042,,,,,,,,,,,,
1.37E+12,,,,,1.47105,1.5042,1.5045,1.50435,,,,,,,,
1.37E+12,,,,,,,,,1.49115,1.49205,1.4961,1.49865,,,,
1.37E+12,,,,,,,,,,,,,1.50405,1.5042,1.50405,1.50435
1.37E+12,1.50465,1.50405,1.50435,1.5042,,,,,,,,,,,,
1.37E+12,,,,,1.47105,1.5042,1.5045,1.50435,,,,,,,,
1.37E+12,,,,,,,,,1.49115,1.49205,1.4961,1.49865,,,,
1.37E+12,,,,,,,,,,,,,1.50405,1.5042,1.50405,1.50435
Load the csv with pandas:
import pandas as pd
df = pd.read_csv('cantype.csv')
Then either use pandas plotting:
df.plot(x='time_stamp', y='sensor1', marker='.')
Or pure matplotlib:
import matplotlib.pyplot as plt
plt.plot(df.time_stamp, df.sensor1, marker='.')
With your sample data, the plot does not look like a meaningful time series because there are only two (timestamp, sensor1) points and both are located at (1.37E+12, 1.50465):

Plot Overlapping Histograms Using Python

I have a .csv file (csv_test_1.csv) that is in this format:
durum_before_length,durum_before_reads,durum_after_length,durum_after_reads
0,0,0,0
10,0,10,0
20,0,20,0
30,0,30,1
40,0,40,4
50,0,50,5
60,0,60,0
70,0,70,1
80,0,80,4
90,0,90,1
100,4840,100,4704
110,4817,110,4706
120,4983,120,4860
130,4997,130,4851
140,5142,140,4980
150,5363,150,5192
160,5756,160,5530
170,6054,170,5725
180,6335,180,5989
190,7051,190,6651
200,9003,200,7157
210,8446,210,7812
220,9088,220,8314
230,9761,230,8955
240,10637,240,9660
250,11659,250,10408
260,12572,260,11178
270,13139,270,11538
280,13985,280,11950
290,113552,290,14304
300,954175,300,16383
,,310,17230
,,320,18368
,,330,19158
,,340,19733
,,350,20754
,,360,21698
,,370,21991
,,380,21937
,,390,22473
,,400,22655
,,410,22497
,,420,22460
,,430,22488
,,440,21941
,,450,21884
,,460,21350
,,470,21066
,,480,20812
,,490,19901
,,500,19716
,,510,19374
,,520,19000
,,530,18245
,,540,17220
,,550,15713
,,560,14042
,,570,11932
,,580,7204
,,590,29
You can see that the second two columns are longer than the first two columns. I would like to plot two overlapping histograms: the first histogram will be the first column as the x values plotted against the second column as the y-values, and the second histogram will be the third column as the x values plotted against the fourth column as the y-values.
I am thinking of using seaborn because it makes nice looking plots. The code I have thus far is as shown below. From here, I have no idea how to specify the x and y values and how to generate two overlapping histograms on the same plot. Any advice would be greatly appreciated.
import numpy as np
import pandas as pd
from pandas import read_csv
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
read_data = read_csv("csv_test_1.csv")
sns.set(style="white", palette="muted")
sns.despine()
plt.hist(read_data, normed=False)
plt.xlabel("Read Length")
plt.ylabel("Number of Reads")

Categories