Plot Data from CSV and group values in colum - python

I am pretty new in python and try to understand how to do the following:
I am trying to plot data from a csv file where I have values for A values for B and values for C. How can I group it and plot it based on the Valuegroup and as values using the colum values?
import pandas as pd
import matplotlib.pyplot as plt
csv_loader = pd.read_csv('C:/Test.csv', encoding='cp1252', sep=';', index_col=0).dropna()
#csv_loader.plot()
print(csv_loader)
fig, ax = plt.subplots()
csv_loader.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
The data looks like the following:
Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45

If you want to take a mean of Value for each Valuegroup and show them with line chart, use
csv_loader.groupby('Valuegroup')['Value'].mean().plot()
There are various chart types available, please refer to pandas documentation on plot

Related

Multiple boxplot in a single Graphic in Python

I'm a beginner in Python.
In my internship project I am trying to plot bloxplots from data contained in a csv
I need to plot bloxplots for each of the 4 (four) variables showed above (AAG, DENS, SRG e RCG). Since each variable presents values ​​in the range from [001] to [100], there will be 100 boxplots for each variable, which need to be plotted in a single graph as shown in the image.
This is the graph I need to plot, but for each variable there will be 100 bloxplots as each one has 100 columns of values:
The x-axis is the "Year", which ranges from 2025 to 2030, so I need a graph like the one shown in figure 2 for each year and the y-axis is the sets of values ​​for each variable.
Using Pandas-melt function and seaborn library I was able to plot only the boxplots of a column. But that's not what I need:
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
mdf= df.melt(id_vars=['Year'], value_vars='AAG[001]')
print(mdf)
ax=sns.boxplot(x='Year', y='value',width = 0.2, data=mdf)
Result of the code above:
What can I try to resolve this?
The following code gives you five subplots, where each subplot only contains the data of one variable. Then a boxplot is generated for each year. To change the range of columns used for each variable, change the upper limit in var_range = range(1, 101), and to see the outliers change showfliers to True.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv("2DBM_50x50_Central_Aug21_Sim.cliped.csv")
variables = ["AAG", "DENS", "SRG", "RCG", "Thick"]
period = range(2025, 2031)
var_range = range(1, 101)
fig, axes = plt.subplots(2, 3)
flattened_axes = fig.axes
flattened_axes[-1].set_visible(False)
for i, var in enumerate(variables):
var_columns = [f"TB_acc_{var}[{j:05}]" for j in var_range]
data = df.melt(id_vars=["Period"], value_vars=var_columns, value_name=var)
ax = flattened_axes[i]
sns.boxplot(x="Period", y=var, width=0.2, data=data, ax=ax, showfliers=False)
plt.tight_layout()
plt.show()
output:

How to eliminate the diagonal line that appears when I run my script using pandas and matplotlib?

I have come across this similar situation: matplotlib plot shows an unnecessary diagonal line however, it didn't help me since I produce my plot from a csv file, not a data array. Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(r"C:\Users\HP\OneDrive\Desktop\MPGES-1\Computational_work\MD\rmsd-yeni-utf-modifiye.csv", sep=";")
df.plot(kind="line", color="red", title="rmsd", xlabel="time", ylabel="freq")
plt.plot(df)
plt.show()
I attach the graph and the screenshot of the beginning of the csv file.
graph
screenshot-csv file
Shortly, I only want to eliminate the diagonal line seen in the picture as the rest of the graph seems to be fine.
Thanks,
Pandas read_csv is inferring a header row from your csv file and is plotting x/y columns incorrectly in your example.
As your csv file does not have a header row you can use set the names like below to define one during import and use the column names to plot x='time', y='freq'
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv(r"C:\Users\HP\OneDrive\Desktop\MPGES-1\Computational_work\MD\rmsd-yeni-utf-modifiye.csv", sep=";", names=['time', 'freq'])
df.plot(kind="line", color="red", title="rmsd", xlabel="time", ylabel="freq", x='time', y='freq')
plt.show()
Alternatively you can set header=None when reading the csv and use the indexes for plotting x=0,y=1
df = pd.read_csv(r"C:\Users\HP\OneDrive\Desktop\MPGES-1\Computational_work\MD\rmsd-yeni-utf-modifiye.csv", sep=";", header=None)
df.plot(kind="line", color="red", title="rmsd", xlabel="time", ylabel="freq",x=0,y=1)
plt.show()
Or as a third fix you could consider adding a header row to your dataset
and plot letting pandas use the first row as header for column names
df = pd.read_csv(r"C:\Users\HP\OneDrive\Desktop\MPGES-1\Computational_work\MD\rmsd-yeni-utf-modifiye.csv", sep=";")
df.plot(kind="line", color="red", title="rmsd", xlabel="time", ylabel="freq",x="time",y="freq")
plt.show()
example plot using different data
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Python pandas and matplotlib automatically filling in missing data

Im very new to python and trying to figure out how to graph out some data which can have missing data for any given date.
The data is number of jobs completed (y), their rating (secondary Y), and date (x).
The graph looks how id like however jobs dont get completed each day so there are days where there is no data and the line on the graph just stops.
Is there a way to have it automatically connect the dots on the graph?
import matplotlib.pyplot as plt
import pandas as pd
import database
df = pd.DataFrame(database.getTasks("Pete"), columns=['date', 'rating', 'jobs']).set_index('date')
fig, ax = plt.subplots()
ax3 = ax.twinx()
rspine = ax3.spines['right']
rspine.set_position(('axes', 1.15))
ax3.set_frame_on(True)
ax3.patch.set_visible(False)
df.jobs.plot(ax=ax, style='b-')
df.rating.plot(ax=ax, style='r-', secondary_y=True)
plt.show()
I think you are looking for Dataframe.fillna().
df.fillna(method='ffill')
Forward Fill ('ffill') will use the last valid observation in place of a missing value.
to fill your data you can use pandas fill.na and use 'method=ffill' to propagate the last valid value. Check the documentation to see what method fits best.

How do add a legend to CSV data plot with no header in Python?

I have a csv file with values that I'd like to plot. The file has no headers as shown below.
0.95744324 0.09625244 7.9512634
0 0.840118 0.153717 7.841126
1 0.646194 0.292572 7.754929
2 0.492966 0.452988 7.829147
3 0.291855 0.646912 7.991959
4 0.279877 0.716354 8.039841
... ... ... ...
I was able to plot each column as separate lines on a graph with the code below, but I'd like to add a legend for x,y,z dimensions for the corresponding column/line. I am not sure how exactly to go about this as what I have now makes all the keys in the legend 'x'. I cannot modify the csv file, so should I add headers in my code and then plot each column individually?
aPlot = pd.read_csv('accl/a.csv')
plt.figure()
plt.plot(aPlot, label = "x")
plt.xlabel("time")
plt.ylabel("acceleration[m/s^2")
plt.legend(loc="upper left")
plt.show
As your CSV file does not have a header, you can specify the column names by passing the names parameter.
You can then use the dataframe to do your plot, the legend will then be correct:
import matplotlib.pyplot as plt
import pandas as pd
aPlot = pd.read_csv('input.csv', names=['col1', 'col2', 'col3'])
aPlot.plot()
plt.xlabel("time")
plt.ylabel("acceleration[m/s^2")
plt.legend(loc="upper left")
plt.show()
Giving you:

How to add a marker from a different column to a seaborn pandas barplot

I have the following dataset, code and plot:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
data = [['tom', 10,15], ['matt', 13,10]]
df3 = pd.DataFrame(data, columns = ['Name', 'Attempts','L4AverageAttempts'])
f,ax = plt.subplots(nrows=1,figsize=(16,9))
sns.barplot(x='Attempts',y='Name',data=df3)
plt.show()
How can get a marker of some description (dot, *, shape, etc) to show that tomhas averaged 15 (so is below his average) and matt has averaged 10 so is above average. So a marker basxed off the L4AverageAttempts value for each person.
I have looked into axvline but that seems to be only a set number rather than a specific value for each y axis category. Any help would be much appreciated! thanks!
You can simply plot a scatter plot on top of your bar plot using L4AverageAttempts as the x value:
You can use seaborn.scatterplot for this. Make sure to set the zorder parameter so that the markers appear on top of the bars.
import seaborn as sns
import pandas as pd
data = [['tom', 10,15], ['matt', 13,10]]
df3 = pd.DataFrame(data, columns = ['Name', 'Attempts','L4AverageAttempts'])
f,ax = plt.subplots(nrows=1,figsize=(16,9))
sns.barplot(x='Attempts',y='Name',data=df3)
sns.scatterplot(x='L4AverageAttempts', y="Name", data=df3, zorder=10, color='k', edgecolor='k')
plt.show()

Categories