Make Matplotlib Ignore CSV Headings - python

Trying to create a bar chart using a CSV file and Matplotlib. However, there are two headings (COUNTRY & COST) which means that the code isn't able to run properly and produce the bar chart. How do I edit the code so that it will ignore the headings? The first image is what the CSV file actually looks like and the second image is what the code is able to understand and run.
EDIT: the python assisstant tells me that the error seems to be occurring in Line 14 of the code: price.append(float(row[1]))
import matplotlib.pyplot as plt
import csv
price = []
countries = []
with open ("Europe.csv","r") as csvfile:
plot = csv.reader(csvfile)
for idx, row in enumerate(plot):
if idx == 0:
continue
price.append(float(row[1]))
countries.append(str(row[0]))
plt.style.use('grayscale')
plt.bar( countries, price, label='Europe', color='red')
plt.ylabel('Price in US$')
plt.title('Cost of spotify premium per country')
plt.xticks(rotation=90)
plt.legend(loc='best')
plt.show()

I would use pandas for this. With that you can then more easily create the bar plot using this function.
Example using your variables countries and price:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"country": countries, "price": price})
df.plot.bar(x="country", y="price")
plt.show()

Just using pandas.read_csv then using skiprows=[0],header=None like this:
import pandas as pd
df = pd.read_csv('data.csv',sep=';',skiprows=[0],header=None)
Iam using separator ';' to data because I assume your csv file create in ms.excel
But I think just read the csv file without skiprows, like this:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv',sep=';')
price = df['cost']
countries = df['country']
plt.style.use('grayscale')
plt.bar( countries, price, label='Europe', color='red')
plt.ylabel('Price in US$')
plt.title('Cost of spotify premium per country')
plt.xticks(rotation=90)
plt.legend(loc='best')
plt.show()
for data like this :
and the result like this :

Related

Update legend when adding data to existing plot (Pandas)

I've written a small Python code to read Covid statistics from ourworldindata.org and plot a certain data series for a certain country.
from pandas import read_csv
import pandas as pd
import matplotlib.pyplot as plt
filename = "https://covid.ourworldindata.org/data/owid-covid-data.csv"
dataset = read_csv(filename)
dataset["date"] = pd.to_datetime(dataset["date"])
country = "Norway"
data = "new_cases"
mask = dataset["location"] == country
dataset.loc[mask].set_index("date")[data].plot()
plt.ylabel(data)
plt.legend([country])
plt.show()
It works as intended and plots the number of new cases in Norway as a function of date in the example above. If I change "country" and rerun it, it will plot a new curve for the new country with a different color in the same plot, which is what I want. But there's a problem with the legend. It shows the name of the last plotted country but the color of the first plotted country. I would like it to show both with the correct name and color. How can I do that?
The link shows a figure with the result when first plotting Norway (blue curve) and then Denmark (yellow curve):
Plot of new cases in Norway and Denmark
I'm not quite sure how exactly you "rerun" the code but you can define your countries in a list and print them in a loop:
import pandas as pd
import matplotlib.pyplot as plt
filename = "owid-covid-data.csv"
dataset = pd.read_csv(filename)
dataset["date"] = pd.to_datetime(dataset["date"])
countries = ["Denmark", "Norway"]
data = "new_cases"
for country in countries:
mask = dataset["location"] == country
dataset.loc[mask].set_index("date")[data].plot()
plt.ylabel(data)
plt.legend(countries)
plt.show()
Or you can use seaborn instead of the loop:
import seaborn as sns
df = dataset[dataset["location"].isin(countries)][["date", "location", data]]
sns.lineplot(data=df, x="date", y=data, hue="location")
plt.show()

How to eliminate the diagonal line that appears when I run my script using pandas and matplotlib?

I have come across this similar situation: matplotlib plot shows an unnecessary diagonal line however, it didn't help me since I produce my plot from a csv file, not a data array. Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(r"C:\Users\HP\OneDrive\Desktop\MPGES-1\Computational_work\MD\rmsd-yeni-utf-modifiye.csv", sep=";")
df.plot(kind="line", color="red", title="rmsd", xlabel="time", ylabel="freq")
plt.plot(df)
plt.show()
I attach the graph and the screenshot of the beginning of the csv file.
graph
screenshot-csv file
Shortly, I only want to eliminate the diagonal line seen in the picture as the rest of the graph seems to be fine.
Thanks,
Pandas read_csv is inferring a header row from your csv file and is plotting x/y columns incorrectly in your example.
As your csv file does not have a header row you can use set the names like below to define one during import and use the column names to plot x='time', y='freq'
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv(r"C:\Users\HP\OneDrive\Desktop\MPGES-1\Computational_work\MD\rmsd-yeni-utf-modifiye.csv", sep=";", names=['time', 'freq'])
df.plot(kind="line", color="red", title="rmsd", xlabel="time", ylabel="freq", x='time', y='freq')
plt.show()
Alternatively you can set header=None when reading the csv and use the indexes for plotting x=0,y=1
df = pd.read_csv(r"C:\Users\HP\OneDrive\Desktop\MPGES-1\Computational_work\MD\rmsd-yeni-utf-modifiye.csv", sep=";", header=None)
df.plot(kind="line", color="red", title="rmsd", xlabel="time", ylabel="freq",x=0,y=1)
plt.show()
Or as a third fix you could consider adding a header row to your dataset
and plot letting pandas use the first row as header for column names
df = pd.read_csv(r"C:\Users\HP\OneDrive\Desktop\MPGES-1\Computational_work\MD\rmsd-yeni-utf-modifiye.csv", sep=";")
df.plot(kind="line", color="red", title="rmsd", xlabel="time", ylabel="freq",x="time",y="freq")
plt.show()
example plot using different data
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Matplotlib: Generating Subplots for Multiple Time Series

I have the following dataset that was randomly generated through a simulation I am building:
https://drive.google.com/drive/folders/1JF5QrliE9s8VPMaGc8Z-mwpFhNWkeYtk?usp=sharing
For debugging purposes, I would like to be able to view this data in a series of small multiples. Like this:
I am attempting to do this using matplotlib and pandas. Here is my code for that:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
def graph_file(f: str):
"""
Graphs a single file of data
and exports it as a pdf of separate charts.
"""
data = pd.read_csv(f)
header = data.columns
fname = f[:-4] + '.pdf'
with PdfPages(fname) as pdf:
n = len(header)
time: str = header[0]
# Multiple charts on one page
fig = plt.figure()
for i in range(1, n):
y: str = header[i]
ax = fig.add_subplot()
data.plot(x=time, y=y)
pdf.savefig(bbox_inches='tight')
When I open up the .csv file and try to run the function using a Jupyter notebook, I get the same deprecation warning over and over again:
<ipython-input-5-0563709f3c08>:24: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
ax = fig.add_subplot()
The resulting pdf file does not contain a single page with multiple graphs (which is what I want like in the first image) but just a single page with a single graph:
What exactly am I doing wrong? I greatly appreciate any feedback you can give.
Here is a solution that should meet your needs. It reads the csv file into a dataframe and iterates through the columns of the dataframe to plot corresponding subplots.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
def graph_file(f: str):
df = pd.read_csv(f)
fig, axs = plt.subplots(nrows=3, ncols=3)
fig.set_size_inches(20, 10)
fig.subplots_adjust(wspace=0.5)
fig.subplots_adjust(hspace=0.5)
fname = f[:-4] + '.pdf'
with PdfPages(fname) as pdf:
for col, ax in zip(df.columns[1:], axs.flatten()):
ax.plot(df['time (days)'], df[col])
ax.set(xlabel='time (days)', ylabel=col)
ax.tick_params(axis='x', labelrotation=30)
pdf.savefig(bbox_inches='tight')
plt.show()

Seaborn: Include multiple 'HUE' in one graph?

I have simple python script for reading CSV file and plotting graphs using seaborn, and it works perfectly!
firstly: here is a screenshot of my CSV file showing the frames and the statistics for stations:
here is my simple code:
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
read_CSV_stats_per_day_for_KS = pd.read_csv('results_per_day/ALL_stations_together.csv', sep=";", encoding ="ISO-8859-1")
read_day_column = read_CSV_stats_per_day_for_KS[read_CSV_stats_per_day_for_KS['day'] == 0]
def plot_results_about_south_east_stations():
south_east = read_day_column[read_day_column['Region'] == 'south_east']
top4_visited_stations = south_east.nlargest(4, 'total_visited_cars')
dataframes_for_south_east_stations = read_CSV_stats_per_day_for_KS[read_CSV_stats_per_day_for_KS['name'].isin(top4_visited_stations['name'])]
sns.relplot(x='day', y='avg_queue_length', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='avg_total_EV_in_station', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='total_rejected_cars', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='total_exhausted_cars', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
plt.show()
When i plot, I get those beautiful graphs:
https://imgur.com/a/YlGTCgH
If you look on the graphs below(exactly the same), I am trying to replace the name of stations with total_amount_of_chargers that particular station has
https://imgur.com/a/5l5cP4x
qustion 1: firstly, the numbers are wrong, i want to show the actual number of chargers, not those 20, 40,60,80. How can i do that?
question 2: I still want the name of stations to stay on the Y label instead of numbers, how can i do that?
qustion 3: the final qustion, Is it possible to combine the name of stations and total amount of chargers to show on the right? like this:
https://imgur.com/a/3RidCDv
It this were possible, it would be awesome and i would be very appreciated!
Thank you.

Plot Data from CSV and group values in colum

I am pretty new in python and try to understand how to do the following:
I am trying to plot data from a csv file where I have values for A values for B and values for C. How can I group it and plot it based on the Valuegroup and as values using the colum values?
import pandas as pd
import matplotlib.pyplot as plt
csv_loader = pd.read_csv('C:/Test.csv', encoding='cp1252', sep=';', index_col=0).dropna()
#csv_loader.plot()
print(csv_loader)
fig, ax = plt.subplots()
csv_loader.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
The data looks like the following:
Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45
If you want to take a mean of Value for each Valuegroup and show them with line chart, use
csv_loader.groupby('Valuegroup')['Value'].mean().plot()
There are various chart types available, please refer to pandas documentation on plot

Categories