I have a code that lets me graph stacked bar charts, and it imports data from a .txt file.
df = pd.read_table("tiempo.txt",header =None, names = ['Tiempos no Contributivos','Tiempos Contributivos', 'Tiempos Productivos'])
df['bar'] = 1
dfgroup = df.groupby('bar').sum()
ax = dfgroup [['Tiempos no Contributivos','Tiempos Contributivos', 'Tiempos Productivos']].plot(kind='bar', title ="Tiempos de obra",
figsize=(15, 10), legend=True, fontsize=12)
ax.set_xlabel(" ", fontsize=12)
ax.set_ylabel("Tiempo(segundos)", fontsize=12)
plt.show()
The graph this code throws, looks like this.
Graph Image
What i need is for this code to work like its been working, but instead of reading a .txt file, i need it to read a .csv file.
When i tried to switch the .txt file for the .csv one in the reading part, this happened.
Error
The data used in this code looks like this (Wont post the entirety of it because there are like 500 rows)
Example of the data used
So I fixed that and changed the function to read_csv. Then I noticed that your csv wasn't comma delimited, and was, instead, tab delimited. So, I added delimiter='\t' the read. Next, I went ahead adn updated the functions you were calling to set the ylabel and xlabel to their modern names. Then, I took your current column names and fed them into the plot, that way we have a properly labeled graph.
import pandas as pd
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
df = pd.read_csv("tiempo.txt",header=None,delimiter='\t',names = ['Tiempos no Contributivos','Tiempos Contributivos', 'Tiempos Productivos'])
y_pos = np.arange(len(df.columns))
bars = df.sum(axis=0)
plt.bar(y_pos, bars, align='center', alpha=0.5)
plt.xticks(y_pos, df.columns)
plt.ylabel("Tiempo(segundos)", fontsize=12)
plt.xlabel(" ", fontsize=12)
plt.title('Tiempo')
plt.show()
Output:
Related
I have a csv file with values that I'd like to plot. The file has no headers as shown below.
0.95744324 0.09625244 7.9512634
0 0.840118 0.153717 7.841126
1 0.646194 0.292572 7.754929
2 0.492966 0.452988 7.829147
3 0.291855 0.646912 7.991959
4 0.279877 0.716354 8.039841
... ... ... ...
I was able to plot each column as separate lines on a graph with the code below, but I'd like to add a legend for x,y,z dimensions for the corresponding column/line. I am not sure how exactly to go about this as what I have now makes all the keys in the legend 'x'. I cannot modify the csv file, so should I add headers in my code and then plot each column individually?
aPlot = pd.read_csv('accl/a.csv')
plt.figure()
plt.plot(aPlot, label = "x")
plt.xlabel("time")
plt.ylabel("acceleration[m/s^2")
plt.legend(loc="upper left")
plt.show
As your CSV file does not have a header, you can specify the column names by passing the names parameter.
You can then use the dataframe to do your plot, the legend will then be correct:
import matplotlib.pyplot as plt
import pandas as pd
aPlot = pd.read_csv('input.csv', names=['col1', 'col2', 'col3'])
aPlot.plot()
plt.xlabel("time")
plt.ylabel("acceleration[m/s^2")
plt.legend(loc="upper left")
plt.show()
Giving you:
I am working with >100 csv files while I am opening and plotting in a loop. My aim is to save each plot on a pdf page and generate a big pdf file with each page containing plot from a single file. I am looking at these examples - (1) and (2). Trying out combinations using matplotlib.backends.backend_pdf I am unable to get the required result.
Here I re-create my code and the approach I am using:
pdf = PdfPages('alltogther.pdf')
fig, ax = plt.subplots(figsize=(20,10))
for file in glob.glob('path*'):
df_in=pd.read_csv(file)
df_d = df_in.resample('d')
df_m = df_in.resample('m')
y1=df_d['column1']
y2=df_m['column2']
plt.plot(y1,linewidth='2.5')
plt.plot(y2,linewidth='2.5')
pdf.savefig(fig)
With this all the plots are getting superimposed on the same figure and the pdf generated is empty.
You need to move the line
fig, ax = plt.subplots(figsize=(20,10))
Inside the loop, otherwise each iteration will use the same figure instance instead of a new instance. Also note that you need to close the pdf when you are done with it. So the code should be
pdf = PdfPages('alltogther.pdf')
for file in glob.glob('path*'):
fig, ax = plt.subplots(figsize=(20,10))
df_in=pd.read_csv(file)
df_d = df_in.resample('d')
df_m = df_in.resample('m')
y1=df_d['column1']
y2=df_m['column2']
plt.plot(y1,linewidth='2.5')
plt.plot(y2,linewidth='2.5')
pdf.savefig(fig)
pdf.close()
Edit
Complete, self-contained example:
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np
pdf = PdfPages('out.pdf')
for i in range(5):
fig, ax = plt.subplots(figsize=(20, 10))
plt.plot(np.random.random(10), linestyle=None, marker='.')
pdf.savefig(fig)
pdf.close()
I have simple python script for reading CSV file and plotting graphs using seaborn, and it works perfectly!
firstly: here is a screenshot of my CSV file showing the frames and the statistics for stations:
here is my simple code:
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
read_CSV_stats_per_day_for_KS = pd.read_csv('results_per_day/ALL_stations_together.csv', sep=";", encoding ="ISO-8859-1")
read_day_column = read_CSV_stats_per_day_for_KS[read_CSV_stats_per_day_for_KS['day'] == 0]
def plot_results_about_south_east_stations():
south_east = read_day_column[read_day_column['Region'] == 'south_east']
top4_visited_stations = south_east.nlargest(4, 'total_visited_cars')
dataframes_for_south_east_stations = read_CSV_stats_per_day_for_KS[read_CSV_stats_per_day_for_KS['name'].isin(top4_visited_stations['name'])]
sns.relplot(x='day', y='avg_queue_length', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='avg_total_EV_in_station', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='total_rejected_cars', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
sns.relplot(x='day', y='total_exhausted_cars', data=dataframes_for_south_east_stations, hue='name', kind='line')
plt.suptitle("South-east Oslo")
plt.show()
When i plot, I get those beautiful graphs:
https://imgur.com/a/YlGTCgH
If you look on the graphs below(exactly the same), I am trying to replace the name of stations with total_amount_of_chargers that particular station has
https://imgur.com/a/5l5cP4x
qustion 1: firstly, the numbers are wrong, i want to show the actual number of chargers, not those 20, 40,60,80. How can i do that?
question 2: I still want the name of stations to stay on the Y label instead of numbers, how can i do that?
qustion 3: the final qustion, Is it possible to combine the name of stations and total amount of chargers to show on the right? like this:
https://imgur.com/a/3RidCDv
It this were possible, it would be awesome and i would be very appreciated!
Thank you.
Hello I am having a problem plotting data from pandas dataframes. Within a few for loops I would like to create one large scatter plot (multiplots.png), to which new data is added in every loop, while also creating separate plots that are plotted and saved in every j loop (plot_i_j.png).
In my code the plots_i_j.png figures are produced correctly, but multiplots.png always ends up being the last plot_i_j.png figure. As you can see, I am trying to plot multiplots.png on axComb, while the plot_i_j.png figures are plotted on ax. Can anyone help me on this please?
import pandas as pd
import matplotlib.pyplot as plt
columnNames = ['a','b']
scatterColors = ['red','blue','green','black']
figComb, axComb = plt.subplots(figsize=(8,6))
for i in range(4): # this is turbine number
df1 = pd.DataFrame(np.random.randn(5, 2), columns=columnNames)
df2 = pd.DataFrame(np.random.randn(5, 2), columns=columnNames)
print(df1)
for j in range(2):
fig, ax = plt.subplots(figsize=(8,6))
fig.suptitle(str(i)+'_'+str(j), fontsize=16)
df1.plot(columnNames[j], ax=ax, color='blue', ls="--")
plt.savefig('plot_'+str(i)+'_'+str(j)+'.png')
df1.reset_index().plot.scatter('index',columnNames[j],3,ax=axComb,color=scatterColors[j])
df2.reset_index().plot.scatter('index',columnNames[j],100,ax=axComb,color=scatterColors[j])
plt.savefig('multiPlots.png')
Really a small error. When you do plt.savefig, matplotlib looks for the last called figure.
Replace the plt.savefig('plot_'+str(i)+'_'+str(j)+'.png') with fig.savefig('plot_'+str(i)+'_'+str(j)+'.png').
And replace plt.savefig('multiPlots.png') by figComb.savefig('multiPlots.png').
I am pretty new to python and coding in general. I have this code so far.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.csv', delimiter=',', skiprows=1)
mSec = data[:,0]
Airspeed = data[:,10]
AS_Cmd = data[:,25]
airspeed = data[:,3]
plt.rc('xtick', labelsize=25) #increase xaxis tick size
plt.rc('ytick', labelsize=25) #increase yaxis tick size
fig, ax = plt.subplots(figsize=(40,40), edgecolor='b')
ax.patch.set_facecolor('white')
ax.plot(mSec, Airspeed, label='Ground speed [m/s]')
ax.plot(mSec, AS_Cmd, label='Voltage [V]')
plt.legend(loc='best',prop={'size':20})
fig.savefig('trans2.png', dpi=(200), bbox_inches='tight') #borderless on save
However, I don't want to individually read every data column there is. I want to be able to load a csv file and have it read out all column names, then asks the users what you want for your x-axis and y-axis and plots that graph. The csv file format is:
time(s),speed(mph),heading,bvoltage(v)
20,30,50,10
25,45,50,10
30,50,55,9
Here is my attempt at the code but I am missing a lot of information:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.csv', delimiter=',')
## names = where I will store the column names
white True:
## display names to user
print ('Pick your x-axis')
xaxis = input()
print ('Pick your y-axis')
yaxis1 = input()
print('pick a 2nd y-axis or enter none')
yaxis2 = input()
if input()= 'none'
break;
else continue
#plot xaxis vs yaxis vs 2nd yaxis
I understand the loop is not correct. I don't want anyone to correct me on that I will figure it out myself, however, I would like a way to access those values from the CSV file so that I can use it in that method.
Using pandas you can do:
import pandas as pd
data = pd.read_csv("yourFile.csv", delimiter=",")
and plot columns with names ColName1, ColName2 against each other with:
data.plot(x='Col1', y='Col2')
If you have a first line in the csv file with the desired names of the columns, pandas will pick those automatically, otherwise you can play with the header argument of read_csv.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
If you don't mind using/installing another module then pandas should do it.