Problem plotting single and double column data with a boxplot - python

I am trying to plot columns of data form a .csv file in a boxplot/violin plot using matplotlib.pyplot.
When setting the dataframe [df] to one column of data, the plotting works fine. However once I try to plot two columns I do not get a plot generated, and the code seems like it's just running and running, so I think there is something to how I am passing along the data. Each columns is 54,500 row long.
import os
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from pandas import read_csv
os.chdir(r"some_directory//")
df = read_csv(r"csv_file.csv")
# the csv file is 7 columns x 54500 rows, only concerned with two columns
df = df[['surge', 'sway']]
# re-size the dataframe to only use two columns
data = df[['surge', 'sway']]
#print data to just to confirm
print(data)
plt.violinplot(data, vert=True, showmeans=True, showmedians=True)
plt.show()
If I change the data line to data = df['surge'] I get a perfect plot with the 54501 surge values.
When I introduce the second variable as data = df[['surge', 'sway']] is when the program gets hung up. I should note the same problem exists if I let data = df[['surge']] so I think it's something to do with the double braces and going from a list to an array, perhaps?

Related

Using Python to read CSV and plot 5 series over time based on column condition

I am trying to graph a series of 5 lines onto one plot with a dataset from a "data.csv" file in python. There are hundreds of columns in the dataset, but I am only interested in "Time", "Value", and "Machine" columns.
I've got far enough to plot value over time all together, but I would like to have a line graph of the "Value" over "Time" with each "Machine" as a separate series.
My dataset looks sometime like this
Table of Time, Value, Machine
What is the best method to graph the "value" in each series conditionally based on "machine" numbers 1->5?
Here is the basic code I have so far to get a straightforward plot:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('data.csv')
value = df['Conductivity']
time = df['Local Timestamp']
machine = df['CWE Requestor']
plt.plot(time, value)
plt.ylabel('value')
plt.xlabel('time')
plt.show()

Python plotting graph from csv problems

csv file
Hello so I have this csv file that I want to convert to a graph, what I want is it to pretty much graph the number of jobs in each region by city. I have the columns for both cities and countries in this csv file, I want to toss out the date created and just have the city and number of job offers.
Here is the code I tried to use and it didn't work:
import pandas as pd
from matplotlib.pyplot import pie, axis, show
%matplotlib inline
df = pd.read_csv ('compuTrabajo_business_summary_by_industry.csv')
sums = df.groupby(df["country;"])["business count"].sum()
axis('equal');
pie(sums, labels=sums.index);
show()
Thanks for the help
As Abhinav Kinagi already answered, pandas assumes that your values are separated by commas. You can either change your csv-file or simply put sep='|'in pd.read_csv. Your code should be
%matplotlib inline
import pandas as pd
from matplotlib.pyplot import pie, axis, show
df = pd.read_csv ('compuTrabajo_business_summary_by_industry.csv', sep='|')
sums = df.groupby(df["country"])["business count"].sum()
axis('equal');
pie(sums, labels=sums.index);
show()
I also removed the ; after country.

Getting data on matplotllib in bar chart

Currently working to try out matplotlib using object oriented interface. I'm still new to this tool.
This is the end result of the graph (using excel) I want to create using matplotlib.
I have load the table into dataframe which look like this.
Below is the code I wrote.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
loaddf = pd.read_excel("C:\\SampleRevenue.xlsx")
#to get the row on number of tickets
count = loaddf.iloc[0]
#to get the total proceeds I get from sellling the ticket
vol = loaddf.iloc[1]
#The profit from tickets after deducting costs
profit = loaddf.iloc[2]
fig, ax = plt.subplots(figsize=(8, 4))
ax.barh(str(count), list(loaddf.columns.values))
Somehow this is the graph I received. How do I display the number of tickers in bar form for each month? Intention is Y axis number of tickets and x axis on months
This is the count, vol and profit series after using iloc to extract the rows.
Do i need to remove the series before I use for plotting?
What's happening is that read_excel gets really confused when the dataframe is transposed. It expects the first row to be the titles of the columns, and each subsequent row a next entry. Optionally the first column contains the row labels. In that case, you have to add index_col=0 to the parameters of read_excel. If you copy and paste-transpose everything while in Excel, it could work like:
import pandas as pd
import matplotlib.pyplot as plt
loaddf = pd.read_excel("C:\\SampleRevenue_transposed\.xlsx", index_col=0)
loaddf[["Vol '000"]].plot(kind='bar', title ="Bar plot of Vol '000")
plt.show()
If you don't transpose the Excel, the header row gets part of the data, which causes the "no numeric data to plot" message.

pandas scatter plot not showing all data

I am new to pandas data visulaizations and I'm having some trouble with a simple scatter plot. I have a dataframe loaded up from a csv, 6 columns, and 137 rows. But when I try to scatter the data from two columns, I only see 20 datapoints in the generated graph. I expected to see all 137. Any suggestions?
Here is a tidbit of code:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
df = pd.read_csv(file, sep=',', header=0)
df.plot.scatter(x="Parte_aerea_peso_fresco", y="APCEi", marker=".")
And here is the output.
Possibility 1)
Many points are on exactly the same spot. You can manually check in your file.csv
Possibility 2)
Some value are not valid i.e : NaN ( not a number ) or a string, ...
Your dataframe is small: You can check this possibility by printing your DataFrame.
print (df)
print (df[40:60])
df.describe()

How to create five different figures with the same format in one script from a single DataFrame based on a single Excel file?

I am still very new to Python so this is likely an easy question but I have yet to locate a satisfactory answer. I have data from five different sources which I am trying to plot in one script after loading the data from a Excel file to a single DataFrame. As it is now, I only know how to graph one source at a time or all 5 in a single figure (or somwhere between 1 and 5). Here is my code, the entire script. It may not all be necessary but I have included it all just in case.
import numpy as np
import pandas as pd
from pandas import *
import matplotlib
import matplotlib.pyplot as plot
import datetime as datetime
from datetime import *
#Import data from Excel File
data2007 = pd.ExcelFile('f:\Python\Learning 19-4-2013\Data 2007.xls')
table2007 = data2007.parse('Sheet1', skiprows=[0,1,2,3,4,5], index=None)
#Plot data for first meter
ax = plot.figure(figsize=(7,4), dpi=100).add_subplot(111)
FirstMeter = table2007_3.columns[0]
Meter1 = table2007_3[FirstMeter]
Meter1.plot(ax=ax, style='-v')
#Plot data for second meter
SecondMeter = table2007_3.columns[1]
Meter2 = table2007_3[SecondMeter]
Meter2.plot(ax=ax, style='-v')
#Plot data for third meter
ThirdMeter = table2007_3.columns[2]
Meter3 = table2007_3[ThirdMeter]
Meter3.plot(ax=ax, style='v-')
#Plot data for fourth meter
FourthMeter = table2007_3.columns[3]
Meter4 = table2007_3[FourthMeter]
Meter4.plot(ax=ax, style='v-')
#Plot data for fifth meter
FifthMeter = table2007_3.columns[4]
Meter5 = table2007_3[FifthMeter]
Meter5.plot(ax=ax, style='v-')
#Command to show plots
plot.show()
I see you are making a new Series (e.g., Meter1) out of each column of your DataFrame and then plotting them individually on the same axes. Instead, you can plot the DataFrame itself. Pandas assumes you want to plot each column as a separate line on the same plot, which is exactly what you seem to be doing here.
table_2007.plot(style='v-')
or perhaps table_2007[0:4].plot(style='v-') if there are other columns which you need to leave out.
By default, it also generates a legend, which you can suppress with the keyword argument legend=False.
If you want separate figures, as the title of your question suggests the subplots=True argument might get the job done.

Categories