Plotting 2 columns of a csv with matplotlib error - python

I am trying to make a simple bar graph out of a 2 column CSV file. One column is the x axis names, the other column is the actual data which will be used for the bars. The CSV looks like this:
count,team
21,group1
15,group2
63,group3
22,group4
42,group5
72,group6
21,group7
23,group8
24,group9
31,group10
32,group11
I am using this code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("sampleData.csv",sep=",").set_index('count')
d = dict(zip(df.index,df.values.tolist()))
df.plot.bar(x = 'count', y = 'team')
print(d)
However, I get an error
KeyError: 'count' from this line :
df.plot.bar(x = 'count', y = 'team')
I don't understand how there is an error for something that exists.

When you set the count as index, you just have a single column left in your DataFrame, i.e., team. Don't set the count as index and switch the order of x and y values for plotting the bar chart
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("sampleData.csv", sep=",")
df.plot.bar(x = 'team', y = 'count')
Matplotlib solution
plt.bar(df['team'], df['count'])
plt.xticks(rotation=45) # Just rotating for better visualizaton

Related

Box and whisker plot on multiple columns

I am trying to make a Box and Whisker plot on my dataset that looks something like this -
& the chart I'm trying to make
My current lines of code are below -
import seaborn as sns
import matplotlib.pyplot as plt
d = df3.boxplot(column = ['Northern California','New York','Kansas','Texas'], by = 'Banner')
d
Thank you
I've recreated a dummy version of your dataset:
import numpy as np
import pandas as pd
dictionary = {'Banner':['Type1']*10+['Type2']*10,
'Northen_californina':np.random.rand(20),
'Texas':np.random.rand(20)}
df = pd.DataFrame(dictionary)
What you need is to melt your dataframe (unpivot) in orther to have the information of geographical zone stored in a column and not as column name. You can use pandas.melt method and specify all the columns you want to put in your boxplot in the value_vars argument.
With my dummy dataset you can do this:
df = pd.melt(df,id_vars=['Banner'],value_vars=['Northen_californina','Texas'],
var_name='zone', value_name='amount')
Now you can apply a boxplot using the hue argument:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(9,9)) #for a bigger image
sns.boxplot(x="Banner", y="amount", hue="zone", data=df, palette="Set1")

Count Rows in Dictionary of Dataframes

I have a dictionary of dataframes. I am trying to count the rows in each dataframe. For the real data, my code is counting just over ten thousand rows for a dataframe that has only has a few rows.
I have tried to reproduce the error using dummy data. Unfortunately, the code works fine with the dummy data!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Dataframe
Df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
# Map
Ma = Df.groupby('D')
# Dictionary of Dataframes
Di = {}
for name, group in Ma:
Di[str(name)] = group
# Count the Rows in each Dataframe
Li = []
for k in Di:
Count = Di[k].shape[0]
Li.append([Count])
# Flatten
Li_1 = []
for sublist in Li:
for item in sublist:
Li_1.append(item)
# Histogram
plt.hist(Li_1, bins=10)
plt.xlabel("Rows / Dataframe")
plt.ylabel("Frequency")
fig = plt.gcf()
To get the number of rows corresponding to each category in 'D', you can simply use .size when you do your groupby:
Df.groupby('D').size()
pandas also allows you to directly plot graphs, so your code can be reduced to:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
Df.groupby('D').size().plot.hist()
plt.xlabel("Rows / Dataframe")
plt.ylabel("Frequency")
fig = plt.gcf()
Assuming that, the data in column D is a categorical variable. You can get the count for each category using seaborn countplot.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Dataframe
df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
# easy count plot in sns
sns.countplot(x='D',data=df)
plt.xlabel("category")
plt.ylabel("frequency")
But if you are looking for distribution plot but not categorical count plot then you can use the folowing part of the code to have distribution plot.
# for distribution plot
sns.distplot(df['D'],kde=False,bins=10)
plt.xlabel("Spread")
plt.ylabel("frequency")
But if you want distribution plot after group by the elements which does not make any sense to me but you can use the following:
# for distribution plot after group by
sns.distplot(df.groupby('D').size() ,kde=False,bins=10)
plt.xlabel("Spread")
plt.ylabel("frequency")

matplotlib dataframe x axis date issue

import sys
import ConfigParser
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as DT
import bokeh
sys.path.extend(['..\..\myProj\SOURCE'])
fullfilepath = "../../myProj/SOURCE/" + 'myparts.txt'
ohg_df = pd.read_csv(fullfilepath, sep="\t" )
temp_df = temp_df[['as_on_date', 'ohg_qty']]
temp_df = temp_df.sort(['as_on_date'], ascending=[1])
temp_df.set_index('as_on_date')
plt.plot(temp_df.index, temp_df.ohg_qty)
plt.show()
This is my dataframe after importing.
I am trying to plot the line graph with x axis as date mentioned in the dataframe.
Can someone guide me... I am new to pandas.
dataframe picture
output pitcure
Easier:
# Set index directly
ohg_df = pd.read_csv(fullfilepath, sep="\t", index='as_on_date')
# Convert string index to dates
ohg_df.index = pd.to_datetime(ohg_df.index)
# Get a column and plot it (taking a column keeps the index)
plt.plot(ohg_df.ohg_qty)

plot graph from python dataframe

i want to convert that dataframe
into this dataframe and plot a matplotlib graph using date along x axis
changed dataframe
Use df.T.plot(kind='bar'):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame.from_csv('./housing_price_index_2010-11_100.csv')
df.T.plot(kind='bar')
plt.show()
you can also assign the transpose to a new variable and plot that (what you asked in the comment):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame.from_csv('./housing_price_index_2010-11_100.csv')
df_transposed = df.T
df_transposed.plot(kind='bar')
plt.show()
both result the same:

Pandas index column for a dataframe

Hi all so I'm trying to work with this set of data that has two columns, one is names and the other is the number of births for each name. What I want to do is import a csv file, perform some basic functions on it such as finding the baby name with the maximum number of births, and then plotting the data in a bar graph. But, when I have an index value for the dataframe, the bar graph prints that as the x axis instead of the names. So I removed the index and now I get all kinds of errors. Below is my code, first the one with the index and then the one without. Thanks in advance. This is really driving me crazy
import pandas as pd
import matplotlib.pyplot as plt
import pdb
import matplotlib as p
import os
from pandas import DataFrame
Location = os.path.join(os.path.sep,'Users', 'Mark\'s Computer','Desktop','projects','data','births1880.csv')
a = pd.read_csv(Location, index_col = False)
print(a) #print the dataframe just to see what I'm getting.
MaxValue = a['Births'].max()
MaxName = a['Names'][a['Births'] == MaxValue].values
print(MaxValue, ' ', MaxName)
a.plot(kind ='bar')
plt.show()
This code works but spits out a bar graph with the index as the x axis instead of the names?
import pandas as pd
import matplotlib.pyplot as plt
import pdb
import matplotlib as p
import os
from pandas import DataFrame
Location = os.path.join(os.path.sep,'Users', 'Mark\'s Computer','Desktop','projects','data','births1880.csv')
a = pd.read_csv(Location, index_col = True) #why is setting the index column to true removing it?
print(a) #print the dataframe just to see what I'm getting.
MaxValue = a['Births'].max()
MaxName = a['Names'][a['Births'] == MaxValue].values
print(MaxValue, ' ', MaxName)
a.plot(kind ='bar', x='Names', y = 'Births' )
plt.show()
edited for solution.
It would be nice if you'd provided a sample csv file, so I made one up, took me a while to figure out what format pandas expects.
I used a test.csv that looked like:
names,briths
mike,3
mark,4
Then my python code:
import pandas
import numpy
import matplotlib.pyplot as plt
a = pandas.read_csv('test.csv', index_col = False)
a.plot(kind='bar')
indices = numpy.arange(len(a['names']))
plt.xticks( indices+0.5, a['names'].values)
plt.show()

Categories