name 'df' is not defined in box plot - python

I am trying to plot box plot with some data, but I am getting name 'df' is not defined error. Below is the code:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import os
% matplotlib inline
# load the dataset
df = pd.read_csv("C:/Users/Kumar Chandan/Downloads/indiavix.csv")
# display 5 rows of dataset
df.head()
df=df.boxplot(column =['close'], grid = False)

You need to use
df.boxplot(column =['close'], grid = False)
By putting df = at the front of that line you are assigning the output of the line (the plot) back to df which erases your dataframe.

Don't assign value again to data frame.
In place of assigning use
df.boxplot(column =['close'], grid = False)

Related

How can one create histograms with subplots according to grouped variables in seaborn?

I am attempting to create a histogram using seaborn and census data that displays 3 subplots for age composition, and I have the data grouped the way that I would like it, but I am struggling to turn that into a histogram.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
filename = "/scratch/%s_class_root/%s_class/materials/data/pums_short.csv.gz"
acs = pd.read_csv(filename)
R65_agg = acs.groupby(["R65", "PUMA"])["HINCP"]
R65_meds = R65_agg.agg(np.median).unstack()
R65_f = R65_meds.dropna()
R65_f = R65_meds.reset_index(drop = True)
I was expecting this code to give me data that I could plug into a histogram but instead of being distinct subplots, the "0.0, 1.0, 2,0" in the final variable just get added together when I apply the .describe() function. Any advice for how I can convert this into a form that's readable with the sns.histplot() function?

Box and whisker plot on multiple columns

I am trying to make a Box and Whisker plot on my dataset that looks something like this -
& the chart I'm trying to make
My current lines of code are below -
import seaborn as sns
import matplotlib.pyplot as plt
d = df3.boxplot(column = ['Northern California','New York','Kansas','Texas'], by = 'Banner')
d
Thank you
I've recreated a dummy version of your dataset:
import numpy as np
import pandas as pd
dictionary = {'Banner':['Type1']*10+['Type2']*10,
'Northen_californina':np.random.rand(20),
'Texas':np.random.rand(20)}
df = pd.DataFrame(dictionary)
What you need is to melt your dataframe (unpivot) in orther to have the information of geographical zone stored in a column and not as column name. You can use pandas.melt method and specify all the columns you want to put in your boxplot in the value_vars argument.
With my dummy dataset you can do this:
df = pd.melt(df,id_vars=['Banner'],value_vars=['Northen_californina','Texas'],
var_name='zone', value_name='amount')
Now you can apply a boxplot using the hue argument:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(9,9)) #for a bigger image
sns.boxplot(x="Banner", y="amount", hue="zone", data=df, palette="Set1")

Plotting 2 columns of a csv with matplotlib error

I am trying to make a simple bar graph out of a 2 column CSV file. One column is the x axis names, the other column is the actual data which will be used for the bars. The CSV looks like this:
count,team
21,group1
15,group2
63,group3
22,group4
42,group5
72,group6
21,group7
23,group8
24,group9
31,group10
32,group11
I am using this code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("sampleData.csv",sep=",").set_index('count')
d = dict(zip(df.index,df.values.tolist()))
df.plot.bar(x = 'count', y = 'team')
print(d)
However, I get an error
KeyError: 'count' from this line :
df.plot.bar(x = 'count', y = 'team')
I don't understand how there is an error for something that exists.
When you set the count as index, you just have a single column left in your DataFrame, i.e., team. Don't set the count as index and switch the order of x and y values for plotting the bar chart
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("sampleData.csv", sep=",")
df.plot.bar(x = 'team', y = 'count')
Matplotlib solution
plt.bar(df['team'], df['count'])
plt.xticks(rotation=45) # Just rotating for better visualizaton

Clustermapping in Python using Seaborn

I am trying to create a heatmap with dendrograms on Python using Seaborn and I have a csv file with about 900 rows. I'm importing the file as a pandas dataframe and attempting to plot that but a large number of the rows are not being represented in the heatmap. What am I doing wrong?
This is the code I have right now. But the heatmap only represents about 49 rows.
Here is an image of the clustermap I've obtained but it is not displaying all of my data.
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
# Data set
df = pd.read_csv('diff_exp_gene.csv', index_col = 0)
# Default plot
sns.clustermap(df, cmap = 'RdBu', row_cluster=True, col_cluster=True)
plt.show()
Thank you.
An alternative approach would be to use imshow in matpltlib. I'm not exactly sure what your question is but I demonstrate a way to graph points on a plane from csv file
import numpy as np
import matplotlib.pyplot as plt
import csv
infile = open('diff_exp_gene.csv')
df = csv.DictReader(in_file)
temp = np.zeros((128,128), dtype = int)
for row in data:
if row['TYPE'] == types:
temp[int(row['Y'])][int(row['X'])] = temp[int(row['Y'])][int(row['X'])] + 1
plt.imshow(temp, cmap = 'hot', origin = 'lower')
plt.show()
As far as I know, keywords that apply to seaborn heatmaps also apply to clustermap, as the sns.clustermap passes to the sns.heatmap. In that case, all you need to do in your example is to set yticklabels=True as a keyword argument in sns.clustermap(). That will make all of the 900 rows appear.
By default, it is set as "auto" to avoid overlap. The same applies to the xticklabels. See more here: https://seaborn.pydata.org/generated/seaborn.heatmap.html

Pandas index column for a dataframe

Hi all so I'm trying to work with this set of data that has two columns, one is names and the other is the number of births for each name. What I want to do is import a csv file, perform some basic functions on it such as finding the baby name with the maximum number of births, and then plotting the data in a bar graph. But, when I have an index value for the dataframe, the bar graph prints that as the x axis instead of the names. So I removed the index and now I get all kinds of errors. Below is my code, first the one with the index and then the one without. Thanks in advance. This is really driving me crazy
import pandas as pd
import matplotlib.pyplot as plt
import pdb
import matplotlib as p
import os
from pandas import DataFrame
Location = os.path.join(os.path.sep,'Users', 'Mark\'s Computer','Desktop','projects','data','births1880.csv')
a = pd.read_csv(Location, index_col = False)
print(a) #print the dataframe just to see what I'm getting.
MaxValue = a['Births'].max()
MaxName = a['Names'][a['Births'] == MaxValue].values
print(MaxValue, ' ', MaxName)
a.plot(kind ='bar')
plt.show()
This code works but spits out a bar graph with the index as the x axis instead of the names?
import pandas as pd
import matplotlib.pyplot as plt
import pdb
import matplotlib as p
import os
from pandas import DataFrame
Location = os.path.join(os.path.sep,'Users', 'Mark\'s Computer','Desktop','projects','data','births1880.csv')
a = pd.read_csv(Location, index_col = True) #why is setting the index column to true removing it?
print(a) #print the dataframe just to see what I'm getting.
MaxValue = a['Births'].max()
MaxName = a['Names'][a['Births'] == MaxValue].values
print(MaxValue, ' ', MaxName)
a.plot(kind ='bar', x='Names', y = 'Births' )
plt.show()
edited for solution.
It would be nice if you'd provided a sample csv file, so I made one up, took me a while to figure out what format pandas expects.
I used a test.csv that looked like:
names,briths
mike,3
mark,4
Then my python code:
import pandas
import numpy
import matplotlib.pyplot as plt
a = pandas.read_csv('test.csv', index_col = False)
a.plot(kind='bar')
indices = numpy.arange(len(a['names']))
plt.xticks( indices+0.5, a['names'].values)
plt.show()

Categories