Pandas index column for a dataframe - python

Hi all so I'm trying to work with this set of data that has two columns, one is names and the other is the number of births for each name. What I want to do is import a csv file, perform some basic functions on it such as finding the baby name with the maximum number of births, and then plotting the data in a bar graph. But, when I have an index value for the dataframe, the bar graph prints that as the x axis instead of the names. So I removed the index and now I get all kinds of errors. Below is my code, first the one with the index and then the one without. Thanks in advance. This is really driving me crazy
import pandas as pd
import matplotlib.pyplot as plt
import pdb
import matplotlib as p
import os
from pandas import DataFrame
Location = os.path.join(os.path.sep,'Users', 'Mark\'s Computer','Desktop','projects','data','births1880.csv')
a = pd.read_csv(Location, index_col = False)
print(a) #print the dataframe just to see what I'm getting.
MaxValue = a['Births'].max()
MaxName = a['Names'][a['Births'] == MaxValue].values
print(MaxValue, ' ', MaxName)
a.plot(kind ='bar')
plt.show()
This code works but spits out a bar graph with the index as the x axis instead of the names?
import pandas as pd
import matplotlib.pyplot as plt
import pdb
import matplotlib as p
import os
from pandas import DataFrame
Location = os.path.join(os.path.sep,'Users', 'Mark\'s Computer','Desktop','projects','data','births1880.csv')
a = pd.read_csv(Location, index_col = True) #why is setting the index column to true removing it?
print(a) #print the dataframe just to see what I'm getting.
MaxValue = a['Births'].max()
MaxName = a['Names'][a['Births'] == MaxValue].values
print(MaxValue, ' ', MaxName)
a.plot(kind ='bar', x='Names', y = 'Births' )
plt.show()
edited for solution.

It would be nice if you'd provided a sample csv file, so I made one up, took me a while to figure out what format pandas expects.
I used a test.csv that looked like:
names,briths
mike,3
mark,4
Then my python code:
import pandas
import numpy
import matplotlib.pyplot as plt
a = pandas.read_csv('test.csv', index_col = False)
a.plot(kind='bar')
indices = numpy.arange(len(a['names']))
plt.xticks( indices+0.5, a['names'].values)
plt.show()

Related

name 'df' is not defined in box plot

I am trying to plot box plot with some data, but I am getting name 'df' is not defined error. Below is the code:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import os
% matplotlib inline
# load the dataset
df = pd.read_csv("C:/Users/Kumar Chandan/Downloads/indiavix.csv")
# display 5 rows of dataset
df.head()
df=df.boxplot(column =['close'], grid = False)
You need to use
df.boxplot(column =['close'], grid = False)
By putting df = at the front of that line you are assigning the output of the line (the plot) back to df which erases your dataframe.
Don't assign value again to data frame.
In place of assigning use
df.boxplot(column =['close'], grid = False)

Plotting 2 columns of a csv with matplotlib error

I am trying to make a simple bar graph out of a 2 column CSV file. One column is the x axis names, the other column is the actual data which will be used for the bars. The CSV looks like this:
count,team
21,group1
15,group2
63,group3
22,group4
42,group5
72,group6
21,group7
23,group8
24,group9
31,group10
32,group11
I am using this code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("sampleData.csv",sep=",").set_index('count')
d = dict(zip(df.index,df.values.tolist()))
df.plot.bar(x = 'count', y = 'team')
print(d)
However, I get an error
KeyError: 'count' from this line :
df.plot.bar(x = 'count', y = 'team')
I don't understand how there is an error for something that exists.
When you set the count as index, you just have a single column left in your DataFrame, i.e., team. Don't set the count as index and switch the order of x and y values for plotting the bar chart
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("sampleData.csv", sep=",")
df.plot.bar(x = 'team', y = 'count')
Matplotlib solution
plt.bar(df['team'], df['count'])
plt.xticks(rotation=45) # Just rotating for better visualizaton

Generating simple plot from csv using Pandas in Python

I'm trying to make a graph of the first column ('Time') of a csv file plotted against the the second column ('Bid').
Here's what I have so far.
import pandas as pd
import datetime
import csv
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
headers = ['Time','Bid','Ask']
df = pd.read_csv('quotes_format.csv')
x = df['Time']
y = df['Bid']
plt.plot(x,y)
plt.gcf().autofmt_xdate()
plt.show()
The csv file looks something like this
This fails and returns exit code 1. How would I fix this so it would generate the graph I'm looking for?
You can specify what the names of each column in the dataframe are with the parameter names.
headers = ['Time','Bid','Ask']
df = pd.read_csv('quotes_format.csv', names=headers)
Here is the documentation for the pandas read_csv function.

matplotlib dataframe x axis date issue

import sys
import ConfigParser
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as DT
import bokeh
sys.path.extend(['..\..\myProj\SOURCE'])
fullfilepath = "../../myProj/SOURCE/" + 'myparts.txt'
ohg_df = pd.read_csv(fullfilepath, sep="\t" )
temp_df = temp_df[['as_on_date', 'ohg_qty']]
temp_df = temp_df.sort(['as_on_date'], ascending=[1])
temp_df.set_index('as_on_date')
plt.plot(temp_df.index, temp_df.ohg_qty)
plt.show()
This is my dataframe after importing.
I am trying to plot the line graph with x axis as date mentioned in the dataframe.
Can someone guide me... I am new to pandas.
dataframe picture
output pitcure
Easier:
# Set index directly
ohg_df = pd.read_csv(fullfilepath, sep="\t", index='as_on_date')
# Convert string index to dates
ohg_df.index = pd.to_datetime(ohg_df.index)
# Get a column and plot it (taking a column keeps the index)
plt.plot(ohg_df.ohg_qty)

pd.DataFrame.boxplot not compatible with pd.cut

sample code:
import pandas as pd
import numpy as np
sample = pd.DataFrame({"a":[1,2,3,1,2,3,1,2,3], "b":np.random.uniform(0,1,9)})
sample.boxplot(column="b", by=pd.cut(sample.a, bins=2))
Apart from the box plot picture, some text appears around the plot. How can I remove the text from the plot?
You can try create new column c by cut, because in DataFrame.boxplot parameter by can be column:
by : string or sequence
Column in the DataFrame to group by
import pandas as pd
import numpy as np
sample = pd.DataFrame({"a":[1,2,3,1,2,3,1,2,3], "b":np.random.uniform(0,1,9)})
sample['c'] = pd.cut(sample.a, bins=2)
sample.boxplot(column="b", by='c')

Categories