I'm trying to make a function that plots all a Dataframe content.
DF Sample:
print(df2)
Close Close Close
Date
2018-12-12 00:00:00-05:00 53.183998 24.440001 104.500504
2018-12-13 00:00:00-05:00 53.095001 25.119333 104.854973
2018-12-14 00:00:00-05:00 52.105000 24.380667 101.578560
2018-12-17 00:00:00-05:00 50.826500 23.228001 98.570381
2018-12-18 00:00:00-05:00 51.435501 22.468666 99.605042
Python:
fig = px.line(df2, x=df2.index, y=df2.columns[1:])
I'm trying to plot it but get this error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
My data frame may have various numbers of columns, so I need my code to plot all columns.
By the way:
print(df2.columns[1:])
Index(['Close', 'Close'], dtype='object')
Try changing the column names. I used the code above with unique column names for the above data and got the following:
Plot obtained.
Also, you can use the date column as your x-axis in the plot. Plotly will generate a timeseries chart for the same.
Related
I've got pandas DataFrame, df, with index named date and the columns columnA, columnB and columnC
I am trying to scatter plot index on a x-axis and columnA on a y-axis using the DataFrame syntax.
When I try:
df.plot(kind='scatter', x='date', y='columnA')
I ma getting an error KeyError: 'date' probably because the date is not column
df.plot(kind='scatter', y='columnA')
I am getting an error:
ValueError: scatter requires and x and y column
so no default index on x-axis.
df.plot(kind='scatter', x=df.index, y='columnA')
I am getting error
KeyError: "DatetimeIndex(['1818-01-01', '1818-01-02', '1818-01-03', '1818-01-04',\n
'1818-01-05', '1818-01-06', '1818-01-07', '1818-01-08',\n
'1818-01-09', '1818-01-10',\n ...\n
'2018-03-22', '2018-03-23', '2018-03-24', '2018-03-25',\n
'2018-03-26', '2018-03-27', '2018-03-28', '2018-03-29',\n
'2018-03-30', '2018-03-31'],\n
dtype='datetime64[ns]', name='date', length=73139, freq=None) not in index"
I can plot it if I use matplotlib.pyplot directly
plt.scatter(df.index, df['columnA'])
Is there a way to plot index as x-axis using the DataFrame kind syntax?
This is kind of ugly (I think the matplotlib solution you used in your question is better, FWIW), but you can always create a temporary DataFrame with the index as a column usinng
df.reset_index()
If the index was nameless, the default name will be 'index'. Assuming this is the case, you could use
df.reset_index().plot(kind='scatter', x='index', y='columnA')
A more simple solution would be:
df['x1'] = df.index
df.plot(kind='scatter', x='x1', y='columnA')
Just create the index variable outside of the plot statement.
At least in pandas>1.4 whats easiest is this:
df['columnA'].plot(style=".")
This lets you mix scatter and line plots, as well as use the standard pandas plot interface
I am trying to plot a column from a dataframe. There are about 8500 rows and the Assignment group column has about 70+ categories. How do I plot this visually using seaborn to get some meaningful output?
nlp_data['Assignment group'].hist(figsize=(17,7))
I used the hist() method to plot
you can use heatmap for such data
seaborn.heatmap
Trying to plot length of objects vs total count of objects using hist2d. I am getting the following error. Can you please help me in finding the error.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
count=799.000000
plt.hist2d(length,count,bins = 10)
plt.xlabel('count')
plt.ylabel('length')
plt.grid(True)
plt.show()
print(length)
1 3.978235
2 4.740024
3 3.470375
4 3.978235
5 3.808948
...
807 5.078597
808 4.655381
809 4.232164
810 4.655381
811 3.470375
Name: length_mm, Length: 799, dtype: float64
I believe the issue in you code is the use of hist2d rather than the good-old hist. With hist, you don't have to pass the number of items - it gets that from the Series:
plt.hist(length, bins = 10)
plt.xlabel('count')
plt.ylabel('length')
plt.grid(True)
plt.show()
The result (for a small amount of data) is:
If, on the other hand, you'd looking for a bar chart, here's the way to do it for length:
fig, ax = plt.subplots()
ax.bar(length.index, length)
fig.show()
The result (for limited data, of course) is:
I have the following pandas Data Frame:
and I need to make line plots using the column names (400, 400.5, 401....) as the x axis and the data frame values as the y axis, and using the index column ('fluorophore') as the label for that line plot. I want to be able to choose which fluorophores I want to plot.
How can I accomplish that?
I do not know your dataset, so if it's always just full columns of NaN you could do
df[non_nan_cols].T[['FAM', 'TET']].plot.line()
Where non_nan_cols is a list of your columns that do not contain NaN values.
Alternatively, you could
choice_of_fp = df.index.tolist()
x_val = np.asarray(df.columns.tolist())
for i in choice_of_fp:
mask = np.isfinite(df.loc[i].values)
plt.plot(x_val[mask], df.loc[i].values[mask], label=i)
plt.legend()
plt.show()
which allows to have NaN values. Here choice_of_fp is a list containing the fluorophores you want to plot.
You can do the below and it will use all columns except the index and plot the chart.
abs_data.set_index('fluorophore ').plot()
If you want to filter values for fluorophore then you can do this
abs_data[abs_data.fluorophore .isin(['A', 'B'])].set_index('fluorophore ').plot()
I have thousands of data points for two values Tm1 and Tm2 for a series of text lables of type :
Tm1 Tm2
ID
A01 51 NaN
A03 51 NaN
A05 47 52
A07 47 52
A09 49 NaN
I managed to create a pandas DataFrame with the values from csv. I now want to plot the Tm1 and Tm2 as y values against the text ID's as x values in a scatter plot, with different color dots in pandas/matplotlib.
With a test case like this I can get a line plot
from pandas import *
df2= DataFrame([52,54,56],index=["A01","A02","A03"],columns=["Tm1"])
df2["Tm2"] = [None,42,None]
Tm1 Tm2
A01 52 NaN
A02 54 42
A03 56 NaN
I want to not connect the individual values with lines and just have the Tm1 and Tm2 values as scatter dots in different colors.
When I try to plot using
df2.reset_index().plot(kind="scatter",x='index',y=["Tm1"])
I get an error:
KeyError: u'no item named index'
I know this is a very basic plotting command, but am sorry i have no idea on how to achieve this in pandas/matplotlib. The scatter command does need an x and y value but I somehow am missing some key pandas concept in understanding how to do this.
I think the problem here is that you are trying to plot a scatter graph against a non-numeric series. That will fail - although the error message you are given is so misleading that it could be considered a bug.
You could, however, explictly set the xticks to use one per category and use the second argument of xticks to set the xtick labels. Like this:
import matplotlib.pyplot as plt
df1 = df2.reset_index() #df1 will have a numeric index, and a
#column named 'index' containing the index labels from df2
plt.scatter(df1.index,df1['Tm1'],c='b',label='Tm1')
plt.scatter(df1.index,df1['Tm2'],c='r',label='Tm2')
plt.legend(loc=4) # Optional - show labelled legend, loc=4 puts it at bottom right
plt.xticks(df1.index,df1['index']) # explicitly set one tick per category and label them
# according to the labels in column df1['index']
plt.show()
I've just tested it with 1.4.3 and it worked OK
For the example data you gave, this yields: