plotly multiple lines chart with a varying dataframe - python

I'm trying to make a function that plots all a Dataframe content.
DF Sample:
print(df2)
Close Close Close
Date
2018-12-12 00:00:00-05:00 53.183998 24.440001 104.500504
2018-12-13 00:00:00-05:00 53.095001 25.119333 104.854973
2018-12-14 00:00:00-05:00 52.105000 24.380667 101.578560
2018-12-17 00:00:00-05:00 50.826500 23.228001 98.570381
2018-12-18 00:00:00-05:00 51.435501 22.468666 99.605042
Python:
fig = px.line(df2, x=df2.index, y=df2.columns[1:])
I'm trying to plot it but get this error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
My data frame may have various numbers of columns, so I need my code to plot all columns.
By the way:
print(df2.columns[1:])
Index(['Close', 'Close'], dtype='object')

Try changing the column names. I used the code above with unique column names for the above data and got the following:
Plot obtained.
Also, you can use the date column as your x-axis in the plot. Plotly will generate a timeseries chart for the same.

Related

How can I plot large amount of data? [duplicate]

I've got pandas DataFrame, df, with index named date and the columns columnA, columnB and columnC
I am trying to scatter plot index on a x-axis and columnA on a y-axis using the DataFrame syntax.
When I try:
df.plot(kind='scatter', x='date', y='columnA')
I ma getting an error KeyError: 'date' probably because the date is not column
df.plot(kind='scatter', y='columnA')
I am getting an error:
ValueError: scatter requires and x and y column
so no default index on x-axis.
df.plot(kind='scatter', x=df.index, y='columnA')
I am getting error
KeyError: "DatetimeIndex(['1818-01-01', '1818-01-02', '1818-01-03', '1818-01-04',\n
'1818-01-05', '1818-01-06', '1818-01-07', '1818-01-08',\n
'1818-01-09', '1818-01-10',\n ...\n
'2018-03-22', '2018-03-23', '2018-03-24', '2018-03-25',\n
'2018-03-26', '2018-03-27', '2018-03-28', '2018-03-29',\n
'2018-03-30', '2018-03-31'],\n
dtype='datetime64[ns]', name='date', length=73139, freq=None) not in index"
I can plot it if I use matplotlib.pyplot directly
plt.scatter(df.index, df['columnA'])
Is there a way to plot index as x-axis using the DataFrame kind syntax?
This is kind of ugly (I think the matplotlib solution you used in your question is better, FWIW), but you can always create a temporary DataFrame with the index as a column usinng
df.reset_index()
If the index was nameless, the default name will be 'index'. Assuming this is the case, you could use
df.reset_index().plot(kind='scatter', x='index', y='columnA')
A more simple solution would be:
df['x1'] = df.index
df.plot(kind='scatter', x='x1', y='columnA')
Just create the index variable outside of the plot statement.
At least in pandas>1.4 whats easiest is this:
df['columnA'].plot(style=".")
This lets you mix scatter and line plots, as well as use the standard pandas plot interface

Using seaborn how do I plot a column which has 70+ categories

I am trying to plot a column from a dataframe. There are about 8500 rows and the Assignment group column has about 70+ categories. How do I plot this visually using seaborn to get some meaningful output?
nlp_data['Assignment group'].hist(figsize=(17,7))
I used the hist() method to plot
you can use heatmap for such data
seaborn.heatmap

unable to plot histogram(hist2d)

Trying to plot length of objects vs total count of objects using hist2d. I am getting the following error. Can you please help me in finding the error.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
count=799.000000
plt.hist2d(length,count,bins = 10)
plt.xlabel('count')
plt.ylabel('length')
plt.grid(True)
plt.show()
print(length)
1 3.978235
2 4.740024
3 3.470375
4 3.978235
5 3.808948
...
807 5.078597
808 4.655381
809 4.232164
810 4.655381
811 3.470375
Name: length_mm, Length: 799, dtype: float64
I believe the issue in you code is the use of hist2d rather than the good-old hist. With hist, you don't have to pass the number of items - it gets that from the Series:
plt.hist(length, bins = 10)
plt.xlabel('count')
plt.ylabel('length')
plt.grid(True)
plt.show()
The result (for a small amount of data) is:
If, on the other hand, you'd looking for a bar chart, here's the way to do it for length:
fig, ax = plt.subplots()
ax.bar(length.index, length)
fig.show()
The result (for limited data, of course) is:

Plot pandas dataframe using column names as x axis

I have the following pandas Data Frame:
and I need to make line plots using the column names (400, 400.5, 401....) as the x axis and the data frame values as the y axis, and using the index column ('fluorophore') as the label for that line plot. I want to be able to choose which fluorophores I want to plot.
How can I accomplish that?
I do not know your dataset, so if it's always just full columns of NaN you could do
df[non_nan_cols].T[['FAM', 'TET']].plot.line()
Where non_nan_cols is a list of your columns that do not contain NaN values.
Alternatively, you could
choice_of_fp = df.index.tolist()
x_val = np.asarray(df.columns.tolist())
for i in choice_of_fp:
mask = np.isfinite(df.loc[i].values)
plt.plot(x_val[mask], df.loc[i].values[mask], label=i)
plt.legend()
plt.show()
which allows to have NaN values. Here choice_of_fp is a list containing the fluorophores you want to plot.
You can do the below and it will use all columns except the index and plot the chart.
abs_data.set_index('fluorophore ').plot()
If you want to filter values for fluorophore then you can do this
abs_data[abs_data.fluorophore .isin(['A', 'B'])].set_index('fluorophore ').plot()

Pandas plot dataframe as scatter complains of unknown item

I have thousands of data points for two values Tm1 and Tm2 for a series of text lables of type :
Tm1 Tm2
ID
A01 51 NaN
A03 51 NaN
A05 47 52
A07 47 52
A09 49 NaN
I managed to create a pandas DataFrame with the values from csv. I now want to plot the Tm1 and Tm2 as y values against the text ID's as x values in a scatter plot, with different color dots in pandas/matplotlib.
With a test case like this I can get a line plot
from pandas import *
df2= DataFrame([52,54,56],index=["A01","A02","A03"],columns=["Tm1"])
df2["Tm2"] = [None,42,None]
Tm1 Tm2
A01 52 NaN
A02 54 42
A03 56 NaN
I want to not connect the individual values with lines and just have the Tm1 and Tm2 values as scatter dots in different colors.
When I try to plot using
df2.reset_index().plot(kind="scatter",x='index',y=["Tm1"])
I get an error:
KeyError: u'no item named index'
I know this is a very basic plotting command, but am sorry i have no idea on how to achieve this in pandas/matplotlib. The scatter command does need an x and y value but I somehow am missing some key pandas concept in understanding how to do this.
I think the problem here is that you are trying to plot a scatter graph against a non-numeric series. That will fail - although the error message you are given is so misleading that it could be considered a bug.
You could, however, explictly set the xticks to use one per category and use the second argument of xticks to set the xtick labels. Like this:
import matplotlib.pyplot as plt
df1 = df2.reset_index() #df1 will have a numeric index, and a
#column named 'index' containing the index labels from df2
plt.scatter(df1.index,df1['Tm1'],c='b',label='Tm1')
plt.scatter(df1.index,df1['Tm2'],c='r',label='Tm2')
plt.legend(loc=4) # Optional - show labelled legend, loc=4 puts it at bottom right
plt.xticks(df1.index,df1['index']) # explicitly set one tick per category and label them
# according to the labels in column df1['index']
plt.show()
I've just tested it with 1.4.3 and it worked OK
For the example data you gave, this yields:

Categories