I have a very simple data frame but I could not plot a line using a row and a column. Here is an image, I would like to plot a "line" that connects them.
enter image description here
I tried to plot it but x-axis disappeared. And I would like to swap those axes. I could not find an easy way to plot this simple thing.
Try:
import matplotlib.pyplot as plt
# Categories will be x axis, sexonds will be y
plt.plot(data["Categories"], data["Seconds"])
plt.show()
Matplotlib generates the axis dynamically, so if you want the labels of the x-axis to appear you'll have to increase the size of your plot.
Related
I am plotting 100 data points for 9 different groups. One group's data points are much larger than all the other groups so when I make a box graph using pandas only that group is shown, while all other groups are smashed to the bottom. Here is what it looks like now: smushed box plot
I would like the Y axis to be more spaced out so that I can see the other groups' box graphs. Here is similar data in a scatter plot that has the spacing I am looking for: well spaced scatter plot
What I have
What is need
Here is my code at the moment:
# use ``` to designate a code block in markdown
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("residues.csv")
df.plot.box()
plt.show()
It looks like you want y to be log-scaled:
df.plot.box(logy=True)
Try this:
boxplot = df.boxplot(column=df.columns)
plt.show()
Reference
See the pandas documentation on boxplot: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.html
I've got a pandas dataframe with a bunch of values in and I want to plot each axis against each axis to get plots of every column against one another. Furthermore, I'm having an issue of the values of my y axis being so condensed that's it's unreadable. I've tried changing the height but have no clue how to "clean up" this axis.
Here is my plotting code:
import seaborn as sns
grid = sns.pairplot(df_merge, dropna = True, height=1.5)
Then here is the graph that has been plotted.
I am new to python and trying to plot a color magnitude diagram(CMD) for a selected cluster by matplotlib, there are 3400000 stars that I need to plot, the data for each star would be color on x axis and magnitude on y axis, However, my code should read two columns in a csv file and plot. The problem is when I using a part of the data (3000 stars), I can plot a CMD succesfully but when I use all the data, the plot is very mess(see figure below) and it seems that points are ploted by their positions in the column instead of its value. For example, a point has data (0.92,20.64) should be close to the y-axis, but is actually located at the far right of the plot just becasue it placed at last few columns of the dataset. So I wanna know how can I plot the entire dataset and show a plot like the first figure.Thanks for yout time. These are my codes:
import matplotlib.pyplot as plt
import pandas as pd
import csv
data = pd.read_csv(r'C:\Users\Peter\Desktop\F275W test.csv', low_memory=False)
# Generate some test data
x = data['F275W-F336W']
y = data['F275W']
#remove the axis
plt.axis('off')
plt.plot(x,y, ',')
plt.show()
This is the plot I got for 3000 stars it's a CMD
This is the plot I got for entire dataset, which is very mess
This may be a very stupid question, but when plotting a Pandas DataFrame using .plot() it is very quick and produces a graph with an appropriate index. As soon as I try to change this to a bar chart, it just seems to lose all formatting and the index goes wild. Why is this the case? And is there an easy way to just plot a bar chart with the same format as the line chart?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['Date'] = pd.date_range(start='01/01/2012', end='31/12/2018')
df['Value'] = np.random.randint(low=5, high=100, size=len(df))
df.set_index('Date', inplace=True)
df.plot()
plt.show()
df.plot(kind='bar')
plt.show()
Update:
For comparison, if I take the data and put it into Excel, then create a line plot and a bar ('column') plot it instantly will convert the plot and keep the axis labels as they were for the line plot. If I try to produce many (thousands) of bar charts in Python with years of daily data, this takes a long time. Is there just an equivalent way of doing this Excel transformation in Python?
Pandas bar plots are categorical in nature; i.e. each bar is a separate category and those get their own label. Plotting numeric bar plots (in the same manner a line plots) is not currently possible with pandas.
In contrast matplotlib bar plots are numerical if the input data is numbers or dates. So
plt.bar(df.index, df["Value"])
produces
Note however that due to the fact that there are 2557 data points in your dataframe, distributed over only some hundreds of pixels, not all bars are actually plotted. Inversely spoken, if you want each bar to be shown, it needs to be one pixel wide in the final image. This means with 5% margins on each side your figure needs to be more than 2800 pixels wide, or a vector format.
So rather than showing daily data, maybe it makes sense to aggregate to monthly or quarterly data first.
The default .plot() connects all your data points with straight lines and produces a line plot.
On the other hand, the .plot(kind='bar') plots each data point as a discrete bar. To get a proper formatting on the x-axis, you will have to modify the tick-labels post plotting.
I am trying to create two interactive plots where the first plot is simply a plot of x and y and the second plot is a subplot which plots dates (fulldate) on its x axis, which correspond to the integer values of x (x axis values) from the first plot.
This code almost does what I want. The only problem is that the dates are not linked to the integers, so when I use the zoom function on the graph, it zooms into the first plot and the subplot is linked and zooms also, but the dates stay stationary and therefore are completely inaccurate.
note that this is just a simplified version of my program. i will be rearranging the dates on the bottom display to got on the bottom.
The integers and the dates must be linked because in my actual program i will be using integers to keep count of the days in the time series.
import matplotlib.pyplot as plt
import seaborn as sns
x=[1,5,7,4,6]
y=[1,3,8,4,6]
fulldate=['01/01/2018','02/01/2018','03/01/2018','04/01/2018','05/01/2018']
with sns.axes_style("darkgrid"):
ax1=plt.subplot2grid((6,1),(0,0),rowspan=3,colspan=1)
ax2=plt.subplot2grid((6,1),(4,0),rowspan=1,colspan=1,sharex=ax1)
ax2v = ax2.twiny()
ax1.plot(x,y)
ax2v.fill_between(fulldate,'Dates')
for label in ax2v.xaxis.get_ticklabels():
label.set_rotation(60)