I am trying to plot random rows in a dataset, where the data consists of data collated across different dates. I have plotted it in such a way that the x-axis is labelled for the specific dates, and there is no interpolation between dates.
The issue I am having, is that the values plotted by matplotlib, do not match the entry values in the dataset. I am unsure as to what is happening here, would anyone be able to provide some insight, and possibly as to how I would fix it?
I have attached an image of the dataset and the plot, with the code contained below.
The code for generating the x-ticks, is as follows:
In: #creating a flat dates object such that dates are integer objects
flat_Dates_dates = flat_Dates[2:7]
flat_Dates_dates
Out: [20220620, 20220624, 20220627, 20220701, 20220708]
In: #creating datetime object(pandas, not datetime module) to only plot specific dates and remove interpolation of dates
date_obj_pd = pd.to_datetime(flat_Dates_dates, format=("%Y%m%d"))
Out: DatetimeIndex(['2022-06-20', '2022-06-24', '2022-06-27', '2022-07-01',
'2022-07-08'],
dtype='datetime64[ns]', freq=None)
As you can see from the dataset, the plotted trends should not take that form, the data values are wildly different from where they should be on the graph.
Edit: Apologies, I forgot to mention x = date_obj_pd - which is why I added the code, essentially just the array of datetime objects.
y is just the name of the pandas DataFrame (data table) I have included in the image.
You are plotting columns instead of rows. The blue line contains elements 1:7 from the first column, namely these:
If you transpose the dataframe you should get the desired result:
plt.plot(x, y[1:7].transpose(), 'o--')
Related
I have got the below plot of temperature in a time series dates aggregated hourly.
What I am trying to do is to interpolate the missing values between 2019 and 2020, using pandas pd.interpolate, and generate results hourly (same frequency as the rest of the data in weather_data). My data is called weather_data, the index column is called date_time (dtype is float64) and the temperature column has also got float64 as the dtype. Here is what I have tried:
test = weather_datetime_index.temperature.interpolate("cubicspline")
test.plot()
This gave the same plot. I also tried (based on this post):
interpolated_temp = weather_datetime_index["temperature"].astype(float).interpolate(method="time")
still gave the same plot.
I also tried (as per this post):
test = weather_datetime_index.temperature.interpolate("spline",limit_direction="forward", order=1)
test.plot()
but still gave me the same plot.
How can I interpolate this data using pd.interpolate?
I'm trying to plot a multi line-graph plot from a pandas dataframe using seaborn. Below is a .csv of the of the data and the desired plot. In excel I simply selected the whole dataset and swapped the axis. Technically there are 110 lines (rows) on this, but many aren't visible because they only contain 0's.
This is my code:
individual_burst_data = {'nb001':nb001, 'nb002':nb002, 'nb003':nb003, 'nb004':nb004, 'nb005':nb005, 'nb006':nb006, 'nb007':nb007, 'nb008':nb008, 'nb009':nb009, 'nb010':nb010, 'nb011':nb011, 'nb012':nb012, 'nb013':nb013, 'nb015':nb015, 'nb016':nb016 }
ibd_panda_conv = pd.DataFrame(individual_burst_data)
sns.lineplot(data = ibd_panda_conv, x = individual_burst_data, y =ibd_panda_conv)
Other sources seem to only extract one column, whereas I need all the columns.
I tried to create an index for the y-axis
index_data = list(range(0,len(individual_burst_data)))
but this didn't work either.
The seaborn lineplot() documentation says:
Passing the entire wide-form dataset to data plots a separate line for each column
Since you want a line for each row instead, you need to transpose your dataframe, so try this:
sns.lineplot(data=ibd_panda_conv.T, dashes=False)
The goal is to plot the data frame I'm working with on a single chart, with a line for each value of init_population where the y-axis is count and x-axis is tick_number.
I've figured out how to use groupby() and plot() together to make this:
As you can see, all the lines are there nicely, but I'm pretty confident that the blue at the top that doesn't follow the relationship the other lines are following is actually a different column of data.
So that this is reproducible, the data is available here.
import pandas as pd
max_runs_data = pd.read_csv('clean_table.csv')
del max_runs_data['visualization']
max_runs_data.columns = ['run_number','init_population', 'tick', 'turtle_count']
max_runs_data.set_index('tick', inplace = True)
test_plot_1 = max_runs_data.groupby('init_population')['turtle_count'].plot()
test_plot_2 = max_runs_data.groupby('init_population').plot(y='turtle_count')
test_plot_1 is the linked image, test_plot_2 is a separate plot for each group.
Is it obvious how to specify the columns for x and y without losing the grouping on a single chart?
Thanks
I have a situation with a bunch of datafiles, these datafiles have a number of samples in a given time frame that depends on the system. i.e. At time t=1 for instance I might have a file with 10 items, or 20 items, at later times in that file I will always have the same number of items. The format is time, x, y, z in columns, and loaded into a numpy array. The time values show which frame, but as mentioned there's always the same, let's go with 10 as a sample. So I'll have a (10,4) numpy array where the time values are identical, but there are many frames in the file, so lets say 100 frames, so really I have (1000,4). I want to plot the data with time on the x-axis and manipulations of the other data on the y, but I am unsure how to do this with line plot methods in matplotlib. Normally to provide both x,y values I believe I need to do a scatter plot, so I'm hoping there's a better way to do this. What I ideally want is to treat each line that has the same time code as a different series (so it will colour differently), and the next bit of data for that same line number in the next frame (time value) will be labelled the same colour, giving those good contiguous lines. We can look at the time column and figure out how many items share a time code, let's call it "n". Sample code:
a = numpy.loadtxt('sampledata.txt')
plt.plot(a[:0,:,n],a[:1,:1])
plt.show()
I think this code expresses what I'm going for, though it doesn't work.
Edit:
I hope this is what you wanted.
seaborn scatterplot can categorize data to some groups which have the same codes (time code in this case) and use the same colors to them.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r"E:\Programming\Python\Matplotlib\timecodes.csv",
names=["time","x","y","z","code"]) #use your file
df["time"]=pd.to_datetime(df["time"]) #recognize the data as Time
df["x"]=df["time"].dt.day # I changed the data into "Date only" and imported to x column. Easier to see on graph.
#just used random numbers in y and z in my data.
sns.scatterplot("x", "y", data = df, hue = "code") #hue does the grouping
plt.show()
I used csv file here but you can do to your text file as well by adding sep="\t" in the argument. I also added a code in the file. If you have it the code can group the data in the graph, so you don't have to separate or make a hierarchical index. If you want to change colors or grouping please see seaborn website.
Hope this helps.
Alternative, the method I used, but Tim's answer is still accurate as well. Since the time codes are not date/time information I modified my own code to add tags as a second column I call "p" (they're polymers).
import numpy as np
import pandas as pd
datain = np.loadtxt('somefile.txt')
df = pd.DataFrame(data = datain, columns = ["t","p","x","y","z"])
ax = sns.scatterplot("t","x", data = df, hue = "p")
plt.show()
And of course the other columns can be plotted similarly if desired.
I have data of shipping dates (1=Jan, 2=Feb ect..) and revenue corresponding to it in a pandas dataframe.
Data Frame Here
My code for the line graph that I am trying to make is:
finalhelp.plot(x='shippeddate',y='revenue',title='Revenue Per Month')
It returns a line graph like this
linegraph
I tried to fix it by using the code
fig = finalhelp.plot(x='shippeddate',y='revenue',title='Revenue Per Month',yticks=([0,20000,40000,60000,80000,100000]), legend=False,)
fig.set_xticklabels(['','Jan','Feb','March','April','May','June','July','August','Sept','Oct','Nov','Dec'])
I would like to find a way to set each of the x axis to one of the corresponding months, right now it still returns only Jan-June.
It returns this image
newlinegraph
You need to set_xticks and set_xticklabels:
fig.set_xticks(df['shippeddate'])
fig.set_xticklabels(['Jan','Feb','March','April','May','June','July','August','Sept','Oct','Nov','Dec'])