I am trying to plot a large set of values against time. My dataset spans over 46 days and includes data for every second of the day. Since the plots are incomprehensible when plotted directly, I tried to group them. the groupby function in pandas works fine as long as one needs to find some aggregates or summary statistics. I tried the following command, but it just gives a blop on the plot and does not do what I want it to.
df1 = df.groupby(pd.Grouper(key='time', freq='7D'))['values']
Is there a way to group the data according to a column and then add it in a new column?
I also tried plots after making time the index, but that also does not help.
Related
I am trying to visualize measured data using Pyplot.
The data is stored in a dataframe from Pandas like this:
Dataframe
So I want to plot the outputs over the respective date and time.
Just plotting it works great, however there is an issue that I cannot fix.
Sometimes the measurements are stopped and resumed at later time. That means that during some measurements there is a gap in the date (e.g. 1 or 2 days skipped).
When I now plot the above dataframe, there is a large region that is skipped due to no data being present.
Graph with gap
Is there any way to change that, so the gap is closed and all the data is presented in a better way?
I'm trying to create a basic scatterplot with pandas. I have two dataframes with identical structures, column numbers, headers etc. The only difference is that on DF has a couple of dozen rows, and the other has several thousand rows. When I create a scattplot for the first DF I get this, exactly what I want:
But when I enter the same commands for the second, larger dataframe, it comes out wonky (for lack of a better term), specifically the y-axis. It is also not showing all of the data points:
I've messed around with the headings, limits, etc., and I can't get the second scatterplot to work. Any ideas? I'm wondering if there's some property of the second dataframe that I cannot figure out that's throwing it off...
i'm pretty new about using spotfire and I've to realize some bar and line charts like those graphs I realized with python and matplotlib :
bar chart from python
line chart from python
In order to realize those graphs, I created a set of unique values for the x axis which contains differents sprints of stories (refer to jira and agile method for more informations) and i created 3 lists (begin, planned and ends) for gathering all of the business values for each sprint occurence. Then I created a pandas dataframe gathering my 4 new lists and I used the columns with matplolib to realize the graphs (the second graph shows the cumulative sum of begin and end business values per sprints).
My question is : is it possible to create a list of unique values for the x axis in spotfire and how to create a data table from another data table, just like I did for the python graphs ? All I can get for the moment by using spotfire is this :
bar and line charts from spotfire
I already tried to merge each graphs of the same category together in order to get the same result however the two x axis (begin and ends) do not have the same number of values and i get some errors if i try. If anyone had a solution that can solve that problem, it would be great.
PS : I can't give any data files cause i'm working for a society and some of those data could be confidential and sorry for potential clerical errors, cause i'm french.
I have tried to create a scatter with grouped boxplots as the ones on the following links:
matplotlib: Group boxplots
https://cmdlinetips.com/2019/03/how-to-make-grouped-boxplots-in-python-with-seaborn/
how to make a grouped boxplot graph in matplotlib
However, the data I want to use comes in a format as:
5y_spreads
7y_spreads
10y_spreads
(each of the images above comes from a different worksheet in the same workbook)
I need to work the data in Python to make it ready for seaborn and that is what is difficult for me.
It is not structured as in the examples from the links. I understand this requires mastering dataframes (something I am still learning).
I also need to show the latest value to see where the bonds are trading now, compared to the range.
currently, I am working in a project and would like to plot data from a logger in a daily basis. The format of the written output is a .csv file and contains in a column the Date/Time stamp
ex: 2018-10-15 10:00. In the other columns, there is just data in float format. I get the written stamp automatically in 10mins interval from 00:00 until 23:50.
I am looking to analyze the data and group it by days*using groupby() and further on compute mean and deviations of the day. I want to plot the mean and std_deviation data for several years as scatter or line graph. The major ticks are years or months with days as minor ticks.
On a daily basis I want to compare the variation of the mean within a certain month and plot against the entire time interval with hours as major ticks and every 10mins intervals as minor ticks. I want to be able to put this in a for loop if possible.
To be honest I've tried a lot of different possibilities but I can't achieve everything with only one. If I could, I would try not to use set_index() to be the Date/Time column, so it is easier to apply the group. I am using the Pandas module to achieve my whole analysis for convenience.
I would be really happy for any guidance.
Thanks you very much!!!!!
Just a couple of pointers:
When reading the csv with pd.read_csv (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) you can specify which columns contain date/times:
df = pd.read_csv('myfile.csv', parse_dates=['date'])
Then you can use .dt to access date/time specific features, see: https://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties
So you can add a column with only day numbers, like:
df['day'] = df['date'].dt.dayofyear
Then you can group by this new column..