I'm a new to data science and I'm looking for an approach to plot data over time and i don't know where to start.
Right now I have a SQLlite database containing datapoints consisting of a name, a timestamp
and a value and it looks like this
I can change this database to a csv file my shiny app if necessary
My goal is to plot the values by "name" over time(the one in the timestamps)
the problem I have is that the values have a different frequency and a different start/end time.
I'm looking for an approach in either R or Python but I Prefer R
i experimented with the R plot function but i get errors like:
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
Is there a library or approach that can help me achieve this goal?
Related
This seems like a trivial thing to solve, but every idea I have is very hacky. I have a series of timestamps that spans multiple days. What I'm interested in, is the distribution of these timestamps (events) within 24h: e.g., see whether there are more events in the morning.
My data is in a pandas.DataFrame, and the timestamp column has dtype datetime[ns]. I tried matplotlib.pyplot.plot(data.timestamp.dt.time), but that gives an error. I also thought of subtracting the data from my timestamps so they all start on 'day 0', and formatting the X-axis in the plot to not show the date. Feels very clumsy. Is there a better way?
If you are interested in distribution with resolution limited to e.g.
hours, you can:
Create a new column with extracted hour from your source timestamp.
Group your data by hour.
Generate your plot.
As you failed to post any sample data, I'm not able to post any code.
I have imported data from excel to python and now want to draw multiple plots on a single figure but for that I will need separate variables like 'x' & 'y' etc because we know that plt.plot(x,y), basically I have two datasets in which I am doing Time series analysis. In first data set I have Monthly+Yearly data in which I combined both columns and formed one column having name Year-Month, In second dataset I have Daily+Yearly data in which I formed one column by merging both and named it as Year-Daily. Now the dependent variable in both datasets is the number of sunspots.
Now I want to Plot Daily and Monthly sunspot numbers on a single Graph in Python, so how will I do that?
What is the library that are you using to import the data?
I've got a dataset with multiple time values as below.
Area,Year,Month,Day of Week,Time of Day,Hour of Day
x,2016,1,6.0,108,1.0
z,2016,1,6.0,140,1.0
n,2016,1,6.0,113,1.0
p,2016,1,6.0,150,1.0
r,2016,1,6.0,158,1.0
I have been trying to transform this into a single datetime object to simplify the dataset and be able to do proper time series analysis against it.
For some reason I have been unable to get the right outcome using the datetime library from Python. Would anyone be able to point me in the right direction?
Update - Example of stats here.
https://data.pa.gov/Public-Safety/Crash-Incident-Details-CY-1997-Current-Annual-Coun/dc5b-gebx/data
I don't think there is a week column. Hmm. I wonder if I've missed something?
Any suggestions would be great. Really just looking to simplify this dataset. Maybe even create another table / sheet for the causes of crash, as their's a lot of superfluous columns that are taking up a lot of data, which can be labeled with simple ints.
I've tried searching for an example of how I might solve this but can't seem to find a specific example that matches my needs.
I'm trying to create multiple separate dataframes (up to 80 - depending on the data that I have) from one large dataframe (using the value in a column s the "grouper". I have records for multiple patient "types" (where patient type is a column variable) and want to create separate dataframes for each of these types.
The reason I want to do this is I want to plot separate kaplan meier survival curves for each of these dataframes. I've tried doing this using subplots - but there are too many different patient types to do in a series of sub-plots (the subplots end up looking too small).
I'm new to Python so apologies if this is a silly question...and thanks in advance for any suggestions.
I have two GPX files (from a race I ran twice, obtained via the Strava API) and I would like to be able to compare the effort across both. The sampling frequency is irregular however (i.e. data is not recorded every second, or every meter), so a straightforward comparison is not possible and I would need to standardize the data first. Preferably, I would resample the data so that I have data points for every 10 meters for example.
I'm using Pandas, so I'm currently standardizing a single file by inserting rows for every 10 meters and interpolating the heartrate, duration, lat/lng, etc from the surrounding data points. This works, but doesn't make the data comparable across files, as the recording does not start at the exact same location.
An alternative is first standardizing the course coordinates using something like geohashing and then trying to map both efforts to this standardized course. Since coordinates can not be easily sorted, I'm not sure how to do that correctly however.
Any pointers are appreciated, thanks!