How to remove day from datetime index in pandas? - python

The idea behind this question is, that when I'm working with full datetime tags and data from different days, I sometimes want to compare how the hourly behavior compares.
But because the days are different, I can not directly plot two 1-hour data sets on top of each other.
My naive idea would be that I need to remove the day from the datetime index on both sets and then plot them on top of each other. What's the best way to do that?
Or, alternatively, what's the better approach to my problem?

This may not be exactly it but should help you along, assuming ts is your timeseries:
hourly = ts.resample('H')
hourly.index = pd.MultiIndex.from_arrays([hourly.index.hour, hourly.index.normalize()])
hourly.unstack().plot()
If you don't care about the day AT ALL, just hourly.index = hourly.index.hour should work

Related

Is it possible to manipulate negative time values in Python?

Is there a way to generate negative time values in Python?
I want to generate a time range ranging from -4 minutes to a variable positive time (between 5 to 10 min), something like this:
import datetime
import pandas as pd
time_range = range(-datetime.time(minute=4), datetime.time(minute=5))
# or
time_range = pd.date_range(-datetime.time(minute=4), datetime.time(minute=5))
But datetime does not seem to support negative values.
I need it to generate a graph like the following one but with a time/datetime index instead of integer values (A time/datetime index is especially useful on a plotly graph as it gives a readable index at any zoom level)
In addition, I believe that the possibility to generate negative time values could have many other applications.
datetime.time doesn't accept negative values
Maybe you can try to do something with timedelta
from datetime import timedelta
delta = timedelta(minutes=-4)
I hope this clue will help you.
Saying “thanks” is appreciated, but it doesn’t answer the question. Instead, vote up the answers that helped you the most! If these answers were helpful to you, please consider saying thank you in a more constructive way – by contributing your own answers to questions your peers have asked here.

Pandas datetime64 problem (datetime introduces spikes in data)

This is my first question on stackoverflow, so be kind :)
I work with imported csv files and pandas and really liked the pandas datetime possibilities to work and filter dataframes. But i have serious problems with plotting the data in a neat way when using dates as datetime64. Either when using pandas plots or seaborn plots.
my csv looks like this:
date time Flux_ConNT_C Flux_ConB1 Flux_ConB2 Flux_ConB3 Flux_ConB4 Flux_ConB4
0 01.01.2015 00:30 2.552032129 2.193558665 1.0093326 1.013124869 1.159512896 1.159512896
1 01.01.2015 01:00 2.553308464 2.195533756 1.01003938 1.013935693 1.160672989 1.160672989
2 01.01.2015 01:30 2.554585438 2.197510626 1.010746655 1.014747166 1.161834243 1.161834243
3 01.01.2015 02:00 2.55586305 2.199489276 1.011454426 1.015559289 1.162996658 1.162996658
4 01.01.2015 02:30 2.557141301 2.201469707 1.012162692 1.016372061 1.164160236 1.164160236
when I plot the data with
df.plot(figsize=(15,8))
my output is right output
but when I change the "date time" column to 'datetime64 with
df['date time'] = pd.to_datetime(df['date time'])
and use the same code to plot, the data is plotted with these spikes and its not usable false output
There seems to be a problem with matplotlib, but i can't find anything else than putting register_matplotlib_converters() before the plot, which doesn't change anything.
I'm working with Spyder IDE and Python 3.7 and all libraries are up to date.
Thanks for your help!
Your problem is no miracle, it's simply not reproduciable.
Are you sure your csv doesn't have a header for the first index column 0..4?
Are you sure in the csv column 8 is a duplicate of column 7?
How did you actually import this csv and construct your dataframe?
The first plot only works after replaceing the range index 0..4 by the "date time" column. What other transformations did you apply to the dataframe before calling the plot method?
Your to_datetime conversion only works on a column, not an index. Why don't you share all the code that you've been using?
In the 2 plots the first 5 rows don't don't differ. Why don't you share the data rows that are actually different in the 2 plots?
I will give you credit for trying to abstract the problem properly. Unfortunately, you omitted important information. Based on the limited information you've been showing here, there is no problem at all.
To make my point clear: What you observed is not related to the datetime64[ns] conversion, but to something probably very simple that you didn't consider important enough to share with us.
Have a look at How to create a Minimal, Reproducible Example. The idea is: When you're able to prepare your problem in a reproduciable way, you'll probably be ab le to solve it yourself.

PySpark - converting hour and minute data to seconds

I have a given time of XXh:YYm (ex 1h:23m) that I'm trying to convert to seconds. The tricky part is that if it is less than an hour then the time will be given as just YYm (eg 52m).
I am currently using
%pyspark
newColumn = unix_timestamp(col("time"), "H:mm")
dataF.withColumn('time', regexp_replace('time', 'h|m', '')).withColumn("time", newColumn).show()
This works great for removing the h and m letters and then converting to seconds, but throws a null when the time is less than an hour as explained above since it's not actually on the H:mm format. What's a good approach to this? I keep trying different things that seems to overcomplicate it, and I still haven't found a solution.
I am leaning toward some sort of conditional like
if value contains 'h:' then newColumn = unix_timestamp(col("time"), "H:mm")
else newColumn = unix_timestamp(col("time"), "mm")
but I am fairly new to pyspark and not sure how to do this to get the final output. I am basically looking for an approach that will convert a time to seconds and can handle formats of '1h:23m' as well as '53m'.
This should do the trick, assuming time column is stringtype. Just used when otherwise to separate the two different times(by contains 'h') and used substring to get desired minutes.
from pyspark.sql import functions as F
df.withColumn("seconds", F.when(F.col("time").contains("h"), F.unix_timestamp(F.regexp_replace("time", "h|m", ''),"H:mm"))\
.otherwise(F.unix_timestamp(F.substring("time",1,2),"mm")))\
.show()
+------+-------+
| time|seconds|
+------+-------+
|1h:23m| 4980|
| 23m| 1380|
+------+-------+
You can use "unix_timestamp" function to convert DateTime to unix timestamp in seconds.
You can refer to one of my blog on the Spark DateTime function and go to "unix_timestamp" section.
https://medium.com/expedia-group-tech/deep-dive-into-apache-spark-datetime-functions-b66de737950a
Regards,
Neeraj

Apply a function to each row python

I am trying to convert from UTC time to LocaleTime in my dataframe. I have a dictionary where I store the number of hours I need to shift for each country code. So for example if I have df['CountryCode'][0]='AU' and I have a df['UTCTime'][0]=2016-08-12 08:01:00 I want to get df['LocaleTime'][0]=2016-08-12 19:01:00 which is
df['UTCTime'][0]+datetime.timedelta(hours=dateDic[df['CountryCode'][0]])
I have tried to do it with a for loop but since I have more than 1 million rows it's not efficient. I have looked into the apply function but I can't seem to be able to put it to take inputs from two different columns.
Can anyone help me?
Without having a more concrete example its difficult but try this:
pd.to_timedelta(df.CountryCode.map(dateDict), 'h') + df.UTCTime

Graphing the number of elements down based on timestamps start/end

I am trying to graph alarm counts in Python to give some sort of display to give an idea of the peak amount of network elements down between two timespans. The way that our alarms report handles it is in CSV like this:
Name,Alarm Start,Alarm Clear
NE1,15:42 08/09/11,15:56 08/09/11
NE2,15:42 08/09/11,15:57 08/09/11
NE3,15:42 08/09/11,16:31 08/09/11
NE4,15:42 08/09/11,15:59 08/09/11
I am trying to graph the start and end between those two points and how many NE's were down during that time, including the maximum number and when it went under or over a certain count. An example is below:
15:42 08/09/11 - 4 Down
15:56 08/09/11 - 3 Down
etc.
Any advice where to start on this would be great. Thanks in advance, you guys and gals have been a big help in the past.
I'd start by parsing your indata to a map indexed by dates with counts as values. Just increase the count for each row with the same date you encounter.
After that, use some plotting module, for instance matplotlib to plot the keys of the map versus the values. That should cover it!
Do you need any more detailed ideas?

Categories