Interpolation between two datetimes - python

I have a time series dataset and I'm getting some events. These events are when I get a specific error from my system. Now I wanted to plot a graph from the dataset and place markers from the events on the graph of my time series. So I have to interpolate between two timestamps to get the exact y value. My problem is now when I'm typing in the following code:
import numpy as np
test=np.interp(event, [timestamp_timeseries[k-1],
timestamp_timeseries[k]], [y_value[k-1], y_value[k])
with types:
timestamp_timeseries: datetime.datetime
y_value: int
event: datetime.datetime (Timestamp when an error is coming from a system)
Thanks for you help.
Example:
test=np.interp(
datetime.datetime(2022, 10, 11, 12, 24, 5, 922000),
[datetime.datetime(2022, 10, 11, 12, 6, 40, 480000),
datetime.datetime(2022, 10, 11, 12, 52, 51, 481000)],
[335872, 336896])
My result is:
TypeError: float() argument must be a string or a number, not 'datetime.datetime'

use a numeric representation of datetime, e.g. Unix time you get from .timestamp(). Ex:
from datetime import datetime
import numpy as np
test=np.interp(
datetime(2022, 10, 11, 12, 24, 5, 922000).timestamp(),
[datetime(2022, 10, 11, 12, 6, 40, 480000).timestamp(), datetime(2022, 10, 11, 12, 52, 51, 481000).timestamp()],
[335872, 336896])
test
# 336258.3342553605

Related

OpenCV VideoCapture.read() same frame has different values per pixel if running on different machines

I have a C++ project running on an old windows PC, I am trying to convert it to
Python and run it on an Ubuntu machine (a Jetson Nano). I started by
simply reading a frame from a grayscale video and writing it into a text file, and I expect the result to be identical, however for some reason it's not. specifically on the Nano, some of the pixel values
are larger than on the Windows PC.
How is this possible? can this be because of the OpenCV version, or maybe the decoder VideoCapture.read() is using is different (is there a way to check that)? or am I missing something in the code?
Here is the C++ code:
#include <opencv2/opencv.hpp>
#include <stdio.h>
#include <fstream>
std::ofstream imagetest("image_test_cpp.txt");
int main(){
cv::Mat img;
cv::VideoCapture m_VideoCapture;
m_VideoCapture.open("video.avi");
m_VideoCapture.read(img);
imagetest << img;
return 0;
}
Output:
[ 12, 15, 13, 12, 15, 13, 12, 15, 13, 14, ...
12, 15, 13, 12, 15, 13, 13, 16, 14, 15, ...
14, 17, 15, 14, 17, 15, 13, 16, 14, 14, ...
... ]
And here is the Python code (maybe the way I write it to the file is wrong?):
import cv2
import numpy as np
image_test = open('image_test_py.txt', 'w')
cap = cv2.VideoCapture('video.avi')
img = cap.read()
np.savetxt('image_test_py.txt', img.reshape(img.shape[0], -1),fmt="%d")
Output:
[ 12, 16, 12, 12, 16, 12, 12, 16, 12, 15, ... ]
(The rows in the output files didn't match so I only put the start of the first one, but I think it's enough
to see the difference)
The frame I'm reading is this:
grayscale frame
Any help would be appreciated.

Plotting a histogram from a database using matplot and python

So from the database, I'm trying to plot a histogram using the matplot lib library in python.
as shown here:
cnx = sqlite3.connect('practice.db')
sql = pd.read_sql_query('''
SELECT CAST((deliverydistance/1)as int)*1 as bin, count(*)
FROM orders
group by 1
order by 1;
''',cnx)
which outputs
This
From the sql table, I try to extract the columns using a for loop and place them in array.
distance =[]
counts = []
for x,y in sql.iterrows():
y = y["count(*)"]
counts.append(y)
distance.append(x)
print(distance)
print(counts)
OUTPUT:
distance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
counts = [57136, 4711, 6569, 7268, 6755, 5757, 7643, 6175, 7954, 9418, 4945, 4178, 2844, 2104, 1829, 9, 4, 1, 3]
When I plot a histogram
plt.hist(counts,bins=distance)
I get this out put:
click here
My question is, how do I make it so that the count is on the Y axis and the distance is on the X axis? It doesn't seem to allow me to put it there.
you could also skip the for loop and plot direct from your pandas dataframe using
sql.bin.plot(kind='hist', weights=sql['count(*)'])
or with the for loop
import matplotlib.pyplot as plt
import pandas as pd
distance =[]
counts = []
for x,y in sql.iterrows():
y = y["count(*)"]
counts.append(y)
distance.append(x)
plt.hist(distance, bins=distance, weights=counts)
You can skip the middle section where you count the instances of each distance. Check out this example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'distance':np.round(20 * np.random.random(100))})
df['distance'].hist(bins = np.arange(0,21,1))
Pandas has a built-in histogram plot which counts, then plots the occurences of each distance. You can specify the bins (in this case 0-20 with a width of 1).
If you are not looking for a bar chart and are looking for a horizontal histogram, then you are looking to pass orientation='horizontal':
distance = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
# plt.style.use('dark_background')
counts = [57136, 4711, 6569, 7268, 6755, 5757, 7643, 6175, 7954, 9418, 4945, 4178, 2844, 2104, 1829, 9, 4, 1, 3]
plt.hist(counts,bins=distance, orientation='horizontal')
Use :
plt.bar(distance,counts)

matplotlib time series from dictionary and datetime

Trying to understand how the time series of matplotlib works.
Unfortunately, this doc just load data straight from a file using bumpy, which makes it very cryptic for non-fluent numpy adepts.
From the doc:
with cbook.get_sample_data('goog.npz') as datafile:
r = np.load(datafile)['price_data'].view(np.recarray)
r = r[-30:] # get the last 30 days
# Matplotlib works better with datetime.datetime than np.datetime64, but the
# latter is more portable.
date = r.date.astype('O')
In my case, I have a dictionary of datetime (key) and int, which I can transform to an array or list, but I wasn't quite successful to get anything that pyplot would take and the doc isn't much of help, especially for timeseries.
def toArray(dict):
data = list(dict.items())
return np.array(data)
>>>
[datetime.datetime(2020, 5, 4, 16, 44) -13]
[datetime.datetime(2020, 5, 4, 16, 45) 7]
[datetime.datetime(2020, 5, 4, 16, 46) -11]
[datetime.datetime(2020, 5, 4, 16, 47) -75]
[datetime.datetime(2020, 5, 4, 16, 48) -41]
[datetime.datetime(2020, 5, 4, 16, 49) -39]
[datetime.datetime(2020, 5, 4, 16, 50) -4]
The most important part is to split X axis from Y axis (in your case - dates from values). Using your function toArray() to retrieve data, the following code produces a desired result:
import matplotlib.pyplot as plt
data = toArray(your_dict)
fig, ax = plt.subplots(figsize=(20, 10))
dates = [x[0] for x in data]
values = [x[1] for x in data]
ax.plot(dates, values, 'o-')
ax.set_title("Default")
fig.autofmt_xdate()
plt.show()
Note how we split data from 2D array of dates and values into two 1D arrays dates and values.

keras ValueError: invalid literal for int() with base 10: [duplicate]

I have a list:
code = ['<s>', 'are', 'defined', 'in', 'the', '"editable', 'parameters"', '\n', 'section.', '\n', 'A', 'larger', '`tsteps`', 'value', 'means', 'that', 'the', 'LSTM', 'will', 'need', 'more', 'memory', '\n', 'to', 'figure', 'out']
And I want to convert to one hot encoding. I tried:
to_categorical(code)
And I get an error: ValueError: invalid literal for int() with base 10: '<s>'
What am I doing wrong?
keras only supports one-hot-encoding for data that has already been integer-encoded. You can manually integer-encode your strings like so:
Manual encoding
# this integer encoding is purely based on position, you can do this in other ways
integer_mapping = {x: i for i,x in enumerate(code)}
vec = [integer_mapping[word] for word in code]
# vec is
# [0, 1, 2, 3, 16, 5, 6, 22, 8, 22, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
Using scikit-learn
from sklearn.preprocessing import LabelEncoder
import numpy as np
code = np.array(code)
label_encoder = LabelEncoder()
vec = label_encoder.fit_transform(code)
# array([ 2, 6, 7, 9, 19, 1, 16, 0, 17, 0, 3, 10, 5, 21, 11, 18, 19,
# 4, 22, 14, 13, 12, 0, 20, 8, 15])
You can now feed this into keras.utils.to_categorical:
from keras.utils import to_categorical
to_categorical(vec)
instead use
pandas.get_dummies(y_train)
tf.keras.layers.CategoryEncoding
In TF 2.6.0, One Hot Encoding (OHE) or Multi Hot Encoding (MHE) can be implemented using tf.keras.layers.CategoryEncoding , tf.keras.layers.StringLookup, and tf.keras.layers.IntegerLookup.
I think this way is not plausible in TF 2.4.x so it must have been implemented after.
See Classify structured data using Keras preprocessing layers for the actual implementation.
def get_category_encoding_layer(name, dataset, dtype, max_tokens=None):
# Create a layer that turns strings into integer indices.
if dtype == 'string':
index = layers.StringLookup(max_tokens=max_tokens)
# Otherwise, create a layer that turns integer values into integer indices.
else:
index = layers.IntegerLookup(max_tokens=max_tokens)
# Prepare a `tf.data.Dataset` that only yields the feature.
feature_ds = dataset.map(lambda x, y: x[name])
# Learn the set of possible values and assign them a fixed integer index.
index.adapt(feature_ds)
# Encode the integer indices.
encoder = layers.CategoryEncoding(num_tokens=index.vocabulary_size())
# Apply multi-hot encoding to the indices. The lambda function captures the
# layer, so you can use them, or include them in the Keras Functional model later.
return lambda feature: encoder(index(feature))
Try converting it to a numpy array first:
from numpy import array
and then:
to_categorical(array(code))

How to separate 2 output arrays of sklearn kneighbors() Python?

I am a beginner in Python and I use NearestNeighbors in sklearn and the output is:
print(neigh.kneighbors([[0.00015217, 0.00050968, 0.00044049, 0.00014538,
0.00077339, 0.0020284 , 0.00047572]]))
And the output is:
(array([[1.01980586e-08, 7.73354596e-05, 7.73354596e-05, 1.20134585e-04,
1.39792434e-04, 1.48002389e-04, 1.98794609e-04, 4.63512739e-04,
5.31436554e-04, 5.36960418e-04, 5.72679303e-04, 6.28187320e-04,
6.67923141e-04, 7.51928163e-04, 8.97313642e-04, 1.00023442e-03,
1.06114362e-03, 1.11943158e-03, 1.12626043e-03, 1.20185118e-03,
1.51073901e-03, 1.71592746e-03, 1.73362257e-03]]),array([[ 0, 16, 15,
19,1, 23, 5, 8, 20, 9,6, 10, 17, 3, 21, 22,14, 2, 13, 7, 11, 12,
18]],dtype=int64))
I would like to import these data to csv because I need both the arrays in csv. how can I separate these arrays?
hh = neigh.kneighbors([[0.00015217, 0.00050968, 0.00044049, 0.00014538,
0.00077339, 0.0020284 , 0.00047572]])
first_array = hh[0]
second_array = hh[1]

Categories