Merging two different lists of datetime intervals - python

I'm trying to merge two lists. Each list has a start date, end date and a value. Resulting list must have both values. Intervals from list 2 will need to be split when intervals don't match. I'm able to this going day by day and obtaining values from both lists. However, this is extremely inefficient, so it won't work with big lists. I would like to know what is the most efficient way of doing this
Here's an example:
LIST 1
[
['ALL', 'ALL', 2],
['2013-11-24', '2013-11-30', 4],
['2013-12-24', '2014-01-01', 3],
]
LIST 2
[
['2013-07-08', '2013-08-29', '1800.00'],
['2013-08-30', '2013-09-06', '1800.00'],
['2013-10-01', '2013-10-31', '1500.00'],
['2013-11-24', '2013-12-03', '400.00'],
['2013-12-24', '2014-01-03', '500.00'],
]
RESULTING LIST
[
['2013-07-08', '2013-08-29', '1800.00', 2],
['2013-08-30', '2013-09-06', '1800.00', 2],
['2013-10-01', '2013-10-31', '1500.00', 2],
['2013-11-24', '2013-11-30', '400.00', 4],
['2013-12-01', '2013-12-03', '400.00', 2],
['2013-12-24', '2014-01-01', '500.00', 3],
['2014-01-02', '2014-01-03', '500.00', 2]
]
I would appreciate any help. Thank you.

Related

Compare the keys of two nested dictionaries with level 3

I have two nested dictionary: main and test. I want to compare the keys of the dictionaries and find the difference.
main = {"A":{"AA":{"AAA1": [1, 3], "AAA2": [2,4]}, 'BB': {'BBB1': [2 ,4 ], 'BBB2': [5,7]}}}
test1 = {"A":{"AA":{"AAA1": [3, 3], "AAA2": [4,4]}, 'BB': {'BBB1': [4 ,4 ], 'BBB2': [7,7]}}}
test2 = {"A":{"AA":{"AAA1": [3, 3], "AAA2": [4,4]}, 'BB': {'BBB1': [4 ,4 ]}}}
When comparing main and test1, the expected output is {} as all keys till level 3 are present.
When comparing main and test2, the expected output is {'A': {'BB': ['BBB2']}}.
I have tried the solution for three levels using the How to get the difference between two dictionaries in Python? Is there any other efficient method for nested dictionaries? Thanks in advance.

Pandas: Looking to avoid a for loop when creating a nested dictionary

Here is my data:
df:
id sub_id
A 1
A 2
B 3
B 4
and I have the following array:
[[1,2],
[2,5],
[1,4],
[7,8]]
Here is my code:
from collections import defaultdict
sub_id_array_dict = defaultdict(dict)
for i, s, a in zip(df['id'].to_list(), df['sub_id'].to_list(), arrays):
sub_id_array_dict[i][s] = a
Now, my actual dataframe includes a total of 100M rows (unique sub_id) with 500K unique ids. Ideally, I'd like to avoid a for loop.
Any help would be much appreciated.
Assuming the arrays variable has same number of rows as in the Dataframe,
df['value'] = arrays
Convert into dictionary by grouping
df.groupby('id').apply(lambda x: dict(zip(x.sub_id, x.value))).to_dict()
Output
{'A': {1: [1, 2], 2: [2, 5]}, 'B': {3: [1, 4], 4: [7, 8]}}

Split numpy array based column value in list

I'am new in numpy and I want to split array 2D based on columns values if value in another list,
I converted a pandas dataframe on numpy array of 2D and I have a another list, I want to split my numpy array on two others array, the first based on (if values of second column in list) and the second contains the rest of my numpy array, and I want to get the rest of my list(contains all values doesn't exist in my numpy array)
numpy_data = np.array([
[1, 'p1', 2],
[11, 'p2', 8],
[1, 'p8', 21],
[13, 'p10', 2] ])
list_value = ['p1', 'p3', 'p8']
The expected output :
data_in_list = [
[1, 'p1', 2],
[1, 'p8', 21]]
list_val_in_numpy = ['p1', 'p8'] # intersection of second column with my list
rest_data = [
[11, 'p2', 8],
[13, 'p10', 2]]
rest_list_value = ['p3']
In my code I have found how to get first output :
first_output = numpy_data[np.isin(numpy_data[:,1], list_value)]
But I couldn't find the rest of my numpy, I have tried too,
Browse my list and seek if values in second column of array and then delete this row, in this case I dont need the first output (That I called data_in_list, b-coz I do what I need on it), here I need the others output
for val in l :
row = numpy_data[np.where(numpy_data[:,1]== val)]
row.size != 0 :
# My custom code
# then remove this row from my numpy, I couldn't do it
Thanks in advance
Use python's invert ~ operator over the result of the np.isin:
rest = numpy_data[~np.isin(numpy_data[:,1], list_value)]
There are multiple ways of doing this. I would prefer a vectorized way of using list comprehension. But for sake of clarity here is loop way of doing the same thing.
data_in_list=[]
list_val_in_numpy = []
rest_data=[]
for x in numpy_data:
for y in x:
if y in list_value:
data_in_list.append(x)
for x in list_value:
if x == y:
list_val_in_numpy.append(x)
for x in numpy_data:
if x in data_in_list:
pass
else:
rest_data.append(x)
This gives you all the three lists you were looking for. Concatenate to get the list you want exactly.
list comprehension will solve it I guess:
numpy_data = [
[1, 'p1', 2],
[11, 'p2', 8],
[1, 'p8', 21],
[13, 'p10', 2],
]
list_value = ['p1', 'p3', 'p8']
output_list = [[item] for item in numpy_data if item[1] in list_value]
print(output_list)
output:
[[[1, 'p1', 2]], [[1, 'p8', 21]]]

Using Python to find matching arrays and combine two arrays into one

I would like to use Python to find the matching arrays of x[0] in set1 and x[0] in set2, for example such as [4012642, 0.10869565] in set 1 and [4012642, 2] in set 2. Then I would like to combine them into 1 array and divide set1[1] by set2[1], So it would become [4012642, (2/0.10869565)] or [4012642, 18.40]. I want to do this for each variable in set1 and set2 and put into a new array. Any help is greatly appreciated, sorry I may have worded this very confusingly.
set1 = [[4012640, 0.014925373], [4012642, 0.10869565], [4012644, 0.40298506], [4012646, 0.04477612], [4012616, 0.6264330499999999], [4012618, 1.128477924], [4012620, 0], [4012622, 0.12820514], [4012624, 0.16417910000000002], [4013328, 0.16666667], [4012626, 0.149253743], [4012658, 0], [4012628, 0.41791046], [4012630, 0.28493894000000003], [4012632, 1.999999953], [4012634, 0.08955224], [4012636, 0], [4012638, 0]]
set2 = [[4012640, 2], [4012642, 2], [4012644, 2], [4012646, 1], [4012616, 5], [4012618, 8], [4012620, 1], [4012622, 2], [4012624, 5], [4013328, 2], [4012626, 6], [4012658, 1], [4012628, 4], [4012630, 8], [4012632, 4], [4012634, 4], [4012636, 1], [4012638, 1]]
Personally, I preferr to use dataframe to handle this kind 'join' question
# build two dataframe from set1 and set2
df1=pd.DataFrame(columns=['x0','x1'])
df1['x0']=[x[0] for x in set1]
df1['x1']=[x[1] for x in set1]
df2=pd.DataFrame(columns=['x0','x2'])
df2['x0']=[x[0] for x in set2]
df2['x2']=[x[1] for x in set2]
Then call merge method in pandas to match two dataframe by column 'x0'
# Merge two dataframe on 'x0'
df=pd.merge(df1,df2,on=['x0'],how='left')
# Calculate a new columnn by 'x2'/'x1'
df['values']=df['x2']/df['x1']
Results:
x0 x1 x2 values
0 4012640 0.014925 2 134.000001
1 4012642 0.108696 2 18.400000

Merge and add duplicate integers from a multidimensional array

I have a multidimensional list where the first item is a date and the second is a date time object needing to be added together. For example (leave the second as a integer for simplicity):
[[01/01/2019, 10], [01/01/2019, 3], [02/01/2019, 4], [03/01/2019, 2]]
The resulting array should be:
[[01/01/2019, 13], [02/01/2019, 4], [03/01/2019, 2]]
Does someone have a short way of doing this?
The background to this is vehicle tracking, I have a list of trips performed by vehicle and I want to have a summary by day with a count of total time driven per day.
You should change your data 01/01/2019 to '01/01/2019'.
#naivepredictor suggested good sample, anyway, if you don't want to import pandas, use this.
my_list = [['01/01/2019', 10], ['01/01/2019', 3], ['02/01/2019', 4], ['03/01/2019', 2]]
result_d = {}
for i in my_list:
result_d[i[0]] = result_d.get(i[0], 0) + i[1]
print(result_d) #{'01/01/2019': 13, '02/01/2019': 4, '03/01/2019': 2}
print([list(d) for d in result_d.items()]) #[['01/01/2019', 13], ['02/01/2019', 4], ['03/01/2019', 2]]
import pandas as pd
# create dataframe out of the given imput
df = pd.DataFrame(data=[['01/01/2019', 10], ['01/01/2019', 3], ['02/01/2019', 4]], columns=['date', 'trip_len'])
# groupby date and sum values for each day
df = df.groupby('date').sum().reset_index()
# output result as list of lists
result = df.values.tolist()

Categories