How to shift one column, based on values from another column? - python

I have a data frame like this:
I want to make column "flag" which will spot the change in "value" column FOR EACH RB individualy and only if the value increases, and put it 1 one month before change when the value increases, but only if that month before has the same RB, as month of the change. So shift cant help, I think.
Also want to make simillar column, but when value increases for one RB, then I put 1 in the month before ( as column above ), 2 months before and 3 months beofre, the rule stays the same 1 only "shift" for those 3 months if month of the change and all those 3 months have same RB.

This should do, what you want:
import pandas as pd
data = [
{"rb": 111, "date": "01/01/2020", "value": 5},
{"rb": 111, "date": "01/02/2020", "value": 5},
{"rb": 111, "date": "01/03/2020", "value": 4},
{"rb": 111, "date": "01/04/2020", "value": 6},
{"rb": 111, "date": "01/05/2020", "value": 6},
{"rb": 111, "date": "01/06/2020", "value": 6},
{"rb": 111, "date": "01/07/2020", "value": 6},
{"rb": 111, "date": "01/08/2020", "value": 7},
{"rb": 112, "date": "01/01/2020", "value": 3},
{"rb": 112, "date": "01/02/2020", "value": 3},
{"rb": 112, "date": "01/03/2020", "value": 4},
{"rb": 112, "date": "01/04/2020", "value": 4},
{"rb": 112, "date": "01/05/2020", "value": 5},
{"rb": 112, "date": "01/06/2020", "value": 5},
{"rb": 112, "date": "01/07/2020", "value": 5},
{"rb": 112, "date": "01/08/2020", "value": 5},
{"rb": 111, "date": "01/01/2020", "value": 18},
{"rb": 111, "date": "01/02/2020", "value": 18},
{"rb": 111, "date": "01/03/2020", "value": 17},
{"rb": 111, "date": "01/04/2020", "value": 11},
{"rb": 111, "date": "01/05/2020", "value": 13},
{"rb": 111, "date": "01/06/2020", "value": 13},
{"rb": 111, "date": "01/07/2020", "value": 13},
{"rb": 111, "date": "01/08/2020", "value": 13},
{"rb": 112, "date": "01/01/2020", "value": 14},
{"rb": 112, "date": "01/02/2020", "value": 14},
{"rb": 112, "date": "01/03/2020", "value": 17},
{"rb": 112, "date": "01/04/2020", "value": 17},
{"rb": 112, "date": "01/05/2020", "value": 5},
{"rb": 112, "date": "01/06/2020", "value": 5},
{"rb": 112, "date": "01/07/2020", "value": 5}
]
df = pd.DataFrame(data)
df["flag"] = 0
for index in range(len(df) - 1):
df.loc[index, "flag"] = int(df.loc[index, "rb"] == df.loc[index + 1, "rb"] and
df.loc[index, "value"] < df.loc[index + 1, "value"])
df["flag_3m"] = 0
for index in range(len(df)):
try:
df.loc[index, "flag_3m"] = int(df.loc[index, "flag_3m"] != 1 and
((df.loc[index, "value"] < df.loc[index + 1, "value"] and df.loc[index, "rb"] == df.loc[index + 1, "rb"]) or
(df.loc[index + 1, "value"] < df.loc[index + 2, "value"] and df.loc[index, "rb"] == df.loc[index + 2, "rb"]) or
(df.loc[index + 2, "value"] < df.loc[index + 3, "value"] and df.loc[index, "rb"] == df.loc[index + 3, "rb"])))
except:
# Dirty way ;)
pass
print(df)
PS: Maybe it is easier to first groupby by rb and then check the data, but this should work, too.

Related

Python list of dict sorting move all zero at bottom

I have a list of dictionary which I am sorting using multiple keys.
Now I want to push all the elements that have zero rank (rank is a key)
basically rank 0 must be at bottom
mylist = [
{
"score": 5.0,
"rank": 2,
"duration": 123,
"amount": "18.000",
},
{
"score": -1.0,
"rank": 0,
"duration": 23,
"amount": "-8.000",
},
{
"score": -2.0,
"rank": 0,
"duration": 63,
"amount": "28.000",
},
{
"score": 2.0,
"rank": 1,
"duration": 73,
"amount": "18.000",
},
]
from operator import itemgetter
sort_fields = ['rank', 'duration']
sorted_list = sorted(mylist, key=itemgetter(*sort_fields), reverse=False)
print(sorted_list)
current output
[{'score': -1.0, 'rank': 0, 'duration': 23, 'amount': '-8.000'}, {'score': -2.0, 'rank': 0, 'duration': 63, 'amount': '28.000'}, {'score': 2.0, 'rank': 1, 'duration': 73, 'amount': '18.000'}, {'score': 5.0, 'rank': 2, 'duration': 123, 'amount': '18.000'}]
expected output
[{'score': 2.0, 'rank': 1, 'duration': 73, 'amount': '18.000'}, {'score': 5.0, 'rank': 2, 'duration': 123, 'amount': '18.000'},{'score': -1.0, 'rank': 0, 'duration': 23, 'amount': '-8.000'}, {'score': -2.0, 'rank': 0, 'duration': 63, 'amount': '28.000'}, ]
mylist = [
{
"score": 5.0,
"rank": 2,
"duration": 123,
"amount": "18.000",
},
{
"score": -1.0,
"rank": 0,
"duration": 23,
"amount": "-8.000",
},
{
"score": -2.0,
"rank": 0,
"duration": 63,
"amount": "28.000",
},
{
"score": 2.0,
"rank": 1,
"duration": 73,
"amount": "18.000",
},
]
sorted_list = sorted(mylist, key = lambda x: (x['rank'] if x['rank'] > 0 else float('inf'),x['duration']))
print(sorted_list)
You should make the key function return a tuple of values based on the precedence of the sorting criteria. Since the first of your sorting criteria is in fact whether the rank is zero, make that test the first item of the tuple. Then you got the rest, namely the rank and the duration, correctly in order:
sorted_list = sorted(mylist, key=lambda d: (d['rank'] == 0, d['rank'], d['duration'])))

using Python Merge two json file with each file having multiple json objects

I am very new to Python, I need to merge two json file with multiple json object based on "Id".
File1.json
{"id": 1, "name": "Ault", "class": 8, "email": "ault#pynative.com"}
{"id": 2, "name": "john", "class": 8, "email": "jhon#pynative.com"}
{"id": 3, "name": "josh", "class": 8, "email": "josh#pynative.com"}
{"id": 4, "name": "emma", "class": 8, "email": "emma#pynative.com"}
File2.json
{"id": 4, "math": "A", "class": 8, "physics": "D"}
{"id": 2, "math": "B", "class": 8, "physics": "C"}
{"id": 3, "math": "A", "class": 8, "physics": "A"}
{"id": 1, "math": "C", "class": 8, "physics": "B"}
I have tried both json.loads(jsonObj) and json.load(path). Both throw errors.
I know both files are not a valid json as a whole (combined), but each line in a file is a valid json. I want to read line by line and merge both.
You can read line by line then parse
# asuming
# File1.json
# {"id": 1, "name": "Ault", "class": 8, "email": "ault#pynative.com"}
# {"id": 2, "name": "john", "class": 8, "email": "jhon#pynative.com"}
# {"id": 3, "name": "josh", "class": 8, "email": "josh#pynative.com"}
# {"id": 4, "name": "emma", "class": 8, "email": "emma#pynative.com"}
# File2.json
# {"id": 4, "math": "A", "class": 8, "physics": "D"}
# {"id": 2, "math": "B", "class": 8, "physics": "C"}
# {"id": 3, "math": "A", "class": 8, "physics": "A"}
# {"id": 1, "math": "C", "class": 8, "physics": "B"}
import json
merged = {}
with open('File1.json') as f:
for line in f:
jsonified = json.loads(line)
merged[jsonified['id']] = jsonified
with open('File2.json') as f:
for line in f:
jsonified = json.loads(line)
merged[jsonified['id']].update(jsonified) # asuming both file has same ids otherwise use try catch
merged = list(merged.values())
print(merged)
[{'id': 1,
'name': 'Ault',
'class': 8,
'email': 'ault#pynative.com',
'math': 'C',
'physics': 'B'},
{'id': 2,
'name': 'john',
'class': 8,
'email': 'jhon#pynative.com',
'math': 'B',
'physics': 'C'},
{'id': 3,
'name': 'josh',
'class': 8,
'email': 'josh#pynative.com',
'math': 'A',
'physics': 'A'},
{'id': 4,
'name': 'emma',
'class': 8,
'email': 'emma#pynative.com',
'math': 'A',
'physics': 'D'}]
Here is my take on this using pandas.
import pandas as pd
import os
os.chdir(os.getcwd())
file_path_1 = 'file1.json'
file_path_2 = 'file2.json'
df1 = pd.read_json(file_path_1, lines=True)
df2 = pd.read_json(file_path_2, lines=True)
df = df1.merge(df2, on='id')
print(df)
Output:
id name class_x email math class_y physics
0 1 Ault 8 ault#pynative.com C 8 B
1 2 john 8 jhon#pynative.com B 8 C
2 3 josh 8 josh#pynative.com A 8 A
3 4 emma 8 emma#pynative.com A 8 D

How to combine dups from dictionary with Python [duplicate]

I have this list of dictionaries:
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
I want to be able to find the duplicates of ingredients (by either name or id). If there are duplicates and have the same unit_of_measurement, combine them into one dictionary and add the quantity accordingly. So the above data should return:
[
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
How do I go about it?
Assuming you have a dictionary represented like this:
data = {
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
}
What you could do is use a collections.defaultdict of lists to group the ingredients by a (name, id) grouping key:
from collections import defaultdict
ingredient_groups = defaultdict(list)
for ingredient in data["ingredients"]:
key = tuple(ingredient["ingredient"].items())
ingredient_groups[key].append(ingredient)
Then you could go through the grouped values of this defaultdict, and calculate the sum of the fraction quantities using fractions.Fractions. For unit_of_measurement and ingredient, we could probably just use the first grouped values.
from fractions import Fraction
result = [
{
"unit_of_measurement": value[0]["unit_of_measurement"],
"quantity": str(sum(Fraction(ingredient["quantity"]) for ingredient in value)),
"ingredient": value[0]["ingredient"],
}
for value in ingredient_groups.values()
]
Which will then give you this result:
[{'ingredient': {'id': 12, 'name': 'Balsamic Vinegar'},
'quantity': '1',
'unit_of_measurement': {'id': 13, 'name': 'Pound (Lb)'}},
{'ingredient': {'id': 14, 'name': 'Basil Leaves'},
'quantity': '3',
'unit_of_measurement': {'id': 15, 'name': 'Tablespoon'}}]
You'll probably need to amend the above to account for ingredients with different units or measurements, but this should get you started.

Grouping data by date in MongoDB and Python

I'm making a standard find query to my MongoDB database, it looks like this:
MyData = pd.DataFrame(list(db.MyData.find({'datetimer': {'$gte': StartTime, '$lt': Endtime}})), columns=['price', 'amount', 'datetime'])
Now i'm trying to do another query, but it's more complicated and i don't know how to do it. Here is a sample of my data:
{"datetime": "2020-07-08 15:10", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:15", "price": 22, "amount": 50}
{"datetime": "2020-07-08 15:19", "price": 21, "amount": 40}
{"datetime": "2020-07-08 15:30", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:35", "price": 32, "amount": 50}
{"datetime": "2020-07-08 15:39", "price": 41, "amount": 40}
{"datetime": "2020-07-08 15:49", "price": 32, "amount": 40}
I need to group that data in intervals of 30 Minutes and have them distinct by price. So all the records before 15:30must have 15:30 as datetime, all the records before 16:00 need to have 16:00. An example of the expected output:
The previous data becomes this:
{"datetime": "2020-07-08 15:30", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:30", "price": 22, "amount": 50}
{"datetime": "2020-07-08 16:00", "price": 32, "amount": 50}
{"datetime": "2020-07-08 16:00", "price": 41, "amount": 40}
I don't know if this query is doable, so any kind of advice is appreciated. I can also do that from my code, if it's not possible to do
I tried the code suggested here, but i got the following result, which is not the expected output:
Query = db.myData.aggregate([
{ "$group": {
"_id": {
"$toDate": {
"$subtract": [
{ "$toLong": "$datetime" },
{ "$mod": [ { "$toLong": "$datetime" }, 1000 * 60 * 15 ] }
]
}
},
"count": { "$sum": 1 }
}}
])
for x in Query:
print(x)
//OUTPUT:
{'_id': datetime.datetime(2020, 7, 7, 9, 15), 'count': 39}
{'_id': datetime.datetime(2020, 7, 6, 18, 30), 'count': 44}
{'_id': datetime.datetime(2020, 7, 7, 16, 30), 'count': 54}
{'_id': datetime.datetime(2020, 7, 7, 11, 45), 'count': 25}
{'_id': datetime.datetime(2020, 7, 6, 22, 15), 'count': 48}
{'_id': datetime.datetime(2020, 7, 7, 15, 0), 'count': 30}
...
What #Gibbs suggested is correct, you just have to modify the data a little bit.
Check if the below aggregate query is what you are looking for
Query = db.myData.aggregate([
{
"$group": {
"_id": {
"datetime":{
"$toDate": {
"$subtract": [
{ "$toLong": "$datetime" },
{ "$mod": [ { "$toLong": "$datetime" }, 1000 * 60 * 30 ] }
]
}
},
"price": "$price",
"amount": "$amount"
},
}
},
{
"$replaceRoot": { "newRoot": "$_id"}
}
])
for x in Query:
print(x)

Merge and add two lists of dictionaries using Python

I have two lists of dictionaries and a piece of code that tries to merge them:
Json_Final =[]
try:
for keyIN in JSON1:
json_data_Merge= {}
for keyUS in JSON2:
if(keyIN['YEAR'] == keyUS['YEAR']) & (keyIN['MONTH'] == keyUS['MONTH'])& (keyIN['Name'] == keyUS['Name']):
json_data_Merge['YEAR'] = keyIN['YEAR']
json_data_Merge['MONTH'] = keyIN['MONTH']
json_data_Merge['Name'] = keyIN['Name']
json_data_Merge['Total']= int(keyIN['Total']) + int(keyUS['Total'])
Json_Final.append(json_data_Merge)
print( Json_Final )
except Exception as e:
print('MergeException',e)
JSON 1 = [{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 100},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 200},
{"YEAR": 2019, "MONTH": 2, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 100},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 200}]
JSON 2 = [{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 200},
{"YEAR": 2019, "MONTH": 1, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 400},
{"YEAR": 2019, "MONTH": 2, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Mango", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 500},
{"YEAR": 2019, "MONTH": 3, "Name": "Orange", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 250}]
Expected Output:
[{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 1, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 600},
{"YEAR": 2019, "MONTH": 2, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Mango", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 600},
{"YEAR": 2019, "MONTH": 3, "Name": "Orange", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 450}]
My Code Output:
[{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 600},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 300},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 450}]
This is one approach.
Ex:
JSON_1 = [{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 100},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 200},
{"YEAR": 2019, "MONTH": 2, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 100},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 200}]
JSON_2 = [{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 200},
{"YEAR": 2019, "MONTH": 1, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 400},
{"YEAR": 2019, "MONTH": 2, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Mango", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 500},
{"YEAR": 2019, "MONTH": 3, "Name": "Orange", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 250}]
JSON_2 = {"{}_{}_{}".format(i["YEAR"], i["MONTH"], i["Name"]): i for i in JSON_2} #Create a dict for easy loopup
for i in JSON_1:
key = "{}_{}_{}".format(i["YEAR"], i["MONTH"], i["Name"]) #Create key with Year, Month, Name
if key in JSON_2: #Check if item from JSON_1 exist in JSON_2
JSON_2[key]['Total'] += i["Total"] #Update Total
else:
JSON_2[key] = i #Else add new entry.
print(list(JSON_2.values())) #Get values.
Output:
[{'MONTH': 1, 'Name': 'Orange', 'Total': 300, 'YEAR': 2019},
{'MONTH': 2, 'Name': 'Mango', 'Total': 200, 'YEAR': 2019},
{'MONTH': 3, 'Name': 'Apple', 'Total': 450, 'YEAR': 2019},
{'MONTH': 1, 'Name': 'Apple', 'Total': 300, 'YEAR': 2019},
{'MONTH': 3, 'Name': 'Grape', 'Total': 600, 'YEAR': 2019},
{'MONTH': 3, 'Name': 'Orange', 'Total': 200, 'YEAR': 2019},
{'MONTH': 2, 'Name': 'Grape', 'Total': 600, 'YEAR': 2019},
{'MONTH': 2, 'Name': 'Apple', 'Total': 300, 'YEAR': 2019},
{'MONTH': 2, 'Name': 'Orange', 'Total': 300, 'YEAR': 2019}]

Categories