Related
I would like to filter custom_fields1 so that the only remaining items are the ones who have 'Value' = 'Ja' below the line 'Name' = 'GPP' (and not below the first 'Name' key 'Name': 'Informationen'). Does anyone know how to efficiently filter through the dictionary? I am happy for every tip!
custom_fields1 =
[(44,
{'#odata.context': 'http://api.hellohq.io/v1/$metadata#CustomFields',
'value': [{'Name': 'Informationen',
'Value': '',
'Type': 'TextMultiline',
'Id': 18020,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None},
{'Name': 'GPP',
'Value': 'Ja',
'Type': 'DropdownCheckbox',
'Id': 18049,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None}]}),
(45,
{'#odata.context': 'http://api.hellohq.io/v1/$metadata#CustomFields',
'value': [{'Name': 'Informationen',
'Value': '',
'Type': 'TextMultiline',
'Id': 18020,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None},
{'Name': 'GPP',
'Value': 'Ja',
'Type': 'DropdownCheckbox',
'Id': 18049,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None}]}),
(46,
{'#odata.context': 'http://api.hellohq.io/v1/$metadata#CustomFields',
'value': [{'Name': 'Informationen',
'Value': '',
'Type': 'TextMultiline',
'Id': 18020,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None},
{'Name': 'GPP',
'Value': 'Nein',
'Type': 'DropdownCheckbox',
'Id': 18049,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None}]}))]
```
Your structure is sooo nested! Let's break it step by step.
It's a list, so we want to iterate over it
for field in custom_field:
<I have an element>
What the element is? Its a tuple and second element is what interest me dict1 = field[1]
Now I have a dictionary, where value is what interest me most values1 = dict['value']
Oh, is it list again? Let's iterate again!
for dict_value in values1:
<this is the dict I need!>
I got proper dict, now I just need to check my conditions
def check(dict_value):
return dict_value["name"] == ... and dict_value["Value"] == ...
How to do filtering based on that? You can use list filtering
[dict_value for dict_value in values1 if check(dict_value)]
And assign it to "value" key of outer dict.
Other option could be deleting records with del that does not satisfy our check.
JSON File
[https://drive.google.com/file/d/1Jb3OdoffyA71vYfojxLedZNPDLq9bn7b/view?usp=sharing]
I am trying to read the JSON file in Python, the JSON file is the same as the link above. The code that I wrote looks like this below
lst = []
for line in open(json_path,'r'):
lst.append(json.loads(line))
But for some reason, I kept having this error JSONDecodeError: Expecting value: line 2 column 1 (char 1) I am wondering did I do something wrong with code or the JSON file has an error in it?
Update
You can strip the break line (\n) out before using json.loads function (Thanks to #DeepSpace for the comment):
import json
lst = []
for line in open("sample.json",'r'):
stripped = line.strip("\n")
if stripped != "":
lst.append(json.loads(stripped))
lst
Also you can use ast module too:
import ast
lst = []
for line in open("sample.json",'r'):
if line.strip("\n") != "":
lst.append(ast.literal_eval(line))
Explanation
ast.literal_eval changes a dictionary or list in the shape of a string (such as "[1,2,3]"), to an useable dictionary or list in python (such as [1,2,3]).
The output of both codes above would be:
[{'content': [{'c_id': '002',
'p_id': 'P02',
'source': 'internet',
'type': 'org'},
{'c_id': '003', 'p_id': 'P03', 'source': 'internet', 'type': 'org'},
{'c_id': '005', 'p_id': 'K01', 'source': 'news', 'type': 'people'}],
'doc_id': '7098727',
'id': 'lni001',
'pub_date': '20220301',
'unique_id': '64WP-UI-POLI'},
{'content': [{'c_id': '002',
'p_id': 'P02',
'source': 'internet',
'type': 'org'},
{'c_id': '003', 'p_id': 'P03', 'source': 'internet', 'type': 'org'},
{'c_id': '005', 'p_id': 'K01', 'source': 'news', 'type': 'people'}],
'doc_id': '7098727',
'id': 'lni001',
'pub_date': '20220301',
'unique_id': '64WP-UI-POLI'},
{'content': [{'c_id': '002',
'p_id': 'P02',
'source': 'internet',
'type': 'org'},
{'c_id': '003', 'p_id': 'P03', 'source': 'internet', 'type': 'org'},
{'c_id': '005', 'p_id': 'K01', 'source': 'news', 'type': 'people'}],
'doc_id': '7098727',
'id': 'lni001',
'pub_date': '20220301',
'unique_id': '64WP-UI-POLI'},
{'content': [{'c_id': '012',
'p_id': 'K21',
'source': 'internet',
'type': 'location'},
{'c_id': '034', 'p_id': 'P17', 'source': 'news', 'type': 'people'},
{'c_id': '098', 'p_id': 'K54', 'source': 'news', 'type': 'people'}],
'doc_id': '7097889',
'id': 'lni002',
'pub_date': '20220301',
'unique_id': '64WP-UI-CFGT'},
{'content': [{'c_id': '012',
'p_id': 'K21',
'source': 'internet',
'type': 'location'},
{'c_id': '034', 'p_id': 'P17', 'source': 'news', 'type': 'people'},
{'c_id': '098', 'p_id': 'K54', 'source': 'news', 'type': 'people'}],
'doc_id': '7097889',
'id': 'lni002',
'pub_date': '20220301',
'unique_id': '64WP-UI-CFGT'}]
Let say, I have a list data for example:
data = [
{'id': 1, 'name': 'brad', 'color': 'red', 'tags': [], 'author': {'name': 'admin'}},
{'id': 2, 'name': 'sylvia', 'color': 'blue', 'tags': [], 'author': {'name': 'user'}},
{'id': 3, 'name': 'sylwia', 'color': 'green', 'tags': [], 'author': {'name': 'admin'}},
{'id': 4, 'name': 'shane', 'color': 'red', 'tags': [], 'author': {'name': 'admin'}},
{'id': 5, 'name': 'shane', 'color': 'red', 'tags': ['python', 'django'], 'author': {'name': 'user'}}
]
and I want to make it ORM'able, such as what Django has doing:
ModelName.objects.filter(color__icontains="gree")
And this what I have do;
import operator
from collections import namedtuple
from django.core.exceptions import MultipleObjectsReturned, ObjectDoesNotExist
class DataQuerySet:
"""
Custom ORM for List dict data,
https://stackoverflow.com/a/58351973/6396981
"""
allowed_operations = {
'gt': operator.gt,
'lt': operator.lt,
'eq': operator.eq,
'icontains': operator.contains
}
def __init__(self, data):
self.data = data
def all(self):
return self.data
def filter(self, **kwargs):
"""
>>> kwargs = {'name': 'sylwia', 'id__gt': 1}
>>> DataQuerySet().filter(**kwargs)
[{'id': 3, 'name': 'sylwia', 'color': 'green'}]
"""
operation = namedtuple('Q', 'op key value')
def parse_filter(item):
"""item is expected to be a tuple with exactly two elements
>>> parse_filter(('id__gt', 2))
Q(op=<built-in function gt>, key='id', value=2)
>>> parse_filter(('id__ ', 2))
Q(op=<built-in function eq>, key='id', value=2)
>>> parse_filter(('color__bad', 'red'))
Traceback (most recent call last):
...
AssertionError: 'bad' operation is not allowed
"""
key, *op = item[0].split('__')
# no value after __ means exact value query, e.g. name='sylvia'
op = ''.join(op).strip() or 'eq'
assert op in self.allowed_operations, f'{repr(op)} operation is not allowed'
return operation(self.allowed_operations[op], key, item[1])
filtered_data = self.data.copy()
for item in map(parse_filter, kwargs.items()):
filtered_data = [
entry for entry in filtered_data
if item.op(entry[item.key], item.value)
]
return filtered_data
def get(self, **kwargs):
"""
>>> DataQuerySet().get(id=3)
[{'id': 3, 'name': 'sylwia', 'color': 'green'}]
"""
operation = namedtuple('Q', 'op key value')
def parse_get(item):
key, *op = item[0].split('__')
return operation(self.allowed_operations['eq'], key, item[1])
filtered_data = self.data.copy()
for item in map(parse_get, kwargs.items()):
filtered_data = [
entry for entry in filtered_data
if item.op(entry[item.key], item.value)
]
if len(filtered_data) > 1:
raise MultipleObjectsReturned(filtered_data)
elif len(filtered_data) < 1:
raise ObjectDoesNotExist(kwargs)
return filtered_data[0]
And to use it:
class DataModel:
def __init__(self, data):
self._data = DataQuerySet(data)
#property
def objects(self):
return self._data
data = [
{'id': 1, 'name': 'brad', 'color': 'red', 'tags': [], 'author': {'name': 'admin'}},
{'id': 2, 'name': 'sylvia', 'color': 'blue', 'tags': [], 'author': {'name': 'user'}},
{'id': 3, 'name': 'sylwia', 'color': 'green', 'tags': [], 'author': {'name': 'admin'}},
{'id': 4, 'name': 'shane', 'color': 'red', 'tags': [], 'author': {'name': 'admin'}},
{'id': 5, 'name': 'shane', 'color': 'red', 'tags': ['python', 'django'], 'author': {'name': 'user'}}
]
d = DataModel(data)
print(d.objects.filter(id__gt=2))
print(d.objects.filter(color='green'))
print(d.objects.filter(color__icontains='gree'))
print(d.objects.get(id=1))
Above tests is just work properly, but seems we have a problem when we want to do more:
print(d.objects.filter(tags__in=['python']))
print(d.objects.filter(author__name='admin'))
print(d.objects.filter(author__name__icontains='use'))
Finally, I found a nice module to handle that case, it called with reobject, and here is the test:
from reobject.models import Model, Field
from reobject.query.parser import Q as Query
data = [
{'name': 'brad', 'color': 'red', 'tags': [], 'author': {'name': 'admin'}},
{'name': 'sylvia', 'color': 'blue', 'tags': [], 'author': {'name': 'user'}},
{'name': 'sylwia', 'color': 'green', 'tags': [], 'author': {'name': 'admin'}},
{'name': 'shane', 'color': 'red', 'tags': [], 'author': {'name': 'admin'}},
{'name': 'shane', 'color': 'red', 'tags': ['python', 'django'], 'author': {'name': 'user'}}
]
class Book(Model):
name = Field()
color = Field()
tags = Field()
author = Field()
for item in data:
Book(**item)
Book.objects.all()
Book.objects.get(name='brad')
Book.objects.filter(name='brad')
Book.objects.filter(author__name='admin')
Book.objects.filter(tags__contains='python')
Book.objects.filter(Query(author__name='admin') | Query(author__name='user'))
Meanwhile, it still doesn't support with id or pk fields.
Mybe because it already taken.
I wrote a class and everything works fine until i try to pass a parameter through calling a variable.
Let me show you:
INSTANCE ONE - ARGUMENTS PASSED DIRECTLY
a = Statements("AAPL","income_statement", "FY", ["2017","2018"])
d = a.get()
print(d)
Output (all good):
[{'tag': 'operatingrevenue', 'value': 229234000000.0}, {'tag': 'totalrevenue', 'value': 229234000000.0}, {'tag': 'operatingcostofrevenue', 'value': 141048000000.0}, {'tag': 'totalcostofrevenue', 'value': 141048000000.0}, {'tag': 'totalgrossprofit', 'value': 88186000000.0}, {'tag': 'sgaexpense', 'value': 15261000000.0}, {'tag': 'rdexpense', 'value': 11581000000.0}, {'tag': 'totaloperatingexpenses', 'value': 26842000000.0}, {'tag': 'totaloperatingincome', 'value': 61344000000.0}, {'tag': 'otherincome', 'value': 2745000000.0}, {'tag': 'totalotherincome', 'value': 2745000000.0}, {'tag': 'totalpretaxincome', 'value': 64089000000.0}, {'tag': 'incometaxexpense', 'value': 15738000000.0}, {'tag': 'netincomecontinuing', 'value': 48351000000.0}, {'tag': 'netincome', 'value': 48351000000.0}, {'tag': 'netincometocommon', 'value': 48351000000.0}, {'tag': 'weightedavebasicsharesos', 'value': 5217242000.0}, {'tag': 'basiceps', 'value': 9.27}, {'tag': 'weightedavedilutedsharesos', 'value': 5251692000.0}, {'tag': 'dilutedeps', 'value': 9.21}, {'tag': 'weightedavebasicdilutedsharesos', 'value': 5215900000.0}, {'tag': 'basicdilutedeps', 'value': 9.27}, {'tag': 'cashdividendspershare', 'value': 2.4}]
{'ticker': 'AAPL', 'statement': 'income_statement', 'type': 'FY', 'fiscal_year': '2017'}
[{'tag': 'operatingrevenue', 'value': 265595000000.0}, {'tag': 'totalrevenue', 'value': 265595000000.0}, {'tag': 'operatingcostofrevenue', 'value': 163756000000.0}, {'tag': 'totalcostofrevenue', 'value': 163756000000.0}, {'tag': 'totalgrossprofit', 'value': 101839000000.0}, {'tag': 'sgaexpense', 'value': 16705000000.0}, {'tag': 'rdexpense', 'value': 14236000000.0}, {'tag': 'totaloperatingexpenses', 'value': 30941000000.0}, {'tag': 'totaloperatingincome', 'value': 70898000000.0}, {'tag': 'otherincome', 'value': 2005000000.0}, {'tag': 'totalotherincome', 'value': 2005000000.0}, {'tag': 'totalpretaxincome', 'value': 72903000000.0}, {'tag': 'incometaxexpense', 'value': 13372000000.0}, {'tag': 'netincomecontinuing', 'value': 59531000000.0}, {'tag': 'netincome', 'value': 59531000000.0}, {'tag': 'netincometocommon', 'value': 59531000000.0}, {'tag': 'weightedavebasicsharesos', 'value': 4955377000.0}, {'tag': 'basiceps', 'value': 12.01}, {'tag': 'weightedavedilutedsharesos', 'value': 5000109000.0}, {'tag': 'dilutedeps', 'value': 11.91}, {'tag': 'weightedavebasicdilutedsharesos', 'value': 4956800000.0}, {'tag': 'basicdilutedeps', 'value': 12.01}, {'tag': 'cashdividendspershare', 'value': 2.72}]
{'ticker': 'AAPL', 'statement': 'income_statement', 'type': 'FY', 'fiscal_year': '2018'}
INSTANCE TWO - ARGUMENTS PASSED THROUGH VARIABLE
ticker = "MMM"
__________
Class ***:
class code
__________
e= Statements(ticker,"income_statement","FY", ["2017", "2018"])
f = e.get()
print(e)
Output (not good):
{'ticker': 'MMM', 'statement': 'income_statement', 'type': 'FY', 'fiscal_year': '2017'}
Traceback (most recent call last):
[]
File "C:/Users/ruleb/Desktop/python test/Ptf_Project/Financials.py", line 96, in <module>
{'ticker': 'MMM', 'statement': 'income_statement', 'type': 'FY', 'fiscal_year': '2018'}
f = e.get()
File "C:/Users/ruleb/Desktop/python test/Ptf_Project/Financials.py", line 86, in get
df = df.applymap(lambda x: x["value"])
File "C:\Users\ruleb\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 6072, in applymap
return self.apply(infer)
File "C:\Users\ruleb\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 6014, in apply
return op.get_result()
File "C:\Users\ruleb\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\apply.py", line 318, in get_result
return super(FrameRowApply, self).get_result()
File "C:\Users\ruleb\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\apply.py", line 142, in get_result
return self.apply_standard()
File "C:\Users\ruleb\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\apply.py", line 248, in apply_standard
self.apply_series_generator()
File "C:\Users\ruleb\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\apply.py", line 277, in apply_series_generator
results[i] = self.f(v)
File "C:\Users\ruleb\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 6070, in infer
return lib.map_infer(x.astype(object).values, func)
File "pandas/_libs/src\inference.pyx", line 1472, in pandas._libs.lib.map_infer
File "C:/Users/ruleb/Desktop/python test/Ptf_Project/Financials.py", line 86, in <lambda>
df = df.applymap(lambda x: x["value"])
TypeError: ("'NoneType' object is not subscriptable", 'occurred at index operatingrevenue')
Process finished with exit code 1
I'm attaching the full code for your reference:
import requests
import pandas as pd
ticker = "MMM"
class Statements:
def __init__(self,ticker = "AAPL",statement= "income_statement",period= "FY",fiscal_year = ["2017","2018"]):
self.ticker = ticker
self.statement = statement
self.period = period
self.fiscal_year = fiscal_year
# , ticker, statement, period, fiscal_year
def get(self):
api_username = 'x'
api_password = 'x'
base_url = "https://api.intrinio.com"
s = []
for year in self.fiscal_year:
request_url = base_url + "/financials/standardized"
query_params = {
'ticker': self.ticker,
'statement': self.statement,
'type': self.period,
'fiscal_year': year
}
response = requests.get(request_url, params=query_params, auth=(api_username, api_password))
if response.status_code == 401: print("Unauthorized! Check your username and password."); exit()
data = response.json()["data"]
s.append(data)
print(data)
print(query_params)
df = pd.DataFrame(s, index = self.fiscal_year)
df.columns = [i["tag"] for i in df.iloc[0].values]
df = df.applymap(lambda x: x["value"])
# print(df)
return df
a = Statements("AAPL","income_statement", "FY", ["2017","2018"])
d = a.get()
print(d)
e= Statements(ticker,"income_statement","FY", ["2017", "2018"])
f = e.get()
print(f)
I don't understand what difference does calling through external. variable makes.
Thanks to all!
Your assumption is wrong, it is not the external variable that causes the error. You are querying different tickers: "AAPL" vs "MMM"
The error is in the data you process - you got None's somewhere.
If you use
e = Statements("MMM","income_statement","FY", ["2017", "2018"])
you'll get the same error.
The problem is that somewhere you get a None value and try to use None[...] - but Nones are not subscriptable.
Debug these lines:
data = response.json()["data"] # json result might be None ?
df.columns = [i["tag"] for i in df.iloc[0].values] # i might be None
df = df.applymap(lambda x: x["value"]) # x might be None
I want to compare below dictionaries. Name key in the dictionary is common in both dictionaries.
If Name matched in both the dictionaries, i wanted to do some other stuff with the data.
PerfData = [
{'Name': 'abc', 'Type': 'Ex1', 'Access': 'N1', 'perfStatus':'Latest Perf', 'Comments': '07/12/2017 S/W Version'},
{'Name': 'xyz', 'Type': 'Ex1', 'Access': 'N2', 'perfStatus':'Latest Perf', 'Comments': '11/12/2017 S/W Version upgrade failed'},
{'Name': 'efg', 'Type': 'Cust1', 'Access': 'A1', 'perfStatus':'Old Perf', 'Comments': '11/10/2017 S/W Version upgrade failed, test data is active'}
]
beatData = [
{'Name': 'efg', 'Status': 'Latest', 'rcvd-timestamp': '1516756202.632'},
{'Name': 'abc', 'Status': 'Latest', 'rcvd-timestamp': '1516756202.896'}
]
Thanks
Rajeev
l = [{'name': 'abc'}, {'name': 'xyz'}]
k = [{'name': 'a'}, {'name': 'abc'}]
[i['name'] for i in l for f in k if i['name'] == f['name']]
Hope above logic work for you.
The answer provided didn't assign the result to any variable. If you want to print it, add the following would work:
result = [i['name'] for i in l for f in k if i['name'] == f['name']]
print(result)