Eliminate keys from list of dict python - python

i am pulling out information from this websites API:
https://financialmodelingprep.com/
to be specific i need the data from the income statements:
https://financialmodelingprep.com/developer/docs/#Company-Financial-Statements
what i get back from the API is a list, which contains 36 dictionarys with the following Data:
[ {
"date" : "2019-09-28",
"symbol" : "AAPL",
"fillingDate" : "2019-10-31 00:00:00",
"acceptedDate" : "2019-10-30 18:12:36",
"period" : "FY",
"revenue" : 260174000000,
"costOfRevenue" : 161782000000,
"grossProfit" : 98392000000,
"grossProfitRatio" : 0.378178,
"researchAndDevelopmentExpenses" : 16217000000,
"generalAndAdministrativeExpenses" : 18245000000,
"sellingAndMarketingExpenses" : 0.0,
"otherExpenses" : 1807000000,
"operatingExpenses" : 34462000000,
"costAndExpenses" : 196244000000,
"interestExpense" : 3576000000,
"depreciationAndAmortization" : 12547000000,
"ebitda" : 81860000000,
"ebitdaratio" : 0.314636,
"operatingIncome" : 63930000000,
"operatingIncomeRatio" : 0.24572,
"totalOtherIncomeExpensesNet" : 422000000,
"incomeBeforeTax" : 65737000000,
"incomeBeforeTaxRatio" : 0.252666,
"incomeTaxExpense" : 10481000000,
"netIncome" : 55256000000,
"netIncomeRatio" : 0.212381,
"eps" : 2.97145,
"epsdiluted" : 2.97145,
"weightedAverageShsOut" : 18595652000,
"weightedAverageShsOutDil" : 18595652000,
"link" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119-index.html",
"finalLink" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/a10-k20199282019.htm"
}, ...
]
What i dont need in the dictionary are the keys:
fillingDate, acceptedDate, link, finalLink
I managed to remove them, but my problem is that now that piece of code i wrote spits out those dictionaries way too often, and i am not able to understand why...
Here is what i tried:
import requests
import json
url = "https://financialmodelingprep.com/api/v3/income-statement/AAPL?apikey=b60bb3d1967bb15bfb9daaa4426e77dc"
response = requests.get(url)
data = response.text
dataList = json.loads(data)
entriesToRemove = {
'fillingDate' : 0,
'acceptedDate' : 0,
'link' : 0,
'finalLink' : 0
}
removedEntries = []
newDict = {}
for index in range(len(dataList)):
for key in dataList[index]:
newDict[key] = dataList[index].get(key)
if key in entriesToRemove:
removedEntries = newDict.pop(key)
print(json.dumps(newDict, indent=4))
Thanks in advance

OP:
for each key in the dictionary, the dictionary gets printed a new time.
Reason:
for index in range(len(dataList)):
for key in dataList[index]:
newDict[key] = dataList[index].get(key)
if key in entriesToRemove:
removedEntries = newDict.pop(key)
print(json.dumps(newDict, indent=4)) # notice this line
The reason why the dictionary is printed for each key is because you have a print(json.dumps(newDict, indent=4)) statement inside the loop for each key-val iteration over the dictionary.
To eradicate the highlighted keys from a list of dict, you could iterate over the list and create another list of dict without the unnecessary keys:
s = [ {
"date" : "2019-09-28",
"symbol" : "AAPL",
"fillingDate" : "2019-10-31 00:00:00",
"acceptedDate" : "2019-10-30 18:12:36",
"period" : "FY",
"revenue" : 260174000000,
"costOfRevenue" : 161782000000,
"grossProfit" : 98392000000,
"grossProfitRatio" : 0.378178,
"researchAndDevelopmentExpenses" : 16217000000,
"generalAndAdministrativeExpenses" : 18245000000,
"sellingAndMarketingExpenses" : 0.0,
"otherExpenses" : 1807000000,
"operatingExpenses" : 34462000000,
"costAndExpenses" : 196244000000,
"interestExpense" : 3576000000,
"depreciationAndAmortization" : 12547000000,
"ebitda" : 81860000000,
"ebitdaratio" : 0.314636,
"operatingIncome" : 63930000000,
"operatingIncomeRatio" : 0.24572,
"totalOtherIncomeExpensesNet" : 422000000,
"incomeBeforeTax" : 65737000000,
"incomeBeforeTaxRatio" : 0.252666,
"incomeTaxExpense" : 10481000000,
"netIncome" : 55256000000,
"netIncomeRatio" : 0.212381,
"eps" : 2.97145,
"epsdiluted" : 2.97145,
"weightedAverageShsOut" : 18595652000,
"weightedAverageShsOutDil" : 18595652000,
"link" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119-index.html",
"finalLink" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/a10-k20199282019.htm"
}
]
res = []
ignored_keys = ['fillingDate', 'acceptedDate', 'link', 'finalLink']
for dd in s:
for k,v in dd.items():
if k not in ignored_keys:
res.append({k: v})
print(res)
EDIT:
one-liner:
print({k:v for dd in s for k,v in dd.items() if k not in ignored_keys})

Related

Auto increment pymongo

I am trying to auto increment a field in my mongo collection. The field is an 'id' field and it contains the 'id' of each document. For example. 1, 2, 3 etc.
What I want to happen is insert a new document and take the 'id' from the last document and add 1 to it so that the new document is lastID + 1.
The way I have written the code makes it so that it gets the last document and adds 1 to the last document and then updates it. So if the last id is 5, then the new document will have 5 and the document that I was incrementing on now has the new 'id' of 6.
I am not sure how to get round this so any help would be appreciated.
Code
last_id = pokemons.find_one({}, sort=[( 'id', -1)])
last_pokemon = pokemons.find_one_and_update({'id' : last_id['id']}, {'$inc': {'id': 1}}, sort=[( 'id', -1)])
new_pokemon = {
"name" : name, "avg_spawns" : avg_spawns, "candy" : candy, "img" : img_link, "weaknesses" : [], "type" : [], "candy_count" : candy_count,
"egg" : egg, "height" : height, "multipliers" : [], "next_evolution" : [], "prev_evolution" : [],
"spawn_chance" : spawn_chance, "spawn_time" : spawn_time, "weight" : weight, "id" : last_pokemon['id'], "num" : last_pokemon['id'],
}
pokemons.insert_one(new_pokemon)
The variables in new_pokemon don't matter as I am just having issues with the last_pokemon part
The find_one command in MongoDB command doesn't support sort functionality. You have to make use of normal find command with limit parameter set to 1.
last_id = pokemons.find({}, {"id": 1}, sort=[('id', -1)]).limit(1).next() # Will error if there are no documents in collection due to the usage of `next()`
last_id["id"] += 1
new_pokemon = {
"name" : name, "avg_spawns" : avg_spawns, "candy" : candy, "img" : img_link, "weaknesses" : [], "type" : [], "candy_count" : candy_count,
"egg" : egg, "height" : height, "multipliers" : [], "next_evolution" : [], "prev_evolution" : [],
"spawn_chance" : spawn_chance, "spawn_time" : spawn_time, "weight" : weight, "id" : last_id['id'], "num" : last_id['id'],
}
pokemons.insert_one(new_pokemon)

MongoDB values to Dict in python

Basically I need to connect to MongoDB documents records and put into values into dict.
**MongoDB Values**
{ "_id" : "LAC1397", "code" : "MIS", "label" : "Marshall Islands", "mappingName" : "RESIDENTIAL_COUNTRY" }
{ "_id" : "LAC1852", "code" : "COP", "label" : "Colombian peso", "mappingName" : "FOREIGN_CURRENCY_CODE"}
How do i map it to dict in the below fashion in python
**syntax :**
dict = {"mappingName|Code" : "Value" }
**Example :**
dict = { "RESIDENTIAL_COUNTRY|MIS" : "Marshall Islands" , "FOREIGN_CURRENCY_CODE|COP" : "Colombian peso" , "COMM_LANG|ENG" : "English" }
**Python Code**
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client.mongo
collection = db.masters
for post in collection.find():
Got stuck after this , not sure how to put into dict in the mentioned method
post will be a dict with the values from mongo, so you can loop the records and append to a new dictionary. As the comments mention, any duplicates would be overridden by the last found value. If this might be an issue, consider a sort() on the find() function.
Sample code:
from pymongo import MongoClient
db = MongoClient()['mydatabase']
db.mycollection.insert_one({ "_id" : "LAC1397", "code" : "MIS", "label" : "Marshall Islands", "mappingName" : "RESIDENTIAL_COUNTRY" })
db.mycollection.insert_one({ "_id" : "LAC1852", "code" : "COP", "label" : "Colombian peso", "mappingName" : "FOREIGN_CURRENCY_CODE"})
mydict = {}
for post in db.mycollection.find():
k = f"{post.get('mappingName')}|{post.get('code')}"
mydict[k] = post.get('label')
print(mydict)
Gives:
{'RESIDENTIAL_COUNTRY|MIS': 'Marshall Islands', 'FOREIGN_CURRENCY_CODE|COP': 'Colombian peso'}

Extract values from oddly-nested Python

I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.
Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']
somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.
This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])

Scraping different style of Json

I am familiar with scraping data in this format.
{"data":[{"assists":0,"assistsPerGame":0.0000,"evAssists":0,"evPoints":0,"gamesPlayed":1,"goals":0,"penaltyMinutes":0,"playerBirthCity":"Windsor","playerBirthCountry":"CAN","playerBirthDate":"1996-02-07",
import csv
import requests
outfile = open("NHL_Recent.csv","a",newline='')
writer = csv.writer(outfile)
writer.writerow(["Player","Pos","GP","G","A","P","+/-","PIM","PPG","PPP","SHG","SHP","GWG","OTG","S","S%","TOI","Shifts/PG","FOW%"])
req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=true&reportType=basic&isGame=true&reportName=skatersummary&sort=[{%22property%22:%22shots%22,%22direction%22:%22DESC%22}]&cayenneExp=gameDate%3E=%222017-11-4%22%20and%20gameDate%3C=%222017-11-10%22%20and%20gameTypeId=2')
data = req.json()['data']
for item in data:
Player = item['playerName']
Pos = item['playerPositionCode']
GP = item['gamesPlayed']
But not in this manner.
"totalItems" : 600,
"totalEvents" : 0,
"totalGames" : 600,
"totalMatches" : 0,
"wait" : 10,
"dates" : [ {
"date" : "2017-10-04",
"totalItems" : 4,
"totalEvents" : 0,
"totalGames" : 4,
"totalMatches" : 0,
"games" : [ {
"gamePk" : 2017020001,
"link" : "/api/v1/game/2017020001/feed/live",
"gameType" : "R",
"season" : "20172018",
"gameDate" : "2017-10-04T23:00:00Z",
"status" : {
"abstractGameState" : "Final",
"codedGameState" : "7",
"detailedState" : "Final",
"statusCode" : "7",
"startTimeTBD" : false
},
"teams" : {
"away" : {
"leagueRecord" : {
"wins" : 1,
"losses" : 0,
"ot" : 0,
"type" : "league"
},
"score" : 7,
"team" : {
"id" : 10,
"name" : "Toronto Maple Leafs",
"link" : "/api/v1/teams/10",
"venue" : {
"name" : "Air Canada Centre",
"link" : "/api/v1/venues/null",
"city" : "Toronto",
"timeZone" : {
"id" : "America/Toronto",
"offset" : -5,
"tz" : "EST"
}
},
"abbreviation" : "TOR",
"teamName" : "Maple Leafs",
"locationName" : "Toronto",
"firstYearOfPlay" : "1926",
"division" : {
"id" : 17,
"name" : "Atlantic",
"link" : "/api/v1/divisions/17"
},
"conference" : {
"id" : 6,
"name" : "Eastern",
"link" : "/api/v1/conferences/6"
},
"franchise" : {
"franchiseId" : 5,
"teamName" : "Maple Leafs",
"link" : "/api/v1/franchises/5
This is what I have so far with no success.
import csv
import requests
import os
outfile = open("NHL DIF JSON.csv","a",newline='')
writer = csv.writer(outfile)
writer.writerow(["Date","Game","gamep"])
req = requests.get('https://statsapi.web.nhl.com/api/v1/schedule?startDate=2017-10-04&endDate=2018-04-30&expand=schedule.teams,schedule.linescore,schedule.broadcasts.all,schedule.ticket,schedule.game.content.media.epg,schedule.radioBroadcasts,schedule.metadata,schedule.game.seriesSummary,seriesSummary.series&leaderCategories=&leaderGameTypes=R&site=en_nhl&teamId=&gameType=&timecode=')
data = req.json()['dates']
for item in data:
Date = item['date']
##for item in games:
Game = item['0']
gamep = item['gamePk']
print(Date,Game)
writer.writerow([Date,Game,gamep])
outfile.close()
os.system("taskkill /f /im pythonw.exe")
I Would like to pull the "gamePk", "gameDate" from totalGames along with the teamNames within "teams" and other categories. I eventually would like to put that into a csv with the gamePk, gameDate, teams, score, etc. I'm just not sure how to get through the individual categories, any help would be greatly appreciated! Thanks!
It's normal json data, just a bit complicated. You can get the date from data['dates'][i]['date']. For the teams, score, etc you have to iterate over data['dates'][i]['games'].
req = requests.get('https://statsapi.web.nhl.com/api/v1/schedule?startDate=2017-10-04&endDate=2018-04-30&expand=schedule.teams,schedule.linescore,schedule.broadcasts.all,schedule.ticket,schedule.game.content.media.epg,schedule.radioBroadcasts,schedule.metadata,schedule.game.seriesSummary,seriesSummary.series&leaderCategories=&leaderGameTypes=R&site=en_nhl&teamId=&gameType=&timecode=')
data = req.json()
my_data =[]
for item in data['dates']:
date = item['date']
games = item['games']
for game in games:
gamePk = game['gamePk']
gameDate = game['gameDate']
team_away, team_home = game['teams']['away'], game['teams']['home']
team_away_score = team_away['score']
team_home_score = team_home['score']
team_away_name = team_away['team']['name']
team_home_name = team_home['team']['name']
my_data.append([date, gamePk, gameDate, team_away_name, team_home_name, team_away_score, team_home_score])
headers = ["Date","Game","gamep","gameDate","team_away_name","team_home_name","team_away_score","team_home_score"]
with open("my_file.csv", "a", newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(my_data)
As for your last question, you can get the 'pk' from data['gameData']['game']['pk']. The player, event, triCode and coordinates values are a little harder to get because some items don't have 'players' and 'team' keys, or the 'coordinates' dict is empty.
In this case the dict.get method can be helpful because it will return None (or you can set a default value) if you try to access a non-existent key.
Still you have to design your code according to the structure of the json data, example:
req = requests.get('https://statsapi.web.nhl.com/api/v1/game/2017020001/feed/live?site=en_nhl')
data = req.json()
my_data = []
pk = data['gameData']['game']['pk']
for item in data['liveData']['plays']['allPlays']:
players = item.get('players')
if players:
player_a = players[0]['player']['fullName'] if len(players) > 0 else None
player_b = players[1]['player']['fullName'] if len(players) > 1 else None
else:
player_a, player_b = None, None
event = item['result']['event']
triCode = item.get('team', {}).get('triCode')
coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
my_data.append([pk, player_a, player_b, event, triCode, coordinates_x, coordinates_y])
for row in my_data:
print(row)

Python and Json

I'm trying out json since I have to work with Cisco API and I can't figure out how to loop through the json object.I can get my keys but i can't get the values.
I am using http://www.jsoneditoronline.org/ to me understand json format but this is what I have so far..
json file :
{
"queryResponse" : {
"#rootUrl" : "\/webacs\/data",
"#requestUrl" : "https : \/\/192.168.116.207\/webacs\/api\/v1\/data\/DeviceGroups\/42",
"#responseType" : "getEntity",
"entity" : {
"#url" : "\/webacs\/data\/className\/15",
"#type" : "className",
"#dtoType" : "deviceGroupsDTO_$$_javassist_5196",
"deviceGroupsDTO" : {
"#id" : "15",
"#displayName" : "String value",
"clearedAlarms" : 1,
"criticalAlarms" : 1,
"groupId" : 2,
"groupName" : "String value",
"informationAlarms" : 1,
"majorAlarms" : 1,
"minorAlarms" : 1,
"name" : "String value",
"warningAlarms" : 1
}
}
}
}
My python script :
import json
jsondata = json.load(open('data.json'))
for rows in jsondata['queryResponse']['entity']['deviceGroupsDTO']:
print(rows)
it print's :
name
#id
warningAlarms
#displayName
informationAlarms
clearedAlarms
majorAlarms
groupId
groupName
criticalAlarms
minorAlarms
not sure what i'm doing wrong...
jsondata['queryResponse']['entity']['deviceGroupsDTO'] is a dictionary.
Iterate over items() to get key, value pairs:
for key, value in jsondata['queryResponse']['entity']['deviceGroupsDTO'].items():
print(key, value)
Note that, in case of python2, you would better use iteritems() in place of items().
See also: What is the difference between dict.items() and dict.iteritems()?

Categories