Auto increment pymongo - python

I am trying to auto increment a field in my mongo collection. The field is an 'id' field and it contains the 'id' of each document. For example. 1, 2, 3 etc.
What I want to happen is insert a new document and take the 'id' from the last document and add 1 to it so that the new document is lastID + 1.
The way I have written the code makes it so that it gets the last document and adds 1 to the last document and then updates it. So if the last id is 5, then the new document will have 5 and the document that I was incrementing on now has the new 'id' of 6.
I am not sure how to get round this so any help would be appreciated.
Code
last_id = pokemons.find_one({}, sort=[( 'id', -1)])
last_pokemon = pokemons.find_one_and_update({'id' : last_id['id']}, {'$inc': {'id': 1}}, sort=[( 'id', -1)])
new_pokemon = {
"name" : name, "avg_spawns" : avg_spawns, "candy" : candy, "img" : img_link, "weaknesses" : [], "type" : [], "candy_count" : candy_count,
"egg" : egg, "height" : height, "multipliers" : [], "next_evolution" : [], "prev_evolution" : [],
"spawn_chance" : spawn_chance, "spawn_time" : spawn_time, "weight" : weight, "id" : last_pokemon['id'], "num" : last_pokemon['id'],
}
pokemons.insert_one(new_pokemon)
The variables in new_pokemon don't matter as I am just having issues with the last_pokemon part

The find_one command in MongoDB command doesn't support sort functionality. You have to make use of normal find command with limit parameter set to 1.
last_id = pokemons.find({}, {"id": 1}, sort=[('id', -1)]).limit(1).next() # Will error if there are no documents in collection due to the usage of `next()`
last_id["id"] += 1
new_pokemon = {
"name" : name, "avg_spawns" : avg_spawns, "candy" : candy, "img" : img_link, "weaknesses" : [], "type" : [], "candy_count" : candy_count,
"egg" : egg, "height" : height, "multipliers" : [], "next_evolution" : [], "prev_evolution" : [],
"spawn_chance" : spawn_chance, "spawn_time" : spawn_time, "weight" : weight, "id" : last_id['id'], "num" : last_id['id'],
}
pokemons.insert_one(new_pokemon)

Related

Eliminate keys from list of dict python

i am pulling out information from this websites API:
https://financialmodelingprep.com/
to be specific i need the data from the income statements:
https://financialmodelingprep.com/developer/docs/#Company-Financial-Statements
what i get back from the API is a list, which contains 36 dictionarys with the following Data:
[ {
"date" : "2019-09-28",
"symbol" : "AAPL",
"fillingDate" : "2019-10-31 00:00:00",
"acceptedDate" : "2019-10-30 18:12:36",
"period" : "FY",
"revenue" : 260174000000,
"costOfRevenue" : 161782000000,
"grossProfit" : 98392000000,
"grossProfitRatio" : 0.378178,
"researchAndDevelopmentExpenses" : 16217000000,
"generalAndAdministrativeExpenses" : 18245000000,
"sellingAndMarketingExpenses" : 0.0,
"otherExpenses" : 1807000000,
"operatingExpenses" : 34462000000,
"costAndExpenses" : 196244000000,
"interestExpense" : 3576000000,
"depreciationAndAmortization" : 12547000000,
"ebitda" : 81860000000,
"ebitdaratio" : 0.314636,
"operatingIncome" : 63930000000,
"operatingIncomeRatio" : 0.24572,
"totalOtherIncomeExpensesNet" : 422000000,
"incomeBeforeTax" : 65737000000,
"incomeBeforeTaxRatio" : 0.252666,
"incomeTaxExpense" : 10481000000,
"netIncome" : 55256000000,
"netIncomeRatio" : 0.212381,
"eps" : 2.97145,
"epsdiluted" : 2.97145,
"weightedAverageShsOut" : 18595652000,
"weightedAverageShsOutDil" : 18595652000,
"link" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119-index.html",
"finalLink" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/a10-k20199282019.htm"
}, ...
]
What i dont need in the dictionary are the keys:
fillingDate, acceptedDate, link, finalLink
I managed to remove them, but my problem is that now that piece of code i wrote spits out those dictionaries way too often, and i am not able to understand why...
Here is what i tried:
import requests
import json
url = "https://financialmodelingprep.com/api/v3/income-statement/AAPL?apikey=b60bb3d1967bb15bfb9daaa4426e77dc"
response = requests.get(url)
data = response.text
dataList = json.loads(data)
entriesToRemove = {
'fillingDate' : 0,
'acceptedDate' : 0,
'link' : 0,
'finalLink' : 0
}
removedEntries = []
newDict = {}
for index in range(len(dataList)):
for key in dataList[index]:
newDict[key] = dataList[index].get(key)
if key in entriesToRemove:
removedEntries = newDict.pop(key)
print(json.dumps(newDict, indent=4))
Thanks in advance
OP:
for each key in the dictionary, the dictionary gets printed a new time.
Reason:
for index in range(len(dataList)):
for key in dataList[index]:
newDict[key] = dataList[index].get(key)
if key in entriesToRemove:
removedEntries = newDict.pop(key)
print(json.dumps(newDict, indent=4)) # notice this line
The reason why the dictionary is printed for each key is because you have a print(json.dumps(newDict, indent=4)) statement inside the loop for each key-val iteration over the dictionary.
To eradicate the highlighted keys from a list of dict, you could iterate over the list and create another list of dict without the unnecessary keys:
s = [ {
"date" : "2019-09-28",
"symbol" : "AAPL",
"fillingDate" : "2019-10-31 00:00:00",
"acceptedDate" : "2019-10-30 18:12:36",
"period" : "FY",
"revenue" : 260174000000,
"costOfRevenue" : 161782000000,
"grossProfit" : 98392000000,
"grossProfitRatio" : 0.378178,
"researchAndDevelopmentExpenses" : 16217000000,
"generalAndAdministrativeExpenses" : 18245000000,
"sellingAndMarketingExpenses" : 0.0,
"otherExpenses" : 1807000000,
"operatingExpenses" : 34462000000,
"costAndExpenses" : 196244000000,
"interestExpense" : 3576000000,
"depreciationAndAmortization" : 12547000000,
"ebitda" : 81860000000,
"ebitdaratio" : 0.314636,
"operatingIncome" : 63930000000,
"operatingIncomeRatio" : 0.24572,
"totalOtherIncomeExpensesNet" : 422000000,
"incomeBeforeTax" : 65737000000,
"incomeBeforeTaxRatio" : 0.252666,
"incomeTaxExpense" : 10481000000,
"netIncome" : 55256000000,
"netIncomeRatio" : 0.212381,
"eps" : 2.97145,
"epsdiluted" : 2.97145,
"weightedAverageShsOut" : 18595652000,
"weightedAverageShsOutDil" : 18595652000,
"link" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119-index.html",
"finalLink" : "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/a10-k20199282019.htm"
}
]
res = []
ignored_keys = ['fillingDate', 'acceptedDate', 'link', 'finalLink']
for dd in s:
for k,v in dd.items():
if k not in ignored_keys:
res.append({k: v})
print(res)
EDIT:
one-liner:
print({k:v for dd in s for k,v in dd.items() if k not in ignored_keys})

Extract values from oddly-nested Python

I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.
Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']
somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.
This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])

IfNull with Zero value is Boolean False. How to count?

EDIT : more explicit exemple
I would like to count the number of values of one specific field in a collection.
chosenSensors = ["CO2_BUR_NE_I_001", "CO2_CEL_SE_I_001"]
match = {'$match':{'$or':list(map(lambda x:{x:{'$exists': True}}, chosenSensors))}}
group = {'$group':{'_id':{'year':{'$year':'$timestamp'}}}}
project = {'$project':{}}
for chosenSensor in chosenSensors:
group['$group'][chosenSensor+'-Count'] = {'$sum':{'$cond':[{'$ifNull':['$'+chosenSensor, False]}, 1, 0]}}
project['$project'][chosenSensor+'-Count'] = True
sort = {'$sort': {"_id":1}}
pipeline = [match, group, project, sort]
for doc in client["cleanData"]["test"].aggregate(pipeline):
print(doc)
Just below is one sample of my collection. I would like to count the number of values in CO2_BUR_NE_I_001.
I expect to have a count of 4.
{
"_id" : ObjectId("593ab6021ccb9b0c0fb226fd"),
"timestamp" : ISODate("2016-11-17T12:36:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 880
}
{
"_id" : ObjectId("593ab6021ccb9b0c0fb226fe"),
"timestamp" : ISODate("2016-11-17T12:37:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 880
}
{
"_id" : ObjectId("593ab6021ccb9b0c0fb226ff"),
"timestamp" : ISODate("2016-11-17T12:38:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 0
}
{
"_id" : ObjectId("593ab63a1ccb9b0c0fb3d3e5"),
"timestamp" : ISODate("2016-02-01T19:26:00.000Z"),
"CO2_CEL_SE_I_001" : 1080
}
{
"_id" : ObjectId("593ab6021ccb9b0c0fb22700"),
"timestamp" : ISODate("2016-11-17T12:39:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 880
}
{
"_id" : ObjectId("593ab6025ccb9b0c0fb226fd"),
"timestamp" : ISODate("2016-11-17T12:36:00.000Z"),
"TEM_ETG_001" : 1210
}
But I have 3. The value 0 of CO2_CEL_SE_I_001 is not counted as an existing value.
{'_id': {'year': 2016}, 'CO2_BUR_NE_I_001-Count': 3, 'CO2_CEL_SE_I_001-Count': 5}
If I replace 0 by 880 in the involved document...
{
"_id" : ObjectId("593ab6021ccb9b0c0fb226ff"),
"timestamp" : ISODate("2016-11-17T12:38:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 880
}
... I find the expected result
{'_id': {'year': 2016}, 'CO2_BUR_NE_I_001-Count': 4, 'CO2_CEL_SE_I_001-Count': 5}
EDIT : Beggining of an answer...
When I use $ifNull on a value which exists, it returns the value. However, when this value is 0, it returns 0. But this return is given to $cond, and when it's 0, the $cond is considered as False and it returns 0 instead of 1 to my $sum. How can i handle that?
Counting the number of values of one specific field in a collection.
You can use db.collection.distinct() to get distinct values from mongodb and then find length of list no need of aggregation.
values = db.collection.distinct('field',{Conditions})
print(len(values))
The method uses the fact than Null is lower than numbers (int, doubles, long) in the comparison order of BSON types values :
Documentation : comparison/Sort Order
So I just have to compare my value with None.
{'$sum':{'$cond':[{ '$gt': ['$'+chosenSensor, None]}, 1, 0]}}

how to select single field from _id embedded field mongodb

Is there a way to select only userId field from _id embedded document?
I tried with below query
And also i wants to delete all the documents that comes as output to the below query may be in a batch of 10000 per batch by keeping database load and this operation should not hamper the database. please suggest.
Sample Data:
"_id" : {
"Path" : 0,
"TriggerName" : "T1",
"userId" : NumberLong(231),
"Date" : "02/09/2017",
"OfferType" : "NOOFFER"
},
"OfferCount" : NumberLong(0),
"OfferName" : "NoOffer",
"trgtm" : NumberLong("1486623660308"),
"trgtype" : "PREDEFINED",
"desktopTop-normal" : NumberLong(1)
query:
mongo --eval 'db.l.find({"_id.Date": {"$lt" : "03/09/2017"}},{"_id.userId":1}).limit(1).forEach(printjson)
output:
{
"_id" : {
"Path" : 0,
"TriggerName" : "T1",
"userId" : NumberLong(231),
"Date" : "02/09/2017",
"OfferType" : "NOOFFER"
}

Parsing relatively structured text files in python and inserting in mongodb

Testbed: ABC123
Image : FOOBAR
Keyword: heredity
Date : 6/27
Other : XYZ suite crash
Suite : XYZ, crash post XYZ delivery
Failure:
Reason :
Known :
Failure:
Reason :
Known :
Type :
Notes :
Testbed: ABC456
Image : FOOBAR
Keyword: isolate
Date :6/27
Other : 3 random failures in 3 different test suites
Suite : LMO Frag
Failure: jumbo_v4_to_v6
Reason : ?
Known : ?
Type :
Notes :
Suite : XYZ suite
Failure: XYZ_v4_to_v4v
Reason : failed to receive expected packets
Known : ?
Type :
Notes :
Suite : RST
Failure: RST_udp_v4_to_v6
Reason : failed to receive expected packets
Known : ?
Type :
Notes :
Image : BARFOO
Keyword: repugnat
Date : 6/26
Other :
Suite : PQR test
Failure: unable to destroy flow - flow created without ppx flow id
Reason : SCRIPT issue
Known : maybe?
Type : embtest
Notes :
Suite : UVW suite
Failure: 8 failures in UVW duplicate - interworking cases not working!
Reason : ?
Known : ?
Type :
Notes :
I am trying to create documents of the type
{
"_id" : "xxxxxxxxxxxxx",
"platform" : "ABC123",
"image" : "FOOBAR",
"keyword" : "parricide",
"suite" : [
{
"name" : "RST (rst_only_v6v_to_v6)",
"notes" : "",
"failure" : "flow not added properly",
"reason" : "EMBTEST script issue",
"known" : "yes?",
"type" : ""
}
]
}
Where each document is unique based on the testbed, platform and image.
I have tried using regex and came up with something of this format but this is prone to human error in creating the structured text in which case this would fail due to its dependencies:
for iter in content:
if re.match(r"\s*testbed",iter,re.IGNORECASE):
testbed = iter.split(':')[1].strip()
if result_doc['platform'] == None:
result_doc['platform'] = testbed
if re.match(r"\s*image",iter,re.IGNORECASE):
image = iter.split(':')[1].strip()
if result_doc['image'] == None:
result_doc['image'] = image
if re.match(r"\s*keyword",iter,re.IGNORECASE):
keyword = iter.split(':')[1].strip()
if result_doc['keyword'] == None:
result_doc['keyword'] = keyword
key = str(testbed)+'-'+str(image)+'-'+str(keyword)
if prev_key == None:
prev_key = key
if key != prev_key: #if keys differ, then add to db
self.insert(result_doc)
prev_key = key
result_doc = self.getTemplate("result") #assign new document template
result_doc['platform'] = testbed
result_doc['image'] = image
result_doc['keyword'] = keyword
result_doc['_id'] = key
if re.match(r"\s*suite",iter,re.IGNORECASE):
suitename = iter.split(':')[1].strip()
if re.match(r"\s*Failure",iter,re.IGNORECASE):
suitefailure = iter.split(':')[1].strip()
result_suite = self.getTemplate("suite") # assign new suite template
result_suite['name'] = suitename
result_suite['failure'] = suitefailure
if re.match(r"\s*Reason",iter,re.IGNORECASE):
suitereason = iter.split(':')[1].strip()
result_suite['reason'] = suitereason
if re.match(r"\s*Known",iter,re.IGNORECASE):
suiteknown = iter.split(':')[1].strip()
result_suite['known'] = suiteknown
if re.match(r"\s*type",iter,re.IGNORECASE):
suitetype = iter.split(':')[1].strip()
result_suite['type'] = suitetype
if re.match(r"\s*Notes",iter,re.IGNORECASE):
suitenotes = iter.split(':')[1].strip()
result_suite['notes'] = suitenotes
result_doc['suite'].append(result_suite)
self.insert(result_doc) #Last document to be inserted
Is there a better way to do this than match on the next tag to create a new document??
Thanks
Yes there is definitely a better, more robust way to do this. One would use a hash table, or python "dictionary," to store the key value pairings provided in an input file and do some formatting to print them out in the desired output format.
# Predefine some constants / inputs
testbed_dict = { "_id" : "xxxxxxxxxxxxx", "platform" : "ABC456" }
inputFile = "ABC456.txt"
with open(inputFile,"r") as infh:
inputLines = infh.readlines()
image_start_indices = [inputLines.index(x) for x in inputLines if x.split(":")[0].strip() == "Image"]
image_end_indices = [x-1 for x in image_start_indices[1:]]
image_end_indices.append(len(inputLines)-1)
image_start_stops = zip(image_start_indices, image_end_indices)
suite_start_indices = [i for i, x in enumerate(inputLines) if x.split(":")[0].strip() == "Suite"]
suite_end_indices = [i+1 for i, x in enumerate(inputLines) if x.split(":")[0].strip() == "Notes"]
suite_start_stops = zip(suite_start_indices,suite_end_indices)
for image_start_index, image_stop_index in image_start_stops:
suiteCount = 1
image_suite_indices, suites, image_dict = [], [], {}
for start, stop in suite_start_stops:
if start >= image_stop_index or image_start_index >= stop:
continue
image_suite_indices.append((start,stop))
suites = [inputLines[x:y] for x, y in image_suite_indices]
header_end_index = min([x for x, y in image_suite_indices])
for line in inputLines[image_start_index:header_end_index]:
if line.strip() == "":
continue
key, value = (line.split(":")[0].strip().lower(), line.split(":")[1].strip())
image_dict[key] = value
for suite in suites:
suite_dict = {}
for line in suite:
if line.strip() == "":
continue
key, value = (line.split(":")[0].strip().lower(), line.split(":")[1].strip())
suite_dict[key] = value
image_dict["suite "+str(suiteCount)] = suite_dict
suiteCount += 1
with open(image_dict["image"]+".txt","w") as outfh:
outfh.write('{\n')
for key, value in testbed_dict.iteritems():
outfh.write('\t"'+key+'" : "'+testbed_dict[key]+'"\n')
for key, value in image_dict.iteritems():
if 'suite' in key:
continue
else:
outfh.write('\t"'+key+'" : "'+value+'",\n')
for key, value in image_dict.iteritems():
if 'suite' not in key:
continue
else:
outfh.write('\t"suite" : [\n\t\t{\n')
for suitekey, suitevalue in value.iteritems():
outfh.write('\t\t\t"'+suitekey+'" : "'+str(suitevalue)+'",\n')
outfh.write("\t\t}\n")
outfh.write("\t],\n")
outfh.write('}\n')
The above code expects to be run in the same directory as an input file (i.e. ' inputFile = "ABC456.txt" '), and writes a variable number of output files depending on how many "images" are present in the input -- in the case of your ABC456 the outputs written would be "FOOBAR.txt" and "BARFOO.txt". For example, if "ABC456.txt" contains the text contents of the section "Testbed: ABC456" in your question above, then the outputs will be the following.
BARFOO.txt
{
"platform" : "ABC456"
"_id" : "xxxxxxxxxxxxx"
"keyword" : "repugnat",
"image" : "BARFOO",
"other" : "",
"date" : "6/26",
"suite" : [
{
"notes" : "",
"failure" : "8 failures in UVW duplicate - interworking cases not working!",
"reason" : "?",
"known" : "?",
"suite" : "UVW suite",
"type" : "",
}
],
"suite" : [
{
"notes" : "",
"failure" : "unable to destroy flow - flow created without ppx flow id",
"reason" : "SCRIPT issue",
"known" : "maybe?",
"suite" : "PQR test",
"type" : "embtest",
}
],
}
FOOBAR.txt
{
"platform" : "ABC456"
"_id" : "xxxxxxxxxxxxx"
"keyword" : "isolate",
"image" : "FOOBAR",
"other" : "3 random failures in 3 different test suites",
"date" : "6/27",
"suite" : [
{
"notes" : "",
"failure" : "RST_udp_v4_to_v6",
"reason" : "failed to receive expected packets",
"known" : "?",
"suite" : "RST",
"type" : "",
}
],
"suite" : [
{
"notes" : "",
"failure" : "XYZ_v4_to_v4v",
"reason" : "failed to receive expected packets",
"known" : "?",
"suite" : "XYZ suite",
"type" : "",
}
],
"suite" : [
{
"notes" : "",
"failure" : "jumbo_v4_to_v6",
"reason" : "?",
"known" : "?",
"suite" : "LMO Frag",
"type" : "",
}
],
}
The code above works but has some caveats -- it doesn't preserve ordering of the lines, but assuming you're just sticking this JSON into mongoDB certainly ordering doesn't matter. Also you would need to modify it to handle some redundancies -- if the "Suite" line has redundant info nested under it (e.g. multiple "Failure" lines, like in your ABC123 example) all but one is ignored. Hopefully you get a chance to look through the code, figure out how it's working, and modify it to meet whatever your needs are.
Cheers.

Categories