Map and reduce functions in CouchDB-Python

Map and reduce functions in CouchDB-Python - python

How is it possible to use map and reduce functions in CouchDB-Python, because the below code does not return anything?
Is it also possible to disable reduce function if it is not needed?
import couchdb
# $ sudo systemctl start couchdb
# http://localhost:5984/_utils/
def fill_data(users_no):
for i in range(users_no):
doc = {
'_id': str(i),
'uname': "name_" + str(i),
}
db.save(doc)
if __name__ == "__main__":
server = couchdb.Server()
db = server.create("test-pagination")
fill_data(300)
map_fun = """
function(doc) {
emit(doc.uname, 1);
}
"""
reduce_fun ="_count"
design = { 'views': {
'get_unames': {
'map': map_fun,
'reduce': reduce_fun
}
} }
db["_design/users"] = design
uname_list = db.view('users/get_unames')
print uname_list
for r in uname_list :
print r.key

You give very few details about what you want to get. But I infer from the code that you want unique names. If it is so, you definitely need data to be reduced.
Your problem is that you group data too much. You should call the view with group_level=exact (or group=true which is a synonym).

Yes it is possible to disable the reduce, and this is exactly what you need:
db.view('users/get_unames', reduce=False)
With the reduce active, you get exactly one row back, with just a value (300, the count of your rows) and and empty key.

Related

How to handle missing JSON nested keys from an API response in python?

Here is the JSON response I get from an API request:
{
"associates": [
{
"name":"DOE",
"fname":"John",
"direct_shares":50,
"direct_shares_details":{
"shares_PP":25,
"shares_NP":25
},
"indirect_shares":50,
"indirect_shares_details": {
"first_type": {
"shares_PP": 25,
"shares_NP": 0
},
"second_type": {
"shares_PP": 25,
"shares_NP": 0
}
}
}
]
}
However, in some occasions, some values will be equal to None. In that case, I handle it in my function for all the values that I know will be integers. But it doesn't work in this scenario for the nested keys inside indirect_shares_details:
{
"associates": [
{
"name":"DOE",
"fname":"John",
"direct_shares":50,
"direct_shares_details":{
"shares_PP":25,
"shares_NP":25
},
"indirect_shares":None,
"indirect_shares_details": None
}
}
]
}
So when I run my function to get the API values and put them in a custom dict, I get an error because the keys are simply inexistant in the response.
def get_shares_data(response):
associate_from_api = []
for i in response["associates"]:
associate_data = {
"PM_shares": round(company["Shares"], 2),
"full_name": i["name"] + " " + ["fname"]
"details": {
"shares_in_PM": i["direct_shares"],
"shares_PP_in_PM": i["direct_shares_details"]["shares_PP"],
"shares_NP_in_PM": i["direct_shares_details"]["shares_NP"],
"shares_directe": i["indirect_shares"],
"shares_indir_PP_1": i["indirect_shares_details"]["first_type"]["shares_PP"],
"shares_indir_NP_1": i["indirect_shares_details"]["first_type"]["shares_NP"],
"shares_indir_PP_2": i["indirect_shares_details"]["second_type"]["shares_PP"],
"shares_indir_NP_2": i["indirect_shares_details"]["second_type"]["shares_NP"],
}
}
for key,value in associate_data["details"].items():
if value != None:
associate_data["details"][key] = value * associate_data["PM_shares"] / 100
else:
associate_data["calculs"][key] = 0.0
associate_from_api.append(associate_data)
return associate_from_api
I've tried conditioning the access of the nested keys only if the parent key wasn't equal to None but I ended up declaring 3 different dictionaries inside if/else conditions and it turned into a mess, is there an efficient way to achieve this?

You can try accessing the values using dict.get('key') instead of accessing them directly, as in dict['key'].
Using the first approach, you will get None instead of KeyError if the key is not there.
EDIT: tested using the dictionary from the question:

You can try pydantic
Install pydantic
pip install pydantic
# OR
conda install pydantic -c conda-forge
Define some models based on your response structure
from pydantic import BaseModel
from typing import List, Optional
# There are some common fields in your json response.
# So you can put them together.
class ShareDetail(BaseModel):
shares_PP: int
shares_NP: int
class IndirectSharesDetails(BaseModel):
first_type: ShareDetail
second_type: ShareDetail
class Associate(BaseModel):
name: str
fname: str
direct_shares: int
direct_shares_details: ShareDetail
indirect_shares: int = 0 # Sets a default value for this field.
indirect_shares_details: Optional[IndirectSharesDetails] = None
class ResponseModel(BaseModel):
associates: List[Associate]
use ResponseModel.parse_xxx functions to parse response.
Here I use parse_file funtion, you can also use parse_json function
See: https://pydantic-docs.helpmanual.io/usage/models/#helper-functions
def main():
res = ResponseModel.parse_file("./NullResponse.json",
content_type="application/json")
print(res.dict())
if __name__ == "__main__":
main()
Then the response can be successfully parsed. And it automatically validates the input.

Groovy Script AEM Asset Reference Update

So I'm working with AEM and am attempting to create a script that grabs all pages under a specific path and updates the image reference on the page from a list of assets under a curtain path.
Both of my select query's aren't returning the specific pages and assets I need.
I'm also getting an error that my queries are searching over 100000 Nodes
How can i resolve this error and query my resources better?
import com.day.cq.wcm.api.Page
import javax.jcr.query.Query
import javax.jcr.query.QueryManager
import org.apache.sling.api.resource.ModifiableValueMap
import groovy.transform.Field
static void main(String[] args)
{
String[] assetNodes
String[] pageNodes
String pagePath ="/content/we-retail/us/en"
String pageResourceType = "weretail/components/structure/page"
String assetPath ="/content/dam/microsoft/internal/en"
String assetQuery = "b1048291-23fa-422a-a7c4-9ea4bae0effc"
boolean isAsset = true;
pageNodes = GetResourcePath(pagePath, pageResourceType);
assetNodes = GetRosourceAsset(assetPath, assetQuery);
InputAssetsOnPage(pageNodes,assetNodes);
}
//Find the Node paths for all Pages to modify
//Narrow down to image component
def GetResourcePath(String rootPath,String queryParam)
{
int i = 0
def String[] allNodes = new String[500]
Page page = getPage(rootPath)
def queryManager = session.workspace.queryManager;
def param= queryParam;
def statement = 'select * from nt:base where jcr:path like \''+page.path+'/%\' and sling:resourceType = \'' + param + '\'';
Query query=queryManager.createQuery(statement, 'sql');
final def result = query.execute()
println "Total pages found = " + result.nodes.size();
NodeIterator nodeIterator = result.getNodes();
while(nodeIterator.hasNext())
{
def hitNode = nodeIterator.nextNode();
allNodes[i] = hitNode.getPath();
i++;
}
println allNodes
return allNodes;
}
//Find all assets paths to add to page
def GetRosourceAsset(String rootPath,String queryParam)
{
int i = 0
def String[] allNodes = new String[500]
Page page = getPage(rootPath)
def queryManager = session.workspace.queryManager;
def param= queryParam;
def statement = 'select * from nt:base where jcr:path like \''+rootPath+'/%\'';
Query query=queryManager.createQuery(statement, 'sql');
final def result = query.execute()
println "Total Assets found = " + result.nodes.size();
NodeIterator nodeIterator = result.getNodes();
while(nodeIterator.hasNext())
{
def hitNode = nodeIterator.nextNode();
allNodes[i] = hitNode.getPath();
i++;
}
println allNodes
return allNodes;
}
//Modify image component property with unique asset path
void InputAssetsOnPage(String[]pageRefrence, String[]assetRefrences)
{
String[] nodes= pageRefrence;
String[] assetNodes = assetRefrences;
nodes.eachWithIndex { self,i->
javax.jcr.Node node=getNode(nodes[i])
ModifiableValueMap mVMap=resourceResolver.resolve(node.path).adaptTo(ModifiableValueMap.class);
mVMap.put("fileRefrence", assetNodes[i]);
println "Property modified to "+node.path;
println "Dry Run "+data.dryRun;
if(!data.dryRun) {
session.save();
}
}
}
//Save session

For mass-updates it is very likely to target too many nodes. You have to try some approaches, to either get just under the limit - or change you approach.
First select pages from cq:PageContent, instead from nt:base. The query-indexes are organised by jcr:primaryType. The nt:base index contains everything. So there are much more nodes.
Second use SQL-2 and ISDESCENDANTNODE() instead of a like operator. I don't expect the like-operator to be so specific. But, if you query almost all pages anyway - it won't help much.
Third iterate over parts of you page-tree. Then the remaining subtree is much small, can can be queried.
For a mass-update, which probably touches more than 15% of your pages, then just iterate over all pages. Don't forget to commit from time to time. (e.g. every 100 changes)
Code sample to iterate a page-tree
def rootPage = getPage("/content/uac/glooly/es")
rootPage.recurse() { page ->
println(page.path)
fixAssetsOnPage(page)
}
You can iterate over the first n levels over the content tree, and then deep dive with a query. So you could have less than 100.000 image components
Code sample to iterate over first 2 levels, and then query:
def rootPage = getPage("/content/we-retail/us/en")
rootPage.iterator().each { firstLevelPage ->
firstLevelPage.iterator().each { secondLevelPage ->
println(secondLevelPage.path)
queryAndFixAssets(secondLevelPage)
save()
}
}
Fourth create an oak:index for you query. Especially if you could use it for later too. But in your case there is an existing index for sling:resourceType, which should be could enough. You have just too many hits.
PS: You probably don't need to query pages. Just query the image components in a small enough subtree. But we can't see that from you code sample.

How to set group = true in couchdb

I am trying to use map/reduce to find the duplication of the data in couchDB
the map function is like this:
function(doc) {
if(doc.coordinates) {
emit({
twitter_id: doc.id_str,
text: doc.text,
coordinates: doc.coordinates
},1)};
}
}
and the reduce function is:
function(keys,values,rereduce){return sum(values)}
I want to find the sum of the data in the same key, but it just add everything together and I get the result:
<Row key=None, value=1035>
Is that a problem of group? How can I set it to true?

Assuming you're using the couchdb package from pypi, you'll need to pass a dictionary with all of the options you require to the view.
for example:
import couchdb
# the design doc and view name of the view you want to use
ddoc = "my_design_document"
view_name = "my_view"
#your server
server = couchdb.server("http://localhost:5984")
db = server["aCouchDatabase"]
#naming convention when passing a ddoc and view to the view method
view_string = ddoc +"/" + view_name
#query options
view_options = {"reduce": True,
"group" : True,
"group_level" : 2}
#call the view
results = db.view(view_string, view_options)
for row in results:
#do something
pass

Extracting BIND parameters to build a JSON query

I have a file which was exported from BIND containing TSIG values for about 500 domain names. I need to repurpose the data into JSON for a REST API query. The BIND data is formatted like so:
// secondary-example.com.
key "2000000000000.key." {
algorithm hmac-md5;
secret "ahashedvalue=";
};
zone "secondary-example.com." {
type slave;
file "sec/secondary-example.com.";
allow-transfer { 1.1.1.1;
1.1.2.2;
};
also-notify { 1.1.1.1;
2.2.2.2;
};
masters {
1.2.3.4 key 2000000000000.key.;
};
};
From this I need to extract the key, zone and secret. Here's an example API request.
{
"properties":{
"name":"secondary-example.com.",
"accountName":"example",
"type":"SECONDARY"
},
"secondaryCreateInfo":{
"primaryNameServers":{
"nameServerIpList":{
"nameServerIp1":{
"ip":"1.2.3.4",
"tsigKey":"2000000000000.key.",
"tsigKeyValue":"ahashedvalue="
}
}
}
}
}
I'm having difficulty crafting a regular expression appropriate for the scenario. I'm looking construct the JSON in a python script and send the request through Postman.

I spent a couple days reading up on regex and figured out a solution. So, each of those "zones" began with a comment... e.g. "secondary-example.com"... and each set of BIND info was 17 lines long exactly. This solution is hackey and always assumes data is correct, but it managed to work.
Separate the zones into chunks of text.
zones = []
cur_zone = ''
f = open(bind_file).readlines()
for line in f:
if line[0:2] == '//':
zones.append(cur_zone)
cur_zone = ''
else:
cur_zone = cur_zone + line
zones.pop(0) # Drop the first list item, it's empty
Iterate through those chunks and match the needed parameters.
for z in zones:
z_lines = z.splitlines()
# Regex patterns to make the required parameters
key = re.findall('\"(.*)\"', z_lines[0])[0]
secret = re.findall('\"(.*)\"', z_lines[2])[0]
name = re.findall('\"(.*)\"', z_lines[5])[0]
master = re.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', z_lines[15])[0]

Google app engine: filter by ID

I am lost somehow, I want to do something like below which filter by the ID.
id = 1000
query = Customers.all()
query.filter('ID =', id)
or
query = db.GqlQuery("select * from Customers where ID = %s" % id)
What is the correct method to filter by ID?

both are correct and even Customers.gql("WHERE ID = :1", id);
Edit: If ID is the automatically created id property you should use Customers.get_by_id()

you need to use Customers.get_by_id(id)

I had this same problem, and it turned out that I was just working too hard. The answer lies in getObjectById(). If this works for you, please go to my very-similar S.O. question and give Gordon's answer a vote-up, since he's the one who showed me this.
Player result = null;
if (playerKey == null)
{
log.log(Level.WARNING, "Tried to find player with null key.");
}
else
{
PersistenceManager pm = assassin.PMF.get().getPersistenceManager();
try {
result = (Player) pm.getObjectById(Player.class, playerKey);
} catch (javax.jdo.JDOObjectNotFoundException notFound) {
// Player not found; we will return null.
result = null;
}
pm.close();
}
return result;

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Map and reduce functions in CouchDB-Python - python

You give very few details about what you want to get. But I infer from the code that you want unique names. If it is so, you definitely need data to be reduced. Your problem is that you group data too much. You should call the view with group_level=exact (or group=true which is a synonym).

Yes it is possible to disable the reduce, and this is exactly what you need: db.view('users/get_unames', reduce=False) With the reduce active, you get exactly one row back, with just a value (300, the count of your rows) and and empty key.

Related

How to handle missing JSON nested keys from an API response in python?

Groovy Script AEM Asset Reference Update

How to set group = true in couchdb

Extracting BIND parameters to build a JSON query

Google app engine: filter by ID

Categories

Resources