So I'm working with AEM and am attempting to create a script that grabs all pages under a specific path and updates the image reference on the page from a list of assets under a curtain path.
Both of my select query's aren't returning the specific pages and assets I need.
I'm also getting an error that my queries are searching over 100000 Nodes
How can i resolve this error and query my resources better?
import com.day.cq.wcm.api.Page
import javax.jcr.query.Query
import javax.jcr.query.QueryManager
import org.apache.sling.api.resource.ModifiableValueMap
import groovy.transform.Field
static void main(String[] args)
{
String[] assetNodes
String[] pageNodes
String pagePath ="/content/we-retail/us/en"
String pageResourceType = "weretail/components/structure/page"
String assetPath ="/content/dam/microsoft/internal/en"
String assetQuery = "b1048291-23fa-422a-a7c4-9ea4bae0effc"
boolean isAsset = true;
pageNodes = GetResourcePath(pagePath, pageResourceType);
assetNodes = GetRosourceAsset(assetPath, assetQuery);
InputAssetsOnPage(pageNodes,assetNodes);
}
//Find the Node paths for all Pages to modify
//Narrow down to image component
def GetResourcePath(String rootPath,String queryParam)
{
int i = 0
def String[] allNodes = new String[500]
Page page = getPage(rootPath)
def queryManager = session.workspace.queryManager;
def param= queryParam;
def statement = 'select * from nt:base where jcr:path like \''+page.path+'/%\' and sling:resourceType = \'' + param + '\'';
Query query=queryManager.createQuery(statement, 'sql');
final def result = query.execute()
println "Total pages found = " + result.nodes.size();
NodeIterator nodeIterator = result.getNodes();
while(nodeIterator.hasNext())
{
def hitNode = nodeIterator.nextNode();
allNodes[i] = hitNode.getPath();
i++;
}
println allNodes
return allNodes;
}
//Find all assets paths to add to page
def GetRosourceAsset(String rootPath,String queryParam)
{
int i = 0
def String[] allNodes = new String[500]
Page page = getPage(rootPath)
def queryManager = session.workspace.queryManager;
def param= queryParam;
def statement = 'select * from nt:base where jcr:path like \''+rootPath+'/%\'';
Query query=queryManager.createQuery(statement, 'sql');
final def result = query.execute()
println "Total Assets found = " + result.nodes.size();
NodeIterator nodeIterator = result.getNodes();
while(nodeIterator.hasNext())
{
def hitNode = nodeIterator.nextNode();
allNodes[i] = hitNode.getPath();
i++;
}
println allNodes
return allNodes;
}
//Modify image component property with unique asset path
void InputAssetsOnPage(String[]pageRefrence, String[]assetRefrences)
{
String[] nodes= pageRefrence;
String[] assetNodes = assetRefrences;
nodes.eachWithIndex { self,i->
javax.jcr.Node node=getNode(nodes[i])
ModifiableValueMap mVMap=resourceResolver.resolve(node.path).adaptTo(ModifiableValueMap.class);
mVMap.put("fileRefrence", assetNodes[i]);
println "Property modified to "+node.path;
println "Dry Run "+data.dryRun;
if(!data.dryRun) {
session.save();
}
}
}
//Save session
For mass-updates it is very likely to target too many nodes. You have to try some approaches, to either get just under the limit - or change you approach.
First select pages from cq:PageContent, instead from nt:base. The query-indexes are organised by jcr:primaryType. The nt:base index contains everything. So there are much more nodes.
Second use SQL-2 and ISDESCENDANTNODE() instead of a like operator. I don't expect the like-operator to be so specific. But, if you query almost all pages anyway - it won't help much.
Third iterate over parts of you page-tree. Then the remaining subtree is much small, can can be queried.
For a mass-update, which probably touches more than 15% of your pages, then just iterate over all pages. Don't forget to commit from time to time. (e.g. every 100 changes)
Code sample to iterate a page-tree
def rootPage = getPage("/content/uac/glooly/es")
rootPage.recurse() { page ->
println(page.path)
fixAssetsOnPage(page)
}
You can iterate over the first n levels over the content tree, and then deep dive with a query. So you could have less than 100.000 image components
Code sample to iterate over first 2 levels, and then query:
def rootPage = getPage("/content/we-retail/us/en")
rootPage.iterator().each { firstLevelPage ->
firstLevelPage.iterator().each { secondLevelPage ->
println(secondLevelPage.path)
queryAndFixAssets(secondLevelPage)
save()
}
}
Fourth create an oak:index for you query. Especially if you could use it for later too. But in your case there is an existing index for sling:resourceType, which should be could enough. You have just too many hits.
PS: You probably don't need to query pages. Just query the image components in a small enough subtree. But we can't see that from you code sample.
Related
I m working on a stock prediction project. This is how I want:
To show all the stocks available in Nifty50, Nifty100 or so and then the user will select the stock to predict the high and low price of a stock on next day only.
I m using Django.
What I have done till now:
I m able to display a list of stock.
def index(request):
api_key = 'myAPI_Key'
url50 = 'https://archives.nseindia.com/content/indices/ind_nifty50list.csv'
url100 = 'https://archives.nseindia.com/content/indices/ind_nifty100list.csv'
url200 = 'https://archives.nseindia.com/content/indices/ind_nifty200list.csv'
sfifty = requests.get(url50).content
shundred = requests.get(url100).content
stwohundred = requests.get(url200).content
nifty50 = pd.read_csv(io.StringIO(sfifty.decode('utf-8')))
nifty100 = pd.read_csv(io.StringIO(shundred.decode('utf-8')))
nifty200 = pd.read_csv(io.StringIO(stwohundred.decode('utf-8')))
nifty50 = nifty50['Symbol']
nifty100 = nifty100['Symbol']
nifty200 = nifty200['Symbol']
context = {
'fifty': nifty50,
'hundred': nifty100,
'twohundred': nifty200
}
return render(request, 'StockPrediction/index.html', context)
What I want:
I want to get the live data of all stocks open, high,LTP,Change, Volume.by mean of live data is that it will change as per stock values will change.
Please Help!
You must combine Ajax/Jquery like code below to periodically get data and update values in DOM :
(function getStocks() {
$.ajax({
type: "GET",
url: "url to your view",
success: function (data) {
// here you can get data from backend and do changes like
// changing color by the data coming from your view.
}
}).then(function() { // on completion, restart
setTimeout(getStocks, 30000); // function refers to itself
});
})();
But be careful about making too requests, you must choose proper interval right in this line setTimeout(getStocks, "proper interval");
And in your view you should put queries into a JSON format something like this :
return JsonResponse({'stocks': stocks})
here stocks must be in json format.
How can I print all the alarms names, instead of only 50, when using the function describe_alarms?
Code, using Python:
conn = boto.connect_cloudwatch()
alarms = conn.describe_alarms()
for item in alarms:
print item.name
Thanks.
Even though I am a bit late to the party, here is my solution (in Java). You have to get the next token and keep on asking for the Result in a loop until there is no next token, so it behaves like a pagination on a website
String nextToken = null;
List<MetricAlarm> metricAlarms = new ArrayList<>();
for (int i = 0; i < 100; i++) {
DescribeAlarmsRequest describeAlarmsRequest = new DescribeAlarmsRequest();
describeAlarmsRequest.setNextToken(nextToken);
describeAlarmsRequest.setMaxRecords(100);
DescribeAlarmsResult describeAlarmsResult = getClient().describeAlarms(describeAlarmsRequest);
List<MetricAlarm> metricAlarmsTmp = describeAlarmsResult.getMetricAlarms();
metricAlarms.addAll(metricAlarmsTmp);
nextToken = describeAlarmsResult.getNextToken();
logger.info("nextToken: {}", nextToken);
if (nextToken == null) {
break;
}
}
logger.info("metricAlarms size: {}", metricAlarms.size());
Of course there is room for improvement, e.g. create a while loop instead of a for loop.
UPDATE:
Here my refined version
String nextToken = null;
List<MetricAlarm> metricAlarms = new ArrayList<>();
while (nextToken != null || metricAlarms.size() == 0) {
DescribeAlarmsRequest describeAlarmsRequest = new DescribeAlarmsRequest().withNextToken(nextToken).withMaxRecords(100); // create the request
DescribeAlarmsResult describeAlarmsResult = getClient().describeAlarms(describeAlarmsRequest); // get the result
metricAlarms.addAll(describeAlarmsResult.getMetricAlarms()); // add new alarms to our list
nextToken = describeAlarmsResult.getNextToken(); // check if we have a nextToken
if (nextToken == null && cachedMetricAlarms.size() == 0) { // if we have no alarm in AWS we get inside the loop but we would never exit -> intercept that
break;
}
}
logger.info("metricAlarms size: {}", metricAlarms.size());
By default it returns 50. If you want more, set max_records=value and try.
Due to underlying AWS API implementation restriction, it will return a maximum of 100 alarms. Don't know if it is fixed now.
conn.describe_alarms(max_records=100)
Help on method describe_alarms in module boto.ec2.cloudwatch:
describe_alarms(self, action_prefix=None, alarm_name_prefix=None,
alarm_names=None, max_records=None, state_value=None, next_token=None)
:type max_records: int
:param max_records: The maximum number of alarm descriptions
to retrieve.
Here's a complete example of how to paginate through the records in order to guarantee that you retrieve all of the records rather then being limited by the max records on the Cloudwatch Alarms API:
alarmMaxRecords = 10
response = client.describe_alarms(
AlarmNamePrefix=prefix,
MaxRecords=alarmMaxRecords
)
alarmsItems = []
while response:
alarmsItems += response['MetricAlarms']
response = client.describe_alarms(AlarmNamePrefix=prefix, MaxRecords=alarmMaxRecords, NextToken=response['NextToken']) if 'NextToken' in response else None
for alarm in alarmsItems:
# Do something with the alarm
print(response['MetricAlarms'])
The above will retrieve 10 records at a time, but can be anything up to 100.
Or more simply using the paginate method provided by boto3:
import boto3
# Create CloudWatch client
cloudwatch = boto3.client('cloudwatch')
# List alarms of insufficient data through the pagination interface
paginator = cloudwatch.get_paginator('describe_alarms')
for response in paginator.paginate(AlarmNamePrefix=prefix, MaxRecords=alarmMaxRecords):
# Do something with the alarm
print(response['MetricAlarms'])
How is it possible to use map and reduce functions in CouchDB-Python, because the below code does not return anything?
Is it also possible to disable reduce function if it is not needed?
import couchdb
# $ sudo systemctl start couchdb
# http://localhost:5984/_utils/
def fill_data(users_no):
for i in range(users_no):
doc = {
'_id': str(i),
'uname': "name_" + str(i),
}
db.save(doc)
if __name__ == "__main__":
server = couchdb.Server()
db = server.create("test-pagination")
fill_data(300)
map_fun = """
function(doc) {
emit(doc.uname, 1);
}
"""
reduce_fun ="_count"
design = { 'views': {
'get_unames': {
'map': map_fun,
'reduce': reduce_fun
}
} }
db["_design/users"] = design
uname_list = db.view('users/get_unames')
print uname_list
for r in uname_list :
print r.key
You give very few details about what you want to get. But I infer from the code that you want unique names. If it is so, you definitely need data to be reduced.
Your problem is that you group data too much. You should call the view with group_level=exact (or group=true which is a synonym).
Yes it is possible to disable the reduce, and this is exactly what you need:
db.view('users/get_unames', reduce=False)
With the reduce active, you get exactly one row back, with just a value (300, the count of your rows) and and empty key.
I have a file which was exported from BIND containing TSIG values for about 500 domain names. I need to repurpose the data into JSON for a REST API query. The BIND data is formatted like so:
// secondary-example.com.
key "2000000000000.key." {
algorithm hmac-md5;
secret "ahashedvalue=";
};
zone "secondary-example.com." {
type slave;
file "sec/secondary-example.com.";
allow-transfer { 1.1.1.1;
1.1.2.2;
};
also-notify { 1.1.1.1;
2.2.2.2;
};
masters {
1.2.3.4 key 2000000000000.key.;
};
};
From this I need to extract the key, zone and secret. Here's an example API request.
{
"properties":{
"name":"secondary-example.com.",
"accountName":"example",
"type":"SECONDARY"
},
"secondaryCreateInfo":{
"primaryNameServers":{
"nameServerIpList":{
"nameServerIp1":{
"ip":"1.2.3.4",
"tsigKey":"2000000000000.key.",
"tsigKeyValue":"ahashedvalue="
}
}
}
}
}
I'm having difficulty crafting a regular expression appropriate for the scenario. I'm looking construct the JSON in a python script and send the request through Postman.
I spent a couple days reading up on regex and figured out a solution. So, each of those "zones" began with a comment... e.g. "secondary-example.com"... and each set of BIND info was 17 lines long exactly. This solution is hackey and always assumes data is correct, but it managed to work.
Separate the zones into chunks of text.
zones = []
cur_zone = ''
f = open(bind_file).readlines()
for line in f:
if line[0:2] == '//':
zones.append(cur_zone)
cur_zone = ''
else:
cur_zone = cur_zone + line
zones.pop(0) # Drop the first list item, it's empty
Iterate through those chunks and match the needed parameters.
for z in zones:
z_lines = z.splitlines()
# Regex patterns to make the required parameters
key = re.findall('\"(.*)\"', z_lines[0])[0]
secret = re.findall('\"(.*)\"', z_lines[2])[0]
name = re.findall('\"(.*)\"', z_lines[5])[0]
master = re.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', z_lines[15])[0]
I have attempted to clean up and revise code in an answer here for my needs where I only want to delete from the Model Reservations for data records prior to the date expressed in the get as yy,mm,dd.
If I am correctly anticipating the action of cleanTable/2012/10/5 against the routing ('/cleanTable/([\d]+)/([\d]+)/([\d]+)', CleanTable) then my code would only delete at most 50 (10*nlimit) data records.
Btw, the author of the original code (who likely no longer subscribes to SO), claimed his main trick for accomplishing this code was "to include redirect in html instead of using self.redirect".
I am unfamiliar with raise Exception and the like, but my instinct would be to add a raise Exception or raise StopIteration to the for loop after it is made into a while loop. But it is not clear to me whether raising an StopIteration exception actually causes iteration to stop or if more is needed. Also, I don't know how to revise so the html ends smoothly upon early exit.
class CleanTable(BaseHandler):
def get(self, yy,mm,dd):
nlimit=5
iyy=int(yy)
imm=int(mm)
idd=int(dd)
param=date(iyy,imm,idd)
q=Reservations.all(keys_only=True)
q.filter("date < ", dt(iyy,imm,idd))
results = q.fetch(nlimit)
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write("""
<html>
<meta HTTP-EQUIV="REFRESH" content="url=http://yourapp.appspot.com/cleanTable">
<body>""")
try:
for i in range(10):
db.delete(results)
results = q.fetch(nlimit, len(results))
for r in results:
logging.info("r.name: %s" % r.name)
self.response.out.write("<p> "+str(nlimit)+" removed</p>")
self.response.out.write("""
</body>
</html>""")
except Exception, inst:
logging.info("inst: %s" % inst)
self.response.out.write(str(inst))
This is not the best approach to clean your models. A better approach would be to get all the keys of your entities and create Task Queues. Each queue will get a batch of keys for the entities that need to be modified.
Another approach would also be to create a cron job that will query for the x number of oldest modified entities, fix them and then store them back.
Finally, if your number of entities is so huge, you could also consider the use of Backends.
Hope this helps.
Here is my update routine and it has converted 500.000 entities. Be sure to run it on a backend instance (You can target a Queue to a backend instance). Notice that I am using a cursor, thats the only way you can consistently iterate through data (Never use offset!).
Queue queue = QueueFactory.getQueue("grinderQueue");
queue.add(TaskOptions.Builder.withPayload(new DeferredTask() { //lets generate
private static final long serialVersionUID = 1L;
#Override
public void run() {
String cursor = null;
boolean done = false;
Date now = new Date(1346763868L * 1000L); // 09/04/2012
while(!done) {
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Query query = new Query("Venue");
query.setFilter(new FilterPredicate("timeOfLastUpdate", Query.FilterOperator.LESS_THAN,now));
PreparedQuery pq = datastore.prepare(query);
FetchOptions fetchOptions = FetchOptions.Builder.withLimit(1000);
if(cursor != null)
fetchOptions.startCursor(Cursor.fromWebSafeString(cursor));
QueryResultList<Entity> results = pq.asQueryResultList(fetchOptions);
List<Entity> updates = new ArrayList<Entity>();
List<Entity> oldVenueUpdates = new ArrayList<Entity>();
int tuples = 0;
for(Entity en : results) {
tuples++;
try {
if(en.getProperty(Venue.VENUE_KEY) == null)
continue;
Entity newVenue = new Entity("CPVenue",(String)en.getProperty(Venue.VENUE_KEY));
newVenue.setPropertiesFrom(en);
newVenue.removeProperty("timeOfLastVenueScoreCalculation");
newVenue.removeProperty("actionsSinceLastVenueScoreCalculation");
newVenue.removeProperty("venueImageUrl");
newVenue.removeProperty("foursquareId");
newVenue.setProperty("geoCell", GeoCellCalculator.calcCellId(Double.valueOf((String)en.getProperty("lng")), Double.valueOf((String)en.getProperty("lat")),8));
newVenue.setProperty(Venue.TIME_SINCE_LAST_UPDATE, new Date());
updates.add(newVenue);
Venue v = new Venue(newVenue);
//Set timestamp on Venue
en.setProperty("timeOfLastUpdate", now);
oldVenueUpdates.add(en);
}catch(Exception e) {
logger.log(Level.WARNING,"",e);
}
}
done = tuples == 0;
tuples = 0;
if(results.getCursor() != null)
cursor = results.getCursor().toWebSafeString();
else
done = true;
System.out.println("Venue Conversion LOOP updates.. " + updates.size() + " cursor " + cursor);
datastore.put(updates);
datastore.put(oldVenueUpdates);
}
System.out.println("Venue Conversion DONE");
}}));