I have been familiarising myself with PRAW for reddit. I am trying to get the top x posts for the week, however I am having trouble changing the limit for the "top" method.
The documentation doesn't seem to mention how to do it, unless I am missing something. I can change the time peroid ok by just passing in the string "week", but the limit has be flummoxed. The image shows that there is a param for limit and it is set to 100.
r = self.getReddit()
sub = r.subreddit('CryptoCurrency')
results = sub.top("week")
for r in results:
print(r.title)
DOCS: subreddit.top()
IMAGE: Inspect listing generator params
From the docs you've linked:
Additional keyword arguments are passed in the initialization of
ListingGenerator.
So we follow that link and see the limit parameter for ListingGenerator:
limit – The number of content entries to fetch. If limit is None, then
fetch as many entries as possible. Most of reddit’s listings contain a
maximum of 1000 items, and are returned 100 at a time. This class will
automatically issue all necessary requests (default: 100).
So using the following should do it for you:
results = sub.top("week", limit=500)
Related
I am trying to use query parameters in an API. If I hardcode the query parameters, I get the expected results of top 2:
requests.get('https://someurl/Counties?$top=2')
But if I try
q = {"top":"2"}
requests.get('https://someurl/Counties', params=q)
I get the default response, with all items and not just the two first. When I try the same approach on a different API, both approaches work. The api uses Odata, if that matters. But I can't get my head around how these two should yield different results in the request that is posted.
I was just missing a dollar sign in front of top.
q = {"$top":"2"}
I am currently using elastic search python client for search the index of my elastic search.
Let's say I have 20 million documents, and I am using the pagination with from and size parameters. I have read in the documentation that there is a limit of 10k. But I didn't understand what that limit mean.
For example,
Did that limit mean I can only use pagination (i.e. from and size) calls 10000 times?
like from=0, size =10, from=10, size =10 etc., 10000 times.
Or Do they mean I can make unlimited pagination calls using the from and size params but there is a size limit of 10k per each pagination call?
Can someone clarify this?
Pagination limit of 10k means
For the applied query only the first 10k results can be displayed.
from:0 size:10,001 will given an error "Result window is too large"
from:10000, size:10 will given an error "Result window is too large"
In the above 2 cases we are trying to access 10000+ offset of the document of the current query, hence the exception
from doesn't represent pageNumber, instead it represents starting offset
The limit is called max_result_window and default value is 10k. Mathematically this is the max value size+from can take.
from:1, size:10000 will give error.
from:5, size:9996 will give error.
from:9999, size:2 will give error.
Search after is the recommended alternative if you want deeper results.
You can update existing index settings with this query:
PUT myexistingindex/_settings
{
"settings": {
"max_result_window": 20000000
}
}
If your are creating dynamic index, you can give max result window parameter in settings.
In Java like this:
private void createIndex(String indexName) throws IOException {
Settings settings = Settings.builder()
.put("number_of_shards", 1)
.put("number_of_replicas", 0)
.put("index.translog.durability", "async")
.put("index.translog.sync_interval", "5s")
.put("max_result_window", "20000000").build();
CreateIndexRequest createIndexRequest = new CreateIndexRequest(indexName).settings(settings);
restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT);
}
After these configurations, you can give "from" parameters up to 20 million.
But this way is not recommended.
You can review this document: Scroll Api
The default results returned for this API is 500 https://environment.data.gov.uk/flood-monitoring/id/floodAreas
I want to get the maximum number of results using Python but I can only set the _limit to a higher amount. How should I go about finding it?
Additional info below:
https://environment.data.gov.uk/flood-monitoring/doc/reference#flood-areas
Used this to get an increased limit
https://environment.data.gov.uk/flood-monitoring/id/floodAreas/?_limit=100000
what I'd normally do would be to send multiple requests, one for each "page" of results, with progressively higher offsets. e.g. you'd do something like:
import requests
url = 'https://environment.data.gov.uk/flood-monitoring/id/floodAreas'
offset = 0
result = []
while True:
res = requests.get(url, params={'_offset': offset})
res.raise_for_status()
data = res.json()
items = data['items']
if not items:
break
result.extend(items)
offset += len(items)
there are multiple reasons for their API working like this, one common one is to try and reduce the impact of denial of service attacks. it's easy for attackers to put a few more zeros on the limit causing the server to do exponentially more work. conversely, once you get above a few hundred objects returned from this service the extra work of paging HTTP requests will always be a small percentage of the total work performed, and keeps clients "honest"
I'm running pyxero and trying to get the reference and description from a bank transaction but am having trouble getting it.
I can run:
trans = xero.banktransactions.filter(BankAccount_Name="chosen_account")
Which gives me the transcations and details, however the reference and description are not present.
It also shows the LineItems are empty:
'LineItems': []
I also get the same if I try:
transaction = xero.banktransactions.filter(BankTransactionID=BankTransactionID)
Is there a way to get this information?
Many thanks
Need to use get instead of filter to get the LineItems:
transaction = xero.banktransactions.get(BankTransactionID)
To get all the lineitem detail you either need to :
retrieve a specific item by BankTransactionID (as #blountdj implies in their answer)
OR
use Xero API's built-in paging in the request by passing 'page=xxx' as an optional parameter (which you might need to loop through multiple pages/requests if >100 transactions - which is likely).
Refer Xero API reference re Bank Transaction paging here
How would I limit the number of results in a twitter search?
This is what I thought would work...tweetSearchUser is user input in another line of code.
tso = TwitterSearchOrder() # create a TwitterSearchOrder object
tso.set_keywords(["tweetSearchUser"]) # let's define all words we would like to have a look for
tso.setcount(30)
tso.set_include_entities(False) # and don't give us all those entity information
Was looking at this reference
https://twittersearch.readthedocs.org/en/latest/advanced_usage_ts.html
tried this, seems like it should work but can't figure out the format to enter the date...
tso.set_until('2016-02-25')
You should use set_count as specified in the documentation.
The default value for count is 200, because it is the maximum of tweets returned by the Twitter API.