I am trying to find out my rate limit remaining by using rate_limit_remaining call, but it doesn't seems work properly. According to pydoc:
class TwitterResponse(__builtin__.object)
| Response from a twitter request. Behaves like a list or a string
| (depending on requested format) but it has a few other interesting
| attributes.
|
| `headers` gives you access to the response headers as an
| httplib.HTTPHeaders instance. You can do
| `response.headers.get('h')` to retrieve a header.
|
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| rate_limit_limit
| The rate limit ceiling for that given request.
|
| rate_limit_remaining
| Remaining requests in the current rate-limit.
|
| rate_limit_reset
| Time in UTC epoch seconds when the rate limit will reset.
My code:
from twitter import *
t = Twitter(
auth=OAuth('all the tokens'))
t.TwitterResponse.rate_limit_remaining
Related
I would like to get the total number of records in an aerospike set via python.
I guese, it is the value that is shown against n_objects against a set in the output of show sets -
aql> show sets
+-----------+------------------+----------------+-------------------+----------------+---------------------+--------------------------------------------+------------+
| n_objects | disable-eviction | set-enable-xdr | stop-writes-count | n-bytes-memory | ns_name | set_name | set-delete |
+
| 179 | "true" | "use-default" | 0 | 0 | "namespace" | "setName" | "false" |
From what I read from this, it seems it is only possible via lua scripting -
https://discuss.aerospike.com/t/fastest-way-to-count-records-returned-by-a-query/2379/4
Can someone confirm the same?
I am however able to find the count by using a counter variable by iterating over the result of select() and it is matching against the above count -
client = aerospike.client(config).connect()
scan = client.scan('namespace', 'set')
scan.select('PK','expiresIn','clientId','scopes','roles')
scan.foreach(process_result)
print "Total aeroCount"
print aeroCount
def process_result((key, metadata, record)):
global aeroCount
aeroCount=aeroCount+1
Update
I tried running command asinfo -v sets on command line first. It gave me the objects count as well, like this -
ns=namespace:set=setName:objects=29949:.
Not sure how exactly to get the objects count against a set from this. Does this command qualify as a command for the python function? I tried this -
client = aerospike.client(config).connect()
response = client.info_all("asinfo -v sets")
Here is an error I am getting -
File "Sandeepan-oauth_token_cache_complete_sanity_cp.py", line 89, in <module>
response = client.info_all("asinfo -v sets")
AttributeError: 'aerospike.Client' object has no attribute 'info_all'
Look into https://www.aerospike.com/apidocs/python/client.html?highlight=info#aerospike.Client.info_all - info_all() in the python client and pass the correct info command from the info command reference here: https://www.aerospike.com/docs/reference/info
The sets info command gives you instantaneous stats such as number of objects in a specified set.
$ python
>>> import aerospike
>>> aerospike.__version__
'2.1.2'
>>> config = {'hosts':[("127.0.0.1", 3000)]}
>>> client = aerospike.client(config).connect()
>>> client.info("sets")
{'BB9BE1CFE290C00': (None, 'ns=test:set=testMap:objects=1:tombstones=0:memory_data_bytes=0:truncate_lut=0:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false;\n')}
TL;DR - The actual problem is that I am working on something that provides information about the entries in an archive file and specifies 'where' the size in the archive is coming from. The example below is sort of exactly not like my real problem(which has hundreds of thousands of entries) but highlights the actual problem I'm running into. My problem is that there's a non-trivial amount of size in my archive that is unaccounted for (actually used in the overhead for compression is my guess). The sum of the parts of my archive (the total compressed size of all of my entries + the expected gaps between them) is less than the actual size of the archive. How do I inspect the archive in a way that provides insight into this hidden overhead?
Where I'm at:
I have a directory that contains three files:
doc.pdf
cat.jpg
model.stl
Using a freeware program I dump these into a zip file: demo.zip
Using python I can inspect these pretty easily:
info_list= zipfile.ZipFile('demo.zip').infolist()
for i in info_list:
print i.orig_filename
print i.compress_size
print i.header_offset
Using this info we can get some info.
The total size of demo.zip is 84469
The compressed size of:
|---------------------|-----------------|---------------|
| File | Compressed Size | Header Offset |
|---------------------|-----------------|---------------|
| doc.pdf | 21439 | 0 |
|---------------------|-----------------|---------------|
| cat.jpg | 48694 | 21495 |
|---------------------|-----------------|---------------|
| model.stl | 13870 | 70232 |
|---------------------|-----------------|---------------|
I know that zipping will result in some space between entries. (Thus the difference between the sums of previous entry sizes and the header offset for every entry). You can calculate this small 'Gap':
gap = offset - previous_entry_size - previous_entry_offset
I can update my chart to look like:
|---------------------|-----------------|---------------|---------------|
| File | Compressed Size | Header Offset | 'Gap' |
|---------------------|-----------------|---------------|---------------|
| doc.pdf | 21439 | 0 | 0 |
|---------------------|-----------------|---------------|---------------|
| cat.jpg | 48694 | 21495 | 56 |
|---------------------|-----------------|---------------|---------------|
| model.stl | 13870 | 70232 | 43 |
|---------------------|-----------------|---------------|---------------|
Cool. So now one might expect that the size of demo.zip would be equal to the sum of the size of all entries and their gaps. (84102 in the example above).
But that's not the case. So, obviously, zipping requires headers and information about how zipping occurred (and how to unzip). But I'm running into a problem on how to define this or access any more information about it.
I could just take 84469 - 84102 and say ~magic zip overhead~ = 367 bytes. But that seems less than ideal because this number obviously is not magic. Is there a way to inspect the underlying zip data that is taking up this space?
An empty zip file is 22 bytes, containing only the End of Central Directory Record.
In [1]: import zipfile
In [2]: z = zipfile.ZipFile('foo.zip', 'w')
In [3]: z.close()
In [4]: import os
In [5]: os.stat('foo.zip').st_size
Out[5]: 22
If the zip-file is not empty, for every file you have a central directory file header (at least 46 bytes), and a local file header (at least 30 bytes).
The actual headers have a variable length because the given lengths do not include space for the file name which is part of the header.
I am working on my first (bigger) python application but I am running into some issues. I am trying to select entries from a table using the web.py import (I am using this since I will be using a web front-end later).
Below is my (simplified) code:
db = web.database(dbn='mysql', host='xxx', port=3306, user='monitor', pw='xxx', db='monitor')
dict = dict(hostname=nodeName)
#nodes = db.select('Nodes', dict,where="hostName = $hostname")
nodes = db.query('SELECT * FROM Nodes') <-- I have tried both, but have comparable results (this returns more entries)
length = len(list(nodes))
print(length)
print(list(nodes))
print(list(nodes)[0])
Below is the output from python:
0.03 (1): SELECT * FROM Nodes
6 <-- Length is correct
[] <-- Why is this empty?
Traceback (most recent call last):
File "monitor.py", line 30, in <module>
print(list(nodes)[0]) <-- If it is empty I can't select first element
IndexError: list index out of range
Below is mySQL output:
mysql> select * from monitor.Nodes;
+--------+-------------+
| nodeId | hostName |
+--------+-------------+
| 1 | TestServer |
| 2 | raspberryPi |
| 3 | TestServer |
| 4 | TestServer |
| 5 | TestServer |
| 6 | TestServer |
+--------+-------------+
6 rows in set (0.00 sec)
Conclusion: table contains entries, and the select/query statement is able to get them partially (it gets the length, but not the actual values?)
I have tried mutliple ways but currently I am not able to get the what I want. I want to select the data from my table and use it in my code.
Thanks for helping
Thanks to the people over at reddit I was able to solve the issue: https://www.reddit.com/r/learnpython/comments/53hdq1/webpy_select_returning_no_results_but_length_of/
Bottom line is: the implementation of the query method already fetches the data, so the second time I am calling list(nodes) the data is already gone, hence the empty result.
Solution is to store the list(nodes) in a variable and work from that.
In a Python dict of 50 items would there be any known noticeable speed difference in matching an integer key (2 digits) to find a string value VERSUS matching a string key (5 - 10+ letters) to find an integer value over a large number of loops (100,000+)?
As a minor bonus; is there any benefit to performing an activity like this in MYSQL versus Python if you're able to?
Micro-benchmarking language
features is a useful exercise, but you have to take it with
a grain of salt. It's hard to do
benchmarks in accurate and meaningful ways, and generally
what people care about are total performance, not individual feature
performance.
I find using a "test harness" makes it easier to run
differnet alternatives in a comparable way.
For dictionary lookup, here's an example using the benchmark module from PyPI.
100
randomized runs, setting up dicts of N=50 items
each--either int keys and str values or the
reverse, then trying both the try/excepts
and get access paradigms. Here's the code:
import benchmark
from random import choice, randint
import string
def str_key(length=8, alphabet=string.ascii_letters):
return ''.join(choice(alphabet) for _ in xrange(length))
def int_key(min=10, max=99):
return randint(min, max)
class Benchmark_DictLookup(benchmark.Benchmark):
each = 100 # allows for differing number of runs
def setUp(self):
# Only using setUp in order to subclass later
# Can also specify tearDown, eachSetUp, and eachTearDown
self.size = 1000000
self.n = 50
self.intdict = { int_key():str_key() for _ in xrange(self.n) }
self.strdict = { str_key():int_key() for _ in xrange(self.n) }
self.intkeys = [ int_key() for _ in xrange(self.size) ]
self.strkeys = [ str_key() for _ in xrange(self.size) ]
def test_int_lookup(self):
d = self.intdict
for key in self.intkeys:
try:
d[key]
except KeyError:
pass
def test_int_lookup_get(self):
d = self.intdict
for key in self.intkeys:
d.get(key, None)
def test_str_lookup(self):
d = self.strdict
for key in self.strkeys:
try:
d[key]
except KeyError:
pass
def test_str_lookup_get(self):
d = self.strdict
for key in self.strkeys:
d.get(key, None)
class Benchmark_Hashing(benchmark.Benchmark):
each = 100 # allows for differing number of runs
def setUp(self):
# Only using setUp in order to subclass later
# Can also specify tearDown, eachSetUp, and eachTearDown
self.size = 100000
self.intkeys = [ int_key() for _ in xrange(self.size) ]
self.strkeys = [ str_key() for _ in xrange(self.size) ]
def test_int_hash(self):
for key in self.intkeys:
id(key)
def test_str_hash(self):
for key in self.strkeys:
id(key)
if __name__ == '__main__':
benchmark.main(format="markdown", numberFormat="%.4g")
And the results:
$ python dictspeed.py
Benchmark Report
================
Benchmark DictLookup
--------------------
name | rank | runs | mean | sd | timesBaseline
---------------|------|------|--------|---------|--------------
int lookup get | 1 | 100 | 0.1756 | 0.01619 | 1.0
str lookup get | 2 | 100 | 0.1859 | 0.01477 | 1.05832996073
int lookup | 3 | 100 | 0.5236 | 0.03935 | 2.98143047487
str lookup | 4 | 100 | 0.8168 | 0.04961 | 4.65108861267
Benchmark Hashing
-----------------
name | rank | runs | mean | sd | timesBaseline
---------|------|------|----------|-----------|--------------
int hash | 1 | 100 | 0.008738 | 0.000489 | 1.0
str hash | 2 | 100 | 0.008925 | 0.0002952 | 1.02137781609
Each of the above 600 runs were run in random, non-consecutive order by
`benchmark` v0.1.5 (http://jspi.es/benchmark) with Python 2.7.5
Darwin-13.4.0-x86_64 on 2014-10-28 19:23:01.
Conclusion: String lookup in dictionaries is not that much more expensive than integer lookup. BUT the supposedly Pythonic "ask forgiveness not permission" paradigm takes much longer than simply using the get method call. Also, hashing a string (at least of size 8) is not much more expensive than hashing an integer.
But then things get even more interesting if you run on a different implementation, like PyPy:
$ pypy dictspeed.py
Benchmark Report
================
Benchmark DictLookup
--------------------
name | rank | runs | mean | sd | timesBaseline
---------------|------|------|---------|-----------|--------------
int lookup get | 1 | 100 | 0.01538 | 0.0004682 | 1.0
str lookup get | 2 | 100 | 0.01993 | 0.001117 | 1.295460397
str lookup | 3 | 100 | 0.0203 | 0.001566 | 1.31997704025
int lookup | 4 | 100 | 0.02316 | 0.001056 | 1.50543635375
Benchmark Hashing
-----------------
name | rank | runs | mean | sd | timesBaseline
---------|------|------|-----------|-----------|--------------
str hash | 1 | 100 | 0.0005657 | 0.0001609 | 1.0
int hash | 2 | 100 | 0.006066 | 0.0005283 | 10.724346492
Each of the above 600 runs were run in random, non-consecutive order by
`benchmark` v0.1.5 (http://jspi.es/benchmark) with Python 2.7.8
Darwin-13.4.0-x86_64 on 2014-10-28 19:23:57.
PyPy is about 11x faster, best case, but the ratios are much different. PyPy doesn't suffer the significant exception-handling cost that CPython does. And, hashing an integer is 10x slower than hashing a string. How about that for an unexpected result?
I would have tried Python 3, but benchmark didn't install well there. I also tried increasing the string length to 50. It didn't markedly change the results, the ratios, or the conclusions.
Overall, hashing and lookups are so fast that, unless you have to do them by the millions or billions, or have extraordinarily long keys, or some other unusual circumstance, developers generally needn't be concerned about their micro-performance.
I have written a simple function to resize an image from 1500x2000px to 900x1200px.
def resizeImage(file_list):
if file_list:
if not os.path.exists('resized'):
os.makedirs('resized')
i = 0
for files in file_list:
i += 1
im = Image.open(files)
im = im.resize((900,1200),Image.ANTIALIAS)
im.save('resized/' + files, quality=90)
print str(i) + " files resized successfully"
else:
print "No files to resize"
i used the timeit function to measure how long it takes to run with some example images. Here is an example of the results.
+---------------+-----------+---------------+---------------+---------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal | 10 | 5.25000018229 | 5.31371171493 | 5.27186083393 |
+---------------+-----------+---------------+---------------+---------------+
But if i repeat the test the times gradually keep increasing i.e.
+---------------+-----------+---------------+---------------+---------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal | 10 | 5.36660298734 | 5.57177596057 | 5.45903467485 |
+---------------+-----------+---------------+---------------+---------------+
+---------------+-----------+---------------+---------------+---------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal | 10 | 5.58739076382 | 5.76515489024 | 5.70014196601 |
+---------------+-----------+---------------+---------------+---------------+
+---------------+-----------+---------------+---------------+-------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+---------------+-------------+
| Resize normal | 10 | 5.77366483042 | 6.00337707034 | 5.891541538 |
+---------------+-----------+---------------+---------------+-------------+
+---------------+-----------+---------------+--------------+---------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+--------------+---------------+
| Resize normal | 10 | 5.91993466793 | 6.1294756299 | 6.03516199948 |
+---------------+-----------+---------------+--------------+---------------+
This is how im running the test.
def resizeTest(repeats):
os.chdir('C:/Users/dominic/Desktop/resize-test')
files = glob.glob('*.jpg')
t = timeit.Timer(
"resizeImage(filess)",
setup="from imageToolkit import resizeImage; import glob; filess = glob.glob('*.jpg')"
)
time = t.repeat(repeats, 1)
results = {
'name': 'Resize normal',
'files': len(files),
'min': min(time),
'max': max(time),
'average': averageTime(time)
}
resultsTable(results)
I have moved the images that are processed from my mechanical hard drive to the SSD and the issue persists. I have also checked the Memory being used and it stays pretty steady through all the runs, topping out at around 26Mb, the process uses around 12% of one core of the CPU.
Going forward i like to experiment with the multiprocessing library to increase the speed, but i'd like to get to the bottom of this issue first.
Would this be an issue with my loop that causes the performance to degrade?
The im.save() call is slowing things down; repeated writing to the same directory is perhaps thrashing OS disk caches. When you removed the call, the OS was able to optimize the image read access times via disk caches.
If your machine has multiple CPU cores, you can indeed speed up the resize process, as the OS will schedule multiple sub-processes across those cores to run each resize operation. You'll not get a linear performance improvement, as all those processes still have to access the same disk for both reads and writes.