Extracting information from the Ethereum blockchain with python - python

I am looking to do some analysis on the Ethereum blockchain, particularly, look for correlations in the data between available hash power and transaction confirmation times. However, I am unable to make sense of how to go about downloading either of the blockchains or extract the transaction and worker information from them.
Ideally, I would download the blockchains, then use a python script to extract the relevant information from the blockchain to a CSV file or something like that?
Any pointers on how this can be achieved?

The standard interface
Standard Ethereum nodes can expose a json-rpc interface. It is typically accessible over local sockets (aka IPC), or over HTTP, depending on which node you have and how you start it up.
From the command line, to get the stats of the block 5,000,000 from geth:
$ curl -X POST --data '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","params":["0x4c4b40", false],"id":1}' -H "Content-Type: application/json" http://localhost:8545/
{'id': 1,
'jsonrpc': '2.0',
'result': {'difficulty': '0x90c21c56929b2',
'extraData': '0x743132',
'gasLimit': '0x7a121d',
'gasUsed': '0x79fac5',
'hash': '0x7d5a4369273c723454ac137f48a4f142b097aa2779464e6505f1b1c5e37b5382',
'logsBloom': '0x8584009c4dd8101162295d8604b1850200788d4c81f39044821155049d2c036a8a00d07f2a10383180984400b0290ba00293400c1d414a5018104a010220101909b918c601251215109755b90003c6a2c23490829e319a506281d9641ac39a840d3aa03e4a287900e0c09641594409a2010543016e966382c02040754030430e2d708316ec64008f0c0100c713b51f8004005bd48980143e08b22bf2262365b8b2658804a560f1028207666d10288144a5a14609a5bcb221280b13da2f4c8800d8422cc27126a46a04f08c00ca9004081d65cc75d10c62862256118481d2e881a993780808e0a00086e321a4602cb214c0044215281c2ccbca824aca00824a80',
'miner': '0xb2930b35844a230f00e51431acae96fe543a0347',
'mixHash': '0x94cd4e844619ee20989578276a0a9046877d569d37ba076bf2e8e34f76189dea',
'nonce': '0x4617a20003ba3f25',
'number': '0x4c4b40',
'parentHash': '0xcae4df80f5862e4321690857eded0d8a40136dafb8155453920bade5bd0c46c0',
'receiptsRoot': '0x6db67db55d5d972c59646a3bda26a39422e71fe400e4cdf9eb7f5c09b0efa7d0',
'sha3Uncles': '0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347',
'size': '0x5dd1',
'stateRoot': '0x6092dfd6bcdd375764d8718c365ce0e8323034da3d3b0c6d72cf7304996b86ad',
'timestamp': '0x5a70760d',
'totalDifficulty': '0x7be181d83d2d77d052',
'transactions': ['0x569c5b35f203ca6db6e2cec44bceba756fad513384e2bd79c06a8c0181273379',
...
'0xaa2703c3ae5d0024b2c3ab77e5200bb2a8eb39a140fad01e89a495d73760297c'],
'transactionsRoot': '0x91dfce7cc2174482b5ebcf6f4beedce854641982eadb1a8cf538e3206abf7836',
'uncles': []}}
Python API
There are several "web3" libraries available for different languages, each providing an abstraction layer over json-rpc. Web3.py is an Ethereum-Foundation-funded python interface, for example.
Using Web3.py at version 4 or later (installed with pip install web3), you can get the same info this way:
>>> from web3.auto import w3
>>> w3.eth.getBlock('latest')
AttributeDict({'difficulty': 2760989505172940,
'extraData': HexBytes('0x65746865726d696e652d6177732d7573312d32'),
'gasLimit': 8000029,
'gasUsed': 1729027,
'hash': HexBytes('0xff07c9bba34bf864d144c39b4f99d3fc981afcaab02c3da6456c096aab51eb89'),
'logsBloom': HexBytes('0x000200000042041002000480000080000000000000001c0400210040100801080000000000000000001040081040000000000000000020000a008000000000100200100004000021001450080000000202002004000000000002000c0041108804000000000000000000000400000111020200090811010000000074001024002000020000010000000000110000140001201044200000100002828000020000000040000040000000060200200080000000140100408000000080400000000010010002000000000010000000800402080000040000028004000080804000012800000120000210000000800020800010040010001080008000980820010001'),
'miner': '0xEA674fdDe714fd979de3EdF0F56AA9716B898ec8',
'mixHash': HexBytes('0xe3aeeafccb31673b210c17610d9706a51ad8f9f8bf35a8b71ea8fba5bb260f09'),
'nonce': HexBytes('0x79d592e01fafd7e9'),
'number': 5020225,
'parentHash': HexBytes('0x41e96f6e823dd46f25bb0219c6ff9bccf418879d50f9f426cc40028d115ca785'),
'receiptsRoot': HexBytes('0xdd7abf25ebb95c9629453b2d287d929b343ea86f52deed83c4a06d64a10137ad'),
'sha3Uncles': HexBytes('0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347'),
'size': 6699,
'stateRoot': HexBytes('0x410941207de8fe6a4ea7841a2f8eb67a101c58d35691bd998ecec9a7e2350813'),
'timestamp': 1517618294,
'totalDifficulty': 2338872012321049424432,
'transactions': [HexBytes('0x5ef662053e3acb450aefdbed9115c81c2562de71cc4907dc3a1647d0810d83ea'),
...
HexBytes('0x23d0cbc238d12c11a5df5cb8f6cf04e89a8d270baf4b5de94c7285750774784e')],
'transactionsRoot': HexBytes('0xdcaa0c4c4af12e9397e1d93312672e1aeb947262c25111a7ced7a31104135f26'),
'uncles': []})

Give the RPC (--rpc) option when you start the process. Make sure you have the entire blockchain. The rpc starts a server process on localhost:8545. You can change the port as per your wish by --rpcport option.
Simply send HTTP Get requests (by CURL or some http module) to localhost:8545 and get the necessary info in JSON format. You can also use web3.js or web3.py APIs, which interface with the blockchain, basically execute on the console that is opened by the process.
https://github.com/ethereum/wiki/wiki/JSON-RPC

Here is a guide on how to export Ethereum data to csv https://medium.com/#medvedev1088/exporting-and-analyzing-ethereum-blockchain-f5353414a94e
It uses https://github.com/medvedev1088/ethereum-etl which outputs the data into blocks.csv, transactions.csv, erc20_transfers.csv.
blocks.csv
Column | Type |
------------------------|---------------------
block_number | bigint |
block_hash | hex_string |
block_parent_hash | hex_string |
block_nonce | hex_string |
block_sha3_uncles | hex_string |
block_logs_bloom | hex_string |
block_transactions_root | hex_string |
block_state_root | hex_string |
block_miner | hex_string |
block_difficulty | bigint |
block_total_difficulty | bigint |
block_size | bigint |
block_extra_data | hex_string |
block_gas_limit | bigint |
block_gas_used | bigint |
block_timestamp | bigint |
block_transaction_count | bigint |
transactions.csv
Column | Type |
--------------------|--------------
tx_hash | hex_string |
tx_nonce | bigint |
tx_block_hash | hex_string |
tx_block_number | bigint |
tx_index | bigint |
tx_from | hex_string |
tx_to | hex_string |
tx_value | bigint |
tx_gas | bigint |
tx_gas_price | bigint |
tx_input | hex_string |
erc20_transfers.csv
Column | Type |
--------------------|--------------
erc20_token | hex_string |
erc20_from | hex_string |
erc20_to | hex_string |
erc20_value | bigint |
erc20_tx_hash | hex_string |
erc20_block_number | bigint |

Related

Output format of subprocess python

I'm trying to keep the format generated by a command to see all wordpress plugin list.
my script is :
import subprocess
r = open("/tmp/resultplugin", "w")
f = open("path/to/my/list", "r")
for x in f:
x = x.rstrip()
bashCommand = "wp plugin list --allow-root"
process = subprocess.Popen(bashCommand, cwd=x, shell=True, stdout=r)
output, error = process.communicate()
The output of the command when launched directly on the bash shell :
| name | status | update | version |
+---------------------------------+----------+-----------+------------+
| plugin name | inactive | none | 5.2.2 |
| plugin name | active | none | 10.4.0 |
| plugin name | inactive | none | 5.65 |
| plugin | inactive | none | 9.4.8 |
The output when i redirect the output in a file :
name status update version
plugin name inactive none 5.2.2
plugin name active none 10.4.0
plugin name inactive none 5.65
plugin name inactive none 9.4.8
I have hard time to find how to keep the same format, or at least have a little bit more visibility as it's very hard to read in the file
Can someone explain me how i can format the output to the file correctly ?
Thank you

how to use jq tools or python as a output encoder to make result to translate to json formot is a difficult question

for a example
[root#test ~]# mysql -uroot -p'123123' -e"select user,host from mysql.user;"
+-------------------+-----------+
| user | host |
+-------------------+-----------+
| root | % |
| test | % |
| sqlaudit_test_mon | % |
| sysbase_test | % |
| mysql.session | localhost |
| mysql.sys | localhost |
+-------------------+-----------+
how to make search the result quick to convert to json format can you jq tools or python
such as that out put
[
{
"user":"root","host":"%"},
{
"user":"test","host":"%"},
{
"user":"sqlaudit_test_mon","host":"%"},
{
"user":"sysbase_test","host":"%"},
{
"user":"mysql.session","host":"localhost"},
{
"user":"mysql.sys","host":"localhost"}
]
i just want to know how to quick make search result to json,thank you!
it is better to user jq or python script it can make me search result to json format.
Just do it in your SELECT instead of pulling another program into a pipeline. MySQL has JSON functions. Ones of interest here are JSON_ARRAYAGG() and JSON_OBJECT(). Something like:
SELECT json_arrayagg(json_object('user', user, 'host', host)) FROM mysql.user;
should do it, plus whatever's needed to not print out that fancy table ascii art.
Here's an all-jq solution that assumes an invocation like this:
jq -Rcn -f program.jq sql.txt
Note in particular the -R ("raw input") and -n options.
def trim: sub(" *$";"") | sub("^ *";"");
# input: an array of values
def objectify($headers):
. as $in
| reduce range(0; $headers|length) as $i ({}; .[$headers[$i]] = ($in[$i]) ) ;
def preprocess:
select( startswith("|") )
| split("|")
| .[1:-1]
| map(trim) ;
reduce (inputs|preprocess) as $in (null;
if . == null then {header: $in}
else .header as $h
| .table += [$in|objectify($h)]
end )
| .table

Python's Subprocess removing Mysql columns and spacing

Specifically, I am using:
Python 2.4.3 (#1, May 24 2008, 13:47:28)
[GCC 4.1.2 20070626 (Red Hat 4.1.2-14)] on linux2
I am trying to get the raw result of a mysql query with the column names and borders. Here is the raw command being run in bash:
[root#machine ~]# mysql -u root -e 'show databases;'
+--------------------+
| Database |
+--------------------+
| dbA |
| dbB |
| dbC |
+--------------------+
I am having trouble storing this value into a variable in Python:
import subprocess
cmd_array = ["mysql", "-u", "root", "-e", "show databases"]
p = subprocess.Popen(cmd_array)
raw_data = p.communicate()[0]
# Console outputs:
# +--------------------+
# | Database |
# +--------------------+
# | dbA |
# | dbB |
# | dbC |
# +--------------------+
#
# raw_data is None
p = subprocess.Popen(cmd_array, stdout=subprocess.PIPE)
rawData = p.communicate()[0]
print rawData
# Console outputs:
# Database
# dbA
# dbB
# dbC
#
# rawData is "Database\ndbA\ndbB\ndbC"
What is the best way to store the pretty printed version of the mysql output in a python variable?
You need to use -t:
p = check_output(["mysql" ,"-u" ," root", "-t" ,"-e", 'show databases;'])
print(p)
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| world |
+--------------------+
You can also use check_output to store the output.
mysql operates differently depending on whether it is being called from a console (pseudo-terminal, pty) or being called from another program. When called from another program, it is purposefully terse for easier parsing. One way to think about it is that
"Database\ndbA\ndbB\ndbC"
is the raw result and
+--------------------+
| Database |
+--------------------+
| dbA |
| dbB |
| dbC |
+--------------------+
is a pretty-printed result for people.
I have no idea what a "column visible string" is, but if you want to process the data in your program, you got the easily parsed result you want.
If you really do want the fancy print version, you can use the pexpect or pty modules to emulate a terminal. Here's an example:
>>> import pexpect
>>> import subprocess
>>> child = pexpect.spawn('mysql -e "show databases"')
>>> child.expect(pexpect.EOF)
0
>>> print child.before
+--------------------+
| Database |
+--------------------+
| information_schema |
| test |
+--------------------+
Looks like mysql uses isatty() to determine if STDOUT is a terminal which then modifies the output.
I can get the table borders by specifying the --table or -t option:
cmd_array = ["mysql", "-t", "-u", "root", "-e", "show databases"]

python boto - how to create health check for route 53

How do I create a health check for route53 using python boto? There are no examples and documentation is very lacking http://boto.readthedocs.org/en/latest/ref/route53.html
So.if given ip address, port, and path.....then what?
Even though boto documentation is lacking, you can understand it by the code and AWS API.
Take a look at boto.route53.healthcheck.HealthCheck and implement as
route = boto.connect_route53()
hc = boto.route53.healthcheck.HealthCheck(...)
route = create_health_check(hc)
... will be filled out by the help page of HealthCheck:
Help on HealthCheck in module boto.route53.healthcheck object:
class HealthCheck(__builtin__.object)
| An individual health check
|
| Methods defined here:
|
| __init__(self, ip_addr, port, hc_type, resource_path, fqdn=None, string_match=None, request_interval=30, failure_threshold=3)
| HealthCheck object
|
| :type ip_addr: str
| :param ip_addr: IP Address
|
| :type port: int
| :param port: Port to check
|
| :type hc_type: str
| :param ip_addr: One of HTTP | HTTPS | HTTP_STR_MATCH | HTTPS_STR_MATCH | TCP
|
| :type resource_path: str
| :param resource_path: Path to check
|
| :type fqdn: str
| :param fqdn: domain name of the endpoint to check
|
| :type string_match: str
| :param string_match: if hc_type is HTTP_STR_MATCH or HTTPS_STR_MATCH, the string to search for in the response body from the specified resource
|
| :type request_interval: int
| :param request_interval: The number of seconds between the time that Amazon Route 53 gets a response from your endpoint and the time that it sends the next health-check request.
|
| :type failure_threshold: int
| :param failure_threshold: The number of consecutive health checks that an endpoint must pass or fail for Amazon Route 53 to change the current status of the endpoint from unhealthy to healthy or vice versa.
|
| to_xml(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| POSTXMLBody = '\n <HealthCheckConfig>\n <IPAddr...il...
|
| XMLFQDNPart = '<FullyQualifiedDomainName>%(fqdn)s</FullyQualifiedDomai...
|
| XMLRequestIntervalPart = '<RequestInterval>%(request_interval)d</Reque...
|
| XMLStringMatchPart = '<SearchString>%(string_match)s</SearchString>'
|
| valid_request_intervals = (10, 30)

Python/Django/MySQL "Incorrect string value" error

I'm running a Django 1.4.2/Python 2.7.3/MySQL 5.5.28 site. One of the features of the site is that the admin can send an email to the server which calls a Python script via procmail that parses the email and tosses it into the DB. I maintain two versions of the site - a development and a production site. Both sites use different but identical vitualenvs (I even deleted them both and reinstalled all packages just to make sure).
I'm experiencing a weird issue. The exact same script succeeds on the dev server and fails on the production server. It fails with this error:
...django/db/backends/mysql/base.py:114: Warning: Incorrect string value: '\x92t kno...' for column 'message' at row 1
I'm well aware of the unicode issues Django has, and I know there are a ton of questions here on SO about this error, but I made sure to setup the database as UTF-8 from the beginning:
mysql> show variables like "character_set_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)
Additionally, I know that each column can have its own charset, but the message column is indeed UTF-8:
mysql> show full columns in listserv_post;
+------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| thread_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | |
| timestamp | datetime | NULL | NO | | NULL | | select,insert,update,references | |
| from_name | varchar(100) | utf8_general_ci | NO | | NULL | | select,insert,update,references | |
| from_email | varchar(75) | utf8_general_ci | NO | | NULL | | select,insert,update,references | |
| message | longtext | utf8_general_ci | NO | | NULL | | select,insert,update,references | |
+------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
6 rows in set (0.00 sec)
Does anyone have any idea why I'm getting this error? Why is it happening under the production config but not the dev config?
Thanks!
[edit 1]
To be clear, the data are the same as well. I send a single email to the server, and procmail sends it off. This is what the .procmailrc looks like:
VERBOSE=off
:0
{
:0c
| <path>/dev/ein/scripts/process_new_mail.py dev > outputdev
:0
| <path>/prd/ein/scripts/process_new_mail.py prd > outputprd
}
There are 2 copies of process_new_mail.py, but that's just because it's version controlled so that I can maintain two separate environments. If I diff the two output files (which contain the message received), they're identical.
[edit 2]
I actually just discovered that both dev and prd configs are failing. The difference is that the dev config fails silently (maybe having to do with the DEBUG setting?). The problem is that there are some unicode characters in one of the messages, and Django is choking on them for some reason. I'm making progress....
I've tried editing the code to explicitly encode the message as ASCII and UTF-8, but it's still not working. I'm getting closer, though.
I fixed it! The problem was that I wasn't parsing the email correctly with respect to the charsets. My fixed email parsing code comes from this post and this post:
#get the charset of an email
#courtesy http://ginstrom.com/scribbles/2007/11/19/parsing-multilingual-email-with-python/
def get_charset(message, default='ascii'):
if message.get_content_charset():
return message.get_content_charset()
if message.get_charset():
return message.get_charset()
return default
#courtesy https://stackoverflow.com/questions/7166922/extracting-the-body-of-an-email-from-mbox-file-decoding-it-to-plain-text-regard
def get_body(message):
body = None
#Walk through the parts of the email to find the text body.
if message.is_multipart():
for part in message.walk():
#If part is multipart, walk through the subparts.
if part.is_multipart():
for subpart in part.walk():
if subpart.get_content_type() == 'text/plain':
#Get the subpart payload (i.e., the message body).
charset = get_charset(subpart, get_charset(message))
body = unicode(subpart.get_payload(decode=True), charset)
#Part isn't multipart so get the email body.
elif part.get_content_type() == 'text/plain':
charset = get_charset(subpart, get_charset(message))
body = unicode(part.get_payload(decode=True), charset)
#If this isn't a multi-part message then get the payload (i.e., the message body).
elif message.get_content_type() == 'text/plain':
charset = get_charset(subpart, get_charset(message))
body = unicode(message.get_payload(decode=True), charset)
return body
Thanks very much for the help!

Categories