sha1 different in go than in python and openssl - python

I am trying to build a base64 encoded sha1 hash in go but the result i am getting is very different to the results of other programming languages
package main
import (
"crypto/sha1"
"encoding/base64"
"fmt"
)
func main() {
c := sha1.New()
input := []byte("hello")
myBytes := c.Sum(input)
fmt.Println(base64.StdEncoding.EncodeToString(base64.StdPadding))
}
This Go Code prints out aGVsbG/aOaPuXmtLDTJVv++VYBiQr9gHCQ==
My Python Code Looks like this
import hashlib
import base64
print(base64.b64encode(hashlib.sha1('hello').digest()))
And outputs qvTGHdzF6KLavt4PO0gs2a6pQ00=
My bash command for comparison looks like this
echo -n hello| openssl dgst -sha1 -binary |base64
And outputs this qvTGHdzF6KLavt4PO0gs2a6pQ00=
Which lets me assume that the python code is doing everything correct.
But why does go prints another result.
Where is my mistake?
Thnx in advance

You use the standard lib in a completely wrong way. Don't assume what a method / function does, always read the docs if it's new to you.
sha1.New() returns a hash.Hash. Its Sum() method is not to calculate the hash value, but to get the current hash result, it does not change the underlying hash state.
hash.Hash implements io.Writer, and to calculate the hash of some data, you have to write that data into it. Hash.Sum() takes an optional slice if you already have one allocated, to write the result (the hash) to it. Pass nil if you want it to allocate a new one.
Also base64.StdEncoding.EncodeToString() expects the byte data (byte slice) you want to convert to base64, so you have to pass the checksum data to it. In your code you didn't tell EncodeToString() what to encode.
Working example:
c := sha1.New()
input := []byte("hello")
c.Write(input)
sum := c.Sum(nil)
fmt.Println(base64.StdEncoding.EncodeToString(sum))
Output is as expected (try it on the Go Playground):
qvTGHdzF6KLavt4PO0gs2a6pQ00=
Note that the crypto/sha1 package also has a handy sha1.Sum() function which does this in one step:
input := []byte("hello")
sum := sha1.Sum(input)
fmt.Println(base64.StdEncoding.EncodeToString(sum[:]))
Output is the same. Try it on the Go Playground.

There is an example of how to use it properly. You should do:
c := sha1.New()
io.WriteString(c, "hello")
myBytes := c.Sum(nil)
fmt.Println(base64.StdEncoding.EncodeToString(myBytes))
https://play.golang.org/p/sELsWTcrdd

Related

What is equivalent of Perl DB_FILE module in Python?

I was asked by my supervisor to convert some Perl scripts into Python language. I'm baffled by few lines of code and I am also relatively inexperienced with Python as well. I'm an IT intern, so this was something of a challenge.
Here are the lines of code:
my %sybase;
my $S= tie %sybase, "DB_File", $prismfile, O_RDWR|O_CREAT, 0666, $DB_HASH or die "Cannot open: $!\n";
$DB_HASH->{'cachesize' } = $cache;
I'm not sure what is the equivalent of this statement in Python? DB_FILE is a Perl module. DB_HASH is a database type that allows arbitrary keys/values to be stored in data file, at least that's according to Perl documentation.
After that, the next lines of code also got me stumped on how to convert this to the equivalent in Python as well.
$scnt=0;
while(my $row=$msth->fetchrow_arrayref()) {
$scnt++;
$srow++;
#if ($scnt <= 600000) {
$S->put(join('#',#{$row}[0..5]),join('#',#{$row}[6..19]));
perf(500000,'sybase') ;#if $VERBOSE ;
# }
}
I'll probably use fetchall() in Python to store the entire result dataset in it, then work through it row by row. But I'm not sure how to implement join() correctly in Python, especially since these lines use range within the row index elements -- [0..5]. Also it seems to write the output to data file (look at put()). I'm not sure what perf() does, can anyone help me out here?
I'd appreciate any kind of help here. Thank you very much.

How do I reverse the byte order of a list of constants in Python?

I have been looking for a way to extract constants from C source files and reverse their byte order in one automated process (no manual input). So far, I've managed to utilize pycparser to do most of the heavy lifting for me and created a script that will print out all of the constants of a C file to the console. The format it prints is like this:
Constant: int, 0x243F6A88
My question is does anyone know of an intuitive way to automate this conversion process in Python? I know how to reverse the byte order with join() but I am struggling to think of a way to do this in which I can minimize the amount of manual input. Ideally, my script would print out the constants (done already) and then use some sort of regex(maybe?) to convert any constant that starts with a 0x (there are a lot of random numbers that get printed that I don't want). I hope this makes sense, thanks!
what I have so far:
class ConstantVisitor(c_ast.NodeVisitor):
def __init__(self):
self.values = []
def visit_Constant(self, node):
self.values.append(node.value)
node.show(showcoord=True)
def show_tree(filename):
# Note that cpp is used. Provide a path to your own cpp or
# make sure one exists in PATH.
ast = parse_file(filename, use_cpp=True,cpp_args=['-E', r'-Iutils/fake_libc_include'])
cv = ConstantVisitor()
cv.visit(ast)
if __name__ == "__main__":
if len(sys.argv) > 1:
filename = sys.argv[1]
else:
filename = 'xmrig-master/src/crypto/c_blake256.c'
show_tree(filename)
You seem to have 3 steps in the task:
Parse the code with pycparser - you have that
Find all constants (just integer constants? how about floats?) and reverse their byte order
Do something with the results
For (2) you can use something like the suggestions in this answer, but adjust it to the actual types you need.
For (3) it's not clear what you're trying to do; are you trying to write the constants back to the original C file? pycparser is not the best tool for that, then. You may want to use the Python bindings to Clang instead, because Clang tools are designed to modify existing code in place.

How to use lists or dicts as command line arguments in pyJWT

The following python code produces a valid JWT token, using pyjwt:
>>> import jwt
>>> payload = {'nested': [{'name': 'me', 'id': '1'}]}
>>> token = jwt.encode(payload, 'secret')
>>> token.decode()
ey[...]ko0Zq_k
pyjwt also supports calls from the command line interface. But the docs only show examples with = separated key value pairs and not with nested payloads.
My best guess was this:
$ pyjwt --key=secret encode nested=[{name=me, id=1}]
ey[...]0FRW9gyU # not the same token as above :(
Which didn't work. Is it simply not supported?
As mentioned, your command line token when decoded returns this json object:
{'nested': '[{name=me,', 'id': '1}]'}
A quick dive into the __main__.py of jwt package gives this little snippet:
... snipped
def encode_payload(args):
# Try to encode
if args.key is None:
raise ValueError('Key is required when encoding. See --help for usage.')
# Build payload object to encode
payload = {}
for arg in args.payload:
k, v = arg.split('=', 1)
... some additional handling on v for time, int, float and True/False/None
... snipped
As you can see the key and value of the payload is determined directly based on the split('=', 1), so it anything passed the first = in your command line following a key will always be determined as a single value (with some conversion afterwards).
So in short, nested dicts in CLI is not supported.
However, the semi-good news is, there are certain ways you can work around these:
Run an impromptu statement off Python's CLI directly like so:
> python -c "import jwt; print(jwt.encode({'nested':[{'name':'me', 'id':'1'}]}, 'secret').decode('utf-8'))"
# eyJ...Zq_k
Not exactly ideal, but it gives you what you need.
Save the same script into a .py capable of taking args and execute it on Python's CLI:
import sys, jwt
my_json = sys.argv[0]
token = jwt.encode(eval(my_json), 'secret')
print(token.decode('utf-8'))
# run in CLI
> python my_encode.py "{'nested':[{'name':'me', 'id':'1'}]}"
# eyJ...Zq_k
Note the use of eval() here is not ideal because of security concerns. This is just my lazy way of implementing it because I don't want to write a parser for the args. If you absolutely must use CLI for your implementation and it's exposed, I would highly recommend you invest the effort into cleansing and parsing the argvs more carefully.
The most contrived way: you can try to modify the Lib\site-packages\jwt\__main__.py function (at your own peril) to suit your need until official support is added. I'd caution you should be rather comfortable with writing your own parse though before considering messing with the main code. I took a few stab at it before I realize the limitations you will be running into:
a. The main encode() method doesn't consider a list as a valid JSON object (but it should). So right off the bat you must have a dict like string to manipulate.
b. The code always forces numbers to be cast as int or float if possible. You'll need to escape it somehow or entirely change the way it handle numbers.
My attempt went something like this:
def func(result, payload):
for arg in payload:
k, v = arg.split('=', 1)
if v.startswith('{') and v.endswith('}'):
result[k] = func({}, v[1:-1])
else:
... the rest of the existing code
However I quickly ran into the limitation of the original arguments are already space delimited and assume it's a k, v pair, I would need to further handle another delimiter like , as well as capability to handle lists, and it could get messier. It's definitely doable, and the effect is immediate i.e. the CLI runs directly off of this __main__.py, but it's more work than I'd like to invest at the moment so I leave it with your capable hands.
The effort to overcome these issues to achieve what you need might be more than necessary, depend on your skill and comfort level. So pick your battle... if CLI is not absolutely necessary, I'd suggest just use the .py methods instead.

Python : file.seek(10000000000, 2000000000). Python int too large to convert to C long

im developing a program that downloads "big files" from the internet (from 200mb to 5Gb) using threads and file.seek to find a offset and insert the data to a main file, but when i try to set the offset above the 2147483647 byte (exceeds the C long max value) it gives the int too large to convert to C long error. how can i work around this? Bellow is a representation of my script code.
f = open("bigfile.txt")
#create big file
f.seek(5000000000-1)
f.write("\0")
#try to get the offset, this gives the error (Python int too large to convert to C long)
f.seek(3333333333, 4444444444)
I wouldn't be asking (because it has been asked a lot) if i really found a solution to this.
I read about casting it to a int64 and use something like UL but i dont really understand it. I hope you can help or at least try make this clearer in my head. xD
f.seek(3333333333, 4444444444)
That second argument is supposed to be the from_where argument, dictating whether you're seeking from:
the file start, os.SEEK_SET or 0;
the current position, os.SEEK_CUR or 1;
the end of the file, os.SEEK_END or 2.
4444444444 is not one of the allowed values.
The following program works fine:
import os
f = open("bigfile.txt",'w')
f.seek(5000000000-1)
f.write("\0")
f.seek(3333333333, os.SEEK_SET)
print f.tell() # 'print(f.tell())' for Python3
and outputs 3333333333 as expected.

converting character or integer to md5 hash using python script

I used SQL to convert a social security number to MD5 hash. I am wondering if there is a module or function in python/pandas that can do the same thing.
My sql script is:
CREATE OR REPLACE FUNCTION MD5HASH(STR IN VARCHAR2) RETURN VARCHAR2 IS
V_CHECKSUM VARCHAR2(32);
BEGIN
V_CHECKSUM := LOWER(RAWTOHEX(UTL_RAW.CAST_TO_RAW(SYS.DBMS_OBFUSCATION_TOOLKIT.MD5(INPUT_ST RING => STR))));
RETURN V_CHECKSUM;
EXCEPTION
WHEN NO_DATA_FOUND THEN
NULL;
WHEN OTHERS THEN
RAISE;
END MD5HASH;
SELECT HRPRO.MD5HASH('555555555') FROM DUAL
thanks.
I apologize, now that I read back over my initial question it is quite confusing.
I have a data frame that contains the following headings:
df[['ssno','regions','occ_ser','ethnicity','veteran','age','age_category']][:10]
Where ssno is personal information that I would like to convert to an md5 hash number and then create a new column into the dataframe.
thanks... sorry for the confusion.
Right now I have to send my file to Oracle and then convert the ssn to hash and then export back out so that I can continue working with it in Pandas. I want to eliminate this step.
Using the standard hashlib module:
import hashlib
hash = hashlib.md5()
hash.update('555555555')
print hash.hexdigest()
output
3665a76e271ada5a75368b99f774e404
As mentioned in timkofu's comment, you can also do this more simply, using
print hashlib.md5('555555555').hexdigest()
The .update() method is useful when you want to generate a checksum in stages. Please see the hashlib documentation (or the Python 3 version) for further details.
hashlib with md5 might be of your interest.
import hashlib
hashlib.md5("Nobody inspects the spammish repetition").hexdigest()
output:
bb649c83dd1ea5c9d9dec9a18df0ffe9
Constructors for hash algorithms that are always present in this module are md5(), sha1(), sha224(), sha256(), sha384(), and sha512().
If you want more condensed result, then you may try sha series
output for sha224:
'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'
For more details : hashlib

Categories