Hash each line in Text file Python MD5 - python

I'm trying to write a program which will open a text file and give me an md5 hash for each line of text. For example I have a text file with:
66090001081992
66109801042010
68340016052015
68450001062015
79450001062016
This is my code:
import hashlib
hasher = hashlib.md5()
archivo_empleados = open("empleados.txt","rb")
lista = (archivo_empleados.readlines())
archivo_empleados.close()

You could open the file with a with context manager(don't need to call .close()), then iterate each line of the file with a for loop and print the MD5 hash string. You also need to encode in utf-8 before hashing.
import hashlib
def compute_MD5_hash(string, encoding='utf-8'):
md5_hasher = hashlib.md5()
md5_hasher.update(string.encode(encoding))
return md5_hasher.hexdigest()
with open("path/to/file") as f:
for line in f:
print(compute_MD5_hash(line))
Which gives hash strings like the following:
58d3ab1af1afd247a90b046d4fefa330
6dea9449f52e07bae45a9c1ed6a03bbc
9e2d8de8f8b3df8a7078b8dc12bb3e35
20819f8084927f700bd58cb8108aabcd
620596810c149a5bc86762d2e1924074
You can have a look at the various hashlib functions in the documentation.

Related

Read md5 hash inside txt file and compare with a md5 hash of a file

I am new to python. I am trying to detect virus of a file md5 hash. I had a list of virus md5 hash call viruslist, then I want to compare the md5 of eicar.com with the md5 hash inside the viruslist. It detected then it will print detected. But now output always show clear, is it I need to readlines or others??
Below is my coding for it.
import hashlib
md5_hash = hashlib.md5()
viruslist = open('C:/FYP/SecuCOM2022/viruslist.txt','rb')
virusinside = viruslist.readlines()
a_file =
open('C:/Users/User/Desktop/irustesting/eicar.com','rb')
content = a_file.read()
md5_hash.update(content)
digest = md5_hash.hexdigest()
print(digest)
virus="detected"
novirus="clear"
if virusinside == digest:
print(virus)
else:
print(novirus)
Assuming your viruslist.txt looks like
bc6e6f16b8a077ef5fbc8d59d0b931b9
2d9fd9fbccf64a485304d7596772f2b0
...
Then you will probably need to make the following changes.
Open viruslist.txt in text mode with viruslist = open('C:/FYP/SecuCOM2022/viruslist.txt','rt'). This is because the output of hashlib.md5().hexdigest() is a string not bytes.
Strip off the trailing newline of each line in viruslist.txt. For example virusinside = [l.rstrip() for l in viruslist].
Use in instead of == and reverse the comparison of the digest. For example, if digest in virusinside:

Calling argument to function and getting the file hashes

I'm attempting to get the hashes of the file which is the argument supplied. Here is my current code:
import hashlib
import argparse
md5 = hashlib.md5()
sha1 = hashlib.sha1()
sha256 = hashlib.sha256()
BUF_SIZE = 32768
parse = argparse.ArgumentParser()
parse.add_argument("-test", help = 'testing')
args = parse.parse_args()
def hashing(hashThis=args.test):
with open(hashThis, 'rb') as f:
while True:
data = f.read(BUF_SIZE)
if not data:
break
md5.update(data)
sha1.update(data)
sha256.update(data)
#print hashes
print('MD5: {0}'.format(md5.hexdigest()))
print('SHA1: {0}'.format(sha1.hexdigest()))
print('SHA256: {0}'.format(sha256.hexdigest()))
hashing(hashThis=args.test)
This gives me the following output:
user#user:~/Testing$ python test.py -test test.txt
MD5: d41d8cd98f00b204e9800998ecf8427e
SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709
SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
The issue is that the hashes given are for an empty file, by using sha256sum of the same file I get
user#user:~/Testing$ sha256sum test.txt
8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4 test.txt
Its not pulling the data from the file, and it works if I use the same code outside of a function. I feel like I'm missing something obvious, but can't figure it out.
You need to be updating the hash objects within the while loop - right now the while loop only exits once 'data' is empty, so all you hash is that empty byte array

Download a gzipped file, md5 checksum it, and then save extracted data if matches

I'm currently attempting to download two files using Python, one a gzipped file, and the other, its checksum.
I would like to verify that the gzipped file's contents match the md5 checksum, and then I would like to save the contents to a target directory.
I found out how to download the files here, and I learned how to calculate the checksum here. I load the URLs from a JSON config file, and I learned how to parse JSON file values here.
I put it all together into the following script, but I'm stuck attempting to store the verified contents of the gzipped file.
import json
import gzip
import urllib
import hashlib
# Function for creating an md5 checksum of a file
def md5Gzip(fname):
hash_md5 = hashlib.md5()
with gzip.open(fname, 'rb') as f:
# Make an iterable of the file and divide into 4096 byte chunks
# The iteration ends when we hit an empty byte string (b"")
for chunk in iter(lambda: f.read(4096), b""):
# Update the MD5 hash with the chunk
hash_md5.update(chunk)
return hash_md5.hexdigest()
# Open the configuration file in the current directory
with open('./config.json') as configFile:
data = json.load(configFile)
# Open the downloaded checksum file
with open(urllib.urlretrieve(data['checksumUrl'])[0]) as checksumFile:
md5Checksum = checksumFile.read()
# Open the downloaded db file and get it's md5 checksum via gzip.open
fileMd5 = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])
if (fileMd5 == md5Checksum):
print 'Downloaded Correct File'
# save correct file
else:
print 'Downloaded Incorrect File'
# do some error handling
In your md5Gzip, return a tuple instead of just the hash.
def md5Gzip(fname):
hash_md5 = hashlib.md5()
file_content = None
with gzip.open(fname, 'rb') as f:
# Make an iterable of the file and divide into 4096 byte chunks
# The iteration ends when we hit an empty byte string (b"")
for chunk in iter(lambda: f.read(4096), b""):
# Update the MD5 hash with the chunk
hash_md5.update(chunk)
# get file content
f.seek(0)
file_content = f.read()
return hash_md5.hexdigest(), file_content
Then, in your code:
fileMd5, file_content = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])

Creating an MD5 Hash of A ZipFile

I want to create an MD5 hash of a ZipFile, not of one of the files inside it. However, ZipFile objects aren't easily convertible to streams.
from hashlib import md5
from zipfile import ZipFile
zipped = ZipFile(r'/Foo/Bar/Filename.zip')
hasher = md5()
hasher.update(zipped)
return hasher.hexdigest()
The above code generates the error :TypeError: must be convertible to a buffer, not ZipFile.
Is there a straightforward way to turn a ZipFile into a stream?
There's no security issues here, I just need a quick an easy way to determine if I've seen a file before. hash(zipped) works fine, but I'd like something a little more robust if possible.
Just open the ZipFile as a regular file. Following code works on my machine.
from hashlib import md5
m = md5()
with open("/Foo/Bar/Filename.zip", "rb") as f:
data = f.read() #read file in chunk and call update on each chunk if file is large.
m.update(data)
print m.hexdigest()
This function should return the MD5 hash of any file, provided it's path (requires pycrypto module):
from Crypto.Hash import MD5
def get_MD5(file_path):
chunk_size = 8192
h = MD5.new()
with open(file_path, 'rb') as f:
while True:
chunk = f.read(chunk_size)
if len(chunk):
h.update(chunk)
else:
break
return h.hexdigest()
print get_MD5('pics.zip') # example
output:
6a690fa3e5b34e30be0e7f4216544365
Info on pycrypto

How do I find the MD5 hash of an ISO file using Python?

I am writing a simple tool that allows me to quickly check MD5 hash values of downloaded ISO files. Here is my algorithm:
import sys
import hashlib
def main():
filename = sys.argv[1] # Takes the ISO 'file' as an argument in the command line
testFile = open(filename, "r") # Opens and reads the ISO 'file'
# Use hashlib here to find MD5 hash of the ISO 'file'. This is where I'm having problems
hashedMd5 = hashlib.md5(testFile).hexdigest()
realMd5 = input("Enter the valid MD5 hash: ") # Promt the user for the valid MD5 hash
if (realMd5 == hashedMd5): # Check if valid
print("GOOD!")
else:
print("BAD!!")
main()
My problem is on the 9th line when I try to take the MD5 hash of the file. I'm getting the Type Error: object supporting the buffer API required. Could anyone shed some light on to how to make this function work?
The object created by hashlib.md5 doesn't take a file object. You need to feed it data a piece at a time, and then request the hash digest.
import hashlib
testFile = open(filename, "rb")
hash = hashlib.md5()
while True:
piece = testFile.read(1024)
if piece:
hash.update(piece)
else: # we're at end of file
hex_hash = hash.hexdigest()
break
print hex_hash # will produce what you're looking for
You need to read the file:
import sys
import hashlib
def main():
filename = sys.argv[1] # Takes the ISO 'file' as an argument in the command line
testFile = open(filename, "rb") # Opens and reads the ISO 'file'
# Use hashlib here to find MD5 hash of the ISO 'file'. This is where I'm having problems
m = hashlib.md5()
while True:
data = testFile.read(4*1024*1024)
if not data: break
m.update(data)
hashedMd5 = m.hexdigest()
realMd5 = input("Enter the valid MD5 hash: ") # Promt the user for the valid MD5 hash
if (realMd5 == hashedMd5): # Check if valid
print("GOOD!")
else:
print("BAD!!")
main()
And you probably need to open the file in binary ("rb") and read the blocks of data in chunks. An ISO file is likely too large to fit in memory.

Categories