How to hash a file in python (SHA256) - python

I am using a script to retrieve information from an email, such as attached files.
I no longer want to fully recover the file but only its SHA256. However, I can't figure out how to read the temporary "tmp" file which is of type "_io.bufferedrandom".
I tried to use hashlib, but with the temporary file nothing works.
Here is part of the code (otherwise it's too long):
# Extract MIME parts
filename = part.get_filename()
print(filename)
mimetype = part.get_content_type()
print(mimetype)
if filename and mimetype:
if mimetype in config['caseFiles'] or not config['caseFiles']:
log.info("Found attachment: %s (%s)" % (filename, mimetype))
# Decode the attachment and save it in a temporary file
charset = part.get_content_charset()
if charset is None:
charset = chardet.detect(bytes(part))['encoding']
# Get filename extension to not break TheHive analysers (see Github #11)
fname, fextension = os.path.splitext(filename)
fd, path = tempfile.mkstemp(prefix=slugify(fname) + "_", suffix=fextension)
try:
with os.fdopen(fd, 'w+b') as tmp:
tmp.write(part.get_payload(decode=1))
print(tmp)
for line in tmp:
m = hashlib.sha256(line)
print("test :", m.hexdigest())
attachments.append(path)
except OSerror as e:
log.error("Cannot dump attachment to %s: %s" % (path,e.errno))
return False

Related

file.save not saving file in proper format

I have to upload a file on two places,
in a local directory
in Jira via curl
I have written post endpoint which read files from request and send same file to Jira over a request and after success response, it saves it locally.
my code looks like below
for file in request.files.getlist('file'):
filename = file.filename
mimetype = file.content_type
if not is_valid_type(mimetype):
return json.dumps({"success": False, "message": "Invalid File Format" }), 415
files = {'file': (filename, file, mimetype)}
r = requests.post(jira_url, files=files, headers=headers, auth=HTTPBasicAuth(current_app.config.get('username'), current_app.config.get('password')),verify=False)
LOG.info("Got %s response from %s - text %s", r.status_code, "upload", r.json())
data = r.json()
filename = secure_filename(file.filename)
file.save(os.path.join(current_app.config["UPLOAD_FOLDER"], filename))
it saves the file but when I try to open it, it says we don't support this file format.
if I remove the post call to Jira from loop then it saves the file in the proper format.
did you try with open/write?:
with open("input_file.txt", "w") as text_file:
text_file.write("destination_file")
text_file.close()
solution to specify the path destination:
import os
filename = "input_file.txt"
path = "/pathlocation/..."
fullpath = os.path.join(path, filename)
with open("input_file.txt", "w") as text_file:
text_file.write(fullpath)
text_file.close()

Comparing files once I have hostname

I need a way to compare two files that have the same hostname in them. I have written a function that will parse out the hostnames and save that in a list. Once I have that I need to be able to compare the files.
Each file is in a different directory.
Step One: Retrieve "hostname" from each file.
Step Two: Run compare on files with same "hostname" from two directories.
Retrieve hostname Code:
def hostname_parse(directory):
results = []
try:
for filename in os.listdir(directory):
if filename.endswith(('.cfg', '.startup', '.confg')):
file_name = os.path.join(directory, filename)
with open(file_name, "r") as in_file:
for line in in_file:
match = re.search('hostname\s(\S+)', line)
if match:
results.append(match.group(1))
#print "Match Found"
return results
except IOError as (errno, strerror):
print "I/O error({0}): {1}".format(errno, strerror)
print "Error in hostname_parse function"
Sample Data:
Test File:
19-30#
!
version 12.3
service timestamps debug datetime msec
service timestamps log datetime msec
service password-encryption
!
hostname 19-30
!
boot-start-marker
boot-end-marker
!
ntp clock-period 17179738
ntp source Loopback0
!
end
19-30#
In this case the hostname is 19-30. For ease of testing I just used the same file but modified it to be the same or not the same.
As stated above. I can extract the hostname but am now looking for a way to then compare the files based on the hostname found.
At the core of things it is a file comparison. However being able to look at specific fields will be what I would like to accomplish. For starters I'm just looking to see that the files are identical. Case sensitivity shouldn't matter as these are cisco generated files that have the same formatting. The contents of the files are more important as I'm looking for "configuration" changes.
Here is some code to meet your requirements. I had no way to test, so it may have a few challenges. Is used hash lib to calculate a hash on the file contents, as a way to find changes.
import hashlib
import os
import re
HOSTNAME_RE = re.compile(r'hostname +(\S+)')
def get_file_info_from_lines(filename, file_lines):
hostname = None
a_hash = hashlib.sha1()
for line in file_lines:
a_hash.update(line.encode('utf-8'))
match = HOSTNAME_RE.match(line)
if match:
hostname = match.group(1)
return hostname, filename, a_hash.hexdigest()
def get_file_info(filename):
if filename.endswith(('.cfg', '.startup', '.confg')):
with open(filename, "r") as in_file:
return get_file_info_from_lines(filename, in_file.readlines())
def hostname_parse(directory):
results = {}
for filename in os.listdir(directory):
info = get_file_info(filename)
if info is not None:
results[info[0]] = info
return results
results1 = hostname_parse('dir1')
results2 = hostname_parse('dir2')
for hostname, filename, filehash in results1.values():
if hostname in results2:
_, filename2, filehash2 = results2[hostname]
if filehash != filehash2:
print("%s has a change (%s, %s)" % (
hostname, filehash, filehash2))
print(filename)
print(filename2)
print()

How can i filter out mimetype JSON only from no of files in python

I am writing python script to take one by one file from dir and get it mimetype if it's mimetype is not JSON then i want to ignore it. See below part of my script
for filepath in files:
filename = os.path.basename(filepath)
mimetype = mimetypes.guess_type(filepath, strict=False) //here i want to filter out only JSON file and ignore other one
version = "0"
checksum = "0"
fileext = os.path.splitext(filename)[1].lower()
# get raw file data
with open(filepath, "rb") as fr:
filedata = fr.read()
oldfilesize = len(filedata)
See my comment in above code.. Any resolution???
You could try something like this:
for filepath in files:
filename = os.path.basename(filepath)
mimetype = mimetypes.guess_type(filepath, strict=False)
if mimetype != ('application/json', None):
with open(filepath) as f:
try:
json.load(f)
except ValueError:
# It's not json
continue
# do stuff
but this could be inefficient if there are lots of files, and/or they are large.
Well, mimetypes won't help because the mime type application/json for .json files is not inherent to the file metadata. Rather you use it to to provide file type info to whoever is going to to handle it, e.g Content-Type: application/json in the HTTP response header tells the client that it is JSON.
Anyway, the solution might be as follows,
import json
with open("filename", "rt") as f:
try:
d = json.load(f) # no need to name it if you are just checking
except JSONDecodeError:
# handle it or just pass
else:
# Got a json file, do whatever

File write and file read in utf-16 in python

I have this file write function:
def filewrite(folderpath, filename, strdata, encmode):
try:
path = os.path.join(folderpath, filename)
if not path:
return
create_dir_path(folderpath)
#path = os.path.join(folderpath, filepath)
with codecs.open(path, mode='w', encoding=encmode) as fp:
fp.write(unicode(strdata))
except Exception, e:
raise Exception(e)
which am using to write data to a file:
filewrite(folderpath, filename, strdata, 'utf-16')
But, when if try to read this file am getting the exception:
Exception: UTF-16 stream does not start with BOM
My file read function is as show below:
def read_in_chunks(file_object, chunk_size=4096):
try:
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
except Exception, ex:
raise ex
def fileread(folderPath, fileName, encmode):
try:
path = os.path.join(folderPath, fileName)
fileData = ''
if os.access(path, os.R_OK):
with codecs.open(path, mode='r', encoding=encmode) as fp:
for block in read_in_chunks(fp):
fileData = fileData + block
return fileData
return ''
except Exception, ex:
raise ex
Please, let me know what am doing wrong here.
Thanks
There doesn't appear to be anything wrong with your code. Running it on my machine creates the proper BOM at the start of the file automatically.
BOM is a sequence of bytes at the start of the file that indicates which order multi-byte encodings (UTF-16) should be read - you can read about system endianness if you're interested.
If you're running on a mac/linux you should be able to hd your_utf16file or hexdump your_utf16file to check the raw bytes inside the file. Running your code I saw the correct bytes 0xff 0xfe at the beginning of mine.
Try replacing your fileread function portion with
with codecs.open(path, mode='r', encoding=encmode) as fp:
for block in fp:
print block
to ensure you can still read the file after eliminating external factors (your read_in_chunks functional).

How do I download a file from S3 using boto only if the remote file is newer than a local copy?

I'm trying to download a file from S3 using boto, but only if a local copy of the file is older than the remote file.
I'm using the header 'If-Modified-Since' and the code below:
#!/usr/bin/python
import os
import datetime
import boto
from boto.s3.key import Key
bucket_name = 'my-bucket'
conn = boto.connect_s3()
bucket = conn.get_bucket(bucket_name)
def download(bucket, filename):
key = Key(bucket, filename)
headers = {}
if os.path.isfile(filename):
print "File exists, adding If-Modified-Since header"
modified_since = os.path.getmtime(filename)
timestamp = datetime.datetime.utcfromtimestamp(modified_since)
headers['If-Modified-Since'] = timestamp.strftime("%a, %d %b %Y %H:%M:%S GMT")
try:
key.get_contents_to_filename(filename, headers)
except boto.exception.S3ResponseError as e:
return 304
return 200
print download(bucket, 'README')
The problem is that when the local file does not exist everything works well and the file is downloaded. When I run the script for the second time my function returns 304 as expected, but the file that was previously downloaded is deleted.
boto.s3.key.Key.get_contents_to_filename open file with wb mode; it truncate the file at the beginning of the function (boto/s3/key.py). In addition to that, it removes the file when an exception raised.
Instead of get_contents_to_filename, you can use get_contents_to_file with different open mode.
def download(bucket, filename):
key = Key(bucket, filename)
headers = {}
mode = 'wb'
updating = False
if os.path.isfile(filename):
mode = 'r+b'
updating = True
print "File exists, adding If-Modified-Since header"
modified_since = os.path.getmtime(filename)
timestamp = datetime.datetime.utcfromtimestamp(modified_since)
headers['If-Modified-Since'] = timestamp.strftime("%a, %d %b %Y %H:%M:%S GMT")
try:
with open(filename, mode) as f:
key.get_contents_to_file(f, headers)
f.truncate()
except boto.exception.S3ResponseError as e:
if not updating:
# got an error and we are not updating an existing file
# delete the file that was created due to mode = 'wb'
os.remove(filename)
return e.status
return 200
NOTE file.truncate is used to handle case where new file is smaller than previous one.

Categories