Related
I am trying to store three different variables(which are results from a for loop) as one line in a file. My entire code just in case you wondering what I am trying to do :
from Bio.PDB import *
from Bio import SeqIO
from glob import glob
parser = PDBParser(PERMISSIVE=True)
pdb_files = glob('/pdb_RF0001/*')
for fileName in pdb_files:
structure_id = fileName.rsplit('/', 1)[1][:-4]
structure = parser.get_structure(structure_id, fileName)
model = structure[0]
for residue1 in structure.get_residues():
for residue2 in structure.get_residues():
if residue1 != residue2:
try:
distance = residue1['P'] - residue2['P']
except KeyError:
continue
f = open('%s.txt' % fileName, 'w')
line = str(residue1)+','+str(residue2)+','+str(distance)
f.write(line)
f.close()
break
Sample code for check :
f = open('%s.txt' % fileName, 'wb')
line = int(residue1)+','+int(residue)+','+float(distance)
f.write(line)
f.close()
How to store the three different variables from the line variable as one line in an output file?
Use f-string
line = f"{residue1}, {residue}, {distance}"
int(residue) is an integer, and float(distance) is a real number (specifically, a floating-point number, hence the "float"). Thus, in this line, you are trying to add numbers to strings:
line = int(residue1)+','+int(residue)+','+float(distance)
However, Python disallows this. What you probably want to do is convert residue1, residue, and distance from (what I assume are) numbers to strings, like this:
line = str(residue1)+','+str(residue)+','+str(distance)
str.format() is one of the string formatting methods in Python
"{}, {}, {}".format("residue1", "residue", "distance")
I'm attempting to anonymize a file so that all the content except certain keywords are replaced with gibberish, but the format is kept the same (including punctuation, length of string and capitalization). For example:
I am testing this, check it out! This is a keyword: long
Wow, another line.
should turn in to:
T ad ehistmg ptrs, erovj qo giw! Tgds ar o qpyeogf: long
Yeg, rmbjthe yadn.
I am attempting to do this in python, but i'm having no luck in finding a solution. I have tried replacing via tokenization and writing to another file, but without much success.
Initially let's disregard the fact that we have to preserve some keywords. We will fix that later.
The easiest way to perform this kind of 1-to-1 mapping is to use the method str.translate. The string module also contains constants that contain all ASCII lowercase and uppercase characters, and random.shuffle can be used to obtain a random permutation.
import string
import random
random_caps = list(string.ascii_uppercase)
random_lows = list(string.ascii_lowercase)
random.shuffle(random_caps)
random.shuffle(random_lows)
all_random_chars = ''.join(random_lows + random_caps)
translation_table = str.maketrans(string.ascii_letters, all_random_chars)
with open('the-file-i-want.txt', 'r') as f:
contents = f.read()
translated_contents = contents.translate(translation_table)
with open('the-file-i-want.txt', 'w') as f:
f.write(translated_contents)
In python 2 the str.maketrans is a function in the string module instead of a static method of str.
The translation_table is a mapping from characters to characters, so it will map every single ASCII character to an other one. The translate method simply applies this table to each character in the string.
Important note: the above method is actually reversible, because each letter its mapped to a unique other letter. This means that using a simple analysis over the frequency of the symbols it's possible to reverse it.
If you want to make this harder or impossible, you could re-create the translation_table for every line:
import string
import random
random_caps = list(string.ascii_uppercase)
random_lows = list(string.ascii_lowercase)
with open('the-file-i-want.txt', 'r') as f:
translated_lines = []
for line in f:
random.shuffle(random_lows)
random.shuffle(random_caps)
all_random_chars = ''.join(random_lows + random_caps)
translation_table = str.maketrans(string.ascii_letters, all_random_chars)
translated_lines.append(line.translate(translation_table))
with open('the-file-i-want.txt', 'w') as f:
f.writelines(translated_lines)
Also note that you could translate and save the file line by line:
with open('the-file-i-want.txt', 'r') as f, open('output.txt', 'w') as o:
for line in f:
random.shuffle(random_lows)
random.shuffle(random_caps)
all_random_chars = ''.join(random_lows + random_caps)
translation_table = str.maketrans(string.ascii_letters, all_random_chars)
o.write(line.translate(translation_table))
Which means you can translate huge files with this code, as far as the lines themselves are not insanely long.
The code above messing all characters, without taking into account such keywords.
The simplest way to handle the requirement is to simply check for each line whether one of keywords occur and "reinsert" it there:
import re
import string
import random
random_caps = list(string.ascii_uppercase)
random_lows = list(string.ascii_lowercase)
keywords = ['long'] # add all the possible keywords in this list
keyword_regex = re.compile('|'.join(re.escape(word) for word in keywords))
with open('the-file-i-want.txt', 'r') as f, open('output.txt', 'w') as o:
for line in f:
random.shuffle(random_lows)
random.shuffle(random_caps)
all_random_chars = ''.join(random_lows + random_caps)
translation_table = str.maketrans(string.ascii_letters, all_random_chars)
matches = keyword_regex.finditer(line)
translated_line = list(line.translate(translation_table))
for match in matches:
translated_line[match.start():match.end()] = match.group()
o.write(''.join(translated_line))
Sample usage (using the version that prevserves keywords):
$ echo 'I am testing this, check it out! This is a keyword: long
Wow, another line.' > the-file-i-want.txt
$ python3 trans.py
$ cat output.txt
M vy hoahitc hfia, ufoum ih pzh! Hfia ia v modjpel: long
Ltj, fstkwzb hdsz.
Note how long is preserved.
I am still learner in python. I was not able to find a specific string and insert multiple strings after that string in python. I want to search the line in the file and insert the content of write function
I have tried the following which is inserting at the end of the file.
line = '<abc hij kdkd>'
dataFile = open('C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml', 'a')
dataFile.write('<!--Delivery Date: 02/15/2013-->\n<!--XML Script: 1.0.0.1-->\n')
dataFile.close()
You can use fileinput to modify the same file inplace and re to search for particular pattern
import fileinput,re
def modify_file(file_name,pattern,value=""):
fh=fileinput.input(file_name,inplace=True)
for line in fh:
replacement=value + line
line=re.sub(pattern,replacement,line)
sys.stdout.write(line)
fh.close()
You can call this function something like this:
modify_file("C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml",
"abc..",
"!--Delivery Date:")
Python strings are immutable, which means that you wouldn't actually modify the input string -you would create a new one which has the first part of the input string, then the text you want to insert, then the rest of the input string.
You can use the find method on Python strings to locate the text you're looking for:
def insertAfter(haystack, needle, newText):
""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
You could use it like
print insertAfter("Hello World", "lo", " beautiful") # prints 'Hello beautiful world'
Here is a suggestion to deal with files, I suppose the pattern you search is a whole line (there is nothing more on the line than the pattern and the pattern fits on one line).
line = ... # What to match
input_filepath = ... # input full path
output_filepath = ... # output full path (must be different than input)
with open(input_filepath, "r", encoding=encoding) as fin \
open(output_filepath, "w", encoding=encoding) as fout:
pattern_found = False
for theline in fin:
# Write input to output unmodified
fout.write(theline)
# if you want to get rid of spaces
theline = theline.strip()
# Find the matching pattern
if pattern_found is False and theline == line:
# Insert extra data in output file
fout.write(all_data_to_insert)
pattern_found = True
# Final check
if pattern_found is False:
raise RuntimeError("No data was inserted because line was not found")
This code is for Python 3, some modifications may be needed for Python 2, especially the with statement (see contextlib.nested. If your pattern fits in one line but is not the entire line, you may use "theline in line" instead of "theline == line". If your pattern can spread on more than one line, you need a stronger algorithm. :)
To write to the same file, you can write to another file and then move the output file over the input file. I didn't plan to release this code, but I was in the same situation some days ago. So here is a class that insert content in a file between two tags and support writing on the input file: https://gist.github.com/Cilyan/8053594
Frerich Raabe...it worked perfectly for me...good one...thanks!!!
def insertAfter(haystack, needle, newText):
#""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
with open(sddraft) as f1:
tf = open("<path to your file>", 'a+')
# Read Lines in the file and replace the required content
for line in f1.readlines():
build = insertAfter(line, "<string to find in your file>", "<new value to be inserted after the string is found in your file>") # inserts value
tf.write(build)
tf.close()
f1.close()
shutil.copy("<path to the source file --> tf>", "<path to the destination where tf needs to be copied with the file name>")
Hope this helps someone:)
I want to search a CSV file and print either True or False, depending on whether or not I found the string. However, I'm running into the problem whereby it will return a false positive if it finds the string embedded in a larger string of text. E.g.: It will return True if string is foo and the term foobar is in the CSV file. I need to be able to return exact matches.
username = input()
if username in open('Users.csv').read():
print("True")
else:
print("False")
I've looked at using mmap, re and csv module functions, but I haven't got anywhere with them.
EDIT: Here is an alternative method:
import re
import csv
username = input()
with open('Users.csv', 'rt') as f:
reader = csv.reader(f)
for row in reader:
re.search(r'\bNOTSUREHERE\b', username)
when you look inside a csv file using the csv module, it will return each row as a list of columns. So if you want to lookup your string, you should modify your code as such:
import csv
username = input()
with open('Users.csv', 'rt') as f:
reader = csv.reader(f, delimiter=',') # good point by #paco
for row in reader:
for field in row:
if field == username:
print "is in file"
but as it is a csv file, you might expect the username to be at a given column:
with open('Users.csv', 'rt') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
if username == row[2]: # if the username shall be on column 3 (-> index 2)
print "is in file"
You should have a look at the csv module in python.
is_in_file = False
with open('my_file.csv', 'rb') as csvfile:
my_content = csv.reader(csvfile, delimiter=',')
for row in my_content:
if username in row:
is_in_file = True
print is_in_file
It assumes that your delimiter is a comma (replace with the your delimiter. Note that username must be defined previously. Also change the name of the file.
The code loops through all the lines in the CSV file. row a list of string containing each element of your row. For example, if you have this in your CSV file: Joe,Peter,Michel the row will be ['Joe', 'Peter', 'Michel']. Then you can check if your username is in that list.
I have used the top comment, it works and looks OK, but it was too slow for me.
I had an array of many strings that I wanted to check if they were in a large csv-file. No other requirements.
For this purpose I used (simplified, I iterated through a string of arrays and did other work than print):
with open('my_csv.csv', 'rt') as c:
str_arr_csv = c.readlines()
Together with:
if str(my_str) in str(str_arr_csv):
print("True")
The reduction in time was about ~90% for me. Code locks ugly but I'm all about speed. Sometimes.
import csv
scoresList=[]
with open ("playerScores_v2.txt") as csvfile:
scores=csv.reader(csvfile, delimiter= ",")
for row in scores:
scoresList.append(row)
playername=input("Enter the player name you would like the score for:")
print("{0:40} {1:10} {2:10}".format("Name","Level","Score"))
for i in range(0,len(scoresList)):
print("{0:40} {1:10} {2:10}".format(scoresList[i] [0],scoresList[i] [1], scoresList[i] [2]))
EXTENDED ALGO:
As i can have in my csv some values with space:
", atleft,atright , both " ,
I patch the code of zmo as follow
if field.strip() == username:
and it's ok, thanks.
OLD FASHION ALGO
i had previously coded an 'old fashion' algorithm that takes care of any allowed separators ( here comma, space and newline),so i was curious to compare performances.
With 10000 rounds on a very simple csv file, i got:
------------------ algo 1 old fashion ---------------
done in 1.931804895401001 s.
------------------ algo 2 with csv ---------------
done in 1.926626205444336 s.
As this is not too bad, 0.25% longer, i think that this good old hand made algo can help somebody (and will be useful if more parasitic chars as strip is only for spaces)
This algo uses bytes and can be used for anything else than strings.
It search for a name not embedded in another by checking left and right bytes that must be in the allowed separators.
It mainly uses loops with ejection asap through break or continue.
def separatorsNok(x):
return (x!=44) and (x!=32) and (x!=10) and (x!=13) #comma space lf cr
# set as a function to be able to run several chained tests
def searchUserName(userName, fileName):
# read file as binary (supposed to be utf-8 as userName)
f = open(fileName, 'rb')
contents = f.read()
lenOfFile = len(contents)
# set username in bytes
userBytes = bytearray(userName.encode('utf-8'))
lenOfUser = len(userBytes)
posInFile = 0
posInUser = 0
while posInFile < lenOfFile:
found = False
posInUser = 0
# search full name
while posInFile < lenOfFile:
if (contents[posInFile] == userBytes[posInUser]):
posInUser += 1
if (posInUser == lenOfUser):
found = True
break
posInFile += 1
if not found:
continue
# found a fulll name, check if isolated on left and on right
# left ok at very beginning or space or comma or new line
if (posInFile > lenOfUser):
if separatorsNok(contents[posInFile-lenOfUser]): #previousLeft
continue
# right ok at very end or space or comma or new line
if (posInFile < lenOfFile-1):
if separatorsNok(contents[posInFile+1]): # nextRight
continue
# found and bordered
break
# main while
if found:
print(userName, "is in file") # at posInFile-lenOfUser+1)
else:
pass
to check: searchUserName('pirla','test.csv')
As other answers, code exit at first match but can be easily extended to find all.
HTH
#!/usr/bin/python
import csv
with open('my.csv', 'r') as f:
lines = f.readlines()
cnt = 0
for entry in lines:
if 'foo' in entry:
cnt += 1
print"No of foo entry Count :".ljust(20, '.'), cnt
I have this error:
Traceback (most recent call last):
File "python_md5_cracker.py", line 27, in <module>
m.update(line)
TypeError: Unicode-objects must be encoded before hashing
when I try to execute this code in Python 3.2.2:
import hashlib, sys
m = hashlib.md5()
hash = ""
hash_file = input("What is the file name in which the hash resides? ")
wordlist = input("What is your wordlist? (Enter the file name) ")
try:
hashdocument = open(hash_file, "r")
except IOError:
print("Invalid file.")
raw_input()
sys.exit()
else:
hash = hashdocument.readline()
hash = hash.replace("\n", "")
try:
wordlistfile = open(wordlist, "r")
except IOError:
print("Invalid file.")
raw_input()
sys.exit()
else:
pass
for line in wordlistfile:
# Flush the buffer (this caused a massive problem when placed
# at the beginning of the script, because the buffer kept getting
# overwritten, thus comparing incorrect hashes)
m = hashlib.md5()
line = line.replace("\n", "")
m.update(line)
word_hash = m.hexdigest()
if word_hash == hash:
print("Collision! The word corresponding to the given hash is", line)
input()
sys.exit()
print("The hash given does not correspond to any supplied word in the wordlist.")
input()
sys.exit()
It is probably looking for a character encoding from wordlistfile.
wordlistfile = open(wordlist,"r",encoding='utf-8')
Or, if you're working on a line-by-line basis:
line.encode('utf-8')
EDIT
Per the comment below and this answer.
My answer above assumes that the desired output is a str from the wordlist file. If you are comfortable in working in bytes, then you're better off using open(wordlist, "rb"). But it is important to remember that your hashfile should NOT use rb if you are comparing it to the output of hexdigest. hashlib.md5(value).hashdigest() outputs a str and that cannot be directly compared with a bytes object: 'abc' != b'abc'. (There's a lot more to this topic, but I don't have the time ATM).
It should also be noted that this line:
line.replace("\n", "")
Should probably be
line.strip()
That will work for both bytes and str's. But if you decide to simply convert to bytes, then you can change the line to:
line.replace(b"\n", b"")
You must have to define encoding format like utf-8,
Try this easy way,
This example generates a random number using the SHA256 algorithm:
>>> import hashlib
>>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'
import hashlib
string_to_hash = '123'
hash_object = hashlib.sha256(str(string_to_hash).encode('utf-8'))
print('Hash', hash_object.hexdigest())
The error already says what you have to do. MD5 operates on bytes, so you have to encode Unicode string into bytes, e.g. with line.encode('utf-8').
To store the password (PY3):
import hashlib, os
password_salt = os.urandom(32).hex()
password = '12345'
hash = hashlib.sha512()
hash.update(('%s%s' % (password_salt, password)).encode('utf-8'))
password_hash = hash.hexdigest()
encoding this line fixed it for me.
m.update(line.encode('utf-8'))
Please take a look first at that answer.
Now, the error message is clear: you can only use bytes, not Python strings (what used to be unicode in Python < 3), so you have to encode the strings with your preferred encoding: utf-32, utf-16, utf-8 or even one of the restricted 8-bit encodings (what some might call codepages).
The bytes in your wordlist file are being automatically decoded to Unicode by Python 3 as you read from the file. I suggest you do:
m.update(line.encode(wordlistfile.encoding))
so that the encoded data pushed to the md5 algorithm are encoded exactly like the underlying file.
You could open the file in binary mode:
import hashlib
with open(hash_file) as file:
control_hash = file.readline().rstrip("\n")
wordlistfile = open(wordlist, "rb")
# ...
for line in wordlistfile:
if hashlib.md5(line.rstrip(b'\n\r')).hexdigest() == control_hash:
# collision
If it's a single line string. wrapt it with b or B. e.g:
variable = b"This is a variable"
or
variable2 = B"This is also a variable"
This program is the bug free and enhanced version of the above MD5 cracker that reads the file containing list of hashed passwords and checks it against hashed word from the English dictionary word list. Hope it is helpful.
I downloaded the English dictionary from the following link
https://github.com/dwyl/english-words
# md5cracker.py
# English Dictionary https://github.com/dwyl/english-words
import hashlib, sys
hash_file = 'exercise\hashed.txt'
wordlist = 'data_sets\english_dictionary\words.txt'
try:
hashdocument = open(hash_file,'r')
except IOError:
print('Invalid file.')
sys.exit()
else:
count = 0
for hash in hashdocument:
hash = hash.rstrip('\n')
print(hash)
i = 0
with open(wordlist,'r') as wordlistfile:
for word in wordlistfile:
m = hashlib.md5()
word = word.rstrip('\n')
m.update(word.encode('utf-8'))
word_hash = m.hexdigest()
if word_hash==hash:
print('The word, hash combination is ' + word + ',' + hash)
count += 1
break
i += 1
print('Itiration is ' + str(i))
if count == 0:
print('The hash given does not correspond to any supplied word in the wordlist.')
else:
print('Total passwords identified is: ' + str(count))
sys.exit()