I'm having trouble getting this to work, and I am hoping for any ideas:
My goal: to take a file, read it line by line, substitute any IP address for a specific substitute, and write the changes to the same file.
I KNOW THIS IS NOT CORRECT SYNTAX
Pseudo-Example:
$ cat foo
10.153.193.0/24 via 10.153.213.1
def swap_ip_inline(line):
m = re.search('some-regex', line)
if m:
for each_ip_it_matched:
ip2db(original_ip)
new_line = reconstruct_line_with_new_ip()
line = new_line
return line
for l in foo.readlines():
swap_ip_inline(l)
do some foo to rebuild the file.
I want to take the file 'foo', find each IP in a given line, substitute the ip using the ip2db function, and then output the altered line.
Workflow:
1. Open File
2. Read Lines
3. Swap IP's
4. Save lines (altered/unaltered) into tmp file
5. Overwrite original file with tmp file
*edited to add pseudo-code example
Here you go:
>>> import re
>>> ip_addr_regex = re.compile(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b')
>>> f = open('foo')
>>> for line in f:
... print(line)
...
10.153.193.0/24 via 10.153.213.1
>>> f.seek(0)
>>>
specific_substitute = 'foo'
>>> for line in f:
... re.sub(ip_addr_regex, specific_substitute, line)
...
'foo/24 via foo\n'
This link gave me the breatkthrough I was looking for:
Python - parse IPv4 addresses from string (even when censored)
a simple modification passes initial smoke tests:
def _sub_ip(self, line):
pattern = r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)([ (\[]?(\.|dot)[ )\]]?(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})"
ips = [each[0] for each in re.findall(pattern, line)]
for item in ips:
location = ips.index(item)
ip = re.sub("[ ()\[\]]", "", item)
ip = re.sub("dot", ".", ip)
ips.remove(item)
ips.insert(location, ip)
for ip in ips:
line = line.replace(ip, self._ip2db(ip))
return line
I'm sure I'll clean it up down the road, but it's a great start.
Related
How to parse a text file containing this pattern "KEYWORD: Out:" and dump the result into output file using Python?
input.txt
DEBUG 2020-11:11:17.401 KEYWORD: Out:0xaaaf0000 In:0x80000000.1110ffff.
DEBUG 2020-11:11:17.401 KEYWORD: Out:0xaaaf00cc In:0x80000000.1110ffaa.
output.txt
0xaaaf0000:1110ffff
0x80000000:1110ffaa
You could use a regex:
import re
txt='''\
DEBUG 2020-11:11:17.401 KEYWORD: Out:0xaaaf0000 In:0x80000000.1110ffff.
DEBUG 2020-11:11:17.401 KEYWORD: Out:0xaaaf00cc In:0x80000000.1110ffaa.'''
pat=r'KEYWORD: Out:(0x[a-f0-9]+)[ \t]+In:0x[a-f0-9]+\.([a-f0-9]+)'
>>> '\n'.join([m[0]+':'+m[1] for m in re.findall(pat, txt)])
0xaaaf0000:1110ffff
0xaaaf00cc:1110ffaa
If you want to do this line-by-line from a file:
import re
pat=r'KEYWORD: Out:(0x[a-f0-9]+)[ \t]+In:0x[a-f0-9]+\.([a-f0-9]+)'
with open(ur_file) as f:
for line in f:
m=re.search(pat, line)
if m:
print(m.group(1)+':'+m.group(2))
In [6]: lines
Out[6]:
['DEBUG 2020-11:11:17.401 KEYWORD: Out:0xaaaf0000 In:0x80000000.1110ffff.',
'DEBUG 2020-11:11:17.401 KEYWORD: Out:0xaaaf00cc In:0x80000000.1110ffaa.']
In [7]: [x.split('Out:')[1].split(' ')[0] + ':' + x.split('In:')[1].split('.')[1] for x in lines]
Out[7]: ['0xaaaf0000:1110ffff', '0xaaaf00cc:1110ffaa']
I think the 2nd line for your 'output.txt' might be wrong (or complex if not - you'd need to point this out).
Otherwise, maybe a RegEx like this:
(.*Out:)(0x[0-9a-f]{1,8}) In:0x[0-9a-f]{1,8}\.([0-9a-f]{1,8}).
https://regex101.com/r/lMrR06/2
How to check if a source_file contains email addresses or md5 once you download
data2 = pd.read_csv(source_file, header=None)
tried using regrex and str.contains...but not able to figure out how to proceed
if that is checked then according to that i need to proceed for rest of the script
source_file1:
abd#gmail.com
xyz#gmail.com
source_file2:
d131dd02c5e6vrc4
55ad340609f4fw02
So far, I have tried:
if(data2['email/md5'].str.contains(r'[a-zA-Z0-9._-]+#[a-zA-Z.]+')==1): print "yes"
Try this pattern r'#\w+\.com'.
Ex:
import pandas as pd
df1 = pd.read_csv(filename1, names=['email/md5'])
if df1['email/md5'].str.contains(r'#\w+\.com').all():
print("Email")
else:
print("md5")
If I understand well the question, you have 2 files, and you want to automatically detect which one has email adresses and which one has md5?
import re
import re
with open(source_file1, 'r') as f:
line = f.readline()
while not line:
line = f.readline()
#First line not empty containing a mail address
if re.match('[^#]+#[^#]+\.[^#]+', f.readline()):
mail_source_file = source_file1
md5_source_file = source_file2
else:
md5_source_file = source_file1
mail_source_file = source_file2
mail_dataframe = pd.read_csv(mail_source_file, header=None)
md5_dataframe = pd.read_csv(md5_source_file, header=None)
Does this help?
This might work, considering the hash as a 16 alphanumeric symbols and there are no invalid emails:
with open('file.txt', 'r') as myfile:
getFile = myfile.read()
# Emails
numberOfEmails = len(re.findall(r'#(.*?).com', getFile))
print "%d email(s) found"%(numberOfEmails)
# MD5
hashFormatCnt = 0
splitFile = getFile.split()
for str in splitFile:
if re.match('^[\w-]+$', str):
if len(str) == 16:
hashFormatCnt = hashFormatCnt + 1
print "%d look like hash found"%(hashFormatCnt)
you need to be more specific.
if you want to know how email looks, you can see it here.
if you want to know how md5 looks, it usually is represented as 32 hex digits (0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f).
I'm learning Python and wanted to automate one of my assignments in a cybersecurity class.
I'm trying to figure out how I would look for the contents of a file that are bound by a set of parenthesis. The contents of the (.txt) file look like:
cow.jpg : jphide[v5](asdfl;kj88876)
fish.jpg : jphide[v5](65498ghjk;0-)
snake.jpg : jphide[v5](poi098*/8!##)
test_practice_0707.jpg : jphide[v5](sJ*=tT#&Ve!2)
test_practice_0101.jpg : jphide[v5](nKFdFX+C!:V9)
test_practice_0808.jpg : jphide[v5](!~rFX3FXszx6)
test_practice_0202.jpg : jphide[v5](X&aC$|mg!wC2)
test_practice_0505.jpg : jphide[v5](pe8f%yC$V6Z3)
dog.jpg : negative`
And here is my code so far:
import sys, os, subprocess, glob, shutil
# Finding the .jpg files that will be copied.
sourcepath = os.getcwd() + '\\imgs\\'
destpath = 'stegdetect'
rawjpg = glob.glob(sourcepath + '*.jpg')
# Copying the said .jpg files into the destpath variable
for filename in rawjpg:
shutil.copy(filename, destpath)
# Asks user for what password file they want to use.
passwords = raw_input("Enter your password file with the .txt extension:")
shutil.copy(passwords, 'stegdetect')
# Navigating to stegdetect. Feel like this could be abstracted.
os.chdir('stegdetect')
# Preparing the arguments then using subprocess to run
args = "stegbreak.exe -r rules.ini -f " + passwords + " -t p *.jpg"
# Uses open to open the output file, and then write the results to the file.
with open('cracks.txt', 'w') as f: # opens cracks.txt and prepares to w
subprocess.call(args, stdout=f)
# Processing whats in the new file.
f = open('cracks.txt')
If it should just be bound by ( and ) you can use the following regex, which ensures starting ( and closing ) and you can have numbers and characters between them. You can add any other symbol also that you want to include.
[\(][a-z A-Z 0-9]*[\)]
[\(] - starts the bracket
[a-z A-Z 0-9]* - all text inside bracket
[\)] - closes the bracket
So for input sdfsdfdsf(sdfdsfsdf)sdfsdfsdf , the output will be (sdfdsfsdf)
Test this regex here: https://regex101.com/
I'm learning Python
If you are learning you should consider alternative implementations, not only regexps.
TO iterate line by line of a text file you just open the file and for over the file handle:
with open('file.txt') as f:
for line in f:
do_something(line)
Each line is a string with the line contents, including the end-of-line char '/n'. To find the start index of a specific substring in a string you can use find:
>>> A = "hello (world)"
>>> A.find('(')
6
>>> A.find(')')
12
To get a substring from the string you can use the slice notation in the form:
>>> A[6:12]
'(world'
You should use regular expressions which are implemented in the Python re module
a simple regex like \(.*\) could match your "parenthesis string"
but it would be better with a group \((.*)\) which allows to get only the content in the parenthesis.
import re
test_string = """cow.jpg : jphide[v5](asdfl;kj88876)
fish.jpg : jphide[v5](65498ghjk;0-)
snake.jpg : jphide[v5](poi098*/8!##)
test_practice_0707.jpg : jphide[v5](sJ*=tT#&Ve!2)
test_practice_0101.jpg : jphide[v5](nKFdFX+C!:V9)
test_practice_0808.jpg : jphide[v5](!~rFX3FXszx6)
test_practice_0202.jpg : jphide[v5](X&aC$|mg!wC2)
test_practice_0505.jpg : jphide[v5](pe8f%yC$V6Z3)
dog.jpg : negative`"""
REGEX = re.compile(r'\((.*)\)', re.MULTILINE)
print(REGEX.findall(test_string))
# ['asdfl;kj88876', '65498ghjk;0-', 'poi098*/8!##', 'sJ*=tT#&Ve!2', 'nKFdFX+C!:V9' , '!~rFX3FXszx6', 'X&aC$|mg!wC2', 'pe8f%yC$V6Z3']
In order to make sure I start and stop reading a text file exactly where I want to, I am providing 'start1'<->'end1', 'start2'<->'end2' as tags in between the text file and providing that to my python script. In my script I read it as:
start_end = ['start1','end1']
line_num = []
with open(file_path) as fp1:
for num, line in enumerate(fp1, 1):
for i in start_end:
if i in line:
line_num.append(num)
fp1.close()
print '\nLine number: ', line_num
fp2 = open(file_path)
for k, line2 in enumerate(fp2):
for x in range(line_num[0], line_num[1] - 1):
if k == x:
header.append(line2)
fp2.close()
This works well until I reach start10 <-> end10 and further. Eg. it checks if I have "start2" in the line and also reads the text that has "start21" and similarly for end tag as well. so providing "start1, end1" as input also reads "start10, end10". If I replace the line:
if i in line:
with
if i == line:
it throws an error.
How can I make sure that the script reads the line that contains ONLY "start1" and not "start10"?
import re
prog = re.compile('start1$')
if prog.match(line):
print line
That should return None if there is no match and return a regex match object if the line matches the compiled regex. The '$' at the end of the regex says that's the end of the line, so 'start1' works but 'start10' doesn't.
or another way..
def test(line):
import re
prog = re.compile('start1$')
return prog.match(line) != None
> test('start1')
True
> test('start10')
False
Since your markers are always at the end of the line, change:
start_end = ['start1','end1']
to:
start_end = ['start1\n','end1\n']
You probably want to look into regular expressions. The Python re library has some good regex tools. It would let you define a string to compare your line to and it has the ability to check for start and end of lines.
If you can control the input file, consider adding an underscore (or any non-number character) to the end of each tag.
'start1_'<->'end1_'
'start10_'<->'end10_'
The regular expression solution presented in other answers is more elegant, but requires using regular expressions.
You can do this with find():
for num, line in enumerate(fp1, 1):
for i in start_end:
if i in line:
# make sure the next char isn't '0'
if line[line.find(i)+len(i)] != '0':
line_num.append(num)
I am trying to open a text file. Parse the text file for specific regex patterns then when if I find that pattern I write the regex returned pattern to another text file.
Specifically a list of IP Addresses which I want to parse specific ones out of.
So the file may have
10.10.10.10
9.9.9.9
5.5.5.5
6.10.10.10
And say I want just the IPs that end in 10 (the regex I think I am good with) My example looks for the 10.180.42, o4 41.XX IP hosts. But I will adjust as needed.
I've tried several method and fail miserably at them all. It's days like this I know why I just never mastered any language. But I'm committed to Python so here goes.
import re
textfile = open("SymantecServers.txt", 'r')
matches = re.findall('^10.180\.4[3,1].\d\d',str(textfile))
print(matches)
This gives me empty backets. I had to encase the textfile in the str function or it just puked. I don't know if this is right.
This just failed all over the place no matter how I fine tuned it.
f = open("SymantecServers.txt","r")
o = open("JustIP.txt",'w', newline="\r\n")
for line in f:
pattern = re.compile("^10.180\.4[3,1].\d\d")
print(pattern)
#o.write(pattern)
#o.close()
f.close()
I did get one working but it just returned the entire line (including netmask and other test like hostname which are all on the same line in the text file. I just want IP)
Any help on how to read a text file and if it has a pattern of IP grab the full IP and write that into another text file so I end up with a text file with a list of just the IPs I want. I am 3 hours into it and behind on work so going to do the first file by hand...
I am just at a loss what I am missing. Sorry for being a newbie
here is it working:
>>> s = """10.10.10.10
... 9.9.9.9
... 5.5.5.5
... 10.180.43.99
... 6.10.10.10"""
>>> re.findall(r'10\.180\.4[31]\.\d\d', s)
['10.180.43.99']
you do not really need to add line boundaries, as you're matching a very specific IP address, if your file does not have weird things like '123.23.234.10.180.43.99.21354' that you don't want to match, it should be ok!
your syntax of [3,1] is matching either 3, 1 or , and you don't want to match against a comma ;-)
about your function:
r = re.compile(r'10\.180\.4[31]\.\d\d')
with open("SymantecServers.txt","r") as f:
with open("JustIP.txt",'w', newline="\r\n") as o:
for line in f:
matches = r.findall(line)
for match in matches:
o.write(match)
though if I were you, I'd extract IPs using:
r = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
with open("SymantecServers.txt","r") as f:
with open("JustIP.txt",'w', newline="\r\n") as o:
for line in f:
matches = r.findall(line)
for match in matches:
a, b, c, d = match.split('.')
if int(a) < 255 and int(b) < 255 and int(c) in (43, 41) and int(d) < 100:
o.write(match)
or another way to do it:
r = re.compile(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})')
with open("SymantecServers.txt","r") as f:
with open("JustIP.txt",'w', newline="\r\n") as o:
for line in f:
m = r.match(line)
if m:
a, b, c, d = m.groups()
if int(a) < 255 and int(b) < 255 and int(c) in (43, 41) and int(d) < 100:
o.write(match)
which uses the regex to split the IP address into groups.
What you're missing is that you're doing a re.compile() which creates a Regular Expression object in Python. You're forgetting to match.
You could try:
# This isn't the best way to match IP's, but if it fits for your use-case keep it for now.
pattern = re.compile("^10.180\.4[13].\d\d")
f = open("SymantecServers.txt",'r')
o = open("JustIP.txt",'w')
for line in f:
m = pattern.match(line)
if m is not None:
print "Match: %s" %(m.group(0))
o.write(m.group(0) + "\n")
f.close()
o.close()
Which is compiling the Python object, attempting to match the line against the compiled object, and then printing out that current match. I can avoid having to split my matches, but I have to pay attention to matching groups - therefore group(0)
You can also look at re.search() which you can do, but if you're running search enough times with the same regular expression, it becomes more worthwhile to use compile.
Also note that I moved the f.close() to the outside of the for loop.