I have a .txt file that contains a list of IP address:
111.67.74.234:8080
111.67.75.89:8080
12.155.183.18:3128
128.208.04.198:2124
142.169.1.233:80
There's a lot more than that though :)
Anyway, imported this into a list using Python and I'm trying to get it to sort them, but I'm having trouble. Anybody have any ideas?
EDIT:
Ok since that was vague, this is what I had so fair.
f = open("/Users/jch5324/Python/Proxy/resources/data/list-proxy.txt", 'r+')
lines = [x.split() for x in f]
new_file = (sorted(lines, key=lambda x:x[:18]))
You're probably sorting them by ascii string-comparison ('.' < '5', etc.), when you'd rather that they sort numerically. Try converting them to tuples of ints, then sorting:
def ipPortToTuple(string):
"""
'12.34.5.678:910' -> (12,34,5,678,910)
"""
ip,port = string.strip().split(':')
return tuple(int(i) for i in ip.split('.')) + (port,)
with open('myfile.txt') as f:
nonemptyLines = (line for line in f if line.strip()!='')
sorted(nonemptyLines, key=ipPortToTuple)
edit: The ValueError you are getting is because your text files are not entirely in the #.#.#.#:# format as you imply. (There may be comments or blank lines, though in this case the error would hint that there is a line with more than one ':'.) You can use debugging techniques to home in on your issue, by catching the exception and emitting useful debugging data:
def tryParseLines(lines):
for line in lines:
try:
yield ipPortToTuple(line.strip())
except Exception:
if __debug__:
print('line {} did not match #.#.#.#:# format'.format(repr(line)))
with open('myfile.txt') as f:
sorted(tryParseLines(f))
I was a bit sloppy in the above, in that it still lets some invalid IP addresses through (e.g. #.#.#.#.#, or 257.-1.#.#). Below is a more thorough solution, which allows you do things like compare IP addresses with the < operators, also making sorting work naturally:
#!/usr/bin/python3
import functools
import re
#functools.total_ordering
class Ipv4Port(object):
regex = re.compile(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}):(\d{1,5})')
def __init__(self, ipv4:(int,int,int,int), port:int):
try:
assert type(ipv4)==tuple and len(ipv4)==4, 'ipv4 not 4-length tuple'
assert all(0<=x<256 for x in ipv4), 'ipv4 numbers not in valid range (0<=n<256)'
assert type(port)==int, 'port must be integer'
except AssertionError as ex:
print('Invalid IPv4 input: ipv4={}, port={}'.format(repr(ipv4),repr(port)))
raise ex
self.ipv4 = ipv4
self.port = port
self._tuple = ipv4+(port,)
#classmethod
def fromString(cls, string:'12.34.5.678:910'):
try:
a,b,c,d,port = cls.regex.match(string.strip()).groups()
ip = tuple(int(x) for x in (a,b,c,d))
return cls(ip, int(port))
except Exception as ex:
args = list(ex.args) if ex.args else ['']
args[0] += "\n...indicating ipv4 string {} doesn't match #.#.#.#:# format\n\n".format(repr(string))
ex.args = tuple(args)
raise ex
def __lt__(self, other):
return self._tuple < other._tuple
def __eq__(self, other):
return self._tuple == other._tuple
def __repr__(self):
#return 'Ipv4Port(ipv4={ipv4}, port={port})'.format(**self.__dict__)
return "Ipv4Port.fromString('{}.{}.{}.{}:{}')".format(*self._tuple)
and then:
def tryParseLines(lines):
for line in lines:
line = line.strip()
if line != '':
try:
yield Ipv4Port.fromString(line)
except AssertionError as ex:
raise ex
except Exception as ex:
if __debug__:
print(ex)
raise ex
Demo:
>>> lines = '222.111.22.44:214 \n222.1.1.1:234\n 23.1.35.6:199'.splitlines()
>>> sorted(tryParseLines(lines))
[Ipv4Port.fromString('23.1.35.6:199'), Ipv4Port.fromString('222.1.1.1:234'), Ipv4Port.fromString('222.111.22.44:214')]
Changing the values to be for example 264... or ...-35... will result in the appropriate errors.
#Ninjagecko's solution is the best but here is another way of doing it using re:
>>> import re
>>> with open('ips.txt') as f:
print sorted(f, key=lambda line: map(int, re.split(r'\.|:', line.strip())))
['12.155.183.18:3128\n', '111.67.74.234:8080\n', '111.67.75.89:8080\n',
'128.208.04.198:2124\n', '142.169.1.233:80 \n']
You can pre-proces the list so it can be sorted using the built in comparison function. and then process it back to a more normal format.
strings will be the same length and can be sorted . Afterwards, we simply remove all spaces.
you can google around and find other examples of this.
for i in range(len(address)):
address[i] = "%3s.%3s.%3s.%3s" % tuple(ips[i].split("."))
address.sort()
for i in range(len(address)):
address[i] = address[i].replace(" ", "")
if you have a ton of ip address you are going to get better processing time if you use c++. it will be more work up front but you will get better processing times.
Related
I'm trying to find regex matches to each entry of a text file in order to structure the data better.
Keeps returning "No match" but if I call the function manually on the entry, it works.
import re
# The patterns
r1 = re.compile('.*full.*time.*', flags = re.IGNORECASE)
r2 = re.compile('.*contingent.*', flags = re.IGNORECASE)
r3 = re.compile('.*intern', flags = re.IGNORECASE)
def doSomething1():
print ("Full Time")
def doSomething2():
print("Contract")
def doSomething3():
print("Internship")
def default():
print ("No match")
def match(r, s):
mo = re.match(r, s)
try:
return mo.group()
except AttributeError:
return None
def delegate(s):
try:
action = {
match(r1, s): doSomething1,
match(r2, s): doSomething2,
match(r3, s): doSomething3
}[s]()
return action
except KeyError:
return default()
with open('data.txt', 'r') as data:
for job in data:
delegate(job)
This is the data.txt:
Full Time Remote
Contingent
Intern
If you set flags as flags = re.IGNORECASE | re.DOTALL, then the three lines will all match.
According to docs, If the DOTALL flag has been specified, this matches any character including a newline.
But your design of delegate is a little bad. You'd better tell us what you really/finally want.
You want to match Full Time Remote but it is matching Full Time Remote\n. In fact, each line in the file is followed by a newline character, so you won't get any matches since . matches any character except a newline by default. You can handle it by re.DOTALL flag as Lei Yang said or having s.strip() to remove newline.
However, I think string method is enough to do the task.
def delegate(s):
# ignore case by lower() and remove newline by strip()
s = s.lower().strip("\n")
# check index by find() to ensure "full" before "time"
if "full" in s and s.find("full") < s.find("time"):
print("Full Time")
elif "contingent" in s:
print("Contract")
# last 6 characters are "intern"
elif s[-6:] == "intern":
print("Intern")
else:
print("No match")
with open('data.txt', 'r') as data:
for job in data:
delegate(job)
With data.txt as:
Full Time Remote
Time Remote Full
Contingent
Intern
Intern 123
Result:
Full Time
No match
Contract
Intern
No match
Hello i have a that file:
WORKERS = yovel:10.0.0.6,james:10.0.0.7
BLACKLIST = 92.122.197.45:ynet,95.1.2.2:twitter
I'm trying to write a function in python that will get the worker IP and returns the worker name like this:
workername = getName(ip)
The only method i thougt to do it is with splits(using .split(":") , .split(",") etc.) but it will be very long code and not smart.
is there a shorter way to do it?
You can use re:
import re
def getName(ip, content = open('filename.txt').read()):
_r = re.findall('\w+(?=:{})'.format(ip), content)
return _r[0] if _r else None
print(getName('10.0.0.6'))
Output:
'yovel'
Note, however, it is slightly more robust to use split:
def getName(ip):
lines = dict(i.strip('\n').split(' = ') for i in open('filename.txt')]
d = {b:a for a, b in map(lambda x:x.split(':'), lines['WORKERS'].split(','))}
return d.get(ip)
Using split() doesn't look too bad here:
def getName(ip_address, filename='file.txt', line_type='WORKERS'):
with open(filename) as in_file:
for line in in_file:
name, info = [x.strip() for x in line.strip().split('=')]
if name == line_type:
info = [x.split(':') for x in info.split(',')]
lookup = {ip: name for name, ip in info}
return lookup.get(ip_address)
Which works as follows:
>>> getName('10.0.0.6')
'yovel'
I want to create python script which can modify code in that script itself using Python Language Services or using any other way.
e.g. A script which keep track of its count of successfull execution
import re
COUNT = 0
def updateCount():
# code to update second line e.g. COUNT = 0
pass
if __name__ == '__main__':
print('This script has run {} times'.format(COUNT))
updateCount()
On successful execution of this script code should get changed to
import re
COUNT = 1
def updateCount():
# code to update second line e.g. COUNT = 0
pass
if __name__ == '__main__':
print('This script has run {} times'.format(COUNT))
updateCount()
Simple approach came to my mind was to open __file__ in write mode and do requried modification using reguler expessions etc. But that did not work I got exception io.UnsupportedOperation: not readable. Even if this approach would be working then it would be very risky because it can spoil my whole script. so I am looking for solution using Python Language Services.
Yes, you can use the language services to achieve self-modification, as in following example:
>>> def foo(): print("original foo")
>>> foo()
original foo
>>> rewrite_txt="def foo(): print('I am new foo')"
>>> newcode=compile(rewrite_text,"",'exec')
>>> eval(newcode)
>>> foo()
I am new foo
So, by new dynamically generated code you can replace stuff contained in the original source file, without modifying the file itself.
A python script is nothing more than a text file. So, you are able to open it as an external file and read & write on that. (Using __file__ variable you can get the exact name of your script):
def updateCount():
fin = open(__file__, 'r')
code = fin.read()
fin.close()
second_line = code.split('\n')[1]
second_line_parts = second_line.split(' ')
second_line_parts[2] = str(int(second_line_parts[2])+1)
second_line = ' '.join(second_line_parts)
lines = code.split('\n')
lines[1] = second_line
code = '\n'.join(lines)
fout = open(__file__, 'w')
fout.write(code)
fout.close()
#kyriakosSt's answer works but hard-codes that the assignment to COUNT must be on the second line, which can be prone to unexpected behaviors over time when the line number changes due to the source being modified for something else.
For a more robust solution, you can use lib2to3 to parse and update the source code instead, by subclassing lib2to3.refactor.RefactoringTool to refactor the code using a fixer that is a subclass of lib2to3.fixer_base.BaseFix with a pattern that looks for an expression statement with the pattern 'COUNT' '=' any, and a transform method that updates the last child node by incrementing its integer value:
from lib2to3 import fixer_base, refactor
COUNT = 0 # this should be incremented every time the script runs
class IncrementCount(fixer_base.BaseFix):
PATTERN = "expr_stmt< 'COUNT' '=' any >"
def transform(self, node, results):
node.children[-1].value = str(int(node.children[-1].value) + 1)
return node
class Refactor(refactor.RefactoringTool):
def __init__(self, fixers):
self._fixers = [cls(None, None) for cls in fixers]
super().__init__(None)
def get_fixers(self):
return self._fixers, []
with open(__file__, 'r+') as file:
source = str(Refactor([IncrementCount]).refactor_string(file.read(), ''))
file.seek(0)
file.write(source)
Demo: https://repl.it/#blhsing/MushyStrangeClosedsource
This will edit the module level variables defined before _local_config. Later, process an update to the dictionary, then replace the line when iterating over the source file with the new _local_config values:
count = 0
a = 0
b = 1
c = 1
_local_config = dict(
filter(
lambda elem: (elem[0][:2] != "__") and (str(elem[1])[:1] != "<"),
globals().items(),
),
)
# do some stuff
count += 1
c = a + b
a = b
b = c
# update with new values
_local_config = dict(
filter(
lambda elem: elem[0] in _local_config.keys(),
globals().items(),
)
)
# read self
with open(__file__, "r") as f:
new_file = ""
for line in f.read().split("\n"):
for k, v in _local_config.items():
search = f"{k} = "
if search == line[: len(k) + 3]:
line = search + str(v)
_local_config.pop(k)
break
new_file += line + "\n"
# write self
with open(__file__, "w") as f:
f.write(new_file[:-1])
Lets say, I have a file named aaa.txt in which the contents are stored in the following format:
# This is a comment
127.0.0.1 localhost
192.168.2.253 pyschools #pyschools server
100.0.0.4.9 amazon.com
.....
I need to write a Python function which accepts the string ip_address as argument and return the corresponding hostname searching for it from the file.
If the ip_address is not found in the file, it should return unknown host.
This is my working solution.
def gethostname(ip_address):
w = open("aaa.txt")
for line in w:
line = line.rstrip()
l = line.split('\t')
k = line.split(" ")
if ip_address == l[0] or ip_address == k[0]:
return l[-1]
else:
continue
return "Unknown host"
Example:
if ip_address = 127.0.0.1, it should return localhost.
if ip_address = 194.2.3, it should return unknown host.
But, when I submit this code in pyschools.com[Topic 13: Question 10], it says private test cases failed.
I've been toiling hard with this problem for a long time now and I don't understand what I am missing.
This is the link of that problem. . You need to sign in using gmail to access it. Please do let me know if someone completes it
Simply put them in a dict and use dict.get() like this:
def gethostname(ip_address):
with open("aaa.txt") as f:
data = [i.strip() for i in f if i.strip() != '']
return dict([i.split() for i in data if len(i.split()) == 2]).get(ip_address, "Unknown host")
Demo:
def gethostname(ip_address):
with open("aaa.txt") as f:
data = [i.strip() for i in f if i.strip() != '']
return dict([i.split() for i in data if len(i.split()) == 2]).get(ip_address, "Unknown host")
print gethostname('194.2.3')
print gethostname('192.168.2.253')
print gethostname('127.0.0.1')
Output:
Unknown host
pyschools
localhost
And to pass the quiz, here is another version from here:
def gethostname_split(ip_address):
fh = open('/tmp/hosts', 'r')
columns = {}
for line in fh.readlines():
if not line.startswith('#'):
tokens = line.split()
if len(tokens) > 1:
columns[tokens[0]] = tokens[1]
print columns
try:
return columns[ip_address]
except KeyError:
return 'Unknown host'
This code won't fail even if there is a line with an ip and no host name or a blank line.
And guys, all your strip and rstrip are useless since you use split !
def gethostname(ip_address):
w = open("aaa.txt")
for line in w:
line = line.split()
if line:
ip = line[0]
if ip_address == ip and len(line)>1:
return line[1]
return "Unknown host"
If you want some advice, you can use split only one time, and with no argument, since its default argument is ' \t\n' (and eventually more whitespace characters).
And else: continue is just useless at the end of a loop. Finally, you seem to have some problems in your indentation.
def gethostname(ip_address):
w = open("aaa.txt")
for line in w:
k = line.split(" ")
for iIdx, val in enumerate(k):
k[iIdx] = val.strip()
if ip_address in k:
return k[-1].strip()
return "Unknown host"
I am using following function of a Class to find out if every .csv has corresponding .csv.meta in the given directory.
I am getting "None " for file which are just .csv and hexadecimal code for .csv.meta.
Result
None
<_sre.SRE_Match object at 0x1bb4300>
None
<_sre.SRE_Match object at 0xbd6378>
This is code
def validate_files(self,filelist):
try:
local_meta_file_list = []
local_csv_file_list = []
# Validate each files and see if they are pairing properly based on the pattern *.csv and *.csv.meta
for tmp_file_str in filelist:
csv_match = re.search(self.vprefix_pattern + '([0-9]+)' + self.vcsv_file_postfix_pattern + '$' , tmp_file_str)
if csv_match:
local_csv_file_list.append(csv_match.group())
meta_file_match_pattern=self.vprefix_pattern + csv_match.group(1) + self.vmeta_file_postfix_pattern
tmp_meta_file = [os.path.basename(s) for s in filelist if meta_file_match_pattern in s]
local_meta_file_list.extend(tmp_meta_file)
except Exception, e:
print e
self.m_logger.error("Error: Validate File Process thrown exception " + str(e))
sys.exit(1)
return local_csv_file_list, local_meta_file_list
These are file names.
File Names
rp_package.1406728501.csv.meta
rp_package.1406728501.csv
rp_package.1402573701.csv.meta
rp_package.1402573701.csv
rp_package.1428870707.csv
rp_package.1428870707.meta
Thanks
Sandy
If all you need is to find .csv files which have corresponding .csv.meta files, then I don’t think you need to use regular expressions for filtering them. We can filter the file list for those with the .csv extension, then filter that list further for files whose name, plus .meta, appears in the file list.
Here’s a simple example:
myList = [
'rp_package.1406728501.csv.meta',
'rp_package.1406728501.csv',
'rp_package.1402573701.csv.meta',
'rp_package.1402573701.csv',
'rp_package.1428870707.csv',
'rp_package.1428870707.meta',
]
def validate_files(file_list):
loc_csv_list = filter(lambda x: x[-3:].lower() == 'csv', file_list)
loc_meta_list = filter(lambda c: '%s.meta' % c in file_list, loc_csv_list)
return loc_csv_list, loc_meta_list
print validate_files(myList)
If there may be CSV files that don’t conform to the rp_package format, and need to be excluded, then we can initially filter the file list using the regex. Here’s an example (swap out the regex parameters as necessary):
import re
vprefix_pattern = 'rp_package.'
vcsv_file_postfix_pattern = '.csv'
regex_str = vprefix_pattern + '[0-9]+' + vcsv_file_postfix_pattern
def validate_files(file_list):
csv_list = filter(lambda x: re.search(regex_str, x), file_list)
loc_csv_list = filter(lambda x: x[-3:].lower() == 'csv', csv_list)
loc_meta_list = filter(lambda c: '%s.meta' % c in file_list, loc_csv_list)
return loc_csv_list, loc_meta_list
print validate_files(myList)