I have a text file that contains something like this :
host host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
host another_host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
I want my program to detect the line with the 'host' and then modify the content of the block according to what I type.
When I do the following (for example with request.form.get('name') in flask):
#random inputs
host = name2
comment = nothing
hardware = 00:00:00:00:00:00
address = 192.168.101.123
I would like to have :
host host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
#after the change
host name2 {
# nothing
hardware ethernet 00:00:00:00:00:00;
fixed-address 192.168.101.123;
}
I don't have a problem with regex but rather the program that I have to do in order to achieve this, so how can I do it?
If you start to cod e the way you are thinking about your problem, you will likely have a complete and utter mess no ne can change or maintain, even if it works at first.
You have several different small tasks, and you are thinking of them "as one thing".
No. You are using Falsk to provide you an easy and light-weight web interface. That is ok. You already know how to get a text block from it. You don't need to ask anything about Flask now. Nor tio put any further code in the same place the code that gets data from the web is.
Instead just write some plain Python functions that will get your textual data as parameters, and then update the configuration files for you.
And while at that, if you can pick an special template and create a new config file when doing this, instead of trying to parse an existing file, and update your desired values in place, then, this is something you can achieve.
Parsing a "real world" config file in place and live update it is not an easy task. Actually it can be so complicated that most Linux distributions skipped trying that for more than 10 years.
And then you have a further complication you don't mention: you probably want to keep any configurations you are not changing on the file. I was to advise you to keep a template of the file, and fill in your data, creating a new file on each run. But that would require you to have all the other config data in some other format, which would basically duplicate your problem.
So, ok, your idea of "getting data from the original file" with regular expressions might be a go. But still, keep it separate from writing back the file. And don't think in "lines" if said file is structured in blocks.
One feasible thing would be to read the file, get the data you are interested in into a Python data-structure (for example, a list of dictionaries, each one having your host_name, comment, ethernet and ip fields). And, in a second apply of the same regex, change all those for placeholders , so that the file contents could be filled back in by a call to the .format method, or using Flask's jinja2 templating.
Separating the above in 2 functions will even allow you to present all the configured hosts on your web interface, so the user can edit then individually without having to type ethernet addresses by hand.
Sorry, but i won be writing all this code for you. I hope the above can help you think about a proper approach there. So, if later you come up with other questions, with some code from your attempts, we can help you further.
Related
I recently started coding, but took a brief stint. I started a new job and I’m under some confidential restrictions. I need to make sure python and pandas are secure before I do this—I’ll also be talking with IT on Monday
I was wondering if pandas in python was a local library, or does the data get sent to or from elsewhere? If I write something in pandas—will the data be stored somewhere under pandas?
The best example of what I’m doing is best found on a medium article about stripping data from tables that don’t have csv Exports.
https://medium.com/#ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58
Creating a DataFrame out of a dict, doing vectorized operations on its rows, printing out slices of it, etc. are all completely local. I'm not sure why this matters. Is your IT department going to say, "Well, this looks fishy—but some random guy on the internet says it's safe, so forget our policies, we'll allow it"? But, for what it's worth, you have this random guy on the internet saying it's safe.
However, Pandas can be used to make network requests. Some of the IO functions can take a URL instead of a filename or file object. Some of them can also use another library that does so—e.g., if you have lxml installed, read_html, will pass the filename to lxml to open, and if that filename is an HTTP URL, lxml will go fetch it.
This is rarely a concern, but if you want to get paranoid, you could imagine ways in which it might be.
For example, let's say your program is parsing user-supplied CSV files and doing some data processing on them. That's safe; there's no network access at all.
Now you add a way for the user to specify CSV files by URL, and you pass them into read_csv and go fetch them. Still safe; there is network access, but it's transparent to the end user and obviously needed for the user's task; if this weren't appropriate, your company wouldn't have asked you to add this feature.
Now you add a way for CSV files to reference other CSV files: if column 1 is #path/to/other/file, you recursively read and parse path/to/other/file and embed it in place of the current row. Now, what happens if I can give one of your users a CSV file where, buried at line 69105, there's #http://example.com/evilendpoint?track=me (an endpoint which does something evil, but then returns something that looks like a perfectly valid thing to insert at line 69105 of that CSV)? Now you may be facilitating my hacking of your employees, without even realizing it.
Of course this is a more limited version of exactly the same functionality that's in every web browser with HTML pages. But maybe your IT department has gotten paranoid and clamped down security on browsers and written an application-level sniffer to detect suspicious followup requests from HTML, and haven't thought to do the same thing for references in CSV files.
I don't think that's a problem a sane IT department should worry about. If your company doesn't trust you to think about these issues, they shouldn't hire you and assign you to write software that involves scraping the web. But then not every IT department is sane about what they do and don't get paranoid about. ("Sure, we can forward this under-1024 port to your laptop for you… but you'd better not install a newer version of Firefox than 16.0…")
I would like to generate geoiplist.acl file from a csv file. acl file format:
acl "A1" {
31.14.133.39/32;
37.221.172.0/23;
acl "A2" {
5.145.149.142/32;
57.72.6.0/24;
......
the csv file: http://download.db-ip.com/free/dbip-country-2016-09.csv.gz
Here are sample lines from CSV file with IP_Start, IP_End and Country columns.
"0.0.0.0","0.255.255.255","US"
"1.0.0.0","1.0.0.255","AU"
"1.0.1.0","1.0.3.255","CN"
"1.0.4.0","1.0.7.255","AU"
"1.0.8.0","1.0.15.255","CN"
"1.0.16.0","1.0.31.255","JP"
"1.0.32.0","1.0.63.255","CN"
"1.0.64.0","1.0.127.255","JP"
"1.0.128.0","1.0.255.255","TH"
"1.1.0.0","1.1.0.255","CN"
I got some references from here: http://geoip.site/ but their acl don't have complete list.
Anyone can help me to do this in bash code please. Thanks in advance.
The issue here is that DB-IP provide the begin and end value of each range in human readable IP address format. Why they have done this, I'm not sure, because the more universal (easier to process) format is to simply present these values in integer form.
In any case, I have modified the Python script on http://geoip.site/ to handle this and included the DB-IP database URL within the script. The ACL file generated from their CSV file is now also available to download from http://geoip.site/download/DB-IP/GeoIP.acl
Note I have already identified some issues with this database:
The entry "::","2001:1ff:ffff:ffff:ffff:ffff:ffff:ffff","US" exists in it. This is obviously complete rubbish and also broke the Python script (I've improved the error detection code to handle this) and is also one of the reasons for point 4 below.
"224.0.0.0","255.255.255.255","CH" is an interesting entry. I'm not entirely sure how they have deemed the entire multicast block of the IPv4 address space to be delegated to Switzerland, or why it exists in their database at all.
The statistical analysis (available on http://geoip.site/) suggests that their DB/CSV file spans 100% of the IPv4 address space outside 224.0.0.0/3 (multicast). That's 3,758,096,384 addresses. But we already know that several address blocks should not exist in here, the obvious ones being 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 (and indeed others; further investigation reveals the entry "192.168.0.0","192.169.31.255","US" exists, which covers 192.168.0.0/16 and beyond). So this result looks questionable.
The statistical analysis also reports that their DB/CSV spans 100% of the IPv6 address space. This is primarily because they have mapped 3000::/4 (and various other smaller address blocks) to the US, which is wrong (see http://www.iana.org/assignments/ipv6-unicast-address-assignments/ipv6-unicast-address-assignments.xhtml where 3000::/4 is listed as RESERVED). This mapping originates from the entry "2c10::","ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff","US". The other 2 databases on http://geoip.site/ are nowhere close to this magnitude of coverage across the IPv6 address space (both are currently less than 0.1%), so this result also looks questionable.
Given all of the above, I would question the accuracy of their database and contact them about it. But feel free to download the http://geoip.site/download/DB-IP/GeoIP.acl file if you wish to use it.
Finally, I would not have even attempted this in BASH. The conversions required to produce this file from their CSV file are only available in more advanced languages like Python; BASH just wouldn't cut this (well, not my BASH).
I hope this has helped resolved your query/problem.
UPDATE
As of the December 2016 version of their database, DB-IP have introduced a ZZ acl to cover IPv4 networks that are outside the realm of being mapped to any specific country. This certainly resolves some of the issues I raised above.
acl ZZ {
0.0.0.0/8;
10.0.0.0/8;
100.64.0.0/10;
127.0.0.0/8;
169.254.0.0/16;
172.16.0.0/12;
192.0.0.8/29;
192.0.0.16/28;
192.0.0.32/27;
192.0.0.64/26;
192.0.0.128/25;
192.0.2.0/24;
192.88.99.0/24;
192.168.0.0/16;
198.18.0.0/15;
198.51.100.0/24;
203.0.113.0/24;
};
So I made a python phonebook program which allows the user to add contacts, change contact info, delete contacts, etc. and write this data to a text file which I can read from every time the program is opened again and get existing contact data. However, in my program, I write to the text file in a very specific manner so I know what the format is and I can set it up to be read very easily. Since it is all formatted in a very specific manner, I want to prevent the user from opening the file and accidentally messing the data up with even just a simple space. How can I do this?
I want to prevent the user from opening the file and accidentally messing the data up...
I will advise you not to prevent users from accessing their own files. Messing with file permissions might result in some rogue files that the user won't be able to get rid of. Trust your user. If they delete or edit a sensitive file, it is their fault. Think of it this way - you have plenty of software installed on your own computer, but how often do you open them in an editor and make some damaging changes? Even if you do edit these files, does the application developer prevent you from doing so?
If you do intent to allow users to change/modify that file give them a good documentation on how to do it. This is the most apt thing to do. Also, make a backup file during run-time (see tempfile below) as an added layer of safety. Backups are almost always a good idea.
However, you can take some precautions to hide that data, so that users can't accidentally open them in an editor by double-clicking on it. There are plenty of options to do this including
Creating a binary file in a custom format
Zipping the text file using zipfile module.
Using tempfile module to create a temporary file, and zipping it using the previous option. (easy to discard if no changes needs to be saved)
Encryption
Everything from here on is not about preventing access, but about hiding the contents of your file
Note that all the above options doesn't have to be mutually exclusive. The advantages of using a zip file is that it will save some space, and it is not easy to read and edit in a text editor (binary data). It can be easily manipulated in your Python Script:
with ZipFile('spam.zip') as myzip:
with myzip.open('eggs.txt') as myfile:
print(myfile.read())
It is as simple as that! A temp file on the other hand, is a volatile (delete=True/False) file and can be discarded once you are done with it. You can easily copy its contents to another file or zip it before you close it as mentioned above.
with open tempfile.NamedTemporaryFile() as temp:
temp.write(b"Binary Data")
Again, another easy process. However, you must zip or encrypt it to achieve the final result. Now, moving on to encryption. The easiest way is an XOR cipher. Since we are simply trying to prevent 'readability' and not concerned about security, you can do the following:
recommended solution (XOR cipher):
from itertools import cycle
def xorcize(data, key):
"""Return a string of xor mutated data."""
return "".join(chr(ord(a)^ord(b)) for a, b in zip(data, cycle(key)))
data = "Something came in the mail today"
key = "Deez Nuts"
encdata = xorcize(data, key)
decdata = xorcize(encdata, key)
print(data, encdata, decdata, sep="\n")
Notice how small that function is? It is quite convenient to include it in any of your scripts. All your data can be encrypted before writing them to a file, and save it using a file extension such as ".dat" or ".contacts" or any custom name you choose. Make sure it is not opened in an editor by default (such as ".txt", ".nfo").
It is difficult to prevent user access to your data storage completely. However, you can either make it more difficult for the user to access your data or actually make it easier not to break it. In the second case, your intention would be to make it clear to the user what the rules are hope that not destroying the data is in the user's own best interest. Some examples:
Using a well established, human-readable serialization format, e.g. JSON. This is often the best solution as it actually allows an experienced user to easily inspect the data, or even modify it. Inexperienced users are unlikely to mess with the data anyways, and an experienced user knowing the format will follow the rules. At the same time, your parser will detect inconsistencies in the file structure.
Using a non-human readable, binary format, such as Pickle. Those files are likely to be left alone by the user as it is pretty clear that they are not meant to be modified outside the program.
Using a database, such as MySQL. Databases provide special protocols for data access which can be used to ensure data consistency and also make it easier to prevent unwanted access.
Assuming that you file format has a comment character, or can be modified to have one, add these lines to the top of your text file:
# Do not edit this file. This file was automatically generated.
# Any change, no matter how slight, may corrupt this file beyond repair.
The contact file belongs to your user, not to you. The best you can do is to inform the user. The best you can hope for is that the user will make intelligent use of your product.
I think the best thing to do in your case is just to choose a new file extension for your format.
It obviously doesn't prevent editing, but it clearly states for user that it has some specific format and probably shouldn't be edited manually. And GUI won't open it by default probably (it will ask what to edit it with).
And that would be enough for any case I can imagine if what you're worrying about is user messing up their own data. I don't think you can win with user who actively tries to mess up their data. Also I doubt any program does anything more. The usual "contract" is that user's data is, well, user's so it can be destroyed by the user.
If you actually won't to prevent editing you could change permissions to forbid editing with os.chmod for example. User would still be able to lift them manually and there will be some time window when you are actually writing, so it will be neither clean nor significantly more effective. And I would expect more trouble than benefit from such a solution.
If you want to actually make it impossible for a user to read/edit a file you can run your process from a different user (or use some heavier like SELinux or other MAC mechanism) and so you could make it really impossible to damage the data (with user's permissions). But it is not worth the effort if it is only about protecting the user from the not-so-catastophic effects of being careless.
I have a python script on my webserver which simply prints out 2 to 5 words, one under the other.
Ruyterplaats
Civic Centre
Racecourse
Atlantis
What I need to do is the following:
open the url www.webserveraddress.com?variable1=variable2
get the words from each line and put them into an array
No need to display the webpage, I just need the words. Thats all.
Iv seen I can use things like Libxml2 and Hpple, but these are ObjC wrappers around other code. I'm not sure how Swift will cope with that.
I quite frankly have no idea where to start or even if I'm going about it the wrong or not :/
PS. I would post code but the python script is around 6500 lines :)
The quickest way to get the contents of a URL as a string is to use the constructor on NSString:
var contents = NSString(contentsOfURL: NSURL(string: "http://example.com"), encoding: NSUTF8StringEncoding, error: nil)
Then you can separate the contents into an array using componentsSeparatedByCharactersInSet:
var wordArray = contents.componentsSeparatedByCharactersInSet(NSCharacterSet.newlineCharacterSet())
Note: The server side technology doesn't matter at all, which is one of the best things about the HTTP protocol ;). That URl could return a static file for all the Swift code (or anyone else) will care.
To preface I'm very new to python (about 7 days) but I'm an experienced software eng undergrad.
I would like to send data between machines running python scripts. The idea I had (in order to simplify things) was to concatenate the data (strings & ints) into a string and do the parsing client-side.
The UDP packets send beautifully with simple strings but when I try to send useful data python always complains about the data I send; specifically python won't let me concatenate tuples.
In order to parse the data on the client I need to seperate the data with a dash character: '-'.
nodeList is of type dictionary where the key is a string and value is a double.
randKey = random.choice( nodeList.keys() )
data = str(randKey) +'-'+ str(nodeList[randKey])
mySocket.sendto ( data , address )
The code above produces the following error:
TypeError: coercing to Unicode: need string or buffer, tuple found
I don't understand why it thinks it is a tuple I am trying to concatenate...
So my question is how can I correct this to keep Python happy, or can someone suggest I better way of sending the data?
Thank you in advance.
I highly suggest using Google Protocol Buffers as implemented in Python as protobuf for this as it will handle the serialization on both ends of the line. It has Python bindings that will allow you to easily use it with your existing Python program.
Using your example code you would create a .proto file like so:
message SomeCoolMessage {
required string key = 1;
required double value = 2;
}
Then after generating, you can use it like so:
randKey = random.choice( nodeList.keys() )
data = SomeCoolMessage()
data.key = randKey
data.value = nodeList[randKey]
mySocket.sendto ( data.SerializeToString() , address )
I'd probably use the json module serialize the data.
You need to serialize the data. Pickle does this built in for you, and you can ask pickle for an ascii representation of the data vs binary data (see the docs), or you could use json (it also serializes the data for you) both are in the standard library. But really there are a hundred thousand different libraries that handle ALL the work for you, in getting data from 1 machine to another. I'd suggest using a library.
Depending on speed, etc. there are different trade offs for the various libraries. In the standard library you get HTTP, that's about it (well and raw sockets). But there are others.
If super fast speed is more important than other things..., zeroMQ, or google's protocol buffers might be valid options.
For me, I use rpyc usually, it lets me be totally lazy, and just call over to the other process across the network. It's fast enough usually.
You know that UDP has no guarantee that the data will ever show up on the other side, or that it will show up IN ORDER. for your application you may not care, I don't know, but just thought I'd bring it up.