I am working in Python, but this is a general design question so general answers are welcome. I will explain the context not as part of the question, but to serve as an example:
I have a script which receives a CSV file, it uses the fields in this file to make automated phone calls. The fields represent names to be spoken, dates to be spoken, and the phone numbers to call. For example, input like "555-555-4321,Bob,Jill,3/30/2011" a phone call might be placed to 555-555-4321 and a robotic message delivered saying "Bob, don't forget Jills birthday is next Wednesday, one week from now."
My question is what design patterns would be useful for making this system configurable? More specifically, I'd like to specify what the format the input lines take, and some behaviors for generating the voice message. Some fields, like "Bob," can be as simple as "speak the field." Other fields, like the date, require some transformation in order to be spoken (ie, how does "3/30/2011" become "next Wednesday"). I'd also like to have various line formats, for example, input such as "555-555-4321,Bob,6:00" might call Bob every day at 6:00 and say "wake up!"
My goal is to have a web interface which allows defining and configuring these types of things. I know how to solve these problems by hacking my source code, but hacking the source code is a long ways from a simple and user friendly front-end.
I'm solving a related but not identical problem currently.
My solution is to create a control list of the same length as the target csv lines, where each element in the control list is the name of a useMethod. In my case the useMethod is an editor widget; in your case, it would be a function that defines how the field is interpreted by your text-to-speech engine. For each line, you can then iterate over the fields, calling the appropriate processing widget.
So for your example of "555-555-4321,Bob,Jill,3/30/2011",
import csv
def phoneNumber(number):
...
def userName(name):
...
def targetDate(datestring):
...
control = [phoneNumber, userName, userName, targetDate]
with open("csvFile", "r") as inFile:
reader = csv.reader(inFile)
for row in reader:
for op, item in zip(control, row):
op(item)
I note that this only works if the csv file has constant interpretation per element, but if it has variant interpretation then a csv file is the wrong storage method. I also note that you'd need some other control object to generate the rest of the sentence; this is left as an exercise for the reader. :)
This allows you to have a library of interpreter functions that can be assigned to fields in the csv file by simply changing the control string. A new control string would invoke a different order of field interpretations with no need to change the source code, and the new string could be entered on command line, stored in the first line of the csv file, or brought in some other way.
edit: And noting your addendum on using a web interface to configure, that would be a straight-forward way to provide a new control list.
Related
I recently started coding, but took a brief stint. I started a new job and I’m under some confidential restrictions. I need to make sure python and pandas are secure before I do this—I’ll also be talking with IT on Monday
I was wondering if pandas in python was a local library, or does the data get sent to or from elsewhere? If I write something in pandas—will the data be stored somewhere under pandas?
The best example of what I’m doing is best found on a medium article about stripping data from tables that don’t have csv Exports.
https://medium.com/#ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58
Creating a DataFrame out of a dict, doing vectorized operations on its rows, printing out slices of it, etc. are all completely local. I'm not sure why this matters. Is your IT department going to say, "Well, this looks fishy—but some random guy on the internet says it's safe, so forget our policies, we'll allow it"? But, for what it's worth, you have this random guy on the internet saying it's safe.
However, Pandas can be used to make network requests. Some of the IO functions can take a URL instead of a filename or file object. Some of them can also use another library that does so—e.g., if you have lxml installed, read_html, will pass the filename to lxml to open, and if that filename is an HTTP URL, lxml will go fetch it.
This is rarely a concern, but if you want to get paranoid, you could imagine ways in which it might be.
For example, let's say your program is parsing user-supplied CSV files and doing some data processing on them. That's safe; there's no network access at all.
Now you add a way for the user to specify CSV files by URL, and you pass them into read_csv and go fetch them. Still safe; there is network access, but it's transparent to the end user and obviously needed for the user's task; if this weren't appropriate, your company wouldn't have asked you to add this feature.
Now you add a way for CSV files to reference other CSV files: if column 1 is #path/to/other/file, you recursively read and parse path/to/other/file and embed it in place of the current row. Now, what happens if I can give one of your users a CSV file where, buried at line 69105, there's #http://example.com/evilendpoint?track=me (an endpoint which does something evil, but then returns something that looks like a perfectly valid thing to insert at line 69105 of that CSV)? Now you may be facilitating my hacking of your employees, without even realizing it.
Of course this is a more limited version of exactly the same functionality that's in every web browser with HTML pages. But maybe your IT department has gotten paranoid and clamped down security on browsers and written an application-level sniffer to detect suspicious followup requests from HTML, and haven't thought to do the same thing for references in CSV files.
I don't think that's a problem a sane IT department should worry about. If your company doesn't trust you to think about these issues, they shouldn't hire you and assign you to write software that involves scraping the web. But then not every IT department is sane about what they do and don't get paranoid about. ("Sure, we can forward this under-1024 port to your laptop for you… but you'd better not install a newer version of Firefox than 16.0…")
I have a text file that contains something like this :
host host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
host another_host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
I want my program to detect the line with the 'host' and then modify the content of the block according to what I type.
When I do the following (for example with request.form.get('name') in flask):
#random inputs
host = name2
comment = nothing
hardware = 00:00:00:00:00:00
address = 192.168.101.123
I would like to have :
host host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
#after the change
host name2 {
# nothing
hardware ethernet 00:00:00:00:00:00;
fixed-address 192.168.101.123;
}
I don't have a problem with regex but rather the program that I have to do in order to achieve this, so how can I do it?
If you start to cod e the way you are thinking about your problem, you will likely have a complete and utter mess no ne can change or maintain, even if it works at first.
You have several different small tasks, and you are thinking of them "as one thing".
No. You are using Falsk to provide you an easy and light-weight web interface. That is ok. You already know how to get a text block from it. You don't need to ask anything about Flask now. Nor tio put any further code in the same place the code that gets data from the web is.
Instead just write some plain Python functions that will get your textual data as parameters, and then update the configuration files for you.
And while at that, if you can pick an special template and create a new config file when doing this, instead of trying to parse an existing file, and update your desired values in place, then, this is something you can achieve.
Parsing a "real world" config file in place and live update it is not an easy task. Actually it can be so complicated that most Linux distributions skipped trying that for more than 10 years.
And then you have a further complication you don't mention: you probably want to keep any configurations you are not changing on the file. I was to advise you to keep a template of the file, and fill in your data, creating a new file on each run. But that would require you to have all the other config data in some other format, which would basically duplicate your problem.
So, ok, your idea of "getting data from the original file" with regular expressions might be a go. But still, keep it separate from writing back the file. And don't think in "lines" if said file is structured in blocks.
One feasible thing would be to read the file, get the data you are interested in into a Python data-structure (for example, a list of dictionaries, each one having your host_name, comment, ethernet and ip fields). And, in a second apply of the same regex, change all those for placeholders , so that the file contents could be filled back in by a call to the .format method, or using Flask's jinja2 templating.
Separating the above in 2 functions will even allow you to present all the configured hosts on your web interface, so the user can edit then individually without having to type ethernet addresses by hand.
Sorry, but i won be writing all this code for you. I hope the above can help you think about a proper approach there. So, if later you come up with other questions, with some code from your attempts, we can help you further.
So I made a python phonebook program which allows the user to add contacts, change contact info, delete contacts, etc. and write this data to a text file which I can read from every time the program is opened again and get existing contact data. However, in my program, I write to the text file in a very specific manner so I know what the format is and I can set it up to be read very easily. Since it is all formatted in a very specific manner, I want to prevent the user from opening the file and accidentally messing the data up with even just a simple space. How can I do this?
I want to prevent the user from opening the file and accidentally messing the data up...
I will advise you not to prevent users from accessing their own files. Messing with file permissions might result in some rogue files that the user won't be able to get rid of. Trust your user. If they delete or edit a sensitive file, it is their fault. Think of it this way - you have plenty of software installed on your own computer, but how often do you open them in an editor and make some damaging changes? Even if you do edit these files, does the application developer prevent you from doing so?
If you do intent to allow users to change/modify that file give them a good documentation on how to do it. This is the most apt thing to do. Also, make a backup file during run-time (see tempfile below) as an added layer of safety. Backups are almost always a good idea.
However, you can take some precautions to hide that data, so that users can't accidentally open them in an editor by double-clicking on it. There are plenty of options to do this including
Creating a binary file in a custom format
Zipping the text file using zipfile module.
Using tempfile module to create a temporary file, and zipping it using the previous option. (easy to discard if no changes needs to be saved)
Encryption
Everything from here on is not about preventing access, but about hiding the contents of your file
Note that all the above options doesn't have to be mutually exclusive. The advantages of using a zip file is that it will save some space, and it is not easy to read and edit in a text editor (binary data). It can be easily manipulated in your Python Script:
with ZipFile('spam.zip') as myzip:
with myzip.open('eggs.txt') as myfile:
print(myfile.read())
It is as simple as that! A temp file on the other hand, is a volatile (delete=True/False) file and can be discarded once you are done with it. You can easily copy its contents to another file or zip it before you close it as mentioned above.
with open tempfile.NamedTemporaryFile() as temp:
temp.write(b"Binary Data")
Again, another easy process. However, you must zip or encrypt it to achieve the final result. Now, moving on to encryption. The easiest way is an XOR cipher. Since we are simply trying to prevent 'readability' and not concerned about security, you can do the following:
recommended solution (XOR cipher):
from itertools import cycle
def xorcize(data, key):
"""Return a string of xor mutated data."""
return "".join(chr(ord(a)^ord(b)) for a, b in zip(data, cycle(key)))
data = "Something came in the mail today"
key = "Deez Nuts"
encdata = xorcize(data, key)
decdata = xorcize(encdata, key)
print(data, encdata, decdata, sep="\n")
Notice how small that function is? It is quite convenient to include it in any of your scripts. All your data can be encrypted before writing them to a file, and save it using a file extension such as ".dat" or ".contacts" or any custom name you choose. Make sure it is not opened in an editor by default (such as ".txt", ".nfo").
It is difficult to prevent user access to your data storage completely. However, you can either make it more difficult for the user to access your data or actually make it easier not to break it. In the second case, your intention would be to make it clear to the user what the rules are hope that not destroying the data is in the user's own best interest. Some examples:
Using a well established, human-readable serialization format, e.g. JSON. This is often the best solution as it actually allows an experienced user to easily inspect the data, or even modify it. Inexperienced users are unlikely to mess with the data anyways, and an experienced user knowing the format will follow the rules. At the same time, your parser will detect inconsistencies in the file structure.
Using a non-human readable, binary format, such as Pickle. Those files are likely to be left alone by the user as it is pretty clear that they are not meant to be modified outside the program.
Using a database, such as MySQL. Databases provide special protocols for data access which can be used to ensure data consistency and also make it easier to prevent unwanted access.
Assuming that you file format has a comment character, or can be modified to have one, add these lines to the top of your text file:
# Do not edit this file. This file was automatically generated.
# Any change, no matter how slight, may corrupt this file beyond repair.
The contact file belongs to your user, not to you. The best you can do is to inform the user. The best you can hope for is that the user will make intelligent use of your product.
I think the best thing to do in your case is just to choose a new file extension for your format.
It obviously doesn't prevent editing, but it clearly states for user that it has some specific format and probably shouldn't be edited manually. And GUI won't open it by default probably (it will ask what to edit it with).
And that would be enough for any case I can imagine if what you're worrying about is user messing up their own data. I don't think you can win with user who actively tries to mess up their data. Also I doubt any program does anything more. The usual "contract" is that user's data is, well, user's so it can be destroyed by the user.
If you actually won't to prevent editing you could change permissions to forbid editing with os.chmod for example. User would still be able to lift them manually and there will be some time window when you are actually writing, so it will be neither clean nor significantly more effective. And I would expect more trouble than benefit from such a solution.
If you want to actually make it impossible for a user to read/edit a file you can run your process from a different user (or use some heavier like SELinux or other MAC mechanism) and so you could make it really impossible to damage the data (with user's permissions). But it is not worth the effort if it is only about protecting the user from the not-so-catastophic effects of being careless.
Q = question
I want to access information from a csv file, for example if the file included:
NAME|DATE OF BIRTH|PHONE NUMBER|
xxx|yyyyyyyyyyyyy| zzzzzzzzzzz|
xxx|yyyyyyyyyyyyy| zzzzzzzzzzz|
xxx|yyyyyyyyyyyyy| zzzzzzzzzzz|
How would I ask a question, example:
(pseudo code)
what is the name of the person you are looking for? :
And for python to get the information and display it?
Without writing all your code for you...
Break your problem down into the different components you need to implement:
Read a .CSV file
Organize the data for retrieval (one or more dicts, keyed by each field you might want to use to look up records, is probably the simplest solution)
Get user input ("ask a question")
Retrieve the requested data
Display the requested data to the user
Figure out (i.e. research) how to implement each of these individually, and then you can write your program to do that. If you then have a specific problem with a specific piece, come back to SO with the code you've written and explain the trouble you're having.
Basically what I am trying to accomplish is have users be able to type in a certain word on one cgi script (which I currently have) and then it will save that entry in a list and display that word and the whole list on the other page. Also I will save it into a .txt file but first I am trying to figure out how to display the whole list. Right now it is only showing the keyword the user enters.
There's no way your code could ever accumulate a list of keywords over multiple posts. Firstly, CGI scripts have no state, so they will start from a blank list each time. And even if that wasn't true, you explicitly reset keywords to the blank list each time anyway.
You will need to store the list somewhere between runs. A text file will work, but only if you can guarantee that only one user will be accessing it at any one time.
Since you're new to CGI scripts, I've no idea why you are trying to learn them. There's very little good reason to use them these days. Really, you should drop the CGI scripts, use a web framework (a micro-framework like Flask would suit you), and store the list in a database (again, an unstructure "no-sql" store might be good for you).