Localization and future-proofing - python

so I know this is a not a good question for the StackOverflow format. but I am at a bit of a loss, feel free to recommend a different place to put this where it may get some eyeballs
I am very familliar with the xgettext stuff for localizing your python program. currently we are saving user preferences as strings (eg: user_prefs = {'temperature':u'\u00b0C Fahrenheit', 'volumetric':'Cubic inches', ...} )
this works fine, and when operating in foreign languages we attempt(mostly successfully) to reverse the string localization back to english before saving it. so that the saved string is always the english version, and then we localize them when we start the program.
however it leads to problems later if we change 'Cubic inches' to u'in\u00b3' or even a small change like 'Cubic inches' to 'Cubic Inches'. we have somewhat compensated for this by always doing comparisons in lower case, but it feels hackish to me, and it seems like there must be a better way to do it such as saving an id (be it an index into an array or even just some unique identifier). but wondering about others experiences here with regards to future proofing user preferences so that they will always be recognized.
I suspect this will get closed (asking for subjective answers) but maybe I can get some good insight before it does

It sounds like you are trying to use the English representation of your text as a unique ID for the message, but then when you change the English representation, it no longer matches the IDs in your previously stored preference files. The solution is to use a unique and permanent ID for each message. This ID could be English-readable, but you have to commit to never changing it. It's probably helpful to use a simple and standardized naming convention for this ID, with no unicode or uppercase characters. For example, one message ID might be 'cubic inches'. Another could be 'degrees fahrenheit'
Then, you should define internationalized text for displaying this message in every language, including English. So if you want to display the 'cubic inches' message on an English system, you will lookup the English equivalent for this ID and get 'Cubic inches', 'Cubic Inches', u'in\u00b3' or whatever you like. Then your application can display that text. But you will always store the permanent message ID ('cubic inches') in the preferences file. This gives you the flexibility to change the English-language representation of the message, as shown to the user, without invalidating the IDs in previously-stored preference files.

i don't get your actual problem, where are you storing.
but hope "base64 utf encoding decoding" can solve your problem.

Related

Asking for help breaking down a piece of code --- Head First Python 2nd Edition (11/9/2022) pg 102

https://prnt.sc/B4pFd_w5reM0
<<< photo of page
https://prnt.sc/bfCN6MN3P9DM
<<< screenshot of code
favorite_languages = {
'jen': 'python',
'sarah': 'c',
'edward': 'ruby',
'phil': 'python',
}
friends = ['phil', 'sarah']
for name in favorite_languages.keys():
print(f"Hi {name.title()}.")
if name in friends:
language = favorite_languages[name].title()
print(f"\t{name.title()}, I see you love {language}!")
I am not sure why [name] is in brackets instead of language. We are interested in the language they prioritize and not who the person is (for the value 'language' in specific at least). So I am wondering why the brackets have 'name' inside them, and not 'language'.
Also.... could someone breakdown what is happening in this code? I think I am just lost in general.
First we have a dictionary called favorite_languages
first column are keys, second column are values
going down, we have a list called friends, I think the words inside are called values.
for name in favorite_languages.keys():
this line of code says tells the editor? right? that the keys (first column), each of the keys will be categorized as a 'name'. correct? the keys in the dictionary (named favorite_languages).
print(f"Hi {name.title()}.")
line of code says we will print a message. Hi (with a name pulled from the dictonary with the first letter capitalized by the (.title() command) Just not sure how to describe the f... all I know is that it is needed... and that it exists.
if name in friends:
as the names from the dictonary get pulled, this line of code tells the editor to check if the name pulled is identical to one of the values in the list 'friends'.
language = favorite_lanugages[name].title()
this is where I am stuck. the name pulled from the dictionary, is the same as the value in the list.... so we tell the editor to put the name back into the dictonary... we find the value that is next to it? and then that value gets its first letter capitalized and becomes known as the value 'language'? or is it a variable? and not a value...
print(f"\t{name.title()}, I see you love {language}!")
coming back full circle. We are going to print on a new line. The message starts with the name we pulled from the dictionary that is also identical to a value found in the list. The name has it's first letter capitalized. Text is added ', I see you love '. Then we add the value we created earlier called 'lanugage' and add an exclaimation point after it. Close the quotation marks and close the parenthesis.
What did I miss? Is my thought process right or wrong? Am I on the right track?
You are asking several questions at once, so I will answer them one at a time.
I will preface my answer by saying this:
Python is a programming language. A programming language consists of instructions that are provided to a computer program. The computer program simply reads one instruction at a time, and performs some work according to the contents of the instruction. Therefore, programming languages have strict rules about every operation that they support. The programming language provides a set of operations that it supports, and we build programs by combining those operations.
Moreover, programs need to be converted from text into a version of those instructions that the computer understands. Therefore the text must also follow a specific and clearly-defined syntax. The meaning of something like [] is and must be unambiguous, otherwise it would be impossible for the computer to interpret our programs.
The important conclusion here is that computer programs must be expressed in a specific way, and that every piece of text has a specific purpose and meaning. Python does not and cannot ever understand what your intentions are. Programming consists of taking your ideas and translating them into the specific operations and syntax provided by the programming language.
I write this because you seem to be taking an interpretive approach to reading code, and you seem to be lacking the fundamental knowledge required to understand it properly. You seem to be trying to guess at the meaning of code by broadly matching it up to how it looks in English. As nice as it is that Python looks like plain English text, it is not plain English text. It is a programming language like any other, and its rules are as strict as any other. If you don't know what something means, resist the temptation to guess at what it means.
I strongly suggest finding a structured learning resource, such as an online course or a book. It will guide you through all of the concepts, syntax elements, and data types that you will need to learn programming. You must learn what these things are. You will never be able to read or write code by interpreting and guessing based on visual patterns.
I am not sure why [name] is in brackets instead of language. We are interested in the language they prioritize and not who the person is (for the value 'language' in specific at least). So I am wondering why the brackets have 'name' inside them, and not 'language'.
It is in brackets because the names are the keys of the dictionary. The favorite_languages is a lookup table from names to languages. It is not a lookup table from languages to names.
In general, a dictionary is a one-way lookup table. The things on the left of : are the "keys", and the things on the right are the "values". The keys must be unique, but the values can be non-unique. The [] syntax performs lookups from keys to values only.
What you are "interested in" is relevant only insofar as it is the goal of your program. You do not get to choose what [] means, based on what you are interested in. Its meaning is built into Python and cannot be changed without modifying Python itself. You must use the tools that you are given, and cannot use tools that do not exist.
first we have a dictionary called favorite_languages first column are keys, second column are values
It's not helpful to think of a dictionary as "columns", like a spreadsheet. Think of it as a lookup table. That's why it's called a dictionary. The keys of the dictionary are like words in a traditional dictionary, and the values are like definitions.
It is important to have a clear understanding of what dictionaries do. Otherwise you won't understand when or how to use them, and you won't understand code that uses them. That might be what is happening here.
going down, we have a list called friends, I think the words inside are called values.
This is a somewhat different usage of the term "value".
Outside the context of dictionaries, "value" does not have a strict technical meaning. The word is usually used to distinguish values, which are tangible things, from variables, which are placeholders for things and do not mean anything on their own.
So when we talk about the values in a list, we are just talking about the things inside the list. People also use the word elements to mean the same thing.
editor? right?
I assume you are talking about an "editor" as in a "text editor" or "code editor". No. That's just a program that you use to edit text. It does not run your code. Some code editors like VS Code have built-in features that help you write code. But your code never "tells" your text editor anything. The code exists to be interpreted by Python. Anything else is just an attempt to help you write the code.
for name in favorite_languages.keys(): this line of code says tells the editor? right? that the keys (first column), each of the keys will be categorized as a 'name'. correct? the keys in the dictionary (named favorite_languages).
No. This code does not categorize anything.
The .keys() method accesses the keys of a dictionary (as explained above). This loops over favorite_languages.keys() and assigns the looped values to name.
The syntax for x in things: loops over the elements of things. It assigns the first element to x, then runs the code in the block, then assigns the second element to x, then runs the code in the block, etc. Loops are an essential tool in programming, and it is very important that you spend time on understanding how they work and how to use them.
print(f"Hi {name.title()}.")
Your understanding seems correct.
name is the key from favorite_languages.keys(), because of for name in favorite_languages.keys():. We also happen to know that this corresponds to people's names.
if name in friends:
Correct. name in friends is an expression that returns True if and only if the value of name is equal to one of the elements in friends.
The if then executes the code inside the block if and only if the result is True (or is equivalent to True in a specific sense that you don't need to worry about right now).
language = favorite_languages[name].title()
This code does not put anything back into the dictionary. This code does not operate by finding values "next to" anything.
favorite_languages is a dictionary, i.e. a lookup table. This code uses the value of name to look up a language in favorite_languages.
Finally, it assigns the result to the language variable. From the perspective of Python, this variable is completely unrelated to the favorite_languages. It is a completely separate placeholder.
The fact that they both say "languages" is relevant to you, the reader and programmer, but not relevant to Python.
print(f"\t{name.title()}, I see you love {language}!")
\t is a tab characdter, not a new line.
Otherwise yes, your interpretation is correct.

How to prevent user from saving comma at integer field in whole Odoo system?

I noticed that whenever you type comma in integer field, odoo (or python) automatically removes that comma and merges numbers (for example whenever you type 1,3 it will become 13). If I type 1.3 or 1;3 etc. everything is fine. Probably I could do something like #api.constrains for a field, but how I could fix this for whole system?
Thank you for your time considering my question.
This depends on the language the user has chosen. Under Settings/Translations/Languages (in newer versions of Odoo you first need to activate developer mode to see this menu) chose a language and look into "Thousands Separator" and "Decimal Separator".
For example using German language it's the other way around, because we use comma as decimal separator and dot as thousands separator.
So nothing to program here, it's just a configuration issue.

Django - short non-linear non-predictable ID in the URL

I know there are similar questions (like this, this, this and this) but I have specific requirements and looking for a less-expensive way to do the following (on Django 1.10.2):
Looking to not have sequential/guessable integer ids in the URLs and ideally meet the following requirements:
Avoid UUIDs since that makes the URL really long.
Avoid a custom primary key. It doesn’t seem to work well if the models have ManyToManyFields. Got affected by at least three bugs while trying that (#25012, #24030 and #22997), including messing up the migrations and having to delete the entire db and recreating the migrations (well, lots of good learning too)
Avoid checking for collisions if possible (hence avoid a db lookup for every insert)
Don’t just want to look up by the slug since it’s less performant than just looking up an integer id.
Don’t care too much about encrypting the id - just don’t want it to
be a visibly sequential integer.
Note: The app would likely have 5 million records or so in the long term.
After researching a lot of options on SO, blogs etc., I ended up doing the following:
Encoding the id to base32 only for the URLs and decoding it back in urls.py (using an edited version of Django’s util functions to encode to base 36 since I needed uppercase letters instead of lowercase).
Not storing the encoded id anywhere. Just encoding and decoding everytime on the fly.
Keeping the default id intact and using it as primary key.
(good hints, posts and especially this comment helped a lot)
What this solution helps achieve:
Absolutely no edits to models or post_save signals.
No collision checks needed. Avoiding one extra request to the db.
Lookup still happens on the default id which is fast. Also, no double save()requests on the model for every insert.
Short and sweet encoded ID (the number of characters go up as the number of records increase but still not very long)
What it doesn’t help achieve/any drawbacks:
Encryption - the ID is encoded but not encrypted, so the user may
still be able to figure out the pattern to get to the id (but I dont
care about it much, as mentioned above).
A tiny overhead of encoding and decoding on each URL construction/request but perhaps that’s better than collision checks and/or multiple save() calls on the model object for insertions.
For reference, looks like there are multiple ways to generate random IDs that I discovered along the way (like Django’s get_random_string, Python’s random, Django’s UUIDField etc.) and many ways to encode the current ID (base 36, base 62, XORing, and what not).
The encoded ID can also be stored as another (indexed) field and looked up every time (like here) but depends on the performance parameters of the web app (since looking up a varchar id is less performant that looking up an integer id). This identifier field can either be saved from a overwritten model’s save() function, or by using a post_save() signal (see here) (while both approaches will need the save() function to be called twice for every insert).
All ears to optimizations to the above approach. I love SO and the community. Everytime there’s so much to learn here.
Update: After more than a year of this post, I found this great library called hashids which does pretty much the same thing quite well! Its available in many languages including Python.

Is using strings as an object identifier bad practice?

I am developing a small app for managing my favourite recipes. I have two classes - Ingredient and Recipe. A Recipe consists of Ingredients and some additional data (preparation, etc). The reason i have an Ingredient class is, that i want to save some additional info in it (proper technique, etc). Ingredients are unique, so there can not be two with the same name.
Currently i am holding all ingredients in a "big" dictionary, using the name of the ingredient as the key. This is useful, as i can ask my model, if an ingredient is already registered and use it (including all it's other data) for a newly created recipe.
But thinking back to when i started programming (Java/C++), i always read, that using strings as an identifier is bad practice. "The Magic String" was a keyword that i often read (But i think that describes another problem). I really like the string approach as it is right now. I don't have problems with encoding either, because all string generation/comparison is done within my program (Python3 uses UTF-8 everywhere if i am not mistaken), but i am not sure if what i am doing is the right way to do it.
Is using strings as an object identifier bad practice? Are there differences between different languages? Can strings prove to be an performance issue, if the amount of data increases? What are the alternatives?
No -
actually identifiers in Python are always strings. Whether you keep then in a dictionary yourself (you say you are using a "big dictionary") or the object is used programmaticaly, with a name hard-coded into the source code. In this later case, Python creates the name in one of its automaticaly handled internal dictionary (that can be inspected as the return of globals() or locals()).
Moreover, Python does not use "utf-8" internally, it does use "unicode" - which means it is simply text, and you should not worry how that text is represented in actual bytes.
Python relies on dictionaries for many of its core features. For that reason the pythonic default dict already comes with a quite effective, fast implementation "from factory", decent hash, etc.
Considering that, the dictionary performance itself should not be a concern for what you need (eventual calls to read and write on it), although the way you handle it / store it (in a python file, json, pickle, gzip, etc.) could impact load/access time, etc.
Maybe if you provide a few lines of code showing us how you deal with the dictionary we could provide specific details.
About the string identifier, check jsbueno's answer, he gave a much better explanation then I could do.

Is it possible to inject shell/python commands from a configuration file?

Say you have a some meta data for a custom file format that your python app reads. Something like a csv with variables that can change as the file is manipulated:
var1,data1
var2,data2
var3,data3
So if the user can manipulate this meta data, do you have to worry about someone crafting a malformed meta data file that will allow some arbitrary code execution? The only thing I can imagine if you you made the poor choice to make var1 be a shell command that you execute with os.sys(data1) in your own code somewhere. Also, if this were C then you would have to worry about buffers being blown, but I don't think you have to worry about that with python. If your reading in that data as a string is it possible to somehow escape the string "\n os.sys('rm -r /'), this SQL like example totally wont work, but is there similar that is possible?
If you are doing what you say there (plain text, just reading and parsing a simple format), you will be safe. As you indicate, Python is generally safe from the more mundane memory corruption errors that C developers can create if they are not careful. The SQL injection scenario you note is not a concern when simply reading in files in python.
However, if you are concerned about security, which it seems you are (interjection: good for you! A good programmer should be lazy and paranoid), here are some things to consider:
Validate all input. Make sure that each piece of data you read is of the expected size, type, range, etc. Error early, and don't propagate tainted variables elsewhere in your code.
Do you know the expected names of the vars, or at least their format? Make sure the validate that it is the kind of thing you expect before you use it. If it should be just letters, confirm that with a regex or similar.
Do you know the expected range or format of the data? If you're expecting a number, make sure it's a number before you use it. If it's supposed to be a short string, verify the length; you get the idea.
What if you get characters or bytes you don't expect? What if someone throws unicode at you?
If any of these are paths, make sure you canonicalize and know that the path points to an acceptable location before you read or write.
Some specific things not to do:
os.system(attackerControlledString)
eval(attackerControlledString)
__import__(attackerControlledString)
pickle/unpickle attacker controlled content (here's why)
Also, rather than rolling your own config file format, consider ConfigParser or something like JSON. A well understood format (and libraries) helps you get a leg up on proper validation.
OWASP would be my normal go-to for providing a "further reading" link, but their Input Validation page needs help. In lieu, this looks like a reasonably pragmatic read: "Secure Programmer: Validating Input". A slightly dated but more python specific one is "Dealing with User Input in Python"
Depends entirely on the way the file is processed, but generally this should be safe. In Python, you have to put in some effort if you want to treat text as code and execute it.

Categories