Python convention for sub-delimiter - python

I am writing a script which takes a string as input and splits it into a list using the .split(sep = ',') function. Then, some of the items in the list will be split into sub-lists. For example:
input = 'my,string,1|2|3'
mylist = input.split(',')
mylist[2] = mylist[2].split('|')
print(mylist)
> ['my','string',['1','2','3']]
The code works without a problem. (I know which position in the list will have the sub-list.) My question is: Is there any convention in python for which delimiter should be used to separate a string which will eventually be converted to numbers (int or float). Assuming that ',' is already used as the first delimiter?
As the programmer, I can request the string to be formatted using whichever delimiters I like. But I will have many users, so if there is a convention for separating numerical values, I would like to follow it. Note that the numbers may be float values, so I do not want to use the characters 'hyphen' or 'period' as delimiters.

I should preface this by saying I have never heard of such a convention, but I like the question. The convention for nested lists in English is to use commas for the inner list and semi-colons for the outer list, e.g.:
I have eaten: eggs, bacon, and apple for breakfast; toast, tuna, and a
banana for lunch; and chicken, salad, and potatoes for dinner.
That convention suggests input = 'my;string;1,2,3;'.
I also like the idea of using newlines: input = 'my\nstring\n1,2,3\n'. It has the benefit of being easy to read from / write to CSV.

Related

How can I convert a 2d list from string into a python list?

I have a string, which is a 2d array, with the following fields [fruitname,qty,date,time]
Sample list:
['apples',1,'04-07-2022','16:35'],['oranges',5,'04-07-2022','18:35'],['mangoes',10,'04-07-2022','16:00']
I would like to store the above in a list in python (fruitsPurchaseList) and access it.
For example, if I wanted to get the quantity of mangoes purchased, I'd access it by something like:
mangoQty = fruitsPurchaseList[2][1]
EDIT:
The list also has some blanks.
Sample list:
['apples',1,'04-07-2022','16:35'],['oranges',5,'04-07-2022','18:35'],['mangoes',10,'04-07-2022','16:00'],['bananas',,'04-09-2022','11:00']
EDIT: original version used eval; new version uses literal_eval from the standard ast module.
If you have a string that represents a list of lists and want to turn that into a list,
you can use the built-in Python function eval()
(documentation). eval() accepts a string as an argument and
evaluates it as a Python expression (after parsing the string). The
result of the evaluated expression is returned. If you do use eval, you must be sure that the input source to eval is trusted.
you should (see discussion at end of post) use the function literal_eval (documentation) from the ast module from the Python Standard Library.
This code does what you want using literal_eval (eval achieves the same effect but see note below).
from pprint import pprint
from ast import literal_eval
# import re
s = """
[
['apples',1,'04-07-2022','16:35'],
['oranges',5,'04-07-2022','18:35'],
['mangoes',10,'04-07-2022','16:00'],
['bananas',,'04-09-2022','11:00']
]
"""
s = s.replace(",,", ",'',")
# s = re.sub(r', *,', ",'',", s)
l = literal_eval(s)
pprint(l)
print(f"\nMango Quantity: {l[2][1]}")
The above replaces all occurrences of ',,' with ",'',". If in the empty fields there are an arbitrary number of spaces between the commas (but nothing else), remove the s.replace line and uncomment the two commented-out lines. Using re.sub is a more general solution.
Output
[['apples', 1, '04-07-2022', '16:35'],
['oranges', 5, '04-07-2022', '18:35'],
['mangoes', 10, '04-07-2022', '16:00'],
['bananas', '', '04-09-2022', '11:00']]
Mango Quantity: 10
Safety Note
In many contexts, eval and literal_eval will achieve the same effect, as in this example. However, eval is known to be potentially very dangerous. eval takes its string argument, parses it and evaluates it as a Python expression. This can be an arbitrary Python expression. One concern is that Python permits users to access and delete system files (e.g. via the os module). So a bad-actor may supply dangerous data that you feed into eval, which could corrupt your system. This is why you need to be extra careful that you can trust the input source before supplying it to eval. literal_eval also accepts a string as an argument, but this is restricted to only contain (quote from literal_eval doc cited above)
Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis. This can be used for safely evaluating strings containing Python values from untrusted sources without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions[.]
In your case, you know that your string data represents a nested list that should only have numbers and strings as elements. So to be safe, use literal_eval, which would not evaluate a non-allowed expression. So if your list had elements that would normally evaluate to something dangerous (which eval would evaluate), literal_eval would not evaluate it -- it is restricted to Python literal structures. In this case, even if you know and absolutely trust your input source, there is no harm in using literal_eval instead.
Be safe!
Hope this helps. Please let me know if there are questions/concerns!
You can use append() to add any element to a list in python - this includes other lists! Use a for loop to loop through each string element and append it to your new list. Access the elements as you showed in your example - listname[element index in whole list][element index in sublist].
string = ['apples',1,'04-07-2022','16:35'],['oranges',5,'04-07-2022','18:35'],['mangoes',10,'04-07-2022','16:00']
fruitsPurchaseList = []
for s in string:
fruitsPurchaseList.append(s)
print(f"The fruits purchase list is: {fruitsPurchaseList}")
mangoQty = fruitsPurchaseList[2][1]
print(f"The mango quantity is: {mangoQty}")
Output is as follows:
The fruits purchase list is: $[['apples', 1, '04-07-2022', '16:35'], ['oranges', 5, '04-07-2022', '18:35'], ['mangoes', 10, '04-07-2022', '16:00']]
The mango quantity is: $10

How to separate user's input with two separators? And controlling the users input

I want to separate the users input using two different separators which are ":" and ";"
Like the user should input 4 subject and it's amounts. The format should be:
(Subject:amount;Subject:amount;Subject:amount;Subject:amount)
If the input is wrong it should print "Invalid Input "
Here's my code but I can only used one separator and how can I control the users input?
B = input("Enter 4 subjects and amount separated by (;) like Math:90;Science:80:").split(";")
Please help. I can't figure it out.
If you are fine with using regular expressions in python you could use the following code:
import re
output_list = re.split("[;:]", input_string)
Where inside the square brackets you include all the characters (also known as delimiters) that you want to split by, just make sure to keep the quotes around the square brackets as that makes a regex string (what we are using to tell the computer what to split)
Further reading on regex can be found here if you feel like it: https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285
However, if you want to do it without importing anything you could do this, which is another possible solution (and I would recommend against, but it gets the job done well):
input_string = input_string.replace(";", ":")
output_list = input_string.split(":")
Which works by first replacing all of the semicolons in the input string with colons (it could also work the other way around) and then splitting by the remaining character (in this case the colons)
Hope this helped, as it is my first answer on Stack overflow.

Python is there a way to count the number of string inserts into a string?

Say let's say there's a string in python like
'f"{var_1} is the same as {var_2}"'
or something like
"{} is the same as {}".format(var_1, var_2)
Is there a way to count the number of insertion strings that exist in the string?
I'm trying to create a function that counts the number of insertions in a string. This is because I have code for generating a middle name and it could generate 2 or 1 middle name and just to keep the code consistent I'd rather it count the number of insertions exists in the string.
you could use a regular expression:
import re
s = 'f"{var_1} is the same as {var_2}"'
len(list(re.finditer(r'{.+?}', s)))
output:
2
For simple cases you can just count the number of open braces
nsubst = "{var_1} is the same as {var_2}".count("{")
for complex cases this however doesn't work and there is no easy solution as you need to do a full parser of format syntax or handle quite a few special cases (the problem are for example escaped braces or nested field substitution in field format specs). Moreover for f-strings you're allowed quite a big subset of valid python expressions as fields, including literal nested dictionaries and things are even more complex.

Indexing the wrong character for an expression

My program seems to be indexing the wrong character or not at all.
I wrote a basic calculator that allows expressions to be used. It works by having the user enter the expression, then turning it into a list, and indexing the first number at position 0 and then using try/except statements to index number2 and the operator. All this is in a while loop that is finished when the user enters done at the prompt.
The program seems to work fine if I type the expression like this "1+1" but if I add spaces "1 + 1" it cannot index it or it ends up indexing the operator if I do "1+1" followed by "1 + 1".
I have asked in a group chat before and someone told me to use tokenization instead of my method, but I want to understand why my program is not running properly before moving on to something else.
Here is my code:
https://hastebin.com/umabukotab.py
Thank you!
Strings are basically lists of characters. 1+1 contains three characters, whereas 1 + 1 contains five, because of the two added spaces. Thus, when you access the third character in this longer string, you're actually accessing the middle element.
Parsing input is often not easy, and certainly parsing arithmetic expressions can get tricky quite quickly. Removing spaces from the input, as suggested by #Sethroph is a viable solution, but will only go that far. If you all of a sudden need to support stuff like 1+2+3, it will still break.
Another solution would be to split your input on the operator. For example:
input = '1 + 2'
terms = input.split('+') # ['1 ', ' 2'] note the spaces
terms = map(int, terms) # [1, 2] since int() can handle leading/trailing whitespace
output = terms[0] + terms[1]
Still, although this can handle situations like 1 + 2 + 3, it will still break when there's multiple different operators involved, or there are parentheses (but that might be something you need not worry about, depending on how complex you want your calculator to be).
IMO, a better approach would indeed be to use tokenization. Personally, I'd use parser combinators, but that may be a bit overkill. For reference, here's an example calculator whose input is parsed using parsy, a parser combinator library for Python.
You could remove the spaces before processing the string by using replace().
Try adding in:
clean_input = hold_input.replace(" ", "")
just after you create hold_input.

Looking for a strategy for parsing a file

I'm an experienced C programmer, but a complete python newbie. I'm learning python mostly for fun, and as a first exercise want to parse a text file, extracting the meaningful bits from the fluff, and ending up with a tab-delimited string of those bits in a different order.
I've had a blast plowing through tutorials and documentation and stackoverflow Q&As, merrily splitting strings and reading lines from files and etc. Now I think I'm at the point where I need a few road signs from experienced folks to avoid blind alleys.
Here's one chunk of the text I want to parse (you may recognize this as a McMaster order). The actual file will contain one or more chunks like this.
1 92351A603 Lag Screw for Wood, 18-8 Stainless Steel, 5/16" Diameter, 5" Long, packs of 5
Your Part Number: 7218-GYROID
22
packs today
5.85
per pack 128.70
Note that the information is split over several lines in the file. I'd like to end up with a tab-delimited string that looks like this:
22\tpacks\tLag Screw for Wood, 18-8 Stainless Steel, 5/16" Diameter, 5" Long, packs of 5\t\t92351A603\t5.85\t\t128.70\t7218-GYROID\n
So I need to extract some parts of the string while ignoring others, rearrange them a bit, and re-pack them into a string.
Here's the (very early) code I have at the moment, it reads the file a line at a time, splits each line with delimiters, and I end up with several lists of strings, including a bunch of empty ones where there were double tabs:
import sys
import string
def split(delimiters, string, maxsplit=0):
"""Split the given string with the given delimiters (an array of strings)
This function lifted from stackoverflow in a post by Kos"""
import re
regexPattern = '|'.join(map(re.escape, delimiters))
return re.split(regexPattern, string, maxsplit)
delimiters = "\t", "\n", "\r", "Your Part Number: "
with open(sys.argv[1], 'r') as f:
for line in f:
print(split( delimiters, line))
f.close()
Question 1 is basic: how can I remove the empty strings from my lists, then mash all the strings together into one list? In C I'd loop through all the lists, ignoring the empties and sticking the other strings in a new list. But I have a feeling python has a more elegant way to do this sort of thing.
Question 2 is more open ended: what's a robust strategy here? Should I read more than one line at a time in the first place? Make a dictionary, allowing easier re-ordering of the items later?
Sorry for the novel. Thanks for any pointers. And please, stylistic comments are more than welcome, style matters.
You don't need to close file when using with.
And if I were to implement this. I might use a big regex to extract parts from each chunk(with finditer), and reassemble them for output.
You can remove empty strings by:
new_list = filter(None, old_list)
Replace the first parameter with a lambda expression that is True for elements you want to keep. Passing None is equivalent to lambda x: x.
You can mash strings together into one string using:
a_string = "".join(list_of_strings)
If you have several lists (of whatever) and you want to join them together into one list, then:
new_list = reduce(lambda x, y: x+y, old_list)
That will simply concatenate them, but you can use any non-empty string as the separator.
If you're new to Python, then functions like filter and reduce (EDIT: deprecated in Python 3) may seem a bit alien, but they save a lot of time coding, so it's worth getting to know them.
I think you're on the right track to solving your problem. I'd do this:
break up everything into lines
break the resulting list into smaller list, one list per order
parse the orders into "something meaningful"
sort, output the result
Personally, I'd make a class to handle the last two parts (they kind of belong together logically) but you could get by without it.

Categories