How to detect if a file path is wrapped in " .. " with Python? - python

I read the ini file to open a file in python.
The thing is that the file info is sometimes inside the "..", but sometimes it's not.
For example,
fileA = "/a/b/c.txt"
fileB = /a/b/d.txt
Is there easy way to detect if a string is wrapped in "..", and return the string inside the quotation?

The simple detection would involve checking s[:1] == s[-1:] == '"' (carefully phrasing it with slicing rather than indexing to avoid exceptions if s is an empty string), and the conditional removal of exactly one quote from each end if one is present at both ends is
if s[:1] == s[-1:] == '"':
s = s[1:-1]
Alternatively, the approach in #Magnus's answer, as he says, removes all leading and trailing quote, and does so unconditionally; so, for example, if s starts with three quotes but doesn't end with any (and in all sort of other weird cases, outside of your specs as stated), the snippet in my answer won't alter s, #Magnus's will strip the three leading quotes.
"You pay your money and you take your choice"... if you don't care one way or another (i.e. you're sure that the situation where the two answers differ is "totally and utterly impossible"...), then I think #Magnus's higher-abstraction-level approach is neater (but, it's a matter of style -- both his approach and mine are correct Python solutions when you don't care about unmatched or unbalanced quotes;-).

To remove all leading and trailing quotes:
fileA = fileA.strip('"')

Related

Is SPLIT doing strange things? [duplicate]

What is the point of '/segment/segment/'.split('/') returning ['', 'segment', 'segment', '']?
Notice the empty elements. If you're splitting on a delimiter that happens to be at position one and at the very end of a string, what extra value does it give you to have the empty string returned from each end?
str.split complements str.join, so
"/".join(['', 'segment', 'segment', ''])
gets you back the original string.
If the empty strings were not there, the first and last '/' would be missing after the join().
More generally, to remove empty strings returned in split() results, you may want to look at the filter function.
Example:
f = filter(None, '/segment/segment/'.split('/'))
s_all = list(f)
returns
['segment', 'segment']
There are two main points to consider here:
Expecting the result of '/segment/segment/'.split('/') to be equal to ['segment', 'segment'] is reasonable, but then this loses information. If split() worked the way you wanted, if I tell you that a.split('/') == ['segment', 'segment'], you can't tell me what a was.
What should be the result of 'a//b'.split() be? ['a', 'b']?, or ['a', '', 'b']? I.e., should split() merge adjacent delimiters? If it should, then it will be very hard to parse data that's delimited by a character, and some of the fields can be empty. I am fairly sure there are many people who do want the empty values in the result for the above case!
In the end, it boils down to two things:
Consistency: if I have n delimiters, in a, I get n+1 values back after the split().
It should be possible to do complex things, and easy to do simple things: if you want to ignore empty strings as a result of the split(), you can always do:
def mysplit(s, delim=None):
return [x for x in s.split(delim) if x]
but if one doesn't want to ignore the empty values, one should be able to.
The language has to pick one definition of split()—there are too many different use cases to satisfy everyone's requirement as a default. I think that Python's choice is a good one, and is the most logical. (As an aside, one of the reasons I don't like C's strtok() is because it merges adjacent delimiters, making it extremely hard to do serious parsing/tokenization with it.)
There is one exception: a.split() without an argument squeezes consecutive white-space, but one can argue that this is the right thing to do in that case. If you don't want the behavior, you can always to a.split(' ').
I'm not sure what kind of answer you're looking for? You get three matches because you have three delimiters. If you don't want that empty one, just use:
'/segment/segment/'.strip('/').split('/')
Having x.split(y) always return a list of 1 + x.count(y) items is a precious regularity -- as #gnibbler's already pointed out it makes split and join exact inverses of each other (as they obviously should be), it also precisely maps the semantics of all kinds of delimiter-joined records (such as csv file lines [[net of quoting issues]], lines from /etc/group in Unix, and so on), it allows (as #Roman's answer mentioned) easy checks for (e.g.) absolute vs relative paths (in file paths and URLs), and so forth.
Another way to look at it is that you shouldn't wantonly toss information out of the window for no gain. What would be gained in making x.split(y) equivalent to x.strip(y).split(y)? Nothing, of course -- it's easy to use the second form when that's what you mean, but if the first form was arbitrarily deemed to mean the second one, you'd have lot of work to do when you do want the first one (which is far from rare, as the previous paragraph points out).
But really, thinking in terms of mathematical regularity is the simplest and most general way you can teach yourself to design passable APIs. To take a different example, it's very important that for any valid x and y x == x[:y] + x[y:] -- which immediately indicates why one extreme of a slicing should be excluded. The simpler the invariant assertion you can formulate, the likelier it is that the resulting semantics are what you need in real life uses -- part of the mystical fact that maths is very useful in dealing with the universe.
Try formulating the invariant for a split dialect in which leading and trailing delimiters are special-cased... counter-example: string methods such as isspace are not maximally simple -- x.isspace() is equivalent to x and all(c in string.whitespace for c in x) -- that silly leading x and is why you so often find yourself coding not x or x.isspace(), to get back to the simplicity which should have been designed into the is... string methods (whereby an empty string "is" anything you want -- contrary to man-in-the-street horse-sense, maybe [[empty sets, like zero &c, have always confused most people;-)]], but fully conforming to obvious well-refined mathematical common-sense!-).
Well, it lets you know there was a delimiter there. So, seeing 4 results lets you know you had 3 delimiters. This gives you the power to do whatever you want with this information, rather than having Python drop the empty elements, and then making you manually check for starting or ending delimiters if you need to know it.
Simple example: Say you want to check for absolute vs. relative filenames. This way you can do it all with the split, without also having to check what the first character of your filename is.
Consider this minimal example:
>>> '/'.split('/')
['', '']
split must give you what's before and after the delimiter '/', but there are no other characters. So it has to give you the empty string, which technically precedes and follows the '/', because '' + '/' + '' == '/'.
If you don't want empty spaces to be returned by split use it without args.
>>> " this is a sentence ".split()
['this', 'is', 'a', 'sentence']
>>> " this is a sentence ".split(" ")
['', '', 'this', '', '', 'is', '', 'a', 'sentence', '']
always use strip function before split if want to ignore blank lines.
youroutput.strip().split('splitter')
Example:
yourstring =' \nhey\njohn\nhow\n\nare\nyou'
yourstring.strip().split('\n')

Why is there an empty string, and only one, created when I split a string by "?" [duplicate]

What is the point of '/segment/segment/'.split('/') returning ['', 'segment', 'segment', '']?
Notice the empty elements. If you're splitting on a delimiter that happens to be at position one and at the very end of a string, what extra value does it give you to have the empty string returned from each end?
str.split complements str.join, so
"/".join(['', 'segment', 'segment', ''])
gets you back the original string.
If the empty strings were not there, the first and last '/' would be missing after the join().
More generally, to remove empty strings returned in split() results, you may want to look at the filter function.
Example:
f = filter(None, '/segment/segment/'.split('/'))
s_all = list(f)
returns
['segment', 'segment']
There are two main points to consider here:
Expecting the result of '/segment/segment/'.split('/') to be equal to ['segment', 'segment'] is reasonable, but then this loses information. If split() worked the way you wanted, if I tell you that a.split('/') == ['segment', 'segment'], you can't tell me what a was.
What should be the result of 'a//b'.split() be? ['a', 'b']?, or ['a', '', 'b']? I.e., should split() merge adjacent delimiters? If it should, then it will be very hard to parse data that's delimited by a character, and some of the fields can be empty. I am fairly sure there are many people who do want the empty values in the result for the above case!
In the end, it boils down to two things:
Consistency: if I have n delimiters, in a, I get n+1 values back after the split().
It should be possible to do complex things, and easy to do simple things: if you want to ignore empty strings as a result of the split(), you can always do:
def mysplit(s, delim=None):
return [x for x in s.split(delim) if x]
but if one doesn't want to ignore the empty values, one should be able to.
The language has to pick one definition of split()—there are too many different use cases to satisfy everyone's requirement as a default. I think that Python's choice is a good one, and is the most logical. (As an aside, one of the reasons I don't like C's strtok() is because it merges adjacent delimiters, making it extremely hard to do serious parsing/tokenization with it.)
There is one exception: a.split() without an argument squeezes consecutive white-space, but one can argue that this is the right thing to do in that case. If you don't want the behavior, you can always to a.split(' ').
I'm not sure what kind of answer you're looking for? You get three matches because you have three delimiters. If you don't want that empty one, just use:
'/segment/segment/'.strip('/').split('/')
Having x.split(y) always return a list of 1 + x.count(y) items is a precious regularity -- as #gnibbler's already pointed out it makes split and join exact inverses of each other (as they obviously should be), it also precisely maps the semantics of all kinds of delimiter-joined records (such as csv file lines [[net of quoting issues]], lines from /etc/group in Unix, and so on), it allows (as #Roman's answer mentioned) easy checks for (e.g.) absolute vs relative paths (in file paths and URLs), and so forth.
Another way to look at it is that you shouldn't wantonly toss information out of the window for no gain. What would be gained in making x.split(y) equivalent to x.strip(y).split(y)? Nothing, of course -- it's easy to use the second form when that's what you mean, but if the first form was arbitrarily deemed to mean the second one, you'd have lot of work to do when you do want the first one (which is far from rare, as the previous paragraph points out).
But really, thinking in terms of mathematical regularity is the simplest and most general way you can teach yourself to design passable APIs. To take a different example, it's very important that for any valid x and y x == x[:y] + x[y:] -- which immediately indicates why one extreme of a slicing should be excluded. The simpler the invariant assertion you can formulate, the likelier it is that the resulting semantics are what you need in real life uses -- part of the mystical fact that maths is very useful in dealing with the universe.
Try formulating the invariant for a split dialect in which leading and trailing delimiters are special-cased... counter-example: string methods such as isspace are not maximally simple -- x.isspace() is equivalent to x and all(c in string.whitespace for c in x) -- that silly leading x and is why you so often find yourself coding not x or x.isspace(), to get back to the simplicity which should have been designed into the is... string methods (whereby an empty string "is" anything you want -- contrary to man-in-the-street horse-sense, maybe [[empty sets, like zero &c, have always confused most people;-)]], but fully conforming to obvious well-refined mathematical common-sense!-).
Well, it lets you know there was a delimiter there. So, seeing 4 results lets you know you had 3 delimiters. This gives you the power to do whatever you want with this information, rather than having Python drop the empty elements, and then making you manually check for starting or ending delimiters if you need to know it.
Simple example: Say you want to check for absolute vs. relative filenames. This way you can do it all with the split, without also having to check what the first character of your filename is.
Consider this minimal example:
>>> '/'.split('/')
['', '']
split must give you what's before and after the delimiter '/', but there are no other characters. So it has to give you the empty string, which technically precedes and follows the '/', because '' + '/' + '' == '/'.
If you don't want empty spaces to be returned by split use it without args.
>>> " this is a sentence ".split()
['this', 'is', 'a', 'sentence']
>>> " this is a sentence ".split(" ")
['', '', 'this', '', '', 'is', '', 'a', 'sentence', '']
always use strip function before split if want to ignore blank lines.
youroutput.strip().split('splitter')
Example:
yourstring =' \nhey\njohn\nhow\n\nare\nyou'
yourstring.strip().split('\n')

Indexing the wrong character for an expression

My program seems to be indexing the wrong character or not at all.
I wrote a basic calculator that allows expressions to be used. It works by having the user enter the expression, then turning it into a list, and indexing the first number at position 0 and then using try/except statements to index number2 and the operator. All this is in a while loop that is finished when the user enters done at the prompt.
The program seems to work fine if I type the expression like this "1+1" but if I add spaces "1 + 1" it cannot index it or it ends up indexing the operator if I do "1+1" followed by "1 + 1".
I have asked in a group chat before and someone told me to use tokenization instead of my method, but I want to understand why my program is not running properly before moving on to something else.
Here is my code:
https://hastebin.com/umabukotab.py
Thank you!
Strings are basically lists of characters. 1+1 contains three characters, whereas 1 + 1 contains five, because of the two added spaces. Thus, when you access the third character in this longer string, you're actually accessing the middle element.
Parsing input is often not easy, and certainly parsing arithmetic expressions can get tricky quite quickly. Removing spaces from the input, as suggested by #Sethroph is a viable solution, but will only go that far. If you all of a sudden need to support stuff like 1+2+3, it will still break.
Another solution would be to split your input on the operator. For example:
input = '1 + 2'
terms = input.split('+') # ['1 ', ' 2'] note the spaces
terms = map(int, terms) # [1, 2] since int() can handle leading/trailing whitespace
output = terms[0] + terms[1]
Still, although this can handle situations like 1 + 2 + 3, it will still break when there's multiple different operators involved, or there are parentheses (but that might be something you need not worry about, depending on how complex you want your calculator to be).
IMO, a better approach would indeed be to use tokenization. Personally, I'd use parser combinators, but that may be a bit overkill. For reference, here's an example calculator whose input is parsed using parsy, a parser combinator library for Python.
You could remove the spaces before processing the string by using replace().
Try adding in:
clean_input = hold_input.replace(" ", "")
just after you create hold_input.

Python CSV writer, how to handle quotes in order to avoid triple quotes in output

I am working with Python's CSV module, specifically the writer. My question is how can I add double quotes to a single item in a list and have the writer write the string the same way as a print statement would?
for example:
import csv
#test "data"
test = ['item1','01','001',1]
csvOut = csv.writer(open('file.txt','a')) #'a' used for keeping past results
test[1] = '"'+test[1]+'"'
print test
#prints: ['item1', '"01"', '001', 1]
csvOut.writerow(test)
#written in the output file: item1,"""01""",001,1
#I was expecting: item1,"01",001,1
del csvOut
I tired adding a quoting=csv.QUOTE_NONE option, but that raised an error. I am guessing this is related to the many csv dialects, I was hoping to avoid digging too far into that.
In retrospect I could probably have built my initial data set smarter and perhaps avoided the need for this situation but at this point curiosity is really getting the better of me (this is a simplified example): how do you keep the written output from adding those extra quotes?
It's not actually triple-quoting, although it looks that way. Try it with another example to see:
test = ['item1', 'abc"def']
Now you'll see that it writes this:
"abc""def"
In other words, it's just wrapping quotes around your string, and escaping the literal quote characters by doubling them, because that's how default Excel-style CSV handles quote characters.
The question is, what format do you want here? Almost anything you want (within reason) is doable, but you have to pick something. Backslash-escaping quotes? Backslash-escaping everything instead of using quotes in the first place? Single quotes instead of double quotes?
For example, this looks like an answer:
csvOut = csv.writer(open('file.txt','a'), quotechar="'")
… until you have an item like Filet O'Fish and the whole thing gets single-quoted and the ' gets doubled and you have the exact same problem you were trying to avoid. If you're aiming for human readability, and ' is a lot less common in your data than ", that may actually be the right answer, but it's not a perfect answer.
And really, no answer can be perfect: you need some way to either quote or escape commas—and other things, like newlines—and the way you do that is going to add at least one more character that needs to be quote-doubled or escaped. If you know there are never any commas, newlines, etc. in your data, and there's at least one other character you know will never show up, you can get away with setting either quotechar to that other character, or escapechar to that other character and quoting=QUOTE_NONE. But the first time someone unexpectedly uses the character you were sure would never appear, your code will break, so you'd better actually be sure.
Quotes get escaped because your data could contain a comma. You probably don't want a CSV file if you don't want quotes escaped. Just join on a comma (this will break downstream if your data has a comma in it)

Python style for `chained` function calls

More and more we use chained function calls:
value = get_row_data(original_parameters).refine_data(leval=3).transfer_to_style_c()
It can be long. To save long line in code, which is prefered?
value = get_row_data(
original_parameters).refine_data(
leval=3).transfer_to_style_c()
or:
value = get_row_data(original_parameters)\
.refine_data(leval=3)\
.transfer_to_style_c()
I feel it good to use backslash \, and put .function to new line. This makes each function call has it own line, it's easy to read. But this sounds not preferred by many. And when code makes subtle errors, when it's hard to debug, I always start to worry it might be a space or something after the backslash (\).
To quote from the Python style guide:
Long lines can be broken over multiple lines by wrapping expressions
in parentheses. These should be used in preference to using a
backslash for line continuation. Make sure to indent the continued
line appropriately. The preferred place to break around a binary
operator is after the operator, not before it.
I tend to prefer the following, which eschews the non-recommended \ at the end of a line, thanks to an opening parenthesis:
value = (get_row_data(original_parameters)
.refine_data(level=3)
.transfer_to_style_c())
One advantage of this syntax is that each method call is on its own line.
A similar kind of \-less structure is also often useful with string literals, so that they don't go beyond the recommended 79 character per line limit:
message = ("This is a very long"
" one-line message put on many"
" source lines.")
This is a single string literal, which is created efficiently by the Python interpreter (this is much better than summing strings, which creates multiple strings in memory and copies them multiple times until the final string is obtained).
Python's code formatting is nice.
What about this option:
value = get_row_data(original_parameters,
).refine_data(leval=3,
).transfer_to_style_c()
Note that commas are redundant if there are no other parameters but I keep them to maintain consistency.
The not quoting my own preference (although see comments on your question:)) or alternatives answer to this is:
Stick to the style guidelines on any project you have already - if not stated, then keep as consistent as you can with the rest of the code base in style.
Otherwise, pick a style you like and stick with that - and let others know somehow that's how you'd appreciate chained function calls to be written if not reasonably readable on one-line (or however you wish to describe it).

Categories