Python: How to get double backslash from Path object

Python: How to get double backslash from Path object - python

I'm using the pathlib Path module to store the paths of a couple of my programs. The problem is, when I go to use this variable, I end up with an error because the \\ gets turned into a \ which the system in interpreting as a special character. My understanding was that depending on the OS, the Path module would handle this accordingly. (I'm using Windows)
Here is a recreation of my code:
from pathlib import Path
def get_dictionary():
path1 = Path("C:\\Programs\\program1")
path2 = Path("C:\\Programs\\program2")
path3 = Path("C:\\Programs\\program3")
path4 = Path("C:\\Programs\\program4")
info = {
"program1" : str(path1),
"program2" : str(path2),
"program3" : str(path3),
"program4" : str(path4)
}
return info
if __name__ == "__main__":
theInfo = get_dictionary()
print(theInfo['program1'])
print(theInfo['program2'])
print(theInfo['program3'])
print(theInfo['program4'])
print(theInfo)
And the console output is the following:
C:\Programs\program1
C:\Programs\program2
C:\Programs\program3
C:\Programs\program4
{'program1': 'C:\\Programs\\program1', 'program2': 'C:\\Programs\\program2', 'program3':
'C:\\Programs\\program3', 'program4': 'C:\\Programs\\program4'}
So my question is: Say I want to use theInfo['program1']. I get C:\Programs\program1 but I need to get C:\\Programs\\program1. How can I go about doing this? Thank you for any help!
Edit: The values I get from the dictionary are placed in a string that ends up being a line in a Tcl file. For instance I have a function where I write:
f"puts {theInfo['program1']}"
where I expect:
puts C:\\Programs\\program1
but I get:
puts C:\Programs\program1
With other characters, this interprets as a tab, newline, ect...

This phenomenon is caused caused by Escape character '\', if you really want the escaped data to be C:\\Programs\\program1， you can split the path then join it with \\\\.
In [4]: data = 'C:\\Programs\\program1'
In [5]: data
Out[5]: 'C:\\Programs\\program1'
In [6]: print(data)
C:\Programs\program1
In [7]: new_data = 'C:\\\\Programs\\\\program1'
In [8]: new_data
Out[8]: 'C:\\\\Programs\\\\program1'
In [9]: print(new_data)
C:\\Programs\\program1
In [21]: p = '\\\\'.join(("C:\\Programs\\program1").split('\\'))
In [22]: p
Out[22]: 'C:\\\\Programs\\\\program1'
In [23]: print(p)
C:\\Programs\\program1

Related

String from file to string utf-8 in python

So I am reading and manipulate a file with :
base_file = open(path+'/'+base_name, "r")
lines = base_file.readlines()
After this I search and find the "raw_data" start of line.
if re.match("\s{0,100}raw_data: ",line):
split_line = line.split("raw_data:")
print(split_line)
raw_string = split_line[1]
One example of raw_data is:
raw_data: "&\276!\300\307 =\277\"O\271\277vH9?j?\345?#\243\264=\350\034\345\277\260\345\033\300\023\017(#z|\273\277L\}\277\210\\031\300\213\263z\277\302\241\033\300\000\207\323\277\247Oh>j\354\215#\364\305\201\276\361+\202#t:\304\277\344\231\243#\225k\002\300vw\262\277\362\220j\300\"(\337\276\354b8\300\230\347H\300\201\320\204\300S;N\300Z0G\300>j\210\000#\034\014\220#\231\330J#\223\025\236#\006\332\230\276\227\273\n\277\353#,#\202\205\215\277\340\356\022\300/\223\035\277\331\277\362\276a\350\013#)\353\276\277v6\316\277K\326\207\300`2)\300\004\014Q\300\340\267\271\300MV\305\300\327\010\207\300j\346o\300\377\260\216\300[\332g\300\336\266\003\300\320S\272?6\300Y#\356\250\034\300\367\277&\300\335Uq>o\010&\300r\277\252\300U\314\243\300\253d\377\300"
And raw_string will be
print(raw_data)
"&\276!\300\307
=\277\"O\271\277vH9?j?\345?#\243\264=\350\034\345\277\260\345\033\300\023\017(#z|\273\277L\}\277\210\\031\300\213\263z\277\302\241\033\300\000\207\323\277\247Oh>j\354\215#\364\305\201\276\361+\202#t:\304\277\344\231\243#\225k\002\300vw\262\277\362\220j\300\"(\337\276\354b8\300\230\347H\300\201\320\204\300S;N\300Z0G\300>j\210\000#\034\014\220#\231\330J#\223\025\236#\006\332\230\276\227\273\n\277\353#,#\202\205\215\277\340\356\022\300/\223\035\277\331\277\362\276a\350\013#)\353\276\277v6\316\277K\326\207\300`2)\300\004\014Q\300\340\267\271\300MV\305\300\327\010\207\300j\346o\300\377\260\216\300[\332g\300\336\266\003\300\320S\272?6\300Y#\356\250\034\300\367\277&\300\335Uq>o\010&\300r\277\252\300U\314\243\300\253d\377\300"
If I tried to read this file I will obtain one char to one char even for escape characters.
So, my question is how to transform this plain text to utf-8 string so that I can have one character when reading \300 and not 4 characters.
I tried to pass "encondig =utf-8" in open file method but does not work.
I have made the same example passing raw_data as variable and it works properly.
RAW_DATA = "&\276!\300\307 =\277\"O\271\277vH9?j?\345?#\243\264=\350\034\345\277\260\345\033\300\023\017(#z|\273\277L\\}\277\210\\\031\300\213\263z\277\302\241\033\300\000\207\323\277\247Oh>j\354\215#\364\305\201\276\361+\202#t:\304\277\344\231\243#\225k\002\300vw\262\277\362\220j\300\"(\337\276\354b8\300\230\347H\300\201\320\204\300S;N\300Z0G\300<I>>j\210\000#\034\014\220#\231\330J#\223\025\236#\006\332\230\276\227\273\n\277\353#,#\202\205\215\277\340\356\022\300/\223\035\277\331\277\362\276a\350\013#)\353\276\277v6\316\277K\326\207\300`2)\300\004\014Q\300\340\267\271\300MV\305\300\327\010\207\300j\346o\300\377\260\216\300[\332g\300\336\266\003\300\320S\272?6\300Y#\356\250\034\300\367\277&\300\335Uq>o\010&\300r\277\252\300U\314\243\300\253d\377\300"
print(f"Qnt -> {len(RAW_DATA)}") # Qnt -> 256
print(type(RAW_DATA))
at = 0
total = 0
while at < len(RAW_DATA):
fin = at+4
substrin = RAW_DATA[at:fin]
resu = FourString_float(substrin)
at = fin
For this example \300 is only one char.
Hope someone can help me.

The problem is that on the read file the escape \ symbols are coming in as \, but in the example you've provided they are being evaluated as part of the numerics that follow it. ie, \276 is read as a single character.
If you run:
RAW_DATA = r"&\276!\300\307 =\277\"O\271\277vH9?j?\345?#\243\264=\350\034\345\277\260\345\033\300\023\017(#z|\273\277L\\}\277\210\\\031\300\213\263z\277\302\241\033\300\000\207\323\277\247Oh>j\354\215#\364\305\201\276\361+\202#t:\304\277\344\231\243#\225k\002\300vw\262\277\362\220j\300\"(\337\276\354b8\300\230\347H\300\201\320\204\300S;N\300Z0G\300<I>>j\210\000#\034\014\220#\231\330J#\223\025\236#\006\332\230\276\227\273\n\277\353#,#\202\205\215\277\340\356\022\300/\223\035\277\331\277\362\276a\350\013#)\353\276\277v6\316\277K\326\207\300`2)\300\004\014Q\300\340\267\271\300MV\305\300\327\010\207\300j\346o\300\377\260\216\300[\332g\300\336\266\003\300\320S\272?6\300Y#\356\250\034\300\367\277&\300\335Uq>o\010&\300r\277\252\300U\314\243\300\253d\377\300"
print(f"Qnt -> {len(RAW_DATA)}") # Qnt -> 256
print(type(RAW_DATA))
at = 0
total = 0
while at < len(RAW_DATA):
fin = at+4
substrin = RAW_DATA[at:fin]
resu = FourString_float(substrin)
at = fin
You would should be getting the same error that you were getting originally. Notice that we are using the raw-string literal instead of regular string literal. This will ensure that the \ don't get escaped.
You would need to evaluate the RAW_DATA to force it to evaluate the \.
You can do something like RAW_DATA = eval(f'"{RAW_DATA}"') or
import ast
RAW_DATA = ast.literal_eval(f'"{RAW_DATA}"')
Note, the second option is a bit more secure that doing a straight eval as you are limiting the scope of what can be executed.

Parametrized complex string working with gpkg in python

I need to construct a chain of text like this:
out = 'ogr:dbname=\'C:\\output\\2020.gpkg\' table=\"2020\" (geom) sql='
Here is my code:
import glob, time, sys, threading, os
from datetime import date, timedelta, datetime
import time, threading
#Parameters
layer = 'C:\\layer.gpkg'
n ='2020'
outdir = 'C:\\output'
#Process
l = os.path.realpath(layer)
pn = os.path.realpath(outdir + '/' + n + '.gpkg')
p = f"'{pn}'"
f = f"'{n}'"
o = f'ogr:dbname={p} table={f} (geom) sql='
#Test
out = 'ogr:dbname=\'C:\\output\\2020.gpkg\' table=\"2020\" (geom) sql='
o == out
The goal is to get o == out.
What do I need to change in the #Process part in order to get this as True ?
Moreover I need to run this either in linux or windows.
My final goal is to create a function that give 3 strings returns the complex string line shown above.

Assuming you are using python 3.6 or above you should use format strings (also known as f strings) to construct strings from variables. Start the string with the letter "f" and then put whatever variables you want in curly brackets {}. Also if you use single quotes as the outer quote then you don't have to escape double quotes and vice versa.
Code:
db_name = "'home/user/output/prueba.gpkg'"
table_name = '"prueba"'
outputlayer = f'ogr:dbname={db_name} table={table_name} (geom) sql='
outputlayer
Output:
'ogr:dbname=\'home/user/output/prueba.gpkg\' table="prueba" (geom) sql='

I think one of the issues that this isn't working is your path here pn = os.path.realpath(outdir + '/' + n + '.gpkg'). This is trying to combine UNIX path / with windows path \\. A more robust solution in terms of portability between linux and windows would be to use the path.join function in os module.
Additionally, python f strings will only add escapes to whichever quote character you used to open the string (' or "). If the escaped quotes around both strings are necessary, you're probably better off hard coding it into an f-string instead of setting 2 new variables with different quote types.
import glob, time, sys, threading, os
from datetime import date, timedelta, datetime
import time, threading
#Parameters
layer = 'C:\\layer.gpkg'
n ='2020'
outdir = 'C:\\output'
#Process
l = os.path.realpath(layer)
pn = os.path.realpath(os.path.join(outdir, f"{n}.gpkg"))
o = f'ogr:dbname=\'{pn}\' table=\"{n}\" (geom) sql='
#Test
out = 'ogr:dbname=\'C:\\output\\2020.gpkg\' table=\"2020\" (geom) sql='
o == out
A version of this (different path) has been tested to work on my linux machine.

Another option is to use a triple quoted string:
dbname = """/home/user/output/prueba.gpkg"""
outputlayer = """ogr:dbname='"""+dbname+"""' table="prueba" (geom) sql="""
which gives:
'ogr:dbname=\'/home/user/output/prueba.gpkg\' table="prueba" (geom) sql='

Absolute path of string that contains special characters

In my code, I get a path from the database that may contain special escaping characters that I need to convert them to a real path name. I'm using python 3.7 on Windows.
Suppose this path: C:\Files\2c2b2541\00025\test.x
IMPORTANT: the path is not a fixed value in the code and it is an output of executing a Stored Procedure from pyodbc.
When I try to convert it to an absolute path I get this error:
ValueError: _getfullpathname: embedded null character in path
I also tried to replace "\" with "/" but with no luck.
import os
# path = cursor.execute(query, "some_input").fetchone()[0]
path = 'C:\Files\2c2b2541\00025\test.x'
print(os.path.abspath(path))

Judging by your comments on the other answers, it sounds like the data is already corrupted in the database you're using. That is, you have a literal null byte stored there, and perhaps other bogus bytes (like \2 perhaps turning into \x02). So you probably need two fixes.
First, you should fix whatever code is putting values into the database, so it won't put bogus data in any more. You haven't described how the data gets into the database, so I we can't give you much guidance on how to do this. But most programming languages (and DB libraries) have tools to prevent escape sequences from being evaluated in strings where they're not wanted.
Once you've stopped new bad data from getting added, you can work on fixing the values that are already in the database. It probably shouldn't be too hard to write a query that will replace \0 null bytes with \\0 (or whatever the appropriate escape sequence is for your DB). You may want to look for special characters like newlines (\n) and unprintable characters (like \x02) as well.
I'd only try to fix this issue on the output end if you don't have any control of the database at all.

I think the below is the right way to solve your problem.
>>> def get_fixed_path(path):
... path = repr(path)
... path = path.replace("\\", "\\\\")
... path = path.replace("\\x", "\\\\0")
... path = os.path.abspath(path3).split("'")[1]
... return path
...
>>>
>>> path = 'C:\Files\2c2b2541\00025\test.x'
>>> path
'C:\\Files\x02c2b2541\x0025\test.x'
>>>
>>> print(path)
C:\Filesc2b2541 25 est.x
>>>
>>> final_path = get_fixed_path(path)
>>> final_path
'C:\\Files\\002c2b2541\\00025\\test.x'
>>>
>>> print(final_path)
C:\Files\002c2b2541\00025\test.x
>>>
And here is the detailed description of each and every steps/statements in the above solution.
First step (problem)
>>> import os
>>>
>>> path = 'C:\Files\2c2b2541\00025\test.x'
>>> path
'C:\\Files\x02c2b2541\x0025\test.x'
>>>
>>> print(path)
C:\Filesc2b2541 25 est.x
>>>
Second step (problem)
>>> path2 = repr(path)
>>> path2
"'C:\\\\Files\\x02c2b2541\\x0025\\test.x'"
>>>
>>> print(path2)
'C:\\Files\x02c2b2541\x0025\test.x'
>>>
Third step (problem)
>>> path3 = path2.replace("\\", "\\\\")
>>> path3
"'C:\\\\\\\\Files\\\\x02c2b2541\\\\x0025\\\\test.x'"
>>>
>>> print(path3)
'C:\\\\Files\\x02c2b2541\\x0025\\test.x'
>>>
>>> path3 = path3.replace("\\x", "\\\\0")
>>> path3
"'C:\\\\\\\\Files\\\\\\002c2b2541\\\\\\00025\\\\test.x'"
>>>
>>> print(path3)
'C:\\\\Files\\\002c2b2541\\\00025\\test.x'
>>>
Fourth step (problem)
>>> os.path.abspath(path3)
"C:\\Users\\RISHIKESH\\'C:\\Files\\002c2b2541\\00025\\test.x'"
>>>
>>> os.path.abspath(path2)
"C:\\Users\\RISHIKESH\\'C:\\Files\\x02c2b2541\\x0025\\test.x'"
>>>
>>> os.path.abspath('k')
'C:\\Users\\RISHIKESH\\k'
>>>
>>> os.path.abspath(path3).split("'")
['C:\\Users\\RISHIKESH\\', 'C:\\Files\\002c2b2541\\00025\\test.x', '']
>>> os.path.abspath(path3).split("'")[1]
'C:\\Files\\002c2b2541\\00025\\test.x'
>>>
Final step (solution)
>>> final_path = os.path.abspath(path3).split("'")[1]
>>>
>>> final_path
'C:\\Files\\002c2b2541\\00025\\test.x'
>>>
>>> print(final_path)
C:\Files\002c2b2541\00025\test.x
>>>

Replace "\" by "\\".
That's it.

You need to either use a raw string literal or double backslashes \\.
import os
path = r'C:\Files\2c2b2541\00025\test.x' #r before the string
print(os.path.abspath(path))

Python match '.\' at start of string

I need to identify where some powershell path strings cross over into Python.
How do I detect if a path in Python starts with .\ ??
Here's an example:
import re
file_path = ".\reports\dsReports"
if re.match(r'.\\', file_path):
print "Pass"
else:
print "Fail"
This Fails, in the debugger it lists
expression = .\\\\\\
string = .\\reports\\\\dsReports
If I try using replace like so:
import re
file_path = ".\reports\dsReports"
testThis = file_path.replace(r'\', '&jkl$ff88')
if re.match(r'.&jkl$ff88', file_path):
print "Pass"
else:
print "Fail"
the testThis variable ends up like this:
testThis = '.\\reports&jkl$ff88dsReports'
Quite agravating.

The reason this is happening is because \r is an escape sequence. You will need to either escape the backslashes by doubling them, or use a raw string literal like this:
file_path = r".\reports\dsReports"
And then check if it starts with ".\\":
if file_path.startswith('.\\'):
do_whatever()

python strip function is not giving expected output

i have below code in which filenames are FR1.1.csv, FR2.0.csv etc. I am using these names to print in header row but i want to modify these name to FR1.1 , Fr2.0 and so on. Hence i am using strip function to remove .csv. when i have tried it at command prompt its working fine. But when i have added it to main script its not giving output.
for fname in filenames:
print "fname : ", fname
fname.strip('.csv');
print "after strip fname: ", fname
headerline.append(fname+' Compile');
headerline.append(fname+' Run');
output i am getting
fname :FR1.1.csv
after strip fname: FR1.1.csv
required output-->
fname :FR1.1.csv
after strip fname: FR1.1
i guess some indentation problem is there in my code after for loop.
plesae tell me what is the correct way to achive this.

Strings are immutable, so string methods can't change the original string, they return a new one which you need to assign again:
fname = fname.strip('.csv') # no semicolons in Python!
But this call doesn't do what you probably expect it to. It will remove all the leading and trailing characters c, s, v and . from your string:
>>> "cross.csv".strip(".csv")
'ro'
So you probably want to do
import re
fname = re.sub(r"\.csv$", "", fname)

Strings are immutable. strip() returns a new string.
>>> "FR1.1.csv".strip('.csv')
'FR1.1'
>>> m = "FR1.1.csv".strip('.csv')
>>> print(m)
FR1.1
You need to do fname = fname.strip('.csv').
And get rid of the semicolons in the end!
P.S - Please see Jon Clement's comment and Tim Pietzcker's answer to know why this code should not be used.

You probably should use os.path for path manipulations:
import os
#...
for fname in filenames:
print "fname : ", fname
fname = os.path.splitext(fname)[0]
#...
The particular reason why your code fails is provided in other answers.

change
fname.strip('.csv')
with
fname = fname.strip('.csv')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: How to get double backslash from Path object - python

Related

String from file to string utf-8 in python

Parametrized complex string working with gpkg in python

Absolute path of string that contains special characters

Python match '.\' at start of string

python strip function is not giving expected output

Categories

Resources