re.match doesn't pick up on txt file format

re.match doesn't pick up on txt file format - python

import os.path
import re
def request ():
print ("What file should I write to?")
file = input ()
thing = os.path.exists (file)
if thing == True:
start = 0
elif re.match ("^.+.\txt$", file):
stuff = open (file, "w")
stuff.write ("Some text.")
stuff.close ()
start = 0
else:
start = 1
go = "yes"
list1 = (start, file, go)
return list1
start = 1
while start == 1:
list1 = request ()
(start, file, go) = list1
Whenever I enter Thing.txt as the text, the elif should catch that it's in the format given. However, start doesn't change to 0, and a file isn't created. Have I formatted the re.match incorrectly?

"^.+.\txt$" is an incorrect pattern for match .txt files you can use the following regex :
r'^\w+\.txt$'
As \w matches word character if you want that the file name only contain letters you could use [a-zA-Z] instead :
r'^[a-zA-Z]+\.txt$'
Note that you need to escape the . as is a special sign in regular expression .
re.match (r'^\w+\.txt$',file)
But as an alternative answer for match file names with special format you can use endswith() :
file.endswith('.txt')
Also instead of if thing == True you can just use if thing : that is more pythonic !

You should escape second dot and unescape the "t" character:
re.match ("^.+\.txt$", file)
Also note that you don't really need regex for this, you can simply use endswith or search for module that can give you files extensions:
import os
fileName, fileExtension = os.path.splitext('your_file.txt')
fileExtension is .txt, which is exactly what you're looking for.

Related

Python match '.\' at start of string

I need to identify where some powershell path strings cross over into Python.
How do I detect if a path in Python starts with .\ ??
Here's an example:
import re
file_path = ".\reports\dsReports"
if re.match(r'.\\', file_path):
print "Pass"
else:
print "Fail"
This Fails, in the debugger it lists
expression = .\\\\\\
string = .\\reports\\\\dsReports
If I try using replace like so:
import re
file_path = ".\reports\dsReports"
testThis = file_path.replace(r'\', '&jkl$ff88')
if re.match(r'.&jkl$ff88', file_path):
print "Pass"
else:
print "Fail"
the testThis variable ends up like this:
testThis = '.\\reports&jkl$ff88dsReports'
Quite agravating.

The reason this is happening is because \r is an escape sequence. You will need to either escape the backslashes by doubling them, or use a raw string literal like this:
file_path = r".\reports\dsReports"
And then check if it starts with ".\\":
if file_path.startswith('.\\'):
do_whatever()

removing a string of four characters from the front and thirteen characters from the end of a filename

I have seen the basic Python code for a filename replacement in a directory but they are always for known strings, but how would you remove random characters of a certain length?
Would this work?
newFileName = file.replace([-5:], "")
As I am trying to remove the last five characters from the filename without removing the extension.
Here is an update:
I am trying to do this:
DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml
to
CIWS15-AAAA-A00-00-0000-00A-018A-D.xml
which removes DMC- and _014-00_EN-US from the end.
I need to add this to a code that will fix a directory of files.

This problem (if I understand it correctly) has a clear separation. Remove extension, remove X characters from beginning and end, and then add the extension again to get the final answer.
import os
oldFileName = 'xxxx-filename-xxxxx.XML'
# remove n chars in beginning, m chars at end
n = 5
m = 6
name, ext = os.path.splitext(oldFileName)
# splice away the chars, and add the extension
newFileName = '{}{}'.format(name[0:-m][n:], ext)
# newFileName == 'filename.XML'
So in your case, you would use n=4 and m=13.
If you didn't know the length, but you knew you wanted everything up to and including the first dash out, and likewise everything after the first underscore (which would mean there couldn't be underscores in the normal filename or the first part of it), this would work also:
import os
oldFileName = 'DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml'
name, ext = os.path.splitext(oldFileName)
newFileName = '{}{}'.format(name[name.index('-')+1:name.index('_')], ext)
# newFileName == 'CIWS15-AAAA-A00-00-0000-00A-018A-D.xml'
And even if the pattern is something else, but there is a pattern, you can code to match it, like I have here.

Its not nice but I hope this works for you tho
If you know the files that you want to rename all have the same length, you can try:
>>>file = 'DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml'
>>>ext = file[51:]
>>>newFile = file[4:38]+ext
when you print the newFile you now have:
>>>print(newFile)
CIWS15-AAAA-A00-00-0000-00A-018A-D.xml

How to find parenthesis bound strings in python

I'm learning Python and wanted to automate one of my assignments in a cybersecurity class.
I'm trying to figure out how I would look for the contents of a file that are bound by a set of parenthesis. The contents of the (.txt) file look like:
cow.jpg : jphide[v5](asdfl;kj88876)
fish.jpg : jphide[v5](65498ghjk;0-)
snake.jpg : jphide[v5](poi098*/8!##)
test_practice_0707.jpg : jphide[v5](sJ*=tT#&Ve!2)
test_practice_0101.jpg : jphide[v5](nKFdFX+C!:V9)
test_practice_0808.jpg : jphide[v5](!~rFX3FXszx6)
test_practice_0202.jpg : jphide[v5](X&aC$|mg!wC2)
test_practice_0505.jpg : jphide[v5](pe8f%yC$V6Z3)
dog.jpg : negative`
And here is my code so far:
import sys, os, subprocess, glob, shutil
# Finding the .jpg files that will be copied.
sourcepath = os.getcwd() + '\\imgs\\'
destpath = 'stegdetect'
rawjpg = glob.glob(sourcepath + '*.jpg')
# Copying the said .jpg files into the destpath variable
for filename in rawjpg:
shutil.copy(filename, destpath)
# Asks user for what password file they want to use.
passwords = raw_input("Enter your password file with the .txt extension:")
shutil.copy(passwords, 'stegdetect')
# Navigating to stegdetect. Feel like this could be abstracted.
os.chdir('stegdetect')
# Preparing the arguments then using subprocess to run
args = "stegbreak.exe -r rules.ini -f " + passwords + " -t p *.jpg"
# Uses open to open the output file, and then write the results to the file.
with open('cracks.txt', 'w') as f: # opens cracks.txt and prepares to w
subprocess.call(args, stdout=f)
# Processing whats in the new file.
f = open('cracks.txt')

If it should just be bound by ( and ) you can use the following regex, which ensures starting ( and closing ) and you can have numbers and characters between them. You can add any other symbol also that you want to include.
[\(][a-z A-Z 0-9]*[\)]
[\(] - starts the bracket
[a-z A-Z 0-9]* - all text inside bracket
[\)] - closes the bracket
So for input sdfsdfdsf(sdfdsfsdf)sdfsdfsdf , the output will be (sdfdsfsdf)
Test this regex here: https://regex101.com/

I'm learning Python
If you are learning you should consider alternative implementations, not only regexps.
TO iterate line by line of a text file you just open the file and for over the file handle:
with open('file.txt') as f:
for line in f:
do_something(line)
Each line is a string with the line contents, including the end-of-line char '/n'. To find the start index of a specific substring in a string you can use find:
>>> A = "hello (world)"
>>> A.find('(')
6
>>> A.find(')')
12
To get a substring from the string you can use the slice notation in the form:
>>> A[6:12]
'(world'

You should use regular expressions which are implemented in the Python re module
a simple regex like \(.*\) could match your "parenthesis string"
but it would be better with a group \((.*)\) which allows to get only the content in the parenthesis.
import re
test_string = """cow.jpg : jphide[v5](asdfl;kj88876)
fish.jpg : jphide[v5](65498ghjk;0-)
snake.jpg : jphide[v5](poi098*/8!##)
test_practice_0707.jpg : jphide[v5](sJ*=tT#&Ve!2)
test_practice_0101.jpg : jphide[v5](nKFdFX+C!:V9)
test_practice_0808.jpg : jphide[v5](!~rFX3FXszx6)
test_practice_0202.jpg : jphide[v5](X&aC$|mg!wC2)
test_practice_0505.jpg : jphide[v5](pe8f%yC$V6Z3)
dog.jpg : negative`"""
REGEX = re.compile(r'\((.*)\)', re.MULTILINE)
print(REGEX.findall(test_string))
# ['asdfl;kj88876', '65498ghjk;0-', 'poi098*/8!##', 'sJ*=tT#&Ve!2', 'nKFdFX+C!:V9' , '!~rFX3FXszx6', 'X&aC$|mg!wC2', 'pe8f%yC$V6Z3']

Python - How to make sure that a line being read from a file contain only a given string and nothing else

In order to make sure I start and stop reading a text file exactly where I want to, I am providing 'start1'<->'end1', 'start2'<->'end2' as tags in between the text file and providing that to my python script. In my script I read it as:
start_end = ['start1','end1']
line_num = []
with open(file_path) as fp1:
for num, line in enumerate(fp1, 1):
for i in start_end:
if i in line:
line_num.append(num)
fp1.close()
print '\nLine number: ', line_num
fp2 = open(file_path)
for k, line2 in enumerate(fp2):
for x in range(line_num[0], line_num[1] - 1):
if k == x:
header.append(line2)
fp2.close()
This works well until I reach start10 <-> end10 and further. Eg. it checks if I have "start2" in the line and also reads the text that has "start21" and similarly for end tag as well. so providing "start1, end1" as input also reads "start10, end10". If I replace the line:
if i in line:
with
if i == line:
it throws an error.
How can I make sure that the script reads the line that contains ONLY "start1" and not "start10"?

import re
prog = re.compile('start1$')
if prog.match(line):
print line
That should return None if there is no match and return a regex match object if the line matches the compiled regex. The '$' at the end of the regex says that's the end of the line, so 'start1' works but 'start10' doesn't.
or another way..
def test(line):
import re
prog = re.compile('start1$')
return prog.match(line) != None
> test('start1')
True
> test('start10')
False

Since your markers are always at the end of the line, change:
start_end = ['start1','end1']
to:
start_end = ['start1\n','end1\n']

You probably want to look into regular expressions. The Python re library has some good regex tools. It would let you define a string to compare your line to and it has the ability to check for start and end of lines.

If you can control the input file, consider adding an underscore (or any non-number character) to the end of each tag.
'start1_'<->'end1_'
'start10_'<->'end10_'
The regular expression solution presented in other answers is more elegant, but requires using regular expressions.

You can do this with find():
for num, line in enumerate(fp1, 1):
for i in start_end:
if i in line:
# make sure the next char isn't '0'
if line[line.find(i)+len(i)] != '0':
line_num.append(num)

python strip function is not giving expected output

i have below code in which filenames are FR1.1.csv, FR2.0.csv etc. I am using these names to print in header row but i want to modify these name to FR1.1 , Fr2.0 and so on. Hence i am using strip function to remove .csv. when i have tried it at command prompt its working fine. But when i have added it to main script its not giving output.
for fname in filenames:
print "fname : ", fname
fname.strip('.csv');
print "after strip fname: ", fname
headerline.append(fname+' Compile');
headerline.append(fname+' Run');
output i am getting
fname :FR1.1.csv
after strip fname: FR1.1.csv
required output-->
fname :FR1.1.csv
after strip fname: FR1.1
i guess some indentation problem is there in my code after for loop.
plesae tell me what is the correct way to achive this.

Strings are immutable, so string methods can't change the original string, they return a new one which you need to assign again:
fname = fname.strip('.csv') # no semicolons in Python!
But this call doesn't do what you probably expect it to. It will remove all the leading and trailing characters c, s, v and . from your string:
>>> "cross.csv".strip(".csv")
'ro'
So you probably want to do
import re
fname = re.sub(r"\.csv$", "", fname)

Strings are immutable. strip() returns a new string.
>>> "FR1.1.csv".strip('.csv')
'FR1.1'
>>> m = "FR1.1.csv".strip('.csv')
>>> print(m)
FR1.1
You need to do fname = fname.strip('.csv').
And get rid of the semicolons in the end!
P.S - Please see Jon Clement's comment and Tim Pietzcker's answer to know why this code should not be used.

You probably should use os.path for path manipulations:
import os
#...
for fname in filenames:
print "fname : ", fname
fname = os.path.splitext(fname)[0]
#...
The particular reason why your code fails is provided in other answers.

change
fname.strip('.csv')
with
fname = fname.strip('.csv')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

re.match doesn't pick up on txt file format - python

Related

Python match '.\' at start of string

removing a string of four characters from the front and thirteen characters from the end of a filename

How to find parenthesis bound strings in python

Python - How to make sure that a line being read from a file contain only a given string and nothing else

python strip function is not giving expected output

Categories

Resources