Python find String in XML or *.bat - python

I have a directory that contains both xml and *.bat files. I would like to find and replace the string "-SNAPSHOT" in any xml or .bat file. I can do this in notepad ++ where I go to Find in Files, enter "-SNAPSHOT" for the Find object, and "pom.xml,.bat" in the filter type, and it does what I am trying to do. However, I'd like to be able to accomplish the same thing in a python script. What is the best approach for this? Thanks.

Try something like this:
import os
myfldr = "myfolder\\"
mydir = os.listdir(myfldr)
old = "-SNAPSHOT"
new = "NEW_STRING"
for file in mydir:
tempdoc = []
if (".xml" in file) or (".bat" in file):
path = myfldr + file
with open(path, "rt") as f:
for line in f:
line = line.replace(old, new)
tempdoc.append(line)
with open(path, "wt") as f:
f.writelines(tempdoc)

Related

I need to extract this uid from a .sgm file

I need to extract the uid from a .sgm file, I tried the below code but it doesn't, work can anybody help?
Sample .sgm file content:
<miscdoc n='1863099' uid='0001863099_20220120' type='seccomlett' t='frm' mdy='01/20/2022'><rname>Kimbell Tiger Acquisition Corp, 01/20/2022</rname>
<table col='2' type='txt'>
<colspec col='1' colwidth='*'>
<colspec col='2' colwidth='2*'>
<tname>Meta-data</tname>
<tbody>
<row><entry>SEC-HEADER</entry><entry>0001104659-22-005920.hdr.sgml : 20220304</entry></row>
<row><entry>ACCEPTANCE-DATETIME</entry><entry>20220120160231</entry></row>
<row><entry>PRIVATE-TO-PUBLIC</entry></row>
<row><entry>ACCESSION-NUMBER</entry><entry>0001104659-22-005920</entry></row>
<row><entry>TYPE</entry><entry>CORRESP</entry></row>
<row><entry>PUBLIC-DOCUMENT-COUNT</entry><entry>1</entry></row>
<row><entry>FILING-DATE</entry><entry>20220120</entry></row>
<row><entry>FILER</entry></row>
code I tried:
import os
# Folder Path
path = "Enter Folder Path"
# Change the directory
os.chdir(path)
# Read text File
def read_file(file_path):
with open(file_path, 'r') as f:
print(f.read())
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".sgm"):
if 'uid' in file:
print("true")
file_path = f"{path}\{file}"
# call read text file function
read_file(file_path)
I need extract the uid value from the above sgm file, is there any other way I could do this? what should I change in my code?
SGM format may just by an XML superset. If it isn't then for this particular case (and if one could rely on the format being as shown in the question) then:
import re
def get_uid(filename):
with open(filename) as infile:
for line in map(str.strip, infile):
if line.startswith('<miscdoc'):
if uid := re.findall("uid='(.*?)'", line):
return uid[0]

I want to open a json file in python but got an error. It said No such file or directory

enter image description here
I wrote the code like this:
intents = json.loads(open('intents.json').read())
Check your intents.json file is in the same folder on which you python file is.
you can use, for example, the os builf-in module to check on the existence of file and os.path for path manipulation. Check the official doc at https://docs.python.org/3/library/os.path.html
import os
file = 'intents.json'
# location of the current directory
w_dir = os.path.abspath('.'))
if os.path.isfile(os.path.join(w_dir, file)):
with open(file, 'r') as fd:
fd.read()
else:
print('Such file does not exist here "{}"...'.format(w_dir))
You can try opening the file using the normal file operation and then use json.load or json.loads to parse the data as per your needs. I may be unfamiliar with this syntax to the best of my knowledge your syntax is wrong.
You can open the file like this:
f = open(file_name)
Then parse the data:
data = json.load(f)
You can refer to this link for more info and reference
https://www.geeksforgeeks.org/read-json-file-using-python/

YAML find and replace text Python

I have some files in YAML format, I need to find the text in the $title file and replace with what I specified. What the configuration file looks like approximately:
JoinGame-MOTD:
Enabled: true
Messages:
- '$title'
The YAML file may look different, so I want to make a universal code that will not get any specific string, but replace all $title with what I specified
What I was trying to do:
import sys
import yaml
with open(r'config.yml', 'w') as file:
def tr(s):
return s.replace('$title', 'Test')
yaml.dump(file, sys.stdout, transform=tr)
Please help me. It is not necessary to work with my code, I will be happy with any examples that can suit me
Might be easier to not use the yaml package at all.
with open("file.yml", "r") as fin:
with open("file_replaced.yml", "w") as fout:
for line in fin:
fout.write(line.replace('$title', 'Test'))
EDIT:
To update in place
with open("config.yml", "r+") as f:
contents = f.read()
f.seek(0)
f.write(contents.replace('$title', 'Test'))
f.truncate()
You can also read & write data in one go. os.path.join is optional, it makes sure the yaml file is read relative to path your script is stored
import re
import os
with open(os.path.join(os.path.dirname(__file__), 'temp.yaml'), 'r+') as f:
data = f.read()
f.seek(0)
new_data = data.replace('$title', 'replaced!')
f.write(new_data)
f.truncate()
In case you wish to dynamically replace other keywords besides $title, like $description or $name, you can write a function using regex like this;
def replaceString(text_to_search, keyword, replacement):
return re.sub(f"(\${keyword})[\W]", replacement, text_to_search)
replaceString('My name is $name', '$name', 'Bob')

python environment file open plus path

In environmental variables in system I have defined two variables:
A_home=C:\install\ahome
B_home=C:\install\bhome
following script is written to read information from location of variable A close it, then open location of variable B and write it there, thing is script only works with precise path e.g
C:\install\a\components\xxx\etc\static-data\myfile.xml
C:\install\b\components\xxx\etc\static-data\myfile.xml
problem is, that i need python to read path that is defined in env variable, plus common path like this: %a_home%\a\components\xxx\etc\static-data\myfile.xml`
so far i have this, and i cant move forward .... anyone have any ideas?? this script reads only exact path...
file = open('C:\install\a\components\xxx\etc\static-data\myfile.xml','r')
lines = file.readlines()
file.close()
file = open('C:\install\b\components\xxx\etc\static-data\myfile.xml','w')
for line in lines:
if line!='</generic-entity-list>'+'\n':
file.write(line)
file.write('<entity>XXX1</entity>\n')
file.write('<entity>XXX2</entity>\n')
file.write('</generic-entity-list>\n')
file.close()
Try something like this:
import os
import os.path
home = os.getenv("A_HOME")
filepath = os.path.join(home, "components", "xxx", "etc", "static-data", "GenericEntityList.xml")
with open(filepath, 'r') as f:
for line in f:
print(line)
so finally success Thanks Tom, i was inspired by you ....
here goes
import os
path1 = os.environ['SOME_ENVIRO1']
path2 = os.environ['SOME_ENVIRO2']
file = open(path1 +'\\components\\xxx\etc\\static-data\\GenericEntityList.xml', 'r')
lines = file.readlines()
file.close()
file = open(path2 +'\\components\\xxx\\etc\\static-data\\GenericEntityList.xml', 'w')
for line in lines:
if line!='</generic-entity-list>'+'\n':
file.write(line)
file.write('<entity>ENTITY1</entity>\n')
file.write('<entity>ENTITY2</entity>\n')
file.write('</generic-entity-list>\n')
file.close()

How do I run this Python 2.7 script over multiple files in the same directory [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
This script currently grabs specific types of IP addresses out of a file, formats them into csv.
How do I change this to get it to look through all files in its directory (same dir as script) and create a new output file. This is my first week on python so please be as simple as possible.
#!usr/bin/python
# Extract IP address from file
#import modules
import re
# Open Source File
infile = open('stix1.xml', 'r')
# Open output file
outfile = open('ExtractedIPs.csv', 'w')
# Create a list
BadIPs = []
#search each line in doc
for line in infile:
# ignore empty lines
if line.isspace(): continue
# find IP that are Indicator Titles
IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
# Only take finds
if not IP: continue
# Add each found IP to the BadIP list
BadIPs.append(IP)
#tidy up for CSV format
data = str(BadIPs)
data = data.replace('[', '')
data = data.replace(']', '')
data = data.replace("'", "")
# Write IPs to a file
outfile.write(data)
infile.close
outfile.close
I thinks you want to have a look at glob.glob: https://docs.python.org/2/library/glob.html
This will return a list of files matching a given pattern.
then you can do something like
import re, glob
def do_something_with(f):
# Open Source File
infile = open(f, 'r')
# Open output file
outfile = open('ExtractedIPs.csv', 'wa') ## ADDED a to append
# Create a list
BadIPs = []
### rest of you code
.
.
outfile.write(data)
infile.close
outfile.close
for f in glob.glob("*.xml"):
do_something_with(f)
assuming that you want to add all outputs to the same file this would be the script:
#!usr/bin/python
import glob
import re
for infileName in glob.glob("*.xml"):
# Open Source File
infile = open(infileName, 'r')
# Append to file
outfile = open('ExtractedIPs.csv', 'a')
# Create a list
BadIPs = []
#search each line in doc
for line in infile:
# ignore empty lines
if line.isspace(): continue
# find IP that are Indicator Titles
IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
# Only take finds
if not IP: continue
# Add each found IP to the BadIP list
BadIPs.append(IP)
#tidy up for CSV format
data = str(BadIPs)
data = data.replace('[', '')
data = data.replace(']', '')
data = data.replace("'", "")
# Write IPs to a file
outfile.write(data)
infile.close
outfile.close
You could get a list of all XML files like this.
filenames = [nm for nm in os.listdir() if nm.endswith('.xml')]
And then you iterate over all the files.
for fn in filenames:
with open(fn) as infile:
for ln in infile:
# do your thing
The with-statement makes sure that the file is closed after you're done with it.
import sys
Make a function out of your current code, for examle def extract(filename).
Call the script with all filenames: python myscript.py file1 file2 file3
Inside your script, loop over the filenames for filename in sys.argv[1:]:.
Call the function inside the loop: extract(filename).
I had a need to do this, and also to go into subdirectories as well. You need to import os and os.path, then can use a function like this:
def recursive_glob(rootdir='.', suffix=()):
""" recursively traverses full path from route, returns
paths and file names for files with suffix in tuple """
pathlist = []
filelist = []
for looproot,dirnames, filenames in os.walk(rootdir):
for filename in filenames:
if filename.endswith(suffix):
pathlist.append(os.path.join(looproot, filename))
filelist.append(filename)
return pathlist, filelist
You pass the function the top level directory you want to start from and the suffix for the file type you are looking for. This was written and tested for Windows, but I believe it will work on other OS's as well, as long as you've got file extensions to work from.
You could just use os.listdir() if all files in your current folder are relevant. If not, say all the .xml files, then use glob.glob("*.xml"). But the overall program can be improved, roughly as follows.
#import modules
import re
pat = re.compile(reg) # reg is your regex
with open("out.csv", "w") as fw:
writer = csv.writer(fw)
for f in os.listdir(): # or glob.glob("*.xml")
with open(f) as fr:
lines = (line for line in fr if line.isspace())
# genex for all ip in that file
ips = (ip for line in lines for ip in pat.findall(line))
writer.writerow(ips)
You probably have to change it to suit to exact needs. But the idea is in this version there are a lot less side effects, lot less memory consumption and close is managed by the context manager. Please comment if doesn't work.

Categories