I need to extract the uid from a .sgm file, I tried the below code but it doesn't, work can anybody help?
Sample .sgm file content:
<miscdoc n='1863099' uid='0001863099_20220120' type='seccomlett' t='frm' mdy='01/20/2022'><rname>Kimbell Tiger Acquisition Corp, 01/20/2022</rname>
<table col='2' type='txt'>
<colspec col='1' colwidth='*'>
<colspec col='2' colwidth='2*'>
<tname>Meta-data</tname>
<tbody>
<row><entry>SEC-HEADER</entry><entry>0001104659-22-005920.hdr.sgml : 20220304</entry></row>
<row><entry>ACCEPTANCE-DATETIME</entry><entry>20220120160231</entry></row>
<row><entry>PRIVATE-TO-PUBLIC</entry></row>
<row><entry>ACCESSION-NUMBER</entry><entry>0001104659-22-005920</entry></row>
<row><entry>TYPE</entry><entry>CORRESP</entry></row>
<row><entry>PUBLIC-DOCUMENT-COUNT</entry><entry>1</entry></row>
<row><entry>FILING-DATE</entry><entry>20220120</entry></row>
<row><entry>FILER</entry></row>
code I tried:
import os
# Folder Path
path = "Enter Folder Path"
# Change the directory
os.chdir(path)
# Read text File
def read_file(file_path):
with open(file_path, 'r') as f:
print(f.read())
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".sgm"):
if 'uid' in file:
print("true")
file_path = f"{path}\{file}"
# call read text file function
read_file(file_path)
I need extract the uid value from the above sgm file, is there any other way I could do this? what should I change in my code?
SGM format may just by an XML superset. If it isn't then for this particular case (and if one could rely on the format being as shown in the question) then:
import re
def get_uid(filename):
with open(filename) as infile:
for line in map(str.strip, infile):
if line.startswith('<miscdoc'):
if uid := re.findall("uid='(.*?)'", line):
return uid[0]
Related
I have some files in YAML format, I need to find the text in the $title file and replace with what I specified. What the configuration file looks like approximately:
JoinGame-MOTD:
Enabled: true
Messages:
- '$title'
The YAML file may look different, so I want to make a universal code that will not get any specific string, but replace all $title with what I specified
What I was trying to do:
import sys
import yaml
with open(r'config.yml', 'w') as file:
def tr(s):
return s.replace('$title', 'Test')
yaml.dump(file, sys.stdout, transform=tr)
Please help me. It is not necessary to work with my code, I will be happy with any examples that can suit me
Might be easier to not use the yaml package at all.
with open("file.yml", "r") as fin:
with open("file_replaced.yml", "w") as fout:
for line in fin:
fout.write(line.replace('$title', 'Test'))
EDIT:
To update in place
with open("config.yml", "r+") as f:
contents = f.read()
f.seek(0)
f.write(contents.replace('$title', 'Test'))
f.truncate()
You can also read & write data in one go. os.path.join is optional, it makes sure the yaml file is read relative to path your script is stored
import re
import os
with open(os.path.join(os.path.dirname(__file__), 'temp.yaml'), 'r+') as f:
data = f.read()
f.seek(0)
new_data = data.replace('$title', 'replaced!')
f.write(new_data)
f.truncate()
In case you wish to dynamically replace other keywords besides $title, like $description or $name, you can write a function using regex like this;
def replaceString(text_to_search, keyword, replacement):
return re.sub(f"(\${keyword})[\W]", replacement, text_to_search)
replaceString('My name is $name', '$name', 'Bob')
I'm trying to use below code to read 5 files from source, write them in destination and then deleting the files in source. I get the following error: [Errno 13] Permission denied: 'c:\\data\\AM\\Desktop\\tester1. The file by the way look like this:
import os
import time
source = r'c:\data\AM\Desktop\tester'
destination = r'c:\data\AM\Desktop\tester1'
for file in os.listdir(source):
file_path = os.path.join(source, file)
if not os.path.isfile:
continue
print(file_path)
with open (file_path, 'r') as IN, open (destination, 'w') as OUT:
data ={
'Power': None,
}
for line in IN:
splitter = (ID, Item, Content, Status) = line.strip().split()
if Item in data == "Power":
Content = str(int(Content) * 10)
os.remove(IN)
I have re-written your entire code. I assume you want to update the value of Power by a multiple of 10 and write the updated content into a new file. The below code will do just that.
Your code had multiple issues, first and foremost, most of what you wanted in your head did not get written in the code (like writing into a new file, providing what and where to write, etc.). The original issue of the permission was because you were trying to open a directory to write instead of a file.
source = r'c:\data\AM\Desktop\tester'
destination = r'c:\data\AM\Desktop\tester1'
for file in os.listdir(source):
source_file = os.path.join(source, file)
destination_file=os.path.join(destination, file)
if not os.path.isfile:
continue
print(source_file)
with open (source_file, 'r') as IN , open (destination_file, 'w') as OUT:
data={
'Power': None,
}
for line in IN:
splitter = (ID, Item, Content, Status) = line.strip().split()
if Item in data:# == "Power": #Changed
Content = str(int(Content) * 10)
OUT.write(ID+'\t'+Item+'\t'+Content+'\t'+Status+'\n') #Added to write the content into destination file.
else:
OUT.write(line) #Added to write the content into destination file.
os.remove(source_file)
Hope this works for you.
I'm not sure what you're going for here, but here's what I could come up with the question put into the title.
import os
# Takes the text from the old file
with open('old file path.txt', 'r') as f:
text = f.read()
# Takes text from old file and writes it to the new file
with open('new file path.txt', 'w') as f:
f.write(text)
# Removes the old text file
os.remove('old file path.txt')
Sounds from your description like this line fails:
with open (file_path, 'r') as IN, open (destination, 'w') as OUT:
Because of this operation:
open (destination, 'w')
So, you might not have write-access to
c:\data\AM\Desktop\tester1
Set file permission on Windows systems:
https://www.online-tech-tips.com/computer-tips/set-file-folder-permissions-windows/
#Sherin Jayanand
One more question bro, I wanted to try something out with some pieces of your code. I made this of it:
import os
import time
from datetime import datetime
#Make source, destination and archive paths.
source = r'c:\data\AM\Desktop\Source'
destination = r'c:\data\AM\Desktop\Destination'
archive = r'c:\data\AM\Desktop\Archive'
for root, dirs, files in os.walk(source):
for f in files:
pads = (root + '\\' + f)
# print(pads)
for file in os.listdir(source):
dst_path=os.path.join(destination, file)
print(dst_path)
with open(pads, 'r') as IN, open(dst_path, 'w') as OUT:
data={'Power': None,
}
for line in IN:
(ID, Item, Content, Status) = line.strip().split()
if Item in data:
Content = str(int(Content) * 10)
OUT.write(ID+'\t'+Item+'\t'+Content+'\t'+Status+'\n')
else:
OUT.write(line)
But again I received the same error: Permission denied: 'c:\\data\\AM\\Desktop\\Destination\\C'
How comes? Thank you very much!
I have a directory that contains both xml and *.bat files. I would like to find and replace the string "-SNAPSHOT" in any xml or .bat file. I can do this in notepad ++ where I go to Find in Files, enter "-SNAPSHOT" for the Find object, and "pom.xml,.bat" in the filter type, and it does what I am trying to do. However, I'd like to be able to accomplish the same thing in a python script. What is the best approach for this? Thanks.
Try something like this:
import os
myfldr = "myfolder\\"
mydir = os.listdir(myfldr)
old = "-SNAPSHOT"
new = "NEW_STRING"
for file in mydir:
tempdoc = []
if (".xml" in file) or (".bat" in file):
path = myfldr + file
with open(path, "rt") as f:
for line in f:
line = line.replace(old, new)
tempdoc.append(line)
with open(path, "wt") as f:
f.writelines(tempdoc)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
This script currently grabs specific types of IP addresses out of a file, formats them into csv.
How do I change this to get it to look through all files in its directory (same dir as script) and create a new output file. This is my first week on python so please be as simple as possible.
#!usr/bin/python
# Extract IP address from file
#import modules
import re
# Open Source File
infile = open('stix1.xml', 'r')
# Open output file
outfile = open('ExtractedIPs.csv', 'w')
# Create a list
BadIPs = []
#search each line in doc
for line in infile:
# ignore empty lines
if line.isspace(): continue
# find IP that are Indicator Titles
IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
# Only take finds
if not IP: continue
# Add each found IP to the BadIP list
BadIPs.append(IP)
#tidy up for CSV format
data = str(BadIPs)
data = data.replace('[', '')
data = data.replace(']', '')
data = data.replace("'", "")
# Write IPs to a file
outfile.write(data)
infile.close
outfile.close
I thinks you want to have a look at glob.glob: https://docs.python.org/2/library/glob.html
This will return a list of files matching a given pattern.
then you can do something like
import re, glob
def do_something_with(f):
# Open Source File
infile = open(f, 'r')
# Open output file
outfile = open('ExtractedIPs.csv', 'wa') ## ADDED a to append
# Create a list
BadIPs = []
### rest of you code
.
.
outfile.write(data)
infile.close
outfile.close
for f in glob.glob("*.xml"):
do_something_with(f)
assuming that you want to add all outputs to the same file this would be the script:
#!usr/bin/python
import glob
import re
for infileName in glob.glob("*.xml"):
# Open Source File
infile = open(infileName, 'r')
# Append to file
outfile = open('ExtractedIPs.csv', 'a')
# Create a list
BadIPs = []
#search each line in doc
for line in infile:
# ignore empty lines
if line.isspace(): continue
# find IP that are Indicator Titles
IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
# Only take finds
if not IP: continue
# Add each found IP to the BadIP list
BadIPs.append(IP)
#tidy up for CSV format
data = str(BadIPs)
data = data.replace('[', '')
data = data.replace(']', '')
data = data.replace("'", "")
# Write IPs to a file
outfile.write(data)
infile.close
outfile.close
You could get a list of all XML files like this.
filenames = [nm for nm in os.listdir() if nm.endswith('.xml')]
And then you iterate over all the files.
for fn in filenames:
with open(fn) as infile:
for ln in infile:
# do your thing
The with-statement makes sure that the file is closed after you're done with it.
import sys
Make a function out of your current code, for examle def extract(filename).
Call the script with all filenames: python myscript.py file1 file2 file3
Inside your script, loop over the filenames for filename in sys.argv[1:]:.
Call the function inside the loop: extract(filename).
I had a need to do this, and also to go into subdirectories as well. You need to import os and os.path, then can use a function like this:
def recursive_glob(rootdir='.', suffix=()):
""" recursively traverses full path from route, returns
paths and file names for files with suffix in tuple """
pathlist = []
filelist = []
for looproot,dirnames, filenames in os.walk(rootdir):
for filename in filenames:
if filename.endswith(suffix):
pathlist.append(os.path.join(looproot, filename))
filelist.append(filename)
return pathlist, filelist
You pass the function the top level directory you want to start from and the suffix for the file type you are looking for. This was written and tested for Windows, but I believe it will work on other OS's as well, as long as you've got file extensions to work from.
You could just use os.listdir() if all files in your current folder are relevant. If not, say all the .xml files, then use glob.glob("*.xml"). But the overall program can be improved, roughly as follows.
#import modules
import re
pat = re.compile(reg) # reg is your regex
with open("out.csv", "w") as fw:
writer = csv.writer(fw)
for f in os.listdir(): # or glob.glob("*.xml")
with open(f) as fr:
lines = (line for line in fr if line.isspace())
# genex for all ip in that file
ips = (ip for line in lines for ip in pat.findall(line))
writer.writerow(ips)
You probably have to change it to suit to exact needs. But the idea is in this version there are a lot less side effects, lot less memory consumption and close is managed by the context manager. Please comment if doesn't work.
I have written a program in python that does the following:
write an initial header in a new file
merge the files in the new file(ie append file to the new file, I want all my log files to be put together)
finally convert the space seperated file to csv.
What I do is mention the output directory where my file should be, and also a filelist,which contains the location of each file that should be merged, it looks like this:
/Users/ra/Documents/Dryad01/meow.log
/Users/ra/Documents/Dryad01/meow1.log
Then I do python program.py path_to_list_file output_dir
Here is my program :
import csv
def main():
parser = argparse.ArgumentParser()
parser.add_argument("filelist", help="Format: Value File in each line")
parser.add_argument("output_dir", help="output directory")
args = parser.parse_args()
# write header
fout = open(args.output_dir+"merged.txt","a")
fout.write("timestamp type response_time")
#from each file get the data and put it in fout/merge
with open(args.filelist) as f:
for file in f:
file_read = open(file)
for line in file_read:
fout.write(line)
fout.close()
#now all file in filelist have been merged
#next make them into csv files
make_csv(args.output_dir+"merged.txt",args.output_dir+"merged_csv.csv")
def make_csv(file1,file2):
with open(file1) as fin, open(file2, 'w') as fout:
o=csv.writer(fout)
for line in fin:
o.writerow(line.split())
But for some reason I get no error, no warning,but just no file!
What do you think is wrong?
Did you maybe forget the following main pattern at the end of your source file?
if __name__ == '__main__':
main()
If so, your main() function will never be called.