Parse Text files and save data against some headings in database - python

I'm working on a python(3.6) project in which I need to parse a text file,
My specific problem is:
My text file has some headings like Examples, Input, Output, Explanations, Notes I need to parse this file and take input & Output headings after Examples headings and write them at end of this file mean after Notes heading and save the file.
Here's I'm trying to get the next line of Examples line:
count = 0
for root, dirs, files in os.walk(os.path.join('description2code_current')):
for file in files:
if file.endswith('description.txt'):
old_name = os.path.join(os.path.abspath(root), file)
print('File Path ````` is: ', old_name)
rf = open(old_name, "r")
lines = rf.readline()
# print(lines)
for line in lines:
line = line.strip()
if line == 'Examples':
print(line)
How can I make that change into my file and then parse it again to save data against these headings into my database?
Help me, please!
Thanks in advance!

Related

Python adding a string leaves extra characters

If you need any more info just Let Me Know
I have a python script that adds a string after each line on a CSV file. the line file_lines = [''.join([x.strip(), string_to_add, '\n']) for x in f.readlines()] is the trouble maker. For each file line it will add the string and then add a new line after each time the string is added.
Here is the script:
#Adding .JPG string to the end of each line for the Part Numbers
string_to_add = ".JPG"
#Open the file and join the .JPG to the current lines
with open("PartNums.csv", 'r') as f:
file_lines = [''.join([x.strip(), string_to_add, '\n']) for x in f.readlines()]
#Writes to the file until its done
with open("PartNums.csv", 'w') as f:
f.writelines(file_lines)
The script works and does what it is supposed to, however my issue is later on in this larger script. This script outputs into a CSV file and it looks like this:
X00TB0001.JPG
X01BJ0003.JPG
X01BJ0004.JPG
X01BJ0005.JPG
X01BJ0006.JPG
X01BJ0007.JPG
X01BJ0008.JPG
X01BJ0026.JPG
X01BJ0038.JPG
X01BJ0039.JPG
X01BJ0040.JPG
X01BJ0041.JPG
...
X01BJ0050.JPG
X01BJ0058.JPG
X01BJ0059.JPG
X01BJ0060.JPG
X01BJ0061.JPG
X01BJ0170.JPG
X01BJ0178.JPG
Without the \n in that line the csv file output looks like this file_lines = [''.join([x.strip(), string_to_add]) for x in f.readlines()]:
X00TB0001.JPGX01BJ0003.JPGX01BJ0004.JPGX01BJ0005.JPGX01BJ0006.JPG
The issue is when I go to read this file later and move files with it using this script:
#If the string matches a file name move it to a new directory
dst = r"xxx"
with open('PicsWeHave.txt') as my_file:
for filename in my_file:
src = os.path.join(XXX") # .strip() to avoid un-wanted white spaces
#shutil.copy(src, os.path.join(dst, filename.strip()))
shutil.copy(os.path.join(src, filename), os.path.join(dst, filename))
When I run this whole Script it works until it has to move the files I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'XXX\\X15SL0447.JPG\n'
I know the file exist however the '\n' should not be there and that's why I am asking how can I still get everything on a new line and not have \n after each name so when I move the file the strings match.
Thank You For Your Help!
As they said above you should use .strip():
shutil.copy(os.path.join(src, filename.strip()), os.path.join(dst, filename.strip()))
This way it gives you the file name or string you need and then it removes anything else.

I need to extract this uid from a .sgm file

I need to extract the uid from a .sgm file, I tried the below code but it doesn't, work can anybody help?
Sample .sgm file content:
<miscdoc n='1863099' uid='0001863099_20220120' type='seccomlett' t='frm' mdy='01/20/2022'><rname>Kimbell Tiger Acquisition Corp, 01/20/2022</rname>
<table col='2' type='txt'>
<colspec col='1' colwidth='*'>
<colspec col='2' colwidth='2*'>
<tname>Meta-data</tname>
<tbody>
<row><entry>SEC-HEADER</entry><entry>0001104659-22-005920.hdr.sgml : 20220304</entry></row>
<row><entry>ACCEPTANCE-DATETIME</entry><entry>20220120160231</entry></row>
<row><entry>PRIVATE-TO-PUBLIC</entry></row>
<row><entry>ACCESSION-NUMBER</entry><entry>0001104659-22-005920</entry></row>
<row><entry>TYPE</entry><entry>CORRESP</entry></row>
<row><entry>PUBLIC-DOCUMENT-COUNT</entry><entry>1</entry></row>
<row><entry>FILING-DATE</entry><entry>20220120</entry></row>
<row><entry>FILER</entry></row>
code I tried:
import os
# Folder Path
path = "Enter Folder Path"
# Change the directory
os.chdir(path)
# Read text File
def read_file(file_path):
with open(file_path, 'r') as f:
print(f.read())
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".sgm"):
if 'uid' in file:
print("true")
file_path = f"{path}\{file}"
# call read text file function
read_file(file_path)
I need extract the uid value from the above sgm file, is there any other way I could do this? what should I change in my code?
SGM format may just by an XML superset. If it isn't then for this particular case (and if one could rely on the format being as shown in the question) then:
import re
def get_uid(filename):
with open(filename) as infile:
for line in map(str.strip, infile):
if line.startswith('<miscdoc'):
if uid := re.findall("uid='(.*?)'", line):
return uid[0]

How to modify the contents of an .SVG file with Python?

I have a .svg file with example contents: <svg style="fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.001" /></svg>
I now want to use Python to directly edit the .svg file and change
style="fill:#000000; to my desired color and save it., but I am not sure how to go about this, I have tried a lot of libraries but none do what I
need.
Try this: https://pythonexamples.org/python-replace-string-in-file/
#read input file
fin = open("data.svg", "rt")
#read file contents to string
data = fin.read()
#replace all occurrences of the required string
data = data.replace('style="fill:#000000;', 'style="fill:#FF0000;')
#close the input file
fin.close()
#open the input file in write mode
fin = open("data.svg", "wt")
#overrite the input file with the resulting data
fin.write(data)
#close the file
fin.close()

listing text files names in a new text file for reading links. getting desire result in print command but trying to save it in file or in list nope

i was trying to extract links from many text files for that i tried to make a name list of text files in separate file to store this names in list . but when i tried to get output as print it give me my desire output but when tried to store it in list or a new file it is showing encoded or not readable format .
i marked my desire output in green and undesired output in redcircle.
Can you guide me to achieve my output according to desire
docs = []
with open('files.txt','r') as f:
content = f.readlines()
for doc in content:
if '-' in doc:
print(doc[101:])
docs.append(doc[101:])
#print(doc[101:])
#print(type(doc))
print(docs)
This snippet might help you in fetching file names from the txt file
docs = []
with open('files.txt', 'r') as f:
content = f.readlines()
for doc in content:
file_name = doc.rstrip().split(" ")[-1]
docs.append(file_name)
print(docs)

python printing a blank line on the first line when writing to a file

I'm stuck on why my code is printing a blank line before writing text to a file. What I am doing is reading two files from a zipped folder and writing the text to a new text file. I am getting the expected results in the file, except for the fact that there is a blank line on the first line of the file.
def test():
if zipfile.is_zipfile(r'C:\Users\test\Desktop\Zip_file.zip'):
zf = zipfile.ZipFile(r'C:\Users\test\Desktop\Zip_file.zip')
for filename in zf.namelist():
with zf.open(filename, 'r') as f:
words = io.TextIOWrapper(f)
new_file = io.open(r'C:\Users\test\Desktop\new_file.txt', 'a')
for line in words:
new_file.write(line)
new_file.write('\n')
else:
pass
zf.close()
words.close()
f.close()
new_file.close()
Output in new_file (there is a blank line before the first "This is a test line...")
This is a test line...
This is a test line...
this is test #2
this is test #2
Any ideas?
Thanks!
My guess is that the first file in zf.namelist() doesn't contain anything, so you skip the for line in words loop for that file and just do new_file.write('\n'). It's difficult to tell without seeing the files that you're looping over; perhaps add some debug statements that print out the files' names and some info, e.g. their size.

Categories