I have a task where i have to combined two files numbers1 and numbers2 and sort them into acsending order - python

I have the following tasks:
Create two text files called numbers1.txt and numbers2.txt will numbers from smallest to largest
Create a text file called all_numbers and sort both files in this txt file
Use the define function to merge the text files
def merge (numbers1,numbers2,all_numbers):
#Open the two text files that were created
numbers1 = open(numbers1)
numbers2 = open(numbers2)
# Open the third txt file where all the files will be merged into one
all_numbers = open(all_numbers, "w")
# Read txt file 1 and 2 contents and store them as file content
numbers1_storage = numbers1.read()
numbers2_storge = numbers2.read()
#write the contents of txt file 1 and 2 to the new txt file
all_numbers.write(numbers1_storage)
all_numbers.write(numbers2_storge)
#close all there files
numbers1.close()
numbers2.close()
all_numbers.close()
#use the sort function to sort contents in ascending order
#open a file containing sorted numbers
#HOW WOULD I USE THE SORT FUNCTION
merge("numbers1.txt","numbers2.txt", "all_numbers.txt" )

Related

Creating and then modifying pdf file in python

I am writing some code that merges some pdfs from their file paths and then writes some text on each page of the merged document. My problem is this: I can do both things separately - merge pdfs and write text to a pdf - I just cant seem to do it all in one go.
My code is below - the pdfs are merged together from their file paths contained in an excel workbook, they are then saved as a single pdf with file name obtained from the workbook (this will change depending on what pdfs are merged so it needs to be dynamic) and I am then attempting to write text (a question number) to this merged pdf.
I keep getting error "cannot save with zero pages" and not sure why this is so as I can saved the merged file, and I can write the desired text to any other pdf with function I made if I pass the document file path into it. Any ideas on how I can merge these pdfs into a single file, then edit it with the inserted text and save it with the chosen file name from the excel doc? Hopefully you get what I mean!
from pypdf import PdfMerger
def insert_qu_numbers(document):
qu_numbers = fitz.open(document)
counter = 0
for page in qu_numbers:
page.clean_contents()
counter += 1
text = f"Question {counter}"
text_length = fitz.get_text_length(text, fontname= "times-roman")
print(text_length)
rect_x0 = 70
rect_y0 = 50
rect_x1 = rect_x0 + text_length + 35
rect_y1 = rect_y0 + 40
rect = fitz.Rect(rect_x0, rect_y0, rect_x1, rect_y1)
page.insert_textbox(rect, text, fontsize = 16, fontname = "times-roman", align = 0)
qu_numbers.write()
# opens the workbook and gets the file paths.
wbxl = xw.Book('demo.xlsm')
get_links = wbxl.sheets['Sheet1'].range('C2:C5').value
# gets rid of any blank cells in range and makes a list of all the file paths called
filenames
filenames = []
for file in get_links:
if file is not None:
filenames.append(file)
# gets each file path from filename list and adds it to merged pdf where it will be
merged
merged_pdf = PdfMerger()
for i in range(len(filenames)):
merged_pdf.append(filenames[i], 'rb')
# merges separate file paths into one pdf and names it the output name in the given
cell
output_name = wbxl.sheets['Sheet1'].range('C7').value
final = merged_pdf.write(output_name + ".pdf")
insert_qu_numbers(final)
You can use PyMuPDF for merging and modifcation as well:
# filelist = list of files to merge
doc = fitz.open() # the output to receive the merged PDFs
for file in filelist:
src = fitz.open(file)
doc.insert_pdf(src) # append input file
src.close()
for page in doc: # now iterate through the pages of the result
page.insert_text(...blahblah ...) # insert text or whatever was on your mind
doc.ez_save("output.pdf")

Creating a list with words counted from multiple .docx files

I'm trying to do a project where I automate my invoices for translation jobs. Basically the script reads multiple .docx files in a folder, counts words for every separate file, then writes those filenames and the corresponding word counts into Excel file.
I've created a word counter script, but can't figure out how to add the counted words to a list to later use this list to extract values from it for my Excel file, and create an invoice.
Here is my code:
import docx
import os
import re
from docx import Document
#Folder to work with
folder = r'D:/Tulk_1'
files = os.listdir(folder)
#List with the names of files and word counts for each file
list_files = []
list_words = []
for file in files:
#Getting the absolute location
location = folder + '/' + file
#Adding filenames to the list
list_files.append(file)
#Word counter
document = docx.Document(location)
newparatextlist = []
for paratext in document.paragraphs:
newparatextlist.append(paratext.text)
#Printing file names
print(file)
#Printing word counts for each file
print(len(re.findall(r'\w+', '\n'.join(newparatextlist))))
Output:
cold calls.docx
2950
Kristības.docx
1068
Tulkojums starpniecības līgums.docx
946
Tulkojums_PL_ULIHA_39_41_4 (1).docx
788
Unfortunately I copied the counter part from the web and the last line is too complicated for me:
print(len(re.findall(r'\w+', '\n'.join(newparatextlist))))
So I don't know how to extract the results out of it into a list.
When I try to store the last line into a variable like this:
x = len(re.findall(r'\w+', '\n'.join(newparatextlist)))
The output is only word count for one of the files:
cold calls.docx
Kristības.docx
Tulkojums starpniecības līgums.docx
Tulkojums_PL_ULIHA_39_41_4 (1).docx
788
Maybe you could help me to break the last line into smaller steps? Or perhaps there are easier solutions to my task?
EDIT:
The desired output for the:
print(list_words)
should be:
[2950, 1068, 946, 788]
Similar as it already is for file names:
print(list_files)
output:
['cold calls.docx', 'Kristības.docx', 'Tulkojums starpniecības līgums.docx', 'Tulkojums_PL_ULIHA_39_41_4 (1).docx']

comparing a txt file with csv file with python

I am working on developing a python code that would compare a txt file and a csv file and find out if there are identical or not.If not,find the errors and summarize them on a excel table.
def main():
filename1=input("Enter txt file name:- ")
filename2=input("Enter csv file name:- ")
fp1=open(filename1,"r")
fp2=open(filename2,"r")
list1=[]
list2=[]
for line in fp1: #iterating through each line in the file
a=line.split(",") #splitting line based on comma
for i in a: #iterating through each element in list
i=i.rstrip() #removing new line from elements in list
list1.append(i) #appending element in the list
for line in fp2:
a=line.split(",")
for i in a:
i=i.rstrip()
list2.append(i)
fp2=open("res.csv","a") #opening res file in append mode
flag=0
for i in range(0,len(list1)): #iterating through lists
if (i==len(list1)-1 and list1[i]!=list2[i]): #if total is different in both files
fp2.write(list1[i-1]+","+str(abs(int(list1[i])-int(list2[i])))) #printing difference
flag=1
elif (list1[i]!=list2[i]): #if other lines are different
fp2.write(list1[i-1]+","+list1[i]+","+list2[i-1]+","+list2[i]+"\n") #printing different lines
flag=1
if (flag==0): #if there is no difference
fp2.write("none")
main() #calling main function
The output should be an excel table with a summary of the differences between the 2 files.The above code gives the difference in numbers between the files but if the number of lines is different, the output should also print the lines that are not in 1 file. I would appreciate any ideas to improve this code and help creating the code to compare a txt and csv file with an output of the difference on a excel spreadsheet.
Thank you.
* I am still new here so please let me know if I need to edit something or make a part of my question more clear.
If both files contains the same type of data, I'd recommend to read both .csv and .txt data to pandas data table (after transformation).
After the reading you can easily operate with columns and rows of both datasets, i.e. find the difference between the two tables, and output this difference to any format you want.

Python create dict from CSV and use the file name as key

I have a simple CSV text file called "allMaps.txt". It contains the following:
cat, dog, fish
How can I take the file name "allMaps" and use it as a key, along with the contents being values?
I wish to achieve this format:
{"allMaps": "cat", "dog", "fish"}
I have a range of txt files all containing values separated by commas so a more dynamic method that does it for all would be beneficial!
The other txt files are:
allMaps.txt
fiveMaps.txt
tenMaps.txt
sevenMaps.txt
They all contain comma separated values. Is there a way to look into the folder and convert each one on the text files into a key-value dict?
Assuming you have the file names in a list.
files = ["allMaps.txt", "fiveMaps.txt", "tenMaps.txt", "sevenMaps.txt"]
You can do the following:
my_dict = {}
for file in files:
with open(file) as f:
items = [i.strip() for i in f.read().split(",")]
my_dict[file.replace(".txt", "")] = items
If the files are all in the same folder, you could do the following instead of maintaining a list of files:
import os
files = os.listdir("<folder>")
Given the file names, you can create a dictionary where the key stores the filenames with a corresponding value of a list of the file data:
files = ['allMaps.txt', 'fiveMaps.txt', 'tenMaps.txt', 'sevenMaps.txt']
final_results = {i:[b.strip('\n').split(', ') for b in open(i)] for i in files}

How would I read and write from multiple files in a single directory? Python

I am writing a Python code and would like some more insight on how to approach this issue.
I am trying to read in multiple files in order that end with .log. With this, I hope to write specific values to a .csv file.
Within the text file, there are X/Y values that are extracted below:
Textfile.log:
X/Y = 5
X/Y = 6
Textfile.log.2:
X/Y = 7
X/Y = 8
DesiredOutput in the CSV file:
5
6
7
8
Here is the code I've come up with so far:
def readfile():
import os
i = 0
for file in os.listdir("\mydir"):
if file.endswith(".log"):
return file
def main ():
import re
list = []
list = readfile()
for line in readfile():
x = re.search(r'(?<=X/Y = )\d+', line)
if x:
list.append(x.group())
else:
break
f = csv.write(open(output, "wb"))
while 1:
if (i>len(list-1)):
break
else:
f.writerow(list(i))
i += 1
if __name__ == '__main__':
main()
I'm confused on how to make it read the .log file, then the .log.2 file.
Is it possible to just have it automatically read all the files in 1 directory without typing them in individually?
Update: I'm using Windows 7 and Python V2.7
The simplest way to read files sequentially is to build a list and then loop over it. Something like:
for fname in list_of_files:
with open(fname, 'r') as f:
#Do all the stuff you do to each file
This way whatever you do to read each file will be repeated and applied to every file in list_of_files. Since lists are ordered, it will occur in the same order as the list is sorted to.
Borrowing from #The2ndSon's answer, you can pick up the files with os.listdir(dir). This will simply list all files and directories within dir in an arbitrary order. From this you can pull out and order all of your files like this:
allFiles = os.listdir(some_dir)
logFiles = [fname for fname in allFiles if "log" in fname.split('.')]
logFiles.sort(key = lambda x: x.split('.')[-1])
logFiles[0], logFiles[-1] = logFiles[-1], logFiles[0]
The above code will work with files name like "somename.log", "somename.log.2" and so on. You can then take logFiles and plug it in as list_of_files. Note that the last line is only necessary if the first file is "somename.log" instead of "somename.log.1". If the first file has a number on the end, just exclude the last step
Line By Line Explanation:
allFiles = os.listdir(some_dir)
This line takes all files and directories within some_dir and returns them as a list
logFiles = [fname for fname in allFiles if "log" in fname.split('.')]
Perform a list comprehension to gather all of the files with log in the name as part of the extension. "something.log.somethingelse" will be included, "log_something.somethingelse" will not.
logFiles.sort(key = lambda x: x.split('.')[-1])
Sort the list of log files in place by the last extension. x.split('.')[-1] splits the file name into a list of period delimited values and takes the last entry. If the name is "name.log.5", it will be sorted as "5". If the name is "name.log", it will be sorted as "log".
logFiles[0], logFiles[-1] = logFiles[-1], logFiles[0]
Swap the first and last entries of the list of log files. This is necessary because the sorting operation will put "name.log" as the last entry and "nane.log.1" as the first.
If you change the naming scheme for your log files you can easily return of list of files that have the ".log" extension. For example if you change the file names to Textfile1.log and Textfile2.log you can update readfile() to be:
import os
def readfile():
my_list = []
for file in os.listdir("."):
if file.endswith(".log"):
my_list.append(file)
print my_list will return ['Textfile1.log', 'Textfile2.log']. Using the word 'list' as a variable is generally avoided, as it is also used to for an object in python.

Categories