Duplicating PDF file by variable

Duplicating PDF file by variable - python

I'm working on a project and I've ran into a brick wall.
We have a text file which has a numeric value and pdf file name. If the value is 6 we need to print 6 copies of the PDF. I was thinking of creating X copies of the PDF per line then combining them after. I'm not sure this is the most efficient way to do it but was wondering if anyone else has another idea.
DATA
1,PDF1
2,PDF5
7,PDF2
923,PDF33

You should be using the python CSV module to read in your data into two variables, numCopies and filePath https://docs.python.org/2/library/csv.html
You can then just use
for i in range(1, numCopies):
shutil.copyfile(filePath, newFilePath)
or something along those lines.
If you want to physically work with the PDF files I'd recommend the pyPdf module.

Related

Looping through an xlsx file

I have two questions regarding reading data from a file in .xlsx format.
Is it possible to convert an .xlsx file to .csv without actually opening the file in pandas or using xlrd? Because when I have to open many files this is quite slow and I was trying to speed it up.
Is it possible to use some sort of for loop to loop through decoded xlsx lines? I put an example below.
xlsx_file = 'some_file.xlsx'
with open(xlsx_file) as lines:
for line in lines:
<do something like I would do for a normal string>
I would like to know if this is possible without the well known xlrd module.

how to compare two csv files and write 1 or 0 in a new file with python

I am very new to python and I would like to request some question.
I want to compare 2 csv files. The source file with all attribute values in a comma separated form.
source csv file contains like this
advapi32.dll,comctl32.dll,comdlg32.dll,gdi32.dll,gdiplus.dll,hal.dll,imagehlp.dll,kernel32.dll,mpr.dll,mscoree.dll,msi.dll,msvcrt.dll,mswsock.dll,ndis.sys,netapi32.dll,ntdll.dll,ntoskrnl.exe,ole32.dll,oleaut32.dll,oledlg.dll,opengl32.dll,psapi.dll,rpcrt4.dll,setupapi.dll,shell32.dll,shlwapi.dll,tapi32.dll,ucc12.dll,user32,user32.dll,wininet.dll,winmm.dll,winspool.drv,ws2_32.dll
the second csv file is
advapi32.dll,gdi32.dll,imagehlp.dll,kernel32.dll,msvcrt.dll,mswsock.dll,ntdll.dll,ole32.dll,oleaut32.dll
I would like to write 1 if the value of second file contains in source, otherwise 0 to a new file.
Example of output csv file is:
advapi32.dll,comctl32.dll,comdlg32.dll,gdi32.dll,gdiplus.dll,hal.dll,imagehlp.dll,kernel32.dll,mpr.dll,mscoree.dll,msi.dll,msvcrt.dll,mswsock.dll,ndis.sys,netapi32.dll,ntdll.dll,ntoskrnl.exe,ole32.dll,oleaut32.dll,oledlg.dll,opengl32.dll,psapi.dll,rpcrt4.dll,setupapi.dll,shell32.dll,shlwapi.dll,tapi32.dll,ucc12.dll,user32,user32.dll,wininet.dll,winmm.dll,winspool.drv,ws2_32.dll
1,0,0,1,0,0,1,1,0,0,0,1,1,0,0,1,0,1,1,0,0,0,0,0,1,1,0,0,0,1,1,0,0,1
Can someone help me please because I am a very new to python programming.

If I understand you correctly then your shorter dll list is some sort of "inventory" (aka possible) and you want to go through the longer list to see if it is in the shorter list. If this is correct then something like this would do it (using split and in as the comments suggest):
possible = "advapi32.dll,gdi32.dll,imagehlp.dll,kernel32.dll,msvcrt.dll,mswsock.dll,ntdll.dll,ole32.dll,oleaut32.dll"
test = "advapi32.dll,comctl32.dll,comdlg32.dll,gdi32.dll,gdiplus.dll,hal.dll,imagehlp.dll,kernel32.dll,mpr.dll,mscoree.dll,msi.dll,msvcrt.dll,mswsock.dll,ndis.sys,netapi32.dll,ntdll.dll,ntoskrnl.exe,ole32.dll,oleaut32.dll,oledlg.dll,opengl32.dll,psapi.dll,rpcrt4.dll,setupapi.dll,shell32.dll,shlwapi.dll,tapi32.dll,ucc12.dll,user32,user32.dll,wininet.dll,winmm.dll,winspool.drv,ws2_32.dll"
possible_list = possible.split(",")
does_match = []
for t in test.split(","):
if t in possible_list:
does_match.append("1")
else:
does_match.append("0")
print(",".join(does_match))
This is just the basic idea how you would approach it. What I leave you as an exercise is to read the data from the file and to write it back, but for this you find plenty of answers on stackoverflow.

Reading extracted XLSX files with OpenPyXL

So I've been using Python 3.2, and OpenPyXL's iterable workbook as demonstrated here in the "Optimized Reader" example.
My problem arises when I try to use this strategy to read a file or files that I've extracted from a simple .zip archive (both manually and through the python zipfile package). When I call .get_highest_column() I get "A" and .get_highest_row() I get 1, and when asked to print each cell's value as shown here:
wb = load_workbook(filename = file_name, use_iterators = True)
ws = wb.worksheets[0] # Only need to read the first sheet, nothing fancy
for row in ws.iter_rows():
for entry in row:
print(entry.internal_value)
It prints the values in A1, A2, A3, A4, A5, A6, and A7, regardless of how large the file actually is. There isn't any reason for this in the file itself, and it will open in Excel perfectly fine. I'm quite stumped as to why it does it like this, but I assume that the unzipped XLSX is formatted differently prior to being saved from within Excel, and OpenPyXL cannot interpret it correctly. I even renamed the '.xlsx' to '.zip' so that I could explore the file and examine the differences, but couldn't tell much except that the one saved from Excel also has a subfolder called "theme" within the "xl" folder that the previous version does not, with font and formatting data.
IMPORTANT NOTE: When I open it and re-save it with the same filename from within Excel and then run this bit of code, it works perfectly - returns correct greatest row and column values, and correctly prints every cell value. I've tried instead saving the workbook through OpenPyXL immediately after opening it, but this yields the same erroneous results.
Basically, I need to discover a method to properly extract a .xlsx file from a .zip file so that it can be read with OpenPyXL. There are many many files that need to be processed like this, so it must be external to Excel, and hopefully as efficient as possible.
Cheers!

It sounds like this has nothing to do with the extraction from the zipfile, as the problem also occurs if you manually extract the files.
I would try to store the files opened and saved with Excel in a zipfile and see what happens. If that works, then clearly the way the original .xlsx files were generated is the problem.
I strongly suspect that to be the case.
If that is the problem, see if you can extract the .xlsx files (they are zipfiles themselves) and compare the one you re-saved with Excel to the original problematic one. xml does not compare easily as Excel can rearrange most things at will, but you might be able to do a diff.

Python - Moving entire text between two .doc files

I have been having this issue for a while and cannot figure how should I start to do this with python. My OS is windows xp pro. I need the script that moves entire (100% of the text) text from one .doc file to another. But its not so easy as it sounds. The target .doc file is not the only one but can be many of them. All the target .doc files are always in the same folder (same path) but all of them don't have the same name. The .doc file FROM where I want to move entire text is only one, always in the same folder (same path) and always with the same file name.
Names of the target are only similar but as I have said before, not the same. Here is the point of whole script:
Target .doc files have the names:
HD1.doc HD2.doc HD3.doc HD4.doc
and so on
What I would like to have is moved the entire (but really all of the text, must be 100% all) text into the .doc file with the highest ( ! ) number. The target .doc files will always start with ''HD'' and always be similar to above examples.
It is possible that the doc file (target file) is only one, so only HD1.doc. Therefore ''1'' is the maximum number and the text is moved into this file.
Sometimes the target file is empty but usually won't be. If it won't be then the text should be moved to the end of the text, into first new line (no empty lines inbetween).
So for example in the target file which has the maximum number in its name is the following text:
a
b
c
In the file from which I want to move the text is:
d
This means I need in the target file this:
a
b
c
d
But no empty lines anywhere.
I have found (showing three different codes):
http://paste.pocoo.org/show/169309/
But neither of them make any sense to me. I know I would need to begin with finding the correct target file (correct HDX file where X is the highest number - again all HD files are and will be in the same folder) but no idea how to do this.
I meant microsoft office word .doc files. They have "pure text". What I mean with pure text is that Im also able to see them in notepad (.txt). But I need to work with .doc extensions. Python is because I need this as automated system, so I wouldn't even need to open any file. Why exsactly python and not any other programming language? The reason for this is because recently I have started learning python and need this script for my work - Python is the "only" programming language that Im interested for and thats why I would like to make this script with it. By "really 100%" I meant that entire text (everything in source file - every single line, no matter if there are 2 or several thousands) would be moved to correct (which one is correct is described in my first post) target file. I cannot move the whole file because I need to move entire text (everything gathered - source file will be always the same but contest of text will be always different - different words in lines) and not whole file because I need the text in correct .doc file with correct name and together (with "together" i mean inside the same file) with already exsisting text IF is there anything already in the target file. Because its possible that the correct target file is empty also.
If someone could suggest me anything, I would really appreciate it.
Thank you, best wishes.
I have tried to ask on openoffice forum but they don't answer. Seen the code could be something like this:
from time import sleep
import win32com.client
from win32com.client import Dispatch
wordApp = win32com.client.Dispatch('Word.Application')
wordApp.Visible=False
wordApp.Documents.Open('C:\\test.doc')
sleep(5)
HD1 = wordApp.Documents.Open('C:\\test.doc') #HD1 word document as object.
HD1.Content.Select.Copy() #Selects entire document and copies it. `
But I have no idea what does that mean. Also I cannot use the .doc file like that because I never know what is the correct filename (HDX.doc where X is maximum integer number, all HD are in same directory path) of the file and therefore I cannot use its name - the script should find the correct file. Also ''filename'' = wordApp.Documents.open... would for sure give me syntax error. :-(

Openoffice ships with full python scripting support, have a look: http://wiki.services.openoffice.org/wiki/Python
Might be easier than trying to mess around with MS Word and COM apis.

So you want to take the text from a doc file, and append it to the end of the text in another doc file. And the problem here is that's MS Word files. It's a proprietary format, and as far as I know there is not module to access them from Python.
But if you are on Windows, you can access them via the COM API, but that's pretty complicated. But look into that. Otehrwise I recommend you to not us MS Word files. The above sounds like some sort of logging facility, and it sounds like a bad idea to use Word files for this, it's too fragile.

Get the inputs from Excel and use those inputs in python script

How to get the inputs from excel and use those inputs in python.

Take a look at xlrd
This is the best reference I found for learning how to use it: http://www.dev-explorer.com/articles/excel-spreadsheets-and-python

Not sure if this is exactly what you're talking about, but:
If you have a very simple excel file (i.e. basically just one table filled with string-values, nothing fancy), and all you want to do is basic processing, then I'd suggest just converting it to a csv (comma-seperated value file). This can be done by "saving as..." in excel and selecting csv.
This is just a file with the same data as the excel, except represented by lines seperated with commas:
cell A:1, cell A:2, cell A:3
cell B:1, cell B:2, cell b:3
This is then very easy to parse using standard python functions (i.e., readlines to get each line of the file, then it's just a list that you can split on ",").
This if of course only helpful in some situations, like when you get a log from a program and want to quickly run a python script which handles it.
Note: As was pointed out in the comments, splitting the string on "," is actually not very good, since you run into all sorts of problems. Better to use the csv module (which another answer here teaches how to use).

import win32com
Excel=win32com.client.Dispatch("Excel.Application")
Excel.Workbooks.Open(file path)
Cells=Excel.ActiveWorkBook.ActiveSheet.Cells
Cells(row,column).Value=Input
Output=Cells(row,column).Value

If you can save as a csv file with headers:
Attrib1, Attrib2, Attrib3
value1.1, value1.2, value1.3
value2,1,...
Then I would highly recommend looking at built-in the csv module
With that you can do things like:
csvFile = csv.DictReader(open("csvFile.csv", "r"))
for row in csvFile:
print row['Attrib1'], row['Attrib2']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.