Trying to attach files PDF one by one

Trying to attach files PDF one by one - python

I have some files that i need send to some companies and i'm trying send one by one, but for some reason he is attaching all the files together. How can i fix that?
import os
import pandas as pd
import win32com.client as win32
tabela = pd.read_excel('SepEmpresa.xlsx')
olApp = win32.Dispatch('Outlook.Application')
olNS = olApp.GetNameSpace('MAPI')
caminho = 'C:\ArquivosPDF\OK'
for i, Vencidos in enumerate(table['Vencidos']):
if Vencidos == 'OK':
email = tabela.loc[i, 'Email']
mailItem = olApp.CreateItem(0)
mailItem.Display()
mailItem.Subject = 'Expired'
mailItem.Body = 'anything'
mailItem.To = email
path = "C:/ArquivosPDF/OK"
dirs = os.listdir(path)
for files in dirs:
mailItem.Attachments.Add(os.path.join(os.getcwd(), caminho, files))

Assuming that your list of companies is contained in table['Vencidos'], then for each company in that table, you are looping over all files in the path specified (path = "C:/ArquivosPDF/OK").
In other words, this:
for files in dirs:
mailItem.Attachments.Add(os.path.join(os.getcwd(), caminho, files))
happens for all companies where table['Vencidos']=="OK". It sounds like that's not your desired behaviour.
If there is a single document contained in path that you want to attach for a particular company, then you'll have to find a way to associate that file for that company.
In pseudo-code:
for i, Vencidos in enumerate(table['Vencidos']):
# set up email/mailItem objects
# get the specific file attachment for that company
file_to_attach = os.path.join(caminho, "file_for_company_A.pdf")
mailItem.Attachments.Add(file_to_attach)

Related

Python and NLP, extract key words and person name from multiple pdf files using NLP & Python

I hope to extract some information from multiple pdf files (e.g. xxx1.pdf, xxx2.pdf, xxx3.pdf...), the output dataframe including 4 fields: 1.file name, 2.the context of each pdf, 3.specific keywords, and 4.the Person's name related to the keyword.
I partly wrote some code to generate the first three fields, apart from the person's name, I assume it is more about NLP skills, e.g. Spacy, NLTK etc, which I am very new here. What I wish to generate is if the keyword ('Chair Man' in my case) be found in each pdf file, then search the closet Person's name to the key words (before or after), then put the name in the dataframe. I may sense the logic is to calculate the shortest distance between a Person name and the key word, but I could not make it, here is my part of code:
import pandas as pd
import numpy as np
from tkinter import filedialog
from pathlib import Path
import os
import PyPDF2
file_path = filedialog.askdirectory()
file_list = Path(file_path + '/').rglob('*.pdf')
file_names = os.listdir(file_path)
file_names = [file for file in file_names if file.endswith('.pdf')]
print(len(file_names))
file_df = []
context = []
for file in file_list:
print(file.stem)
file_name = file.stem
pdf = PyPDF2.PdfFileReader(file)
page_num = pdf.getNumPages()
text = ''
for i in range(0, page_num):
PgOb = pdf.getPage(i)
text += PgOb.extractText()
# print(text)
file_df.append(file_name)
context.append(text)
pdf_df = pd.DataFrame({'File': file_df, 'Context': context})
pdf_df['Chair_Man_Check'] = np.nan
pdf_df.loc[pdf_df['Context'].str.contains('Chair Man'), 'Chair_Man_Check'] = 'Chair Man - Yes'
print(pdf_df)
the ideal output is like this below:
File_Name Context Chair_Man_Check Name
xxx1.pdf file context blah blah... Chair Man -Yes James Volks
xxx2.pdf blah..... NaN NaN
xxx3.pdf blahhhhh Chair Man - Yes Tim Aloepman
Could you provide some help please? many thanks!

I want to move a file based on part of the name to a folder with that name

I have a directory with a large number of files that I want to move into folders based on part of the file name. My list of files looks like this:
001-020-012B-B.nc
001-022-151-A.nc
001-023-022-PY-T1.nc.nc
001-096-016B-A.nc
I want to move the files I have into separate folders based on the first part of the file name (001-096-016B, 001-023-022, 001-022-151). The first parts of the file name always have the same number of numbers and are always in 3 parts separated by an underscore '-'.
The folder names are named like this \oe-xxxx\xxxx\xxxx\001-Disc-PED\020-Rotor-parts-1200.
So for example, this file should be placed in the above folder, based on the folder name (the numbers):
001-020-012B-B.nc
File path divided into column to show where the above file has to be moved to:
(001)-Disc-PED\(020)-Rotor-parts-1200.
Therefore:
(001)-Disc-PED\(020)-Rotor-parts-1200 (001)-(020)-012B-B.nc
This is what I have tried from looking online but it does not work:
My thinking is I want to loop through the folders and look for matches.
import os
import glob
import itertools
import re
#Source file
sourcefile = r'C:\Users\cah\Desktop\000Turning'
destinationPath = r'C:\Users\cah\Desktop\08-CAM'
#Seperation
dirs = glob.glob('*-*')
#Every file with file extension .nc
files = glob.glob('*.nc')
for root, dirs, files in os.walk(sourcefile):
for file in files:
if file.endswith(".nc"):
first3Char = str(file[0:3])
last3Char = str(file[4:7])
for root in os.walk(destinationPath):
first33CharsOfRoot = str(root[0:33])
cleanRoot1 = str(root).replace("[", "")
cleanRoot2 = str(cleanRoot1).replace("]", "")
cleanRoot3 = str(cleanRoot2).replace(")", "")
cleanRoot4 = str(cleanRoot3).replace("'", "")
cleanRoot5 = str(cleanRoot4).replace(",", "")
firstCharOfRoot = re.findall(r'(.{3})\s*$', str(cleanRoot5))
print(firstCharOfRoot==first3Char)
if(firstCharOfRoot == first3Char):
print("Hello")
for root in os.walk(destinationPath):
print(os.path.basename(root))
# if(os.path)
I realized that I should not look for the last 3 chars in the path, because it is the first (001) etc. Numbers that I need to look for in the beginning to find the first path that I need to go to.
EDIT:
import os
import glob
import itertools
import re
#Source file
sourcefile = r'C:\Users\cah\Desktop\000Turning'
destinationPath = r'C:\Users\cah\Desktop\08-CAM'
#Seperation
dirs = glob.glob('*-*')
#Every file with file extension .nc
files = glob.glob('*.nc')
for root, dirs, files in os.walk(sourcefile):
for file in files:
if file.endswith(".nc"):
first3Char = str(file[0:3])
last3Char = str(file[4:7])
for root in os.walk(destinationPath):
cleanRoot1 = str(root).replace("[", "")
cleanRoot2 = str(cleanRoot1).replace("]", "")
cleanRoot3 = str(cleanRoot2).replace(")", "")
cleanRoot4 = str(cleanRoot3).replace("'", "")
cleanRoot5 = str(cleanRoot4).replace(",", "")
firstCharOfRoot = re.findall(r'^(?:[^\\]+\\\\){5}(\d+).*$', str(cleanRoot5))
secondCharOfRoot = re.findall(r'^(?:[^\\]+\\\\){6}(\d+).*$', str(cleanRoot5))
firstCharOfRootCleaned = ''.join(firstCharOfRoot)
secondCharOfRoot = ''.join(secondCharOfRoot)
cleanRoot6 = str(cleanRoot5).replace("(", "")
if(firstCharOfRootCleaned == str(first3Char) & secondCharOfRoot == str(last3Char)):
print("BINGOf")
# for root1 in os.walk(cleanRoot6):

Solution
There is an improved solution in the next section. But let's decompose the straightforward solution before.
First, get the complete list of subfolders.
all_folders_splitted = [os.path.split(f)\
for f in glob.iglob(os.path.join(destinationPath, "**"), recursive=True)\
if os.path.isdir(f)]
Then, use a function on each of your file to find its matching folder, or a new filepath if it doesn't exist. I include this function called find_folder() in the rest of the script:
import os
import glob
import shutil
sourcefile= r'C:\Users\cah\Desktop\000Turning'
destinationPath = r'C:\Users\cah\Desktop\08-CAM'
all_folders_splitted = [os.path.split(f)\
for f in glob.iglob(os.path.join(destinationPath , "**"), recursive=True)\
if os.path.isdir(f)]
# It will create and return a new directory if no directory matches
def find_folder(part1, part2):
matching_folders1 = [folder for folder in all_folders_splitted\
if os.path.split(folder[0])[-1].startswith(part1)]
matching_folder2 = None
for matching_folder2 in matching_folders1:
if matching_folder2[-1].startswith(part2):
return os.path.join(*matching_folder2)
# Whole new folder tree
if matching_folder2 is None:
dest = os.path.join(destinationPath, part1, part2)
os.makedirs(dest)
return dest
# Inside the already existing folder part "1"
dest = os.path.join(matching_folder2[0], part2)
os.makedirs(dest)
return dest
# All the files you want to move
files_gen = glob.iglob(os.path.join(source_path, "**", "*-*-*.nc"), recursive=True)
for file in files_gen:
# Split the first two "-"
basename = os.path.basename(file)
splitted = basename.split("-", 2)
# Format the destination folder.
# Creates it if necessary
destination_folder = find_folder(splitted[0], splitted[1])
# Copying the file
shutil.copy2(file, os.path.join(destination_folder, basename))
Improved solution
In case you have a large number of files, it could be detrimental to "split and match" every folder at each iteration.
We can store the folder, found given a pattern, in a dictionary. The dictionary will be updated if a new pattern is given, else it will return the previously found folder.
import os
import glob
import shutil
sourcefile= r'C:\Users\cah\Desktop\000Turning'
destinationPath = r'C:\Users\cah\Desktop\08-CAM'
# Global dictionary to store folder paths, relative to a pattern
found_pattern = dict()
all_folders_splitted = [os.path.split(f)\
for f in glob.iglob(os.path.join(destinationPath , "**"), recursive=True)\
if os.path.isdir(f)]
def find_folder(part1, part2):
current_key = tuple([part1, part2])
if current_key in pattern_match:
# Already found previously.
# We just return the folder path, stored as the value.
return pattern_match[current_key]
matching_folders1 = [folder for folder in all_folders_splitted\
if os.path.split(folder[0])[-1].startswith(part1)]
matching_folder2 = None
for matching_folder2 in matching_folders1:
if matching_folder2[-1].startswith(part2):
dest = os.path.join(*matching_folder2)
# Update the dictionary
pattern_match[current_key] = dest
return dest
if matching_folder2 is None:
dest = os.path.join(destinationPath, part1, part2)
else:
dest = os.path.join(matching_folder2[0], part2)
# Update the dictionary
pattern_match[current_key] = dest
os.makedirs(dest, exist_ok = True)
return dest
# All the files you want to move
files_gen = glob.iglob(os.path.join(source_path, "**", "*-*-*.nc"), recursive=True)
for file in files_gen:
# Split the first two "-"
basename = os.path.basename(file)
splitted = basename.split("-", 2)
# Format the destination folder.
# Creates it if necessary
destination_folder = find_folder(splitted[0], splitted[1])
# Copying the file
shutil.copy2(file, os.path.join(destination_folder, basename))
This updated solution makes it more efficient (especially when many files should share the same folder) and you could also make use of the dictionary later, if you save it.

Python: How do I add files to a new directory with a time-stamp?

Im new to python. I am trying to create a script that will add imported files to a new directory with a timestamp, as a daily backup. How do I point to the new directory as it changes name every day? Here is my script:
gis = GIS("https://arcgis.com", "xxx", "xxx")
items = gis.content.search(query="type:Feature Service, owner:xxx", max_items=5000,)
import datetime
import shutil
import os
now = datetime.datetime.today()
nTime = now.strftime("%d-%m-%Y")
source = r"C:\Users\Bruger\xxx\xxx\xxx\Backup\Backup"
dest = os.path.join(source+nTime)
if not os.path.exists(dest):
os.makedirs(dest) #creat dest dir
source_files = os.listdir(source)
for f in source_files:
source_file = os.path.join(source,f)
if os.path.isfile(source_file): #check if source file is a file not dir
shutil.move(source_file,dest) #move all only files (not include dir) to dest dir
for item in items:
service_title = item.title
if service_title == "Bøjninger_09_06":
try:
service_title = item.title
version = "1"
fgdb_title = service_title+version
result = item.export(fgdb_title, "File Geodatabase")
result.download(r"C:\Users\Bruger\xxx\xxx\xxx\Backup\?????) **what shal I write here in order to point to the new folder?**
result.delete()
except:
print("An error occurred downloading"+" "+service_title)```

You could try an f-string instead of a regular string, like this:
rf"C:\Users\Bruger\xxx\xxx\xxx\Backup\{dest}"

You can use something like that:
result.download("C:\Users\Bruger\xxx\xxx\xxx\Backup\{}".format(dest))

Importing xml file into Access database with defined id

I am strugling in order to import an enormous amount of data from xml file into Access.
The problem I am facing is that files I want to import does contain the first row with id
<vin id="11111111111111111">
<description>Mazda3 L 2.0l MZR 150 PS 4T 5AG AL-EDITION TRA-P</description>
<type>BL</type>
<typeapproval>e11*2001/116*0262*07</typeapproval>
<variant>B2F</variant>
<version>7EU</version>
<series>Mazda3</series>
<body>L</body>
<engine>2.0l MZR 150 PS</engine>
<grade>AL-EDITION</grade>
<transmission>5AG</transmission>
<colourtype>Mica</colourtype>
<extcolourcode>34K</extcolourcode>
<extcolourcodedescription>Crystal White Pearl</extcolourcodedescription>
<intcolourcode>BU4</intcolourcode>
<intcolourcodedescription>Black</intcolourcodedescription>
<registrationdate>2012-07-20</registrationdate>
<productiondate>2011-11-30</productiondate>
</vin>
so the result of my import is all the lines except from the VIN number of vehicle that is actually defined as id.
I was trying to manually replace characters like:
"> etc. with
etc.
to get rid of that id but I have actually dozens of files and hundreds of thousands records in each file so it is quite a pain...
so I thought about concatinating all files together with a script in python:
import os
import csv
import pandas as pd
import numpy as np
ver='2011'
dirName =r'C:\Users\dawid\Desktop\DE_DATA\Mazda_DE\VINs_DE\Mazda\xml'.format(ver);
out_file=r'C:\Users\dawid\Desktop\DE_DATA\Mazda_DE\VINs_DE\Mazda\Output.xml'.format(ver);
def getListOfFiles(dirName):
# create a list of file and sub directories
# names in the given directory
listOfFile = os.listdir(dirName)
allFiles = list()
# Iterate over all the entries
for entry in listOfFile:
# Create full path
fullPath = os.path.join(dirName, entry)
# If entry is a directory then get the list of files in this directory
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
else:
allFiles.append(fullPath)
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
return allFiles
listOfFileOut=getListOfFiles(dirName)
#filenames = allFiles
with open(out_file, 'w',encoding='ANSI') as outfile:
for fname in listOfFileOut:
with open(fname,encoding='ANSI') as infile:
for line in infile:
outfile.write(line)
print("Done")
But this completely destroyed structure of the xml file and I cannot import it anymore.
Could anyone suggest if it's possilble to use python to get rid of all those ids to be able to import the whole Database in access?
Thank you in advance.enter image description here

Try this.
from simplified_scrapy import utils, SimplifiedDoc, req
dirName = r'C:\Users\dawid\Desktop\DE_DATA\Mazda_DE\VINs_DE\Mazda\xml'
listFile = utils.getSubFile(dirName, end='.xml')
for f in listFile:
doc = SimplifiedDoc(utils.getFileContent(f, encoding='ANSI'))
doc.replaceReg('<vin[^>]*>', '<vin>')
print(doc.html)
# utils.saveFile(f, doc.html, encoding='ANSI') # write to original file
Result:
<vin>
<description>Mazda3 L 2.0l MZR 150 PS 4T 5AG AL-EDITION TRA-P</description>
<type>BL</type>
<typeapproval>e11*2001/116*0262*07</typeapproval>
<variant>B2F</variant>
<version>7EU</version>
...

Create folders based on filenames

I have a folder with some 1500 excel files . The format of each file is something like this:
0d20170101abcd.xlsx
1d20170101ef.xlsx
0d20170104g.xlsx
0d20170109hijkl.xlsx
1d20170109mno.xlsx
0d20170110pqr.xlsx
The first character of the file name is either '0' or '1' followed by 'd' followed by the date when the file was created followed by customer id(abcd,ef,g,hijkl,mno,pqr).The customer id has no fixed length and it can vary.
I want to create folders for each unique date(folder name should be date) and move the files with the same date into a single folder .
So for the above example , 4 folders (20170101,20170104,20170109,20170110) has to be created with files with same dates copied into their respective folders.
I want to know if there is any way to do this in python ? Sorry for not posting any sample code because I have no idea as to how to start.

Try this out:
import os
import re
root_path = 'test'
def main():
# Keep track of directories already created
created_dirs = []
# Go through all stuff in the directory
file_names = os.listdir(root_path)
for file_name in file_names:
process_file(file_name, created_dirs)
def process_file(file_name, created_dirs):
file_path = os.path.join(root_path, file_name)
# Check if it's not itself a directory - safe guard
if os.path.isfile(file_path):
file_date, user_id, file_ext = get_file_info(file_name)
# Check we could parse the infos of the file
if file_date is not None \
and user_id is not None \
and file_ext is not None:
# Make sure we haven't already created the directory
if file_date not in created_dirs:
create_dir(file_date)
created_dirs.append(file_date)
# Move the file and rename it
os.rename(
file_path,
os.path.join(root_path, file_date, '{}.{}'.format(user_id, file_ext)))
print file_date, user_id
def create_dir(dir_name):
dir_path = os.path.join(root_path, dir_name)
if not os.path.exists(dir_path) or not os.path.isdir(dir_path):
os.mkdir(dir_path)
def get_file_info(file_name):
match = re.search(r'[01]d(\d{8})([\w+-]+)\.(\w+)', file_name)
if match:
return match.group(1), match.group(2), match.group(3)
return None, None, None
if __name__ == '__main__':
main()
Note that depending on the names of your files, you might want to change (in the future) the regex I use, i.e. [01]d(\d{8})([\w+-]+) (you can play with it and see details about how to read it here)...

Check this code.
import os
files = list(x for x in os.listdir('.') if x.is_file())
for i in files:
d = i[2:10] #get data from filename
n = i[10:] #get new filename
if os.path.isdir(i[2:10]):
os.rename(os.getcwd()+i,os.getcwd()+d+"/"+i)
else:
os.mkdir(os.getcwd()+i)
os.rename(os.getcwd()+i,os.getcwd()+d+"/"+i)
Here's is the repl link

Try this out :
import os, shutil
filepath = "your_file_path"
files = list(x for x in os.listdir(filepath) if x.endswith(".xlsx"))
dates = list(set(x[2:10] for x in files))
for j in dates:
os.makedirs(filepath + j)
for i in files:
cid = i[10:]
for j in dates:
if j in i:
os.rename(filepath+i,cid)
shutil.copy2(filepath+cid, filepath+j)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to attach files PDF one by one - python

Related

Python and NLP, extract key words and person name from multiple pdf files using NLP & Python

I want to move a file based on part of the name to a folder with that name

Python: How do I add files to a new directory with a time-stamp?

Importing xml file into Access database with defined id

Create folders based on filenames

Categories

Resources