Python automatic script to convert EML file in Hebrew to PDF - python

Anyone have any suggestions on how to automate this problem using Thunderbird. I have an EML file that contains Hebrew and every conversion tool does not properly convert the Hebrew text correctly( it moves the sentence to be read left to right) except for Thunderbird. I have been looking at modules for Python such as autokey and dogtail but could use some advice on how to tackle this problem. If I open up the EML file with thunderbird and go to print I can save the file as pdf. I would like to automate this process.

Related

Reading .nl file through browser in python

I am trying to read the characters in the screenshot below in python. It has to be done through the browser as this is the end an automation script and I can't download the file. The barcode doesn't matter, just the text.
The file is of ".nl" type
Is there any way that I could possibly do this?

I need to extract text from PDF file and make a new .txt file to put in

I need help in a PYTHON script to read PDF file and copy every word on it and put them in a new .txt file (every word must take 1 line) ; and then deleted the repeated words and count them after that and print the count in the last line
Install these libraries.
PyPDF2 (To convert simple, text-based PDF files into text readable by Python)
textract (To convert non-trivial, scanned PDF files into text readable by Python)
nltk (To clean and convert phrases into keywords)
Each of these libraries can be installed with the following commands in side terminal(on macOS):
pip install Libraryname
See this Tutorial https://medium.com/#rqaiserr/how-to-convert-pdfs-into-searchable-key-words-with-python-85aab86c544f
Use texttrack it support many types of files also PDF. So texttrack better.
folow these links
https://github.com/deanmalmgren/textract
https://textract.readthedocs.io/en/latest/
Did you search the Stackoverflow for answers?
Here you can find some pretty good answers about how to extract text from a pdf file (Look at Jakobovski answer):
How to extract text from a PDF file?
Here you can find information about writing/editing/creating .txt files:
https://www.guru99.com/reading-and-writing-files-in-python.html

Unreadable characters from Python to csv file

I'm a linguistics student and I'm downloading tweets in Italian for my thesis, I've been reading previous answers to similar problems but none of them worked for me: after downloading them, if I read them in PyCharm terminal my tweets are perfectly readable, but when I open the csv file, doesn't matter the program, LibreOffice (I'm using Ubuntu 18.04), Excel 2010, Txt, characters like "é è à" and so on are visualized as a unicode string.
I tried every tutorial here and elsewhere, but I'm not having success, any idea of what could I do?
Thanks a lot
Two options you can try.
Use Sublime Text (free trial): Open your CSV file, then Save with encoding... and choose "UTF-8"
Import (rather than open) with Excel: Open blank sheet. Then Import, choose CSV File. In the following Assistant choose "UTF-8" as Source.

Convert text file into pdf

I have a task to convert simple text file into pdf format. Also I need to add a header to that newly created pdf file.
The server which will have this text file and will convert it does not have any Microsoft Office document or other tools for conversion. One suggested to use python for that task since the server has it installed.
Could you please help me to start with conversion from text to pdf using python?
P.S. My system does not have pyPdf module and I failed to install it.
Thanks
Here is some update:
I run some program which at the end generate manifest. Manifest is a simple text file which looks like .csv file but columns are separated by white space. I ship this manifest to client. My current task is to ship to client additionally to this manifest another file which should have the same content and the header with the client name and be in PDF format.
I am all set now.
I figured out that my server already has pdf installed and the only thing I had to do was to call it. Sorry for confusion.
Ticket could be closed.

how to read ppt file using python?

I want to get the content (text only) in a ppt file. How to do it?
(It likes that if I want to get content in a txt file, I just need to open and read. What do I need to do to get information from ppt files?)
By the way, I know there is a win32com in windows system. But now I am working on linux, is there any possible way?
I found this discussion over on Superuser:
Command line tool in Linux to Extract Text From Word, Excel, Powerpoint?
There are several reasonable answers listed there, including using LibreOffice to do this (and for .doc, .docx, .pptx, etc, etc.), and the Apache Tika Project (which appears to be the 5,000lb gorilla in this solution space).

Categories