I am trying to create a tool that assists me in filling out a specific web form.
For that I have a „.txt“ with the necessary information. Now I am trying to find a python module that detects that I pasted the last content and loads the next content (from txt) into the clipboard. Is that possible with python?
I believe you can the pyperclip module to copy and paste text to and from the clipboard. You can take a look at http://pyperclip.readthedocs.io/en/latest/introduction.html for more info.
They have an example where they copy a string onto a clipboard and paste it.
>>> import pyperclip
>>> pyperclip.copy('Hello world!')
>>> pyperclip.paste()
'Hello world!'
But of course, I believe you can extract the string from your text file and use it.
Related
I am new to coding python and have trouble when I print out from a file (only tried from .rtf) as it displays all the file properties. I've tried a variety of ways to code the same thing, but the output is always similar. Example of the code and the output:
opener=open("file.rtf","r")
print(opener.read())
opener.close()
The file only contains this:
Camila
Employee
Try it
But the outcome is always:
{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Camila\
\
Employees\
\
Try it}
Help? How to stop that from happening or what am I doing wrong?
The RTF filetype contains more information than just the text, like fonts etc..
Python reads the RTF file as plain text, and therefore includes this information.
If you want to get the plain text, you need a module that can translate it, like striprtf
Make sure the module is installed by running this in the commandline:
pip install striprtf
Then, to get your text:
from striprtf.striprtf import rtf_to_text
file = open("file.rtf", "r")
plaintext = rtf_to_text(file.read())
file.close()
Use this package https://github.com/joshy/striprtf.
from striprtf.striprtf import rtf_to_text
rtf = "some rtf encoded string"
text = rtf_to_text(rtf)
print(text)
I have copied a text from my software using pywinauto. Unfortunately, I don't know how to paste that to a text file. The following is the code that I wrote:
The last line of the code is not working as it should not. However, that is what I should do. Can anyone help me to solve this problem?
pywinauto.mouse.double_click(button='left', coords=(820,168))
pywinauto.keyboard.send_keys('^c')
f= open("trial.txt","w+")
f.write(pywinauto.keyboard.send_keys('^v'))```
I see that you're trying to paste the contents of clipboard, but there is no visual area to paste.
f.write() will accept text through a variable or, by passing some text. Invoking Ctrl + V is a GUI operation, which can't replace the text in f.write()
You can use pyperclip module to access the clipboard contents.
import pyperclip
"""yourcode"""
f.write(pyperclip.paste())
f.close()
You can also programatically copy something to system clipboard using pyperclip.
pyperclip.copy("This is a text copied to clipboard from Python script!!")
You can now check the contents by invoking Ctrl + V in some GUI application like notepad.
You can try send it hotkey
pyautogui.hotkey('ctrl','v')
When you do Copy (CTRL+C) on a file, then in some programs (example: it works in the Windows Explorer address bar, also with Everything indexing software), when doing Paste (CTRL+V), the filename or directory name is pasted like text, like this: "d:\test\hello.txt".
I tried this:
CTRL+C on a file or folder in Windows Explorer
Run:
import win32clipboard
win32clipboard.OpenClipboard()
data = win32clipboard.GetClipboardData()
win32clipboard.CloseClipboard()
print data
But I get this error:
TypeError: Specified clipboard format is not available
Question: how to retrieve the filename of a file that has been "copied" (CTRL+C) in the Windows Explorer?
The clipboard may contain more than one format. For example, when formatted text is copied from MS word, both the formatted text and the plain text will be in the clipboard, so that depending on the application into which you are pasting, the target application may take one or the other format, depending on what it supports.
From MSDN:
A window can place more than one clipboard object on the clipboard,
each representing the same information in a different clipboard
format. When placing information on the clipboard, the window should
provide data in as many formats as possible. To find out how many
formats are currently used on the clipboard, call the
CountClipboardFormats function.
Because of that, win32clipboard.GetClipboardData takes one argument: format, which is by default win32clipboard.CF_TEXT.
When you call it without arguments, it raises error saying TypeError: Specified clipboard format is not available, because TEXT format is not in the clipboard.
You can, instead, ask for win32clipboard.CF_HDROP format, which is "A tuple of Unicode filenames":
import win32clipboard
win32clipboard.OpenClipboard()
filenames = win32clipboard.GetClipboardData(win32clipboard.CF_HDROP)
win32clipboard.CloseClipboard()
for filename in filenames:
print(filename)
See also MSDN doc for standard clipboard formats
This worked for me:
import win32clipboard
win32clipboard.OpenClipboard()
filename_format = win32clipboard.RegisterClipboardFormat('FileName')
if win32clipboard.IsClipboardFormatAvailable(filename_format):
input_filename = win32clipboard.GetClipboardData(filename_format).decode("utf-8")
print(input_filename)
win32clipboard.CloseClipboard()
That prints the whole file path, if you want just the file name use:
os.path.basename(input_filename)
Try to use this argument >>> CF_UNICODETEXT like this win32clipboard.GetClipboardData(win32clipboard.CF_UNICODETEXT)
It's work for me. Refer: https://learn.microsoft.com/en-us/windows/win32/dataxchg/standard-clipboard-formats
I'm working on a Python 3 project that uses the Gtk3 TextView/TextBuffer to get a user's input, and I've got it working to where I can have the user typing in rich text and able to format it as Bold/Italic/Underline/Combination of these.
However, I'm stuck on trying to figure out how to get the text from the TextBuffer with those flags included so I can use the formatting flags to convert the text to properly formatted HTML when I need to.
Calling textbuffer.get_text(start, end, True) simply returns the text without any flags.
Here's the code and the editor.glade file. Save them both in the same directory.
How can I get the text with the flags included? Or, alternatively, is there a way I can get the user's input formatted as HTML automatically in another variable automatically?
That's not very easy. Here is a link to some code that I once wrote to do the same thing for RTF output. You can probably adapt it to produce HTML output. If you manage to do so, I'd possibly integrate it into that library's successor.
Alternatively, if you prefer text processing to the above, you can export the rich text in GtkTextBuffer's internal serialization format and convert it to HTML yourself later:
format = textbuffer.register_serialize_tagset('my-tagset')
exported = textbuffer.serialize(textbuffer, format, start, end)
I'm trying to use pyPdf to extract and print pages from a multipage PDF. Problem is, text is not extracted from some pages. I've put an example file here:
http://www.4shared.com/document/kmJF67E4/forms.html
If you run the following, the first 81 pages return no text, while the final 11 extract properly. Can anyone help?
from pyPdf import PdfFileReader
input = PdfFileReader(file("forms.pdf", "rb"))
for page in input1.pages:
print page.extractText()
Note that extractText() still has problems extracting the text properly. From the documentation for extractText():
This works well for some PDF files,
but poorly for others, depending on
the generator used. This will be
refined in the future. Do not rely on
the order of text coming out of this
function, as it will change if this
function is made more sophisticated.
Since it is the text you want, you can use the Linux command pdftotext.
To invoke that using Python, you can do this:
>>> import subprocess
>>> subprocess.call(['pdftotext', 'forms.pdf', 'output'])
The text is extracted from forms.pdf and saved to output.
This works in the case of your PDF file and extracts the text you want.
This isn't really an answer, but the problem with pyPdf is this: it doesn't yet support CMaps. PDF allows fonts to use CMaps to map character IDs (bytes in the PDF) to Unicode character codes. When you have a PDF that contains non-ASCII characters, there's probably a CMap in use, and even sometimes when there's no non-ASCII characters. When pyPdf encounters strings that are not in standard Unicode encoding, it just sees a bunch of byte code; it can't convert those bytes to Unicode, so it just gives you empty strings. I actually had this same problem and I'm working on the source code at the moment. It's time consuming, but I hope to send a patch to the maintainer some time around mid-2011.
You could also try the pdfminer library (also in python), and see if it's better at extracting the text. For splitting however, you will have to stick with pyPdf as pdfminer doesn't support that.
I find it sometimes useful to convert it to ps (try with pdf2psand pdftops for potential differences) then back to pdf (ps2pdf). Then try your original script again.
I had similar problem with some pdfs and for windows, this is working excellent for me:
1.- Download Xpdf tools for windows
2.- copy pdftotext.exe from xpdf-tools-win-4.00\bin32 to C:\Windows\System32 and also to C:\Windows\SysWOW64
3.- use subprocess to run command from console:
import subprocess
try:
extInfo = subprocess.check_output('pdftotext.exe '+filePath + ' -',shell=True,stderr=subprocess.STDOUT).strip()
except Exception as e:
print (e)
I'm starting to think I should adopt a messy two-part solution. there are two sections to the PDF, pp 1-82 which have text page labels (pdftotext can extract), and pp 83-end which have no page labels but pyPDF can extract and it explicitly knows pages.
I think I need to combine the two. Clunky, but I don't see any way round it. Sadly I'm having to do this on a Windows machine.