Pasting python's tabulate output to Microsoft Office editors - python

Quite frequently I need to copy-paste small tables from an SQL editor into Microsoft Office programs (Outlook and OneNote) and I want it to look nice as I paste it. So I wrote a short script taking the data from the clipboard, processing it with Tabulate and returning it to the clipboard.
This works very well when I paste the new table into Notepad++ and other editors. It completely messes up when I paste into Outlook.
If I paste into Notepad++ and then copy paste from there, everything's fine.
I tried the different table formats, and I tried playing around with Outlook's editor options.
Would really appreciate any insights!
Thanks!
See code:
import win32clipboard
import pandas as pd
from tabulate import tabulate
df = pd.read_clipboard()
head = df.columns.tolist()
val = df.values
table = tabulate(val,headers=head,tablefmt="grid")
# set clipboard data
win32clipboard.OpenClipboard()
win32clipboard.EmptyClipboard()
win32clipboard.SetClipboardText(str(table))
win32clipboard.CloseClipboard()

Related

Web Scraping Data Visualization

I'm trying to capture AND present data in a table format after the script is finished. The website I am using is http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records And the logic is working as such:
I run the command, it opens to the URL
I then go to the URL http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records
I proceed to copy any selected rows/columns from the Table/chart
I then go back to my IDE (Jupyter Notebook) and it takes the captured data and spits it out
I can select the data on that particular webpage and copy it using my cursor by highlighting and selecting “copy”. It will then spit out all that I have selected and copied to my clipboard.
So far, my script that I wrote, is working to only capture the data and then spit it back out as is (unformatted).
PROBLEM: I would like the data I captured to be presented in a table format after I have finished selecting it and have it copied in my clipboard.
I realize I need to probably write the logic for the data I captured to be then be formatted. What would be the best approach for accomplishing this?
Below is my code that I have written so far:
Here is my code:
import numpy as np
Import pandas as pd
from pandas import Series, Dataframe
website='http://en.wikipedia.org/wiki/NFL_win_loss_records'
web browser.open(website)
nfl_frame= pd.read_clipboard(Sep='\t')
nfl_frame
You can read your data directly to DataFrame with pandas.read_html
import pandas as pd
WIKI_URL = 'http://en.wikipedia.org/wiki/List_of_all-time_NFL_win-loss_records'
df = pd.read_html(WIKI_URL,header=0)[1]
df.head() # in jupyter or print(df.head()) to show a table with first 5 rows
As pd.read_html returns a list. In them are tables that are in that HTML/URL. I set header to first raw, and selected the second element of the list which is the table you are looking for.

From Python web app: insert data into spreadsheet (e.g. LibreOffice / Excel), calculate and save as pdf

I am facing the problem, that I would like to push data (one large dataframe and one image) from my python web app (running on Tornado Webserver and Ubuntu) into a spreadsheet, calculate, save as pdf and the deliver to the frontend.
I took a look at several libs like openpyxl for writing Sheets in MS Excel, but that would solve just one part. I was thinking about using LibreOffice and pyoo, but it seems that I need the same python version on my backend as shipped with LibeOffice when importing pyuno.
Does somebody has solved a similar issue and have a recommendation how to solve this?
Thanks
I came up to a let's say not pretty, but rare solution that works very flexible for me.
use openpyxl to open an existing Excel workbook that includes layout (Template)
insert the dataframe into a separate sheet in that workbook
use openpyxl to save as temporary_file.xlsx
call LibeOffice with --headless --convert-to pdf temporary_file.xlsx
While executing the last call, all integrated formulas are recalculated/updated and the pdf is created (you have to configure calc so that auto calc is enabled when files are opened)
deliver pdf to frontend or process as you like
delete temporary_file.xlsx
import openpyxl
import pandas as pd
from subprocess import call
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
now = datetime.datetime.now().strftime("%Y%m%d_%H%M_%f")
wb_template_name = 'Template.xlsx'
wb_temp_name = now + wb_template_name
wb = openpyxl.load_workbook(wb_template_name)
ws = wb['dataframe_sheet']
pdf_convert_cmd = 'soffice --headless --convert-to pdf ' + wb_temp_name
for r in dataframe_to_rows(df, index=True, header=True):
ws.append(r)
wb.save(wb_temp_name)
call(pdf_convert_cmd, shell=True)
The reason why I'm doing this, is that I would like to be able to style the layout of the pdf independently from the data. I use named ranges or lookups that are referenced to the separate dataframe-sheet in excel.
I didn't try the image insertion yet, but this should work similar. I think there could be a way to increase the performance while simply dump the dataframe into the xlsx file (which is a zipped file of xmls), so that you don't need openpyxl.

How to Read a WebPage with Python and write to a flat file?

Very novice at Python here.
Trying to read the table presented at this page (w/ the current filters set as is) and then write it to a csv file.
http://www65.myfantasyleague.com/2017/options?L=47579&O=243&TEAM=DAL&POS=RB
I tried this next approach. It creates the csv file but does not fill it w/ the actual table contents.
Appreciate any help in advance. thanks.
import requests
import pandas as pd
url = 'http://www65.myfantasyleague.com/2017/optionsL=47579&O=243&TEAM=DAL&POS=RB'
csv_file='DAL.RB.csv'
pd.read_html(requests.get(url).content)[-1].to_csv(csv_file)
Generally, try to emphasize your problems better, try to debug and don't put everything in one line. With that said, your specific problem here was the index and the missing ? in the code (after options):
import requests
import pandas as pd
url = 'http://www65.myfantasyleague.com/2017/options?L=47579&O=243&TEAM=DAL&POS=RB'
# -^-
csv_file='DAL.RB.csv'
pd.read_html(requests.get(url).content)[1].to_csv(csv_file)
# -^-
This yields a CSV file with the table in it.

python 3.6 windows: retrieving the clipboard CF_HTML format

I want to copy some rich text, modify its source code (changing some tags and text, using regex and/or beautifulsoup) and send it back to the clipboard. I'm looking for the easiest way to do that.
I tried win32clipboard, but it doesn't support the CF_HTML format (windows clipboard contains many formats).
So I'm looking for a module that could help me to get this format:
if the CF_HTML clipboard format contains HTML, store it in that variable, do some operation, then send it back. (Optionally: and do other stuff on other clipboard formats)
Here is a Linux equivalent of what I'm looking for. It retrieves the HTML source, when there's some in the clipboard (source)
#!/usr/bin/env python
import gtk
print (gtk.Clipboard().wait_for_contents('text/html')).data
Edit1: There is a work around with pywin32 using this script. But is there a module able to do that directly (if CF_HTML contains data, get it, and send it back)?
The Edit1 solution seems to be actually the best.
put the script above (HtmlClipboard.py) in the python module folder: C:\Users\xxx\AppData\Local\Programs\Python\Python36\Lib\site-packages
install win32clipboard
With the 2 points above you could play with a script like this:
#get CF_Html Clipboard
import HtmlClipboard #.py script found in github
if HtmlClipboard.HasHtml():
# print('there is HTML!!')
dirty_HTML = HtmlClipboard.GetHtml()
print(dirty_HTML)
else:
print('no html')
dirty_HTML= clean_HTML #do what you want with it
#put data to clipboard:
HtmlClipboard.PutHtml(clean_HTML)
Bonus:
##get CF_TEXT from clipboard
import win32clipboard
win32clipboard.OpenClipboard()
text = win32clipboard.GetClipboardData(win32clipboard.CF_TEXT)
win32clipboard.CloseClipboard()

how to read a data file including "pandas.core.frame, numpy.core.multiarray"

I met a DF file which is encoded in binary format. But when I open it using Vim, still I can see characters like "pandas.core.frame", "numpy.core.multiarray". So I guess it is related with Python. However I know little about the Python language. Though I have tried using pandas and numpy modules, I failed to read the file. Could you guys give any suggestion on this issue? Thank you in advance. Here is the Dropbox link to the DF file: https://www.dropbox.com/s/b22lez3xysvzj7q/flux.df
Looks like DataFrame stored with pickle, use read_pickle() to read it:
import pandas as pd
df = pd.read_pickle('flux.df')

Categories