I need to read large data from temp file in Spotfire using IronPython.
First I have exported my Tibco data table in a temp file using the Exported text() method:
#Temp file for storing the TablePlot data
tempFolder = Path.GetTempPath()
tempFilename = Path.GetTempFileName()
#Export TablePlot data to the temp file
tp = tablePlotViz.As[TablePlot]()
writer = StreamWriter(tempFilename)
tp.ExportText(writer)
After that, opened the temp file using the open() method.
f = open(tempFilename)
Now when I started to read the data from the opened file and write back into a String variable then it is taking too much time. And my Spotfire screen is stopped working.
Has anyone idea about this?
My data table is of 8MB size.
Code is:
from Spotfire.Dxp.Application.Visuals import TablePlot, HtmlTextArea
import clr
import sys
clr.AddReference('System.Data')
import System
from System.Data import DataSet, DataTable, XmlReadMode
from Spotfire.Dxp.Data import DataType, DataTableSaveSettings
from System.IO import StringReader, StreamReader, StreamWriter, MemoryStream, SeekOrigin, FileStream, FileMode,Path, File
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
from System.Threading import Thread
from Spotfire.Dxp.Data import IndexSet
from Spotfire.Dxp.Data import RowSelection
from Spotfire.Dxp.Data import DataValueCursor
from Spotfire.Dxp.Data import DataSelection
from Spotfire.Dxp.Data import DataPropertyClass
from Spotfire.Dxp.Data import Import
from Spotfire.Dxp.Data.Import import TextFileDataSource, TextDataReaderSettings
from System import Array
from Spotfire.Dxp.Application.Visuals import VisualContent
from Spotfire.Dxp.Application.Visuals import TablePlot
from System.IO import Path, StreamWriter
from System.Text import StringBuilder
#Temp file for storing the TablePlot data
tempFolder = Path.GetTempPath()
tempFilename = Path.GetTempFileName()
#Export TablePlot data to the temp file
tp = tablePlotViz.As[TablePlot]()
writer = StreamWriter(tempFilename)
tp.ExportText(writer)
#Build the table
sb = StringBuilder()
#Open the temp file for reading
f = open(tempFilename)
#build the html table
html = " <TABLE id='table' style='display:none;'>\n"
html += "<THEAD>"
html += " <TR><TH>"
html += " </TH><TH>".join(f.readline().split("\t")).strip()
html += " </TH></TR>"
html += "</THEAD>\n"
html += "<TBODY>\n"
for line in f:
html += "<TR><TD>"
html += "</TD><TD>".join(line.split("\t")).strip()
html += "</TD></TR>\n"
#Assigned the all HTML data in the text area
print html
The code works fine with short data.
If I am getting correctly, the intention behind the code is reading Table Plot visualization data into a string, for further using in a HTML Text Area.
There is an alternative way for doing this, without writing data into temporary file. We can use memory stream to export data and convert exported text to string for further reuse. The sample code can be referred from here.
Related
I am trying to use the tablib library and create a Dataset from a .csv file. The following works:
import tablib
dataset = tablib.Dataset().load(open('data.csv').read())
However, in some cases, I'd like to load the .csv file from a URL.
Any ideas on how to do that?
You wrote
def get_ds(filename):
return tablib.Dataset().load(open(filename).read())
You want
import os.path
import requests
def get_ds(src):
if os.path.exists(src):
txt = open(src).read()
else:
req = requests.get(src)
req.raise_for_status()
txt = req.text
return tablib.Dataset().load(txt)
I am trying to build a function that iterates over a bunch of names in a CSV I give then extracts the last serial number written from JSON file then adding one for each name and putting serial number beside every name in the csv, but what i get is that the function generates the first serial number successfully and saves it in Json file but fails to add it in the csv via pandas and fails to update the number in the JSON file.
this is the code of the function:
from docx import Document
import pandas as pd
from datetime import datetime
import time
import os
from docx2pdf import convert
import json
date=datetime.date(datetime.now())
strdate=date.strftime("%d-%m-%Y")
year=date.strftime("%Y")
month=date.strftime("%m")
def genrateserial(a):
jsonFile1 = open("data_file.json", "r")
lastserial = jsonFile1.read()
jsonFile1.close()
for d in range(len(lastserial)):
if lastserial[d]=="\"":
lastserial[d].replace("\"","")
jsonFile1.close()
if strdate=="01" or (month[1]!=lastserial[8]):
num=1
last=f"JO/{year}{month}{num}"
data=f"{last}"
jsonstring=json.dumps(data)
jsonfile2=open("data_file.json", "w")
jsonfile2.write(jsonstring)
jsonfile2.close()
database = pd.read_csv(a)
df = pd.DataFrame(database)
df = df.dropna(axis=0)
for z in range(len(df.Name)):
newentry=f"JO/{year}{month}{num+1}"
jsonstring1=json.dumps(newentry)
jsonfile3=open("data_file.json","w")
jsonfile3.write(jsonstring1)
jsonfile3.close()
df.iloc[[z],3]=newentry
genrateserial('database.csv')
At the moment I use a script to populate a template for each of the entries in our database and generate a docx file for each entry. Following that I convert that docx file to a pdf and mail it to the user.
For this I use following code :
from docxtpl import DocxTemplate
from docx2pdf import convert
pathToTemplate='template.docx'
outputPath='output.docx'
template = DocxTemplate(pathToTemplate)
context = person.get_context(short) # gets the context used to render the document
template.render(context)
template.save(outputPath)
pdfpath = outputPath[:-4]+'pdf'
convert(outputPath, pdfpath)
This part of the code is embedded in a loop and when measuring the time needed to generate the context from the database (in the person.get_context(short) function) and generating the docx file it gives me a result between 0.5s - 1.0s. When measuring the time needed to convert this docx to pdf it gives me a time of 5.0s - 7.0s.
Because the loop has to loop over > 1000 users, this is the difference can add up. Does anyone have an idea how the DocxTemplate kan save to pdf directly (and how fast this is) or if there is a faster way to generate the pdf files?
as far as I know you just can't do it with the docx library itself, but I have found an alternate way to achieve this, we can convert the docx to pdf using the following code
from docxtpl import DocxTemplate
import pandas as pd
df = pd.read_excel("Data.xlsx")
import time
import os
from win32com import client
word_app = client.Dispatch("Word.Application")
for i , j in df.iterrows():
Name = j["Party_Name"]
tpl = DocxTemplate("Invoice_Template.docx")
dicty = df.to_dict()
x = df.to_dict(orient="records")
context = x
tpl.render(context[i])
tpl.save("hello.docx")
rod = os.path.dirname(os.path.abspath(__file__))
print(rod)
time.sleep(2)
#converting to pdf
doc = word_app.Documents.Open(rod + "\\1.docx")
doc.SaveAs(rod + "\\hello.pdf", FileFormat=17)
I am currently using the code below to go out and fetch a Salesforce report and try to write it to a csv file. When I take the length of items its 2000, but when I execute this code it produces a CSV file that only contains 55 rows total. My guess is something is off in the write function but I am unsure.
Anyone suggestions would be appreciated.
import csv
from salesforce_reporting import Connection
import salesforce_reporting
sf = Connection(username='user',password='pw',security_token='token')
report = sf.get_report('report_id',details=True)
parser = salesforce_reporting.ReportParser(report)
items = parser.records()
with open("output.csv", "w") as f:
writer = csv.writer(f)
writer.writerows(items)
I was able to figure out that the issue was indeed in the writing aspect of my code. The code below will export your report without headers.
import csv
from salesforce_reporting import Connection
import salesforce_reporting
sf = Connection(username='user',password='pw',secrity_token='token')
report = sf.get_report('reportid',details=True)
parser = salesforce_reporting.ReportParser(report)
items = parser.records()
f = csv.writer(open('test_output.csv','w'))
f.writerows(items)
I'm trying to load a dataset with breaks in it. I am trying to find an intelligent way to make this work. I got started on it with the code i included.
As you can see, the data within the file posted on the public FTP site starts at line 11, ends at line 23818, then starts at again at 23823, and ends at 45,630.
import pandas as pd
import numpy as np
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
url = urlopen("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/10_Portfolios_Prior_12_2_Daily_CSV.zip")
#Download Zipfile and create pandas DataFrame
zipfile = ZipFile(BytesIO(url.read()))
df = pd.read_csv(zipfile.open('10_Portfolios_Prior_12_2_Daily.CSV'), header = 0,
names = ['asof_dt','1','2','3','4','5','6','7','8','9','10'], skiprows=10).dropna()
df['asof_dt'] = pd.to_datetime(df['asof_dt'], format = "%Y%m%d")
I would ideally like the first set to have a version number "1", the second to have "2", etc.
Any help would be greatly appreciated. Thank you.