Moving numpy arrays from VBA to Python and back - python

I have a VBA script in Microsoft Access. The VBA script is part of a large project with multiple people, and so it is not possible to leave the VBA environment.
In a section of my script, I need to do complicated linear algebra on a table quickly. So, I move the VBA tables written as recordsets) into Python to do linear algebra, and back into VBA. The matrices in python are represented as numpy arrays.
Some of the linear algebra is proprietary and so we are compiling the proprietary scripts with pyinstaller.
The details of the process are as follows:
The VBA script creates a csv file representing the table input.csv.
The VBA script runs the python script through the command line
The python script loads the csv file input.csv as a numpy matrix, does linear algebra on it, and creates an output csv file output.csv.
VBA waits until python is done, then loads output.csv.
VBA deletes the no-longer-needed input.csv file and output.csv file.
This process is inefficient.
Is there a way to load VBA matrices into Python (and back) without the csv clutter? Do these methods work with compiled python code through pyinstaller?
I have found the following examples on stackoverflow that are relevant. However, they do not address my problem specifically.
Return result from Python to Vba
How to pass Variable from Python to VBA Sub

Solution 1
Either retrieve the COM running instance of Access and get/set the data directly with the python script via the COM API:
VBA:
Private Cache
Public Function GetData()
GetData = Cache
Cache = Empty
End Function
Public Sub SetData(data)
Cache = data
End Sub
Sub Usage()
Dim wshell
Set wshell = VBA.CreateObject("WScript.Shell")
' Make the data available via GetData()'
Cache = Array(4, 6, 8, 9)
' Launch the python script compiled with pylauncher '
Debug.Assert 0 = wshell.Run("C:\dev\myapp.exe", 0, True)
' Handle the returned data '
Debug.Assert Cache(3) = 2
End Sub
Python (myapp.exe):
import win32com.client
if __name__ == "__main__":
# get the running instance of Access
app = win32com.client.GetObject(Class="Access.Application")
# get some data from Access
data = app.run("GetData")
# return some data to Access
app.run("SetData", [1, 2, 3, 4])
Solution 2
Or create a COM server to expose some functions to Access :
VBA:
Sub Usage()
Dim Py As Object
Set Py = CreateObject("Python.MyModule")
Dim result
result = Py.MyFunction(Array(5, 6, 7, 8))
End Sub
Python (myserver.exe or myserver.py):
import sys, os, win32api, win32com.server.localserver, win32com.server.register
class MyModule(object):
_reg_clsid_ = "{5B4A4174-EE23-4B70-99F9-E57958CFE3DF}"
_reg_desc_ = "My Python COM Server"
_reg_progid_ = "Python.MyModule"
_public_methods_ = ['MyFunction']
def MyFunction(self, data) :
return [(1,2), (3, 4)]
def register(*classes) :
regsz = lambda key, val: win32api.RegSetValue(-2147483647, key, 1, val)
isPy = not sys.argv[0].lower().endswith('.exe')
python_path = isPy and win32com.server.register._find_localserver_exe(1)
server_path = isPy and win32com.server.register._find_localserver_module()
for cls in classes :
if isPy :
file_path = sys.modules[cls.__module__].__file__
class_name = '%s.%s' % (os.path.splitext(os.path.basename(file_path))[0], cls.__name__)
command = '"%s" "%s" %s' % (python_path, server_path, cls._reg_clsid_)
else :
file_path = sys.argv[0]
class_name = '%s.%s' % (cls.__module__, cls.__name__)
command = '"%s" %s' % (file_path, cls._reg_clsid_)
regsz("SOFTWARE\\Classes\\" + cls._reg_progid_ + '\\CLSID', cls._reg_clsid_)
regsz("SOFTWARE\\Classes\\AppID\\" + cls._reg_clsid_, cls._reg_progid_)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_, cls._reg_desc_)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\LocalServer32', command)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\ProgID', cls._reg_progid_)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\PythonCOM', class_name)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\PythonCOMPath', os.path.dirname(file_path))
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\Debugging', "0")
print('Registered ' + cls._reg_progid_)
if __name__ == "__main__":
if len(sys.argv) > 1 :
win32com.server.localserver.serve(set([v for v in sys.argv if v[0] == '{']))
else :
register(MyModule)
Note that you'll have to run the script once without any argument to register the class and to make it available to VBA.CreateObject.
Both solutions work with pylauncher and the array received in python can be converted with numpy.array(data).
Dependency :
https://pypi.python.org/pypi/pywin32

You can try loading your record set into an array, dim'ed as Double
Dim arr(1 to 100, 1 to 100) as Double
by looping, then pass the pointer to the first element ptr = VarPtr(arr(1, 1)) to Python, where
arr = numpy.ctypeslib.as_array(ptr, (100 * 100,)) ?
But VBA will still own the array memory

There is a very simple way of doing this with xlwings. See xlwings.org and make sure to follow the instructions to enable macro settings, tick xlwings in VBA references, etc. etc.
The code would then look as simple as the following (a slightly silly block of code that just returns the same dataframe back, but you get the picture):
import xlwings as xw
import numpy as np
import pandas as pd
# the #xw.decorator is to tell xlwings to create an Excel VBA wrapper for this function.
# It has no effect on how the function behaves in python
#xw.func
#xw.arg('pensioner_data', pd.DataFrame, index=False, header=True)
#xw.ret(expand='table', index=False)
def pensioner_CF(pensioner_data, mortality_table = "PA(90)", male_age_adj = 0, male_improv = 0, female_age_adj = 0, female_improv = 0,
years_improv = 0, arrears_advance = 0, discount_rate = 0, qxy_tables=0):
pensioner_data = pensioner_data.replace(np.nan, '', regex=True)
cashflows_df = pd.DataFrame()
return cashflows_df
I'd be interested to hear if this answers the question. It certainly made my VBA / python experience a lot easier.

Related

Visual Studio: unresolved import 'numpy'

I am trying to run the code below which requires numpy. I installed it via pip install numpy. However, numpy gets highlighted in the editor with the note unresolved import 'numpy'. When I try to run it I get the error No module named 'numpy'. After I got the error the first time I uninstalled numpy and re-installed it but the problem persists.
I am using Python 3.7.8 and NumPy 1.20.2.
The code I am trying to run:
#!/usr/bin/env python3
#
# Copyright (c) 2018 Matthew Earl
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
# NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
# OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
# USE OR OTHER DEALINGS IN THE SOFTWARE.
"""
Super Mario Bros level extractor
This script requires py65emu, numpy, and PIL to run. Run with no arguments to see usage.
See http://matthewearl.github.io/2018/06/28/smb-level-extractor/ for a description of how this was written.
To run you'll need to compile https://gist.github.com/1wErt3r/4048722 with x816 to obtain the PRG-ROM and symbol files.
The CHR-ROM should be extracted from a Super Mario Bros ROM, or can be read from an INES ROM file. See
https://wiki.nesdev.com/w/index.php/INES for information on the INES format. In addition you'll need a NES palette
saved in "data/ntscpalette.pal", generated using the tool here: https://bisqwit.iki.fi/utils/nespalette.php
"""
import collections
import pathlib
import re
import numpy as np
from py65emu.cpu import CPU
from py65emu.mmu import MMU
_WORKING_RAM_SIZE = 0x800
Symbol = collections.namedtuple('Symbol', ('name', 'address', 'line_num'))
class SymbolFile:
_LINE_RE = r"(?P<name>[A-Z0-9_]+) *= \$(?P<address>[A-F0-9]*) *; <> \d+, statement #(?P<line_num>\d+)"
def __init__(self, fname):
with open(fname) as f:
self._symbols = [self._parse_symbol(line) for line in f.readlines()]
self._symbols = list(sorted(self._symbols, key=lambda s: s.address))
self._name_to_addr = {s.name: s.address for s in self._symbols}
self._addr_to_name = {s.address: s.name for s in self._symbols}
def _parse_symbol(self, line):
m = re.match(self._LINE_RE, line)
return Symbol(m.group('name'), int(m.group('address'), 16), int(m.group('line_num')))
def __getitem__(self, name):
return self._name_to_addr[name]
def _read_ppu_data(mmu, addr):
while True:
ppu_high_addr = mmu.read(addr)
if ppu_high_addr == 0x0:
break
ppu_low_addr = mmu.read(addr + 1)
assert ppu_high_addr == 0x3f and ppu_low_addr == 0x00
flags_and_length = mmu.read(addr + 2)
assert (flags_and_length & (1<<7)) == 0, "32-byte increment flag set"
assert (flags_and_length & (1<<6)) == 0, "Repeating flag set"
length = flags_and_length & 0b111111
addr += 3
for i in range(length):
yield mmu.read(addr)
addr += 1
def _load_palette(mmu, sym_file, nes_palette):
area_type = mmu.read(sym_file['AREATYPE'])
idx = mmu.read(sym_file['AREAPALETTE'] + area_type)
high_addr = mmu.read(sym_file['VRAM_ADDRTABLE_HIGH'] + idx)
low_addr = mmu.read(sym_file['VRAM_ADDRTABLE_LOW'] + idx)
palette_data = list(_read_ppu_data(mmu, high_addr << 8 | low_addr))
assert len(palette_data) == 32
a = np.array(palette_data[:16]).reshape(4, 4)
a[:, 0] = mmu.read(sym_file['BACKGROUNDCOLORS'] + area_type)
return nes_palette[a]
def _execute_subroutine(cpu, addr):
s_before = cpu.r.s
cpu.JSR(addr)
while cpu.r.s != s_before:
cpu.step()
def _get_metatile_buffer(mmu, sym_file):
return [mmu.read(sym_file['METATILEBUFFER'] + i) for i in range(13)]
def load_tile(chr_rom, idx):
chr_rom_addr = 0x1000 + 16 * idx
d = chr_rom[chr_rom_addr:chr_rom_addr + 16]
a = np.array([[b & (128 >> i) != 0 for i in range(8)] for b in d]).reshape(2, 8, 8)
return a[0] + 2 * a[1]
def _render_metatile(mmu, chr_rom, mtile, palette):
palette_num = mtile >> 6
palette_idx = mtile & 0b111111
high_addr = mmu.read(sym_file['METATILEGRAPHICS_HIGH'] + palette_num)
low_addr = mmu.read(sym_file['METATILEGRAPHICS_LOW'] + palette_num)
addr = (high_addr << 8 | low_addr) + palette_idx * 4
t = np.vstack([np.hstack([load_tile(chr_rom, mmu.read(addr + c * 2 + r)) for c in range(2)])
for r in range(2)])
return palette[palette_num][t]
def load_level(stage, prg_rom, chr_rom, sym_file, nes_palette):
# Initialize the MMU / CPU
mmu = MMU([
(0x0, _WORKING_RAM_SIZE, False, []),
(0x8000, 0x10000, True, list(prg_rom))
])
cpu = CPU(mmu, 0x0)
# Execute some preamble subroutines which set up variables used by the main subroutines.
if isinstance(stage, tuple):
world_num, area_num = stage
mmu.write(sym_file['WORLDNUMBER'], world_num - 1)
mmu.write(sym_file['AREANUMBER'], area_num - 1)
_execute_subroutine(cpu, sym_file['LOADAREAPOINTER'])
else:
area_pointer = stage
mmu.write(sym_file['AREAPOINTER'], area_pointer)
mmu.write(sym_file['HALFWAYPAGE'], 0)
mmu.write(sym_file['ALTENTRANCECONTROL'], 0)
mmu.write(sym_file['PRIMARYHARDMODE'], 0)
mmu.write(sym_file['OPERMODE_TASK'], 0)
_execute_subroutine(cpu, sym_file['INITIALIZEAREA'])
# Extract the palette.
palette = _load_palette(mmu, sym_file, nes_palette)
# Repeatedly extract meta-tile columns, until the level starts repeating.
cols = []
for column_pos in range(1000):
_execute_subroutine(cpu, sym_file['AREAPARSERCORE'])
cols.append(_get_metatile_buffer(mmu, sym_file))
_execute_subroutine(cpu, sym_file['INCREMENTCOLUMNPOS'])
if len(cols) >= 96 and cols[-48:] == cols[-96:-48]:
cols = cols[:-80]
break
level = np.array(cols).T
# Render a dict of metatiles.
mtiles = {mtile: _render_metatile(mmu, chr_rom, mtile, palette)
for mtile in set(level.flatten())}
return level, mtiles
def render_level(level, mtiles):
return np.vstack([np.hstack([mtiles[mtile] for mtile in row]) for row in level])
if __name__ == "__main__":
import sys
import PIL.Image
world_map = {
'{}-{}'.format(world_num, area_num): (world_num, area_num)
for world_num in range(1, 9)
for area_num in range(1, 5)
}
world_map.update({
'bonus': 0xc2,
'cloud1': 0x2b,
'cloud2': 0x34,
'water1': 0x00,
'water2': 0x02,
'warp': 0x2f,
})
if len(sys.argv) < 6:
print("Usage: {} <world> <prg-rom> <sym-file> <chr-rom> <out-file>".format(sys.argv[0]), file=sys.stderr)
print(" <world> is one of {}".format(', '.join(sorted(world_map.keys()))), file=sys.stderr)
print(" <prg-rom> is the binary output from x816")
print(" <sym-file> is the sym file output from x816")
print(" <chr-rom> is a CHR-ROM dump")
print(" <out-file> is the output image name")
sys.exit(-1)
stage = world_map[sys.argv[1]]
with open(sys.argv[2], 'rb') as f:
prg_rom = f.read()
sym_file = SymbolFile(sys.argv[3])
with open(sys.argv[4], 'rb') as f:
chr_rom = f.read()
out_fname = sys.argv[5]
with (pathlib.Path(sys.argv[0]).parent / "data" / "ntscpalette.pal").open("rb") as f:
nes_palette = np.array(list(f.read())).reshape(64, 3)
level, mtiles = load_level(stage, prg_rom, chr_rom, sym_file, nes_palette)
a = render_level(level, mtiles).astype(np.uint8)
im = PIL.Image.fromarray(a)
im.save(out_fname)
How did you create your workspace in Visual Studio? Do you have Python development tools installed with Visual Studio? Did you create a "Python application" as your project template?
If so then your project should have a virtual environment created, which you can see in the solution directory. If that is the case do:
Go to "Solution Explorer" Tab >
Find "Python Environments"
Find your active env. For me there was only one called "Python 3.9 (global default)"
Right click and select "Manage Python Packages..."
There it should list all the packages installed and versions. If numpy is not there, just type "numpy" in the search box and click the suggests install option: "run command: pip install numpy".
Make sure you have installed NumPy in the same python environment that you use to run the program. (Check the PATH variable if it includes the path to the correct python environment)

How to get progress of successful build through Jenkins Python API

I have written python code to retrieve information about build. I prints a summary of successful and unsuccessful builds.
from prettytable import PrettyTable
t = PrettyTable(['Job name','Successful','Failed','Unstable','Aborted','Total Builds','Failure Rate'])
t1 = PrettyTable(['Status', 'Job name','Build #','Date','Duration','Node','User'])
aggregation ={}
jobs = server.get_all_jobs(folder_depth=None)
for job in jobs:
print(job['fullname'])
aggregation[job['fullname']] = {"success" : 0 , "failure" : 0 , "aborted" : 0, "unstable":0}
info = server.get_job_info(job['fullname'])
# Loop over builds
builds = info['builds']
for build in builds:
information = server.get_build_info(job["fullname"],
build['number'])
if "SUCCESS" in information['result']:
aggregation[job['fullname']]['success'] = str(int(aggregation[job['fullname']]['success']) + 1)
if "FAILURE" in information['result']:
aggregation[job['fullname']]['failure'] = str(int(aggregation[job['fullname']]['failure']) + 1)
if "ABORTED" in information['result']:
aggregation[job['fullname']]['aborted'] = str(int(aggregation[job['fullname']]['aborted']) + 1)
if "UNSTABLE" in information['result']:
aggregation[job['fullname']]['unstable'] = str(int(aggregation[job['fullname']]['unstable']) + 1)
t1.add_row([ information['result'], job['fullname'],information["id"],datetime.fromtimestamp(information['timestamp']/1000),information["duration"],"master",information["actions"][0]["causes"][0]["userName"]])
total_build = int(aggregation[job['fullname']]['success'])+int(aggregation[job['fullname']]['failure'])
t.add_row([job["fullname"], aggregation[job['fullname']]['success'],aggregation[job['fullname']]['failure'],aggregation[job['fullname']]['aborted'],aggregation[job['fullname']]['unstable'],total_build,(float(aggregation[job['fullname']]['failure'])/total_build)*100])
with open('result', 'w') as w:
w.write(str(t1))
w.write(str(t))
This is what the output looks like:
And this is what Windows batch execute command looks like:
cd E:\airflowtmp
conda activate web_scraping
python hello.py
hello.py prints hello world. If I add print counter =100 or something like this then how do I return it and print it in this resultant table.
Edit:
I am trying to get some kind of variable from code to display. For instance if Im scraping pages and scraper ran successfully then I want to know the number of pages that it scraped. You can think of it as a simple counter. Is there any way to return a variable from Jenkins to python

Why does Python 2.7 os.system(command) sometimes recurse in Windows 8 while os.startfile(command) does not recurse?

Why does Python 2.7 os.system(command) sometimes recurse in Windows 8 while os.startfile(command) does not recurse?
My command is backup.bat, which is a Windows batch file.
The contents of backup.bat are:
"C:\Users\Frank Chang\Anaconda2\python.exe" -m animation_mini
#echo off
echo %time%
timeout 10 > NUL
echo %time%
The way I discovered that the Python 2.7 animate function in animation_mini.py was being invoked multiple times when os.system
was used is to place a print statement at the beginning of the animate function entry point and count the print statements in the console.
I was told today that Python 2.7 os.system(command) is a wrapper around the C function execve. But that fact does not explain the recursion I see with os.system('backup.bat').
os.system is being called from adder.cgi, a Python 2.7 CGI script whose lines of code are:
#!C:\Users\Frank Chang\Anaconda2\python.exe
import cgitb
import cgi
import os
import signal
import threading , time
import sys
sys.path.insert(0,"C:\Users\Frank Chang\Documents\Arduino\mary\data\usr\lib\python2.7\dist-packages\HTMLgen")
import HTMLgen
import subprocess
import win32api
import pandas as pd
def main():
form = cgi.FieldStorage()
numStr1 = form.getfirst("input1", "0")
numStr2 = form.getfirst("input2", "0")
numStr3 = form.getfirst("input3", "0")
numStr4 = form.getfirst("input4", "0")
numStr5 = form.getfirst("input5", "0")
numStr6 = form.getfirst("input6", "0")
numStr7 = form.getfirst("input7", "0")
numStr8 = form.getfirst("input8", "0")
numStr9 = form.getfirst("input9", "0")
numStr10 = form.getfirst("input10", "0")
numStr11 = form.getfirst("input11", "0")
numStr12 = form.getfirst("input12", "0")
numStr13 = form.getfirst("input13", "0")
numStr14 = form.getfirst("input14", "0")
from pandas import ExcelWriter
writer = ExcelWriter('PythonExport.xlsx')
from pandas import DataFrame
yourdf = DataFrame({'DC Start': numStr1, 'DC Duration': numStr2, 'Plant Start': numStr3, 'Plant Duration': numStr4,
'Supplier Start': numStr5, 'Supplier Duration': numStr6}, index=[0])
yourdf.to_excel(writer,'Disruptions')
yourdf = DataFrame({'FGI': numStr10, 'WIP': numStr11, 'DC': numStr12, 'Plant': numStr13,
'Supplier' : numStr14}, index=[0])
yourdf.to_excel(writer,'Policy')
writer.save()
os.system('backup.bat')
def processInput(numStr1, numStr2):
'''Process input parameters and return the final page as a string.'''
num1 = int(numStr1) # transform input to output data
num2 = int(numStr2)
total = num1+num2
return str(total)
def fileToStr(fileName):
"""Return a string containing the contents of the named file."""
fin = open(fileName);
contents = fin.read();
fin.close()
return contents
main()
Might my CGI script be the cause of the os.system('backup.bat') recursion?
Both eryksun and Blckkngt had exactly the right answer yesterday about mixing multiple versions together to cause the recursive nightmare.
The solution to the INTERNAL ERROR: cannot create temporary directory lies in the pyinstaller which can be corrected through using win32api.SetDllDirectory(..) and copying the parent process's environment variables and adding the key value pair "TEMP", "C:/TMP" to the dictionary
The CGI script had no bearing on whether our animation function was run several times consecutively.

py2exe SytaxError: invalid syntax (asyncsupport.py, line22) [duplicate]

This command works fine on my personal computer but keeps giving me this error on my work PC. What could be going on? I can run the Char_Limits.py script directly in Powershell without a problem.
error: compiling 'C:\ProgramData\Anaconda2\lib\site-packages\jinja2\asyncsupport.py' failed
SyntaxError: invalid syntax (asyncsupport.py, line 22)
My setup.py file looks like:
from distutils.core import setup
import py2exe
setup (console=['Char_Limits.py'])
My file looks like:
import xlwings as xw
from win32com.client import constants as c
import win32api
"""
Important Notes: Header row has to be the first row. No columns without a header row. If you need/want a blank column, just place a random placeholder
header value in the first row.
Product_Article_Number column is used to determine the number of rows. It must be populated for every row.
"""
#functions, hooray!
def setRange(columnDict, columnHeader):
column = columnDict[columnHeader]
rngForFormatting = xw.Range((2,column), (bttm, column))
cellReference = xw.Range((2,column)).get_address(False, False)
return rngForFormatting, cellReference
def msg_box(message):
win32api.MessageBox(wb.app.hwnd, message)
#Character limits for fields in Hybris
CharLimits_Fields = {"alerts":500, "certifications":255, "productTitle":300,
"teaserText":450 , "includes":1000, "compliance":255, "disclaimers":9000,
"ecommDescription100":100, "ecommDescription240":240,
"internalKeyword":1000, "metaKeywords":1000, "metaDescription":1000,
"productFeatures":7500, "productLongDescription":1500,"requires":500,
"servicePlan":255, "skuDifferentiatorText":255, "storage":255,
"techDetailsAndRefs":12000, "warranty":1000}
# Fields for which a break tag is problematic.
BreakTagNotAllowed = ["ecommDescription100", "ecommDescription240", "productTitle",
"skuDifferentiatorText"]
app = xw.apps.active
wb = xw.Book(r'C:\Users\XXXX\Documents\Import File.xlsx')
#identifies the blanket range of interest
firstCell = xw.Range('A1')
lstcolumn = firstCell.end("right").column
headers_Row = xw.Range((1,1), (1, lstcolumn)).value
columnDict = {}
for column in range(1, len(headers_Row) + 1):
header = headers_Row[column - 1]
columnDict[header] = column
try:
articleColumn = columnDict["Product_Article_Number"]
except:
articleColumn = columnDict["Family_Article_Number"]
firstCell = xw.Range((1,articleColumn))
bttm = firstCell.end("down").row
wholeRange = xw.Range((1,1),(bttm, lstcolumn))
wholeRangeVal = wholeRange.value
#Sets the font and deletes previous conditional formatting
wholeRange.api.Font.Name = "Arial Unicode MS"
wholeRange.api.FormatConditions.Delete()
for columnHeader in columnDict.keys():
if columnHeader in CharLimits_Fields.keys():
rng, cellRef = setRange(columnDict, columnHeader)
rng.api.FormatConditions.Add(2,3, "=len(" + cellRef + ") >=" + str(CharLimits_Fields[columnHeader]))
rng.api.FormatConditions(1).Interior.ColorIndex = 3
if columnHeader in BreakTagNotAllowed:
rng, cellRef = setRange(columnDict, columnHeader)
rng.api.FormatConditions.Add(2,3, '=OR(ISNUMBER(SEARCH("<br>",' + cellRef + ')), ISNUMBER(SEARCH("<br/>",' + cellRef + ")))")
rng.api.FormatConditions(2).Interior.ColorIndex = 6
searchResults = wholeRange.api.Find("~\"")
if searchResults is not None:
msg_box("There's a double quote in this spreadsheet")
else:
msg_box("There are no double quotes in this spreadsheet")
# app.api.FindFormat.Clear
# app.api.FindFormat.Interior.ColorIndex = 3
# foundRed = wholeRange.api.Find("*", SearchFormat=True)
# if foundRed is None:
# msg_box("There are no values exceeding character limits")
# else:
# msg_box("There are values exceeding character limits")
# app.api.FindFormat.Clear
# app.api.FindFormat.Interior.ColorIndex = 6
# foundYellow = wholeRange.api.Find("*", SearchFormat=True)
# if foundYellow is None:
# msg_box("There are no break tags in this spreadsheet")
# else:
# msg_box("There are break tags in this spreadsheet")
Note:
If you are reading this, I would try Santiago's solution first.
The issue:
Looking at what is likely at line 22 on the github package:
async def concat_async(async_gen):
This is making use of the async keyword which was added in python 3.5, however py2exe only supports up to python 3.4. Now jinja looks to be extending the python language in some way (perhaps during runtime?) to support this async keyword in earlier versions of python. py2exe cannot account for this language extension.
The Fix:
async support was added in jinja2 version 2.9 according to the documentation. So I tried installing an earlier version of jinja (version 2.8) which I downloaded here.
I made a backup of my current jinja installation by moving the contents of %PYTHONHOME%\Lib\site-packages\jinja2 to some other place.
extract the previously downloaded tar.gz file and install the package via pip:
cd .\Downloads\dist\Jinja2-2.8 # or wherever you extracted jinja2.8
python setup.py install
As a side note, I also had to increase my recursion limit because py2exe was reaching the default limit.
from distutils.core import setup
import py2exe
import sys
sys.setrecursionlimit(5000)
setup (console=['test.py'])
Warning:
If whatever it is you are using relies on the latest version of jinja2, then this might fail or have unintended side effects when actually running your code. I was compiling a very simple script.
I had the same trouble coding in python3.7. I fixed that adding the excludes part to my py2exe file:
a = Analysis(['pyinst_test.py'],
#...
excludes=['jinja2.asyncsupport','jinja2.asyncfilters'],
#...)
I took that from: https://github.com/pyinstaller/pyinstaller/issues/2393

Import text files to pig through python UDF

I'm trying to load files to pig while use python udf, i've tried two ways:
• (myudf1, sample1.pig): try to read the file from python, the file is located on my client server.
• (myudf2, sample2.pig): load file from hdfs to grunt shell first, then pass it as a parameter to python udf.
myudf1.py
from __future__ import with_statement
def get_words(dir):
stopwords=set()
with open(dir) as f1:
for line1 in f1:
stopwords.update([line1.decode('ascii','ignore').split("\n")[0]])
return stopwords
stopwords=get_words("/home/zhge/uwc/mappings/english_stop.txt")
#outputSchema("findit: int")
def findit(stp):
stp=str(stp)
if stp in stopwords:
return 1
else:
return 0
sample1.pig:
REGISTER '/home/zhge/uwc/scripts/myudf1.py' USING jython as pyudf;
item_title = load '/user/zhge/data/item_title_sample/000000_0' USING PigStorage(',') AS (title:chararray);
T = limit item_title 1;
S = FOREACH T GENERATE pyudf.findit(title);
DUMP S
I get: IOError: (2, 'No such file or directory', '/home/zhge/uwc/mappings/english_stop.txt')
For solution 2:
myudf2:
def get_wordlists(wordbag):
stopwords=set()
for t in wordbag:
stopwords.update(t.decode('ascii','ignore'))
return stopwords
#outputSchema("findit: int")
def findit(stopwordbag, stp):
stopwords=get_wordlists(stopwordbag)
stp=str(stp)
if stp in stopwords:
return 1
else:
return 0
Sample2.pig
REGISTER '/home/zhge/uwc/scripts/myudf2.py' USING jython as pyudf;
stops = load '/user/zhge/uwc/mappings/stopwords.txt' AS (stop_w:chararray);
-- this step works fine and i can see the "stops" obejct is loaded to pig
item_title = load '/user/zhge/data/item_title_sample/000000_0' USING PigStorage(',') AS (title:chararray);
T = limit item_title 1;
S = FOREACH T GENERATE pyudf.findit(stops.stop_w, title);
DUMP S;
Then I got:
ERROR org.apache.pig.tools.grunt.Grunt -ERROR 1066: Unable to open iterator for alias S. Backend error : Scalar has more than one row in the output. 1st : (a), 2nd :(as
Your second example should work. Though you LIMITed the wrong expression -- it should be on the stops relationship. Therefore it should be:
stops = LOAD '/user/zhge/uwc/mappings/stopwords.txt' AS (stop_w:chararray);
item_title = LOAD '/user/zhge/data/item_title_sample/000000_0' USING PigStorage(',') AS (title:chararray);
T = LIMIT stops 1;
S = FOREACH item_title GENERATE pyudf.findit(T.stop_w, title);
However, since it looks like you need to process all of the stop words first this will not be enough. You'll need to do a GROUP ALL and then pass the results to your get_wordlist function instead:
stops = LOAD '/user/zhge/uwc/mappings/stopwords.txt' AS (stop_w:chararray);
item_title = LOAD '/user/zhge/data/item_title_sample/000000_0' USING PigStorage(',') AS (title:chararray);
T = FOREACH (GROUP stops ALL) GENERATE pyudf.get_wordlists(stops) AS ready;
S = FOREACH item_title GENERATE pyudf.findit(T.ready, title);
You'll have to update your UDF to accept a list of dicts though for this method to work.

Categories