I have written python code to retrieve information about build. I prints a summary of successful and unsuccessful builds.
from prettytable import PrettyTable
t = PrettyTable(['Job name','Successful','Failed','Unstable','Aborted','Total Builds','Failure Rate'])
t1 = PrettyTable(['Status', 'Job name','Build #','Date','Duration','Node','User'])
aggregation ={}
jobs = server.get_all_jobs(folder_depth=None)
for job in jobs:
print(job['fullname'])
aggregation[job['fullname']] = {"success" : 0 , "failure" : 0 , "aborted" : 0, "unstable":0}
info = server.get_job_info(job['fullname'])
# Loop over builds
builds = info['builds']
for build in builds:
information = server.get_build_info(job["fullname"],
build['number'])
if "SUCCESS" in information['result']:
aggregation[job['fullname']]['success'] = str(int(aggregation[job['fullname']]['success']) + 1)
if "FAILURE" in information['result']:
aggregation[job['fullname']]['failure'] = str(int(aggregation[job['fullname']]['failure']) + 1)
if "ABORTED" in information['result']:
aggregation[job['fullname']]['aborted'] = str(int(aggregation[job['fullname']]['aborted']) + 1)
if "UNSTABLE" in information['result']:
aggregation[job['fullname']]['unstable'] = str(int(aggregation[job['fullname']]['unstable']) + 1)
t1.add_row([ information['result'], job['fullname'],information["id"],datetime.fromtimestamp(information['timestamp']/1000),information["duration"],"master",information["actions"][0]["causes"][0]["userName"]])
total_build = int(aggregation[job['fullname']]['success'])+int(aggregation[job['fullname']]['failure'])
t.add_row([job["fullname"], aggregation[job['fullname']]['success'],aggregation[job['fullname']]['failure'],aggregation[job['fullname']]['aborted'],aggregation[job['fullname']]['unstable'],total_build,(float(aggregation[job['fullname']]['failure'])/total_build)*100])
with open('result', 'w') as w:
w.write(str(t1))
w.write(str(t))
This is what the output looks like:
And this is what Windows batch execute command looks like:
cd E:\airflowtmp
conda activate web_scraping
python hello.py
hello.py prints hello world. If I add print counter =100 or something like this then how do I return it and print it in this resultant table.
Edit:
I am trying to get some kind of variable from code to display. For instance if Im scraping pages and scraper ran successfully then I want to know the number of pages that it scraped. You can think of it as a simple counter. Is there any way to return a variable from Jenkins to python
Related
I understand livy session statement intakes code statements like the below example.
data = {
'code': textwrap.dedent("""
import random
NUM_SAMPLES = 100000
def sample(p):
x, y = random.random(), random.random()
return 1 if x*x + y*y < 1 else 0
count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
""")
}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
but is there a way in which I can provide pyspark files, maybe something like this:
data = {
'pySparkFile': file_name.py
}
I understand livy batch provides this functionality but I want an interactive session where users can pass multiple scripts one after another and we can also call variables of other scripts, just like in an interactive pySpark session.
I am not sure this answers your question, but I managed to create a Spark session on EMR using cURL like this:
$ curl -H "Content-Type: application/json" -X POST -d '{"kind":"pyspark", "conf": {"spark.yarn.dist.pyFiles": "s3://bucket-name/test.py"}}' http://ec2-3-87-28-125.compute-1.amazonaws.com:8998/sessions
{"id":0,"name":null,"appId":null,"owner":null,"proxyUser":null,"state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":["stdout: ","\nstderr: ","\nYARN Diagnostics: "]}
I inspected /mnt/var/log/livy/livy-livy-server.out and found this line which indicates that the session was successfully created:
20/08/31 18:02:25 INFO InteractiveSession: Interactive session 0 created [appid: application_1598896609416_0002, owner: null, proxyUser: None, state: idle, kind: pyspark, info: {driverLogUrl=http://ip-172-31-85-247.ec2.internal:8042/node/containerlogs/container_1598896609416_0002_01_000001/livy, sparkUiUrl=http://ip-172-31-95-182.ec2.internal:20888/proxy/application_1598896609416_0002/}]
Can somebody help me identify the error here? I am using Python on Visual Studio, and I have copied the code from another source. When I paste the code into a playground like Ideone.com, the program runs without issue. Only when I paste into visual studio do I get an error. The error is in the last two lines of code that begin with "print". This is probably a huge noob question, but I am a beginner. Help!
import hashlib as hasher
import datetime as date
# Define what a Snakecoin block is
class Block:
def __init__(self, index, timestamp, data, previous_hash):
self.index = index
self.timestamp = timestamp
self.data = data
self.previous_hash = previous_hash
self.hash = self.hash_block()
def hash_block(self):
sha = hasher.sha256()
sha.update(str(self.index) + str(self.timestamp) + str(self.data) + str(self.previous_hash))
return sha.hexdigest()
# Generate genesis block
def create_genesis_block():
# Manually construct a block with
# index zero and arbitrary previous hash
return Block(0, date.datetime.now(), "Genesis Block", "0")
# Generate all later blocks in the blockchain
def next_block(last_block):
this_index = last_block.index + 1
this_timestamp = date.datetime.now()
this_data = "Hey! I'm block " + str(this_index)
this_hash = last_block.hash
return Block(this_index, this_timestamp, this_data, this_hash)
# Create the blockchain and add the genesis block
blockchain = [create_genesis_block()]
previous_block = blockchain[0]
# How many blocks should we add to the chain
# after the genesis block
num_of_blocks_to_add = 20
# Add blocks to the chain
for i in range(0, num_of_blocks_to_add):
block_to_add = next_block(previous_block)
blockchain.append(block_to_add)
previous_block = block_to_add
# Tell everyone about it!
print "Block #{} has been added to the blockchain!".format(block_to_add.index)
print "Hash: {}\n".format(block_to_add.hash)
In visual studio and most other editors, they are updated to the most recent python, also meaning that you have to place parentheses after the print statement.
For example:
print("hello")
Visual Studio runs Python 3.x.x, so, you have to write the print sentences on this way:
print('String to print')
Because in Python 3 print is a function, not a reserved word like in Python 2.
I have a VBA script in Microsoft Access. The VBA script is part of a large project with multiple people, and so it is not possible to leave the VBA environment.
In a section of my script, I need to do complicated linear algebra on a table quickly. So, I move the VBA tables written as recordsets) into Python to do linear algebra, and back into VBA. The matrices in python are represented as numpy arrays.
Some of the linear algebra is proprietary and so we are compiling the proprietary scripts with pyinstaller.
The details of the process are as follows:
The VBA script creates a csv file representing the table input.csv.
The VBA script runs the python script through the command line
The python script loads the csv file input.csv as a numpy matrix, does linear algebra on it, and creates an output csv file output.csv.
VBA waits until python is done, then loads output.csv.
VBA deletes the no-longer-needed input.csv file and output.csv file.
This process is inefficient.
Is there a way to load VBA matrices into Python (and back) without the csv clutter? Do these methods work with compiled python code through pyinstaller?
I have found the following examples on stackoverflow that are relevant. However, they do not address my problem specifically.
Return result from Python to Vba
How to pass Variable from Python to VBA Sub
Solution 1
Either retrieve the COM running instance of Access and get/set the data directly with the python script via the COM API:
VBA:
Private Cache
Public Function GetData()
GetData = Cache
Cache = Empty
End Function
Public Sub SetData(data)
Cache = data
End Sub
Sub Usage()
Dim wshell
Set wshell = VBA.CreateObject("WScript.Shell")
' Make the data available via GetData()'
Cache = Array(4, 6, 8, 9)
' Launch the python script compiled with pylauncher '
Debug.Assert 0 = wshell.Run("C:\dev\myapp.exe", 0, True)
' Handle the returned data '
Debug.Assert Cache(3) = 2
End Sub
Python (myapp.exe):
import win32com.client
if __name__ == "__main__":
# get the running instance of Access
app = win32com.client.GetObject(Class="Access.Application")
# get some data from Access
data = app.run("GetData")
# return some data to Access
app.run("SetData", [1, 2, 3, 4])
Solution 2
Or create a COM server to expose some functions to Access :
VBA:
Sub Usage()
Dim Py As Object
Set Py = CreateObject("Python.MyModule")
Dim result
result = Py.MyFunction(Array(5, 6, 7, 8))
End Sub
Python (myserver.exe or myserver.py):
import sys, os, win32api, win32com.server.localserver, win32com.server.register
class MyModule(object):
_reg_clsid_ = "{5B4A4174-EE23-4B70-99F9-E57958CFE3DF}"
_reg_desc_ = "My Python COM Server"
_reg_progid_ = "Python.MyModule"
_public_methods_ = ['MyFunction']
def MyFunction(self, data) :
return [(1,2), (3, 4)]
def register(*classes) :
regsz = lambda key, val: win32api.RegSetValue(-2147483647, key, 1, val)
isPy = not sys.argv[0].lower().endswith('.exe')
python_path = isPy and win32com.server.register._find_localserver_exe(1)
server_path = isPy and win32com.server.register._find_localserver_module()
for cls in classes :
if isPy :
file_path = sys.modules[cls.__module__].__file__
class_name = '%s.%s' % (os.path.splitext(os.path.basename(file_path))[0], cls.__name__)
command = '"%s" "%s" %s' % (python_path, server_path, cls._reg_clsid_)
else :
file_path = sys.argv[0]
class_name = '%s.%s' % (cls.__module__, cls.__name__)
command = '"%s" %s' % (file_path, cls._reg_clsid_)
regsz("SOFTWARE\\Classes\\" + cls._reg_progid_ + '\\CLSID', cls._reg_clsid_)
regsz("SOFTWARE\\Classes\\AppID\\" + cls._reg_clsid_, cls._reg_progid_)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_, cls._reg_desc_)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\LocalServer32', command)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\ProgID', cls._reg_progid_)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\PythonCOM', class_name)
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\PythonCOMPath', os.path.dirname(file_path))
regsz("SOFTWARE\\Classes\\CLSID\\" + cls._reg_clsid_ + '\\Debugging', "0")
print('Registered ' + cls._reg_progid_)
if __name__ == "__main__":
if len(sys.argv) > 1 :
win32com.server.localserver.serve(set([v for v in sys.argv if v[0] == '{']))
else :
register(MyModule)
Note that you'll have to run the script once without any argument to register the class and to make it available to VBA.CreateObject.
Both solutions work with pylauncher and the array received in python can be converted with numpy.array(data).
Dependency :
https://pypi.python.org/pypi/pywin32
You can try loading your record set into an array, dim'ed as Double
Dim arr(1 to 100, 1 to 100) as Double
by looping, then pass the pointer to the first element ptr = VarPtr(arr(1, 1)) to Python, where
arr = numpy.ctypeslib.as_array(ptr, (100 * 100,)) ?
But VBA will still own the array memory
There is a very simple way of doing this with xlwings. See xlwings.org and make sure to follow the instructions to enable macro settings, tick xlwings in VBA references, etc. etc.
The code would then look as simple as the following (a slightly silly block of code that just returns the same dataframe back, but you get the picture):
import xlwings as xw
import numpy as np
import pandas as pd
# the #xw.decorator is to tell xlwings to create an Excel VBA wrapper for this function.
# It has no effect on how the function behaves in python
#xw.func
#xw.arg('pensioner_data', pd.DataFrame, index=False, header=True)
#xw.ret(expand='table', index=False)
def pensioner_CF(pensioner_data, mortality_table = "PA(90)", male_age_adj = 0, male_improv = 0, female_age_adj = 0, female_improv = 0,
years_improv = 0, arrears_advance = 0, discount_rate = 0, qxy_tables=0):
pensioner_data = pensioner_data.replace(np.nan, '', regex=True)
cashflows_df = pd.DataFrame()
return cashflows_df
I'd be interested to hear if this answers the question. It certainly made my VBA / python experience a lot easier.
This command works fine on my personal computer but keeps giving me this error on my work PC. What could be going on? I can run the Char_Limits.py script directly in Powershell without a problem.
error: compiling 'C:\ProgramData\Anaconda2\lib\site-packages\jinja2\asyncsupport.py' failed
SyntaxError: invalid syntax (asyncsupport.py, line 22)
My setup.py file looks like:
from distutils.core import setup
import py2exe
setup (console=['Char_Limits.py'])
My file looks like:
import xlwings as xw
from win32com.client import constants as c
import win32api
"""
Important Notes: Header row has to be the first row. No columns without a header row. If you need/want a blank column, just place a random placeholder
header value in the first row.
Product_Article_Number column is used to determine the number of rows. It must be populated for every row.
"""
#functions, hooray!
def setRange(columnDict, columnHeader):
column = columnDict[columnHeader]
rngForFormatting = xw.Range((2,column), (bttm, column))
cellReference = xw.Range((2,column)).get_address(False, False)
return rngForFormatting, cellReference
def msg_box(message):
win32api.MessageBox(wb.app.hwnd, message)
#Character limits for fields in Hybris
CharLimits_Fields = {"alerts":500, "certifications":255, "productTitle":300,
"teaserText":450 , "includes":1000, "compliance":255, "disclaimers":9000,
"ecommDescription100":100, "ecommDescription240":240,
"internalKeyword":1000, "metaKeywords":1000, "metaDescription":1000,
"productFeatures":7500, "productLongDescription":1500,"requires":500,
"servicePlan":255, "skuDifferentiatorText":255, "storage":255,
"techDetailsAndRefs":12000, "warranty":1000}
# Fields for which a break tag is problematic.
BreakTagNotAllowed = ["ecommDescription100", "ecommDescription240", "productTitle",
"skuDifferentiatorText"]
app = xw.apps.active
wb = xw.Book(r'C:\Users\XXXX\Documents\Import File.xlsx')
#identifies the blanket range of interest
firstCell = xw.Range('A1')
lstcolumn = firstCell.end("right").column
headers_Row = xw.Range((1,1), (1, lstcolumn)).value
columnDict = {}
for column in range(1, len(headers_Row) + 1):
header = headers_Row[column - 1]
columnDict[header] = column
try:
articleColumn = columnDict["Product_Article_Number"]
except:
articleColumn = columnDict["Family_Article_Number"]
firstCell = xw.Range((1,articleColumn))
bttm = firstCell.end("down").row
wholeRange = xw.Range((1,1),(bttm, lstcolumn))
wholeRangeVal = wholeRange.value
#Sets the font and deletes previous conditional formatting
wholeRange.api.Font.Name = "Arial Unicode MS"
wholeRange.api.FormatConditions.Delete()
for columnHeader in columnDict.keys():
if columnHeader in CharLimits_Fields.keys():
rng, cellRef = setRange(columnDict, columnHeader)
rng.api.FormatConditions.Add(2,3, "=len(" + cellRef + ") >=" + str(CharLimits_Fields[columnHeader]))
rng.api.FormatConditions(1).Interior.ColorIndex = 3
if columnHeader in BreakTagNotAllowed:
rng, cellRef = setRange(columnDict, columnHeader)
rng.api.FormatConditions.Add(2,3, '=OR(ISNUMBER(SEARCH("<br>",' + cellRef + ')), ISNUMBER(SEARCH("<br/>",' + cellRef + ")))")
rng.api.FormatConditions(2).Interior.ColorIndex = 6
searchResults = wholeRange.api.Find("~\"")
if searchResults is not None:
msg_box("There's a double quote in this spreadsheet")
else:
msg_box("There are no double quotes in this spreadsheet")
# app.api.FindFormat.Clear
# app.api.FindFormat.Interior.ColorIndex = 3
# foundRed = wholeRange.api.Find("*", SearchFormat=True)
# if foundRed is None:
# msg_box("There are no values exceeding character limits")
# else:
# msg_box("There are values exceeding character limits")
# app.api.FindFormat.Clear
# app.api.FindFormat.Interior.ColorIndex = 6
# foundYellow = wholeRange.api.Find("*", SearchFormat=True)
# if foundYellow is None:
# msg_box("There are no break tags in this spreadsheet")
# else:
# msg_box("There are break tags in this spreadsheet")
Note:
If you are reading this, I would try Santiago's solution first.
The issue:
Looking at what is likely at line 22 on the github package:
async def concat_async(async_gen):
This is making use of the async keyword which was added in python 3.5, however py2exe only supports up to python 3.4. Now jinja looks to be extending the python language in some way (perhaps during runtime?) to support this async keyword in earlier versions of python. py2exe cannot account for this language extension.
The Fix:
async support was added in jinja2 version 2.9 according to the documentation. So I tried installing an earlier version of jinja (version 2.8) which I downloaded here.
I made a backup of my current jinja installation by moving the contents of %PYTHONHOME%\Lib\site-packages\jinja2 to some other place.
extract the previously downloaded tar.gz file and install the package via pip:
cd .\Downloads\dist\Jinja2-2.8 # or wherever you extracted jinja2.8
python setup.py install
As a side note, I also had to increase my recursion limit because py2exe was reaching the default limit.
from distutils.core import setup
import py2exe
import sys
sys.setrecursionlimit(5000)
setup (console=['test.py'])
Warning:
If whatever it is you are using relies on the latest version of jinja2, then this might fail or have unintended side effects when actually running your code. I was compiling a very simple script.
I had the same trouble coding in python3.7. I fixed that adding the excludes part to my py2exe file:
a = Analysis(['pyinst_test.py'],
#...
excludes=['jinja2.asyncsupport','jinja2.asyncfilters'],
#...)
I took that from: https://github.com/pyinstaller/pyinstaller/issues/2393
I'm trying to load files to pig while use python udf, i've tried two ways:
• (myudf1, sample1.pig): try to read the file from python, the file is located on my client server.
• (myudf2, sample2.pig): load file from hdfs to grunt shell first, then pass it as a parameter to python udf.
myudf1.py
from __future__ import with_statement
def get_words(dir):
stopwords=set()
with open(dir) as f1:
for line1 in f1:
stopwords.update([line1.decode('ascii','ignore').split("\n")[0]])
return stopwords
stopwords=get_words("/home/zhge/uwc/mappings/english_stop.txt")
#outputSchema("findit: int")
def findit(stp):
stp=str(stp)
if stp in stopwords:
return 1
else:
return 0
sample1.pig:
REGISTER '/home/zhge/uwc/scripts/myudf1.py' USING jython as pyudf;
item_title = load '/user/zhge/data/item_title_sample/000000_0' USING PigStorage(',') AS (title:chararray);
T = limit item_title 1;
S = FOREACH T GENERATE pyudf.findit(title);
DUMP S
I get: IOError: (2, 'No such file or directory', '/home/zhge/uwc/mappings/english_stop.txt')
For solution 2:
myudf2:
def get_wordlists(wordbag):
stopwords=set()
for t in wordbag:
stopwords.update(t.decode('ascii','ignore'))
return stopwords
#outputSchema("findit: int")
def findit(stopwordbag, stp):
stopwords=get_wordlists(stopwordbag)
stp=str(stp)
if stp in stopwords:
return 1
else:
return 0
Sample2.pig
REGISTER '/home/zhge/uwc/scripts/myudf2.py' USING jython as pyudf;
stops = load '/user/zhge/uwc/mappings/stopwords.txt' AS (stop_w:chararray);
-- this step works fine and i can see the "stops" obejct is loaded to pig
item_title = load '/user/zhge/data/item_title_sample/000000_0' USING PigStorage(',') AS (title:chararray);
T = limit item_title 1;
S = FOREACH T GENERATE pyudf.findit(stops.stop_w, title);
DUMP S;
Then I got:
ERROR org.apache.pig.tools.grunt.Grunt -ERROR 1066: Unable to open iterator for alias S. Backend error : Scalar has more than one row in the output. 1st : (a), 2nd :(as
Your second example should work. Though you LIMITed the wrong expression -- it should be on the stops relationship. Therefore it should be:
stops = LOAD '/user/zhge/uwc/mappings/stopwords.txt' AS (stop_w:chararray);
item_title = LOAD '/user/zhge/data/item_title_sample/000000_0' USING PigStorage(',') AS (title:chararray);
T = LIMIT stops 1;
S = FOREACH item_title GENERATE pyudf.findit(T.stop_w, title);
However, since it looks like you need to process all of the stop words first this will not be enough. You'll need to do a GROUP ALL and then pass the results to your get_wordlist function instead:
stops = LOAD '/user/zhge/uwc/mappings/stopwords.txt' AS (stop_w:chararray);
item_title = LOAD '/user/zhge/data/item_title_sample/000000_0' USING PigStorage(',') AS (title:chararray);
T = FOREACH (GROUP stops ALL) GENERATE pyudf.get_wordlists(stops) AS ready;
S = FOREACH item_title GENERATE pyudf.findit(T.ready, title);
You'll have to update your UDF to accept a list of dicts though for this method to work.