Script to nicify pc-lint MISRA C Output - python

I recently acquired a trial version of some source code to check MISRA compliance before purchasing. I have run pc-lint over the C code to verify compliance, and have got an output of a huge amount of violations. I was wanting to nicify the html generated so that I can sort what violations there are. I have tried googling for something that exists already to do this with little yield, so instead i began writing a python script...
In short, the script iterates through every line of the html output multiple times in order to check for a particular string. Of course this takes a ridiculously long time to execute, I have been unable to find an elegant solution to this, but I'm hoping im missing something obvious that someone could point out... otherwise, perhaps another language would be more appropriate that would execute faster. Cheers!
#!/usr/bin/env python
import re
rule_search = re.compile("Required Rule (.*?),",re.DOTALL|re.M)
rule_search2 = re.compile("MISRA 2004 Rule (.*?)]",re.DOTALL|re.M)
line_search = re.compile("<br>(.*?)<br>",re.DOTALL|re.M)
data=open('lint-all.html').read()
unique_rules = list(set(rule_search.findall(data)))
unique_rules2 = list(set(rule_search2.findall(data)))
MISRA_Rules = unique_rules + unique_rules2
count = [0] * len(MISRA_Rules)
page_lines = {}
pages = {}
counts = open("pages/counts.html",'w')
counts.write("<h2>Violated Rules Count</h2><h3><ol>")
counts.close()
for i in range (len(MISRA_Rules)):
pages[i] = open("pages/" + str(MISRA_Rules[i]).translate(None, '.') + ".html", 'w')
pages[i].close()
counts = open("pages/counts.html",'a+')
counts.write("<a href=" + str(MISRA_Rules[i]).translate(None, '.') + ".html>" + str(MISRA_Rules[i]) + "</a>: <font size='3'> 0 </font> " )
if i%4 == 0 and i != 0:
counts.write("<br />")
counts.write("<br /><a href=sorted.html>Total:</a> " + "<font size='3'>" + str(count) + "</font>")
counts.write("</h3>")
for i in range (len(MISRA_Rules)):
pages[i] = open("pages/" + str(MISRA_Rules[i]).translate(None, '.') + ".html", 'a+')
pages[i].write("<h1>MISRA Rule " + str(MISRA_Rules[i]) + "</h1>")
pages[i].write("""<link rel="import" href="counts.html">""")
for j in range (len(line_search.findall(data))):
if "Rule " + str(MISRA_Rules[i]) in line_search.findall(data)[j]:
count[i] += 1
pages[i].write("<br>")
pages[i].write(line_search.findall(data)[j])
pages[i].write("</br>")
print "out"
new_html = open('pages/sorted.html', 'w')
counts = """<h2>Violated Rules Count</h2><h3><ol>"""
for i in range (len(MISRA_Rules)):
counts += """""" + str(MISRA_Rules[i]) + """: <font size="3">""" + str(count[i]) + """</font> """
if i%4 == 0 and i != 0:
counts += """<br />"""
counts += """<br /><a href=sorted.html>Total:</a> """ + """<font size="3">""" + str(count) + """</font>"""
counts += """</h3>"""
counts.close()
new_html.write(counts)
new_html.write(data)
new_html.close()

Several approaches possible.
First is to optimize existing code. It's difficult to say what's wrong with it. In this case one goes to cprofile docs and sets up a profiler. There you'll see the bottlenecks.
Second approach (most preferable to my opinion): parse data in Python, but leave HTML generation to specialized tools, such as jinja2 template engine, which is extensively used in web development. The simpler version of jinja2 is mustache, most likely that in won't require any installation.
Third approach is to do all this stuff in-browser. Add jQuery for DOM manipulation (introduce new tags and classes) and a css stylesheet (determine how new tags and classes should look like).

Related

Trouble with initializing first project

I'm a newbie here. I'm trying to print a list out in a doc, and I can't get my python to initialize. It's my first python project, so I'm thinking that I am missing a line of code at the top -- though I cannot figure out what that is. Here is my code:
num = input('--')
month = input('--')
crimes = ['_TOT_CLR_MURDER ','_TOT_CLR_MANSLGHTR ','_TOT_CLR_RAPE_TOTAL','_TOT_CLR_FORC_RAPE ','_TOT_CLR_ATMPTD_RAPE ','_TOT_CLR_TOTL_ROBERY ','_TOT_CLR_GUN_ROBBERY ','_TOT_CLR_KNIFE_ROBRY ','_TOT_CLR_OTH_WPN_ROB ','_TOT_CLR_STR_ARM_ROB ','_TOT_CLR_ASSLT_TOTAL ','_TOT_CLR_GUN_ASSAULT ','_TOT_CLR_KNIFE_ASSLT ','_TOT_CLR_OTH_WPN_ASLT ','_TOT_CLR_HND_FT_ASLT ','_TOT_CLR_SIMPLE_ASLT ','_TOT_CLR_BRGLRY_TOTL ','_TOT_CLR_FORC_ENTRY ','_TOT_CLR_ENTR-NO_FRC ','_TOT_CLR_ATT_BURGLRY ','_TOT_CLR_LARCNY_TOTL ','_TOT_CLR_VHC_THFT_TOT ','_TOT_CLR_AUTO_THEFT ','_TOT_CLR_TRCK_BS_THFT ','_TOT_CLR_OTH_VHC_THFT ','_TOT_CLR_ALL_FIELDS ']
f = open('list.txt','w')
for item in crimes:
f.write(item + num + "-" + num + "\n")
num++
Do you think you could help?
Edit - I made the arguments in open() into strings, and closed the file after writing - unfortunately the file will not run.

Highlight every match with Solr 6, Python 3 and pysolr

I have this Solr index that contains a large numer of quite long text files, indexed with the text_sv schema. I want to print out every single snippet for each indexed document. However, I only retrieve a few ones, even though I have tried to smanipulate the various settings as specified in the documentation.
Here is the code section:
results = solr.search(search_string, rows = result_limit, sort = order,
**{
'hl':'true',
'hl.fragsize': 100,
'hl.fl': 'fulltext',
'hl.maxAnalyzedChars': -1,
'hl.snippets': 100,
})
resultcounter = 0
for result in results:
resultcounter += 1
fulltexturl = '<a href="http://localhost/source/\
' + result['filename'] + '">' + result['filename'][:-4] + '</a>'
year = str(result['year'])
number = str(result['number'])
highlights = results.highlighting
print("Saw {0} result(s).".format(len(results)))
print('<p>' + str(resultcounter) + '. <b>År:</b> ' + year + ', <b>Nummer\
: </b>' + number +' ,<b>Fulltext:</b> ' + fulltexturl + '. <b>\
</b> träffar.<br></p>')
inSOUresults = 1
for idnumber, h in highlights.items():
for key, value in h.items():
for v in value:
print('<p>' + str(inSOUresults) + ". " + v + "</p>")
inSOUresults += 1
What am I doing wrong?
You probably want a very large (or 0) value for the hl.fragments parameter (from the Highlighting wiki page):
With the original Highlighter, if you have a use case where you need to highlight the complete text of a field and need to highlight every instance of the search term(s) you can set hl.fragsize to a very high value (whatever it takes to include all the text for the largest value for that field), for example &hl.fragsize=50000.
However, if you want to change fragsize to a value greater than 51200 to return long document texts with highlighting, you will need to pass the same value to hl.maxAnalyzedChars parameter too. These two parameters go hand in hand and changing just the hl.fragsize would not be sufficient for highlighting in very large fields.

Python: Writing multiple variables to a file

I'm fairly new to Python and I've written a scraper that prints the data I scrap the exact way I need it, but I'm having trouble writing the data to a file. I need it to look the exact same way and be in the same order as it does when it prints in IDLE
import requests
import re
from bs4 import BeautifulSoup
year_entry = raw_input("Enter year: ")
week_entry = raw_input("Enter week number: ")
week_link = requests.get("http://sports.yahoo.com/nfl/scoreboard/?week=" + week_entry + "&phase=2&season=" + year_entry)
page_content = BeautifulSoup(week_link.content)
a_links = page_content.find_all('tr', {'class': 'game link'})
for link in a_links:
r = 'http://www.sports.yahoo.com' + str(link.attrs['data-url'])
r_get = requests.get(r)
soup = BeautifulSoup(r_get.content)
stats = soup.find_all("td", {'class':'stat-value'})
teams = soup.find_all("th", {'class':'stat-value'})
scores = soup.find_all('dd', {"class": 'score'})
try:
game_score = scores[-1]
game_score = game_score.text
x = game_score.split(" ")
away_score = x[1]
home_score = x[4]
home_team = teams[1]
away_team = teams[0]
away_team_stats = stats[0::2]
home_team_stats = stats[1::2]
print away_team.text + ',' + away_score + ',',
for stats in away_team_stats:
print stats.text + ',',
print '\n'
print home_team.text + ',' + home_score +',',
for stats in home_team_stats:
print stats.text + ',',
print '\n'
except:
pass
I am totally confused on how to get this to print to a txt file the same way it prints in IDLE. The code is built to only run on completed weeks of the NFL season. So if you test the code, I recommend year = 2014 and week = 12 (or before)
Thanks,
JT
To write to a file you need to build up the line as a string, then write that line to a file.
You'd use something like:
# Open/create a file for your output
with open('my_output_file.csv', 'wb') as csv_out:
...
# Your BeautifulSoup code and parsing goes here
...
# Then build up your output strings
for link in a_links:
away_line = ",".join([away_team.text, away_score])
for stats in away_team_stats:
away_line += [stats.text]
home_line = ",".join(home_team.text, home_score])
for stats in home_team_stats:
home_line += [stats.text]
# Write your output strings to the file
csv_out.write(away_line + '\n')
csv_out.write(home_line + '\n')
This is a quick and dirty fix. To do it properly you probably want to look into the csv module (docs)
From the structure of your output I agree with Jamie that using CSV is a logical choice.
But since you're using Python 2, it's possible to use an alternate form of the print statement to print to a file.
From https://docs.python.org/2/reference/simple_stmts.html#the-print-statement
print also has an extended form, defined by the second portion of the
syntax described above. This form is sometimes referred to as “print
chevron.” In this form, the first expression after the >> must
evaluate to a “file-like” object, specifically an object that has a
write() method as described above. With this extended form, the
subsequent expressions are printed to this file object. If the first
expression evaluates to None, then sys.stdout is used as the file for
output.
Eg,
outfile = open("myfile.txt", "w")
print >>outfile, "Hello, world"
outfile.close()
However, this syntax is not supported in Python 3, so I guess it's probably not a good idea to use it. :) FWIW, I generally use the file write() method in my code when writing to files, except that I tend to use print >>sys.stderr for error messages.

python & pyparsing newb: how to open a file

Paul McGuire, the author of pyparsing, was kind enough to help a lot with a problem I'm trying to solve. We're on 1st down with a yard to goal, but I can't even punt it across the goal line. Confucius said if he gave a student 1/4 of the solution, and he did not return with the other 3/4s, then he would not teach that student again. So it is after almost a week of frustation and with great anxiety that I ask this...
How do I open an input file for pyparsing and print the output to another file?
Here is what I've got so far, but it's really all his work
from pyparsing import *
datafile = open( 'test.txt' )
# Backaus Nuer Form
num = Word(nums)
accessionDate = Combine(num + "/" + num + "/" + num)("accDate")
accessionNumber = Combine("S" + num + "-" + num)("accNum")
patMedicalRecordNum = Combine(num + "/" + num + "-" + num + "-" + num)("patientNum")
gleason = Group("GLEASON" + Optional("SCORE:") + num("left") + "+" + num("right") + "=" + num("total"))
patientData = Group(accessionDate + accessionNumber + patMedicalRecordNum)
partMatch = patientData("patientData") | gleason("gleason")
lastPatientData = None
# PARSE ACTIONS
def patientRecord( datafile ):
for match in partMatch.searchString(datafile):
if match.patientData:
lastPatientData = match
elif match.gleason:
if lastPatientData is None:
print "bad!"
continue
print "{0.accDate}: {0.accNum} {0.patientNum} Gleason({1.left}+{1.right}={1.total})".format(
lastPatientData.patientData, match.gleason
)
patientData.setParseAction(lastPatientData)
# MAIN PROGRAM
if __name__=="__main__":
patientRecord()
It looks like you need to call datafile.read() in order to read the contents of the file. Right now you are trying to call searchString on the file object itself, not the text in the file. You should really look at the Python tutorial (particularly this section) to get up to speed on how to read files, etc.
It seems like you need some help putting it together. The advice of #BrenBarn is spot-on, work with problem of simple complexity before you put it all together. I can help by giving you a minimal example of what you are trying to do, with a much simpler grammar. You can use this as a template to learn how to read/write a file in python. Consider the input text file data.txt:
cat 3
dog 5
foo 7
Let's parse this file and output the results. To have some fun, let's mulpitply the second column by 2:
from pyparsing import *
# Read the input data
filename = "data.txt"
FIN = open(filename)
TEXT = FIN.read()
# Define a simple grammar for the text, multiply the first col by 2
digits = Word(nums)
digits.setParseAction(lambda x:int(x[0]) * 2)
blocks = Group(Word(alphas) + digits)
grammar = OneOrMore(blocks)
# Parse the results
result = grammar.parseString( TEXT )
# This gives a list of lists
# [['cat', 6], ['dog', 10], ['foo', 14]]
# Open up a new file for the output
filename2 = "data2.txt"
FOUT = open(filename2,'w')
# Walk through the results and write to the file
for item in result:
print item
FOUT.write("%s %i\n" % (item[0],item[1]))
FOUT.close()
This gives in data2.txt:
cat 6
dog 10
foo 14
Break each piece down until you understand it. From here, you can slowly adapt this minimal example to your more complex problem above. It's OK to read the file in (as long as it is relatively small) since Paul himself notes:
parseFile is really just a simple shortcut around parseString, pretty
much the equivalent of expr.parseString(open(filename).read()).

Function append lines to .csv

It has been awhile since I have written functions with for loops and writing to files so bare with my ignorance.
This function is given an IP address to read from a text file; pings the IP, searches for the received packets and then appends it to a .csv
My question is: Is there a better or an easier way to write this?
def pingS (IPadd4):
fTmp = "tmp"
os.system ("ping " + IPadd4 + "-n 500 > tmp")
sName = siteNF #sys.argv[1]
scrap = open(fTmp,"r")
nF = file(sName,"a") # appends
nF.write(IPadd4 + ",")
for line in scrap:
if line.startswith(" Packets"):
arrT = line.split(" ")
nF.write(arrT[10]+" \n")
scrap.close()
nF.close()
Note: If you need the full script I can supply that as well.
This in my opinion at least makes what is going on a bit more obvious. The len('Received = ') could obviously be replaced by a constant.
def pingS (IPadd4):
fTmp = "tmp"
os.system ("ping " + IPadd4 + "-n 500 > tmp")
sName = siteNF #sys.argv[1]
scrap = open(fTmp,"r")
nF = file(sName,"a") # appends
ip_string = scrap.read()
recvd = ip_string[ip_string.find('Received = ') + len('Received = ')]
nF.write(IPadd4 + ',' + recvd + '\n')
You could also try looking at the Python csv module for writing to the csv. In this case it's pretty trivial though.
This may not be a direct answer, but you may get some performance increase from using StringIO. I have had some dramatic speedups in IO with this. I'm a bioinformatics guy, so I spend a lot of time shooting large text files out of my code.
http://www.skymind.com/~ocrow/python_string/
I use method 5. Didn't require many changes. There are some fancier methods in there, but they didn't appeal to me as much.

Categories