indexError, searching within - python

I am writing a program that will read a CSV file with data that looks like this:
"10724_artifact11679.jpg","H. 3 1/4 in. (8.26 cm)","10.210.114","This artwork is currently on display in Gallery 171","11679"
And write it into an HTML table. I only want the files that say, in the 3rd position, "This artwork is not on display".. but I've been having issues with this set of data
import csv
metlist4 = []
newList = csv.reader(open("v2img_10724_list.csv", 'r'))
for row in newList:
metlist4.append(row)
artifact_template = """<td>
<div>
<img src= "%(image)s" alt = "artifact" />
<p>Dimensions: %(dimension)s </p>
<p>Accession #: %(accession)s </p>
<p>Display: %(display)s </p>
<p>index2: %(index2)s </p>
</div>
</td>"""
html_list = []
count = 5794
for artifact in metlist4:
if artifact[3] in ["This artwork is not on display"]:
artifactinfo = {}
artifactinfo["image"]=artifact[0]
artifactinfo["dimension"]=artifact[1]
artifactinfo["accession"]=artifact[2]
artifactinfo["display"]=artifact[3]
artifactinfo["index2"]=count
count = count + 1
html_list.append(artifact_template % artifactinfo)
else:
pass
f = open("v3display_test.txt", "w")
f.write("\n".join(html_list))
f.close()
I get this error, but only when I run the entire metlist4...
File "/Users/Rose/Documents/workspace/METProjectFOREAL/src/no_display_Met4.py", line 34, in <module>
if artifact[3] in ["This artwork is not on display"]:
IndexError: list index out of range
if I run just a section, for example metlist4[0:500], the error does not occur. Any ideas or suggestions would be greatly appreciated!! Thanks!

There is at least one row that doesn't have a 4th element. Perhaps the line is empty.
Test for the length, and print the row to test:
if len(artifact) < 4:
print 'short row', artifact
If it is an empty line, just skip it:
if not artifact: continue
You are using a lot of verbose and redundant code; there is no need to build a separate list when you can just loop over the csv.reader() object directly, and there is no need to add an empty else: pass block either.
Idiomatic Python code would be:
artifact_template = """<td>
<div>
<img src= "%(image)s" alt = "artifact" />
<p>Dimensions: %(dimension)s </p>
<p>Accession #: %(accession)s </p>
<p>Display: %(display)s </p>
<p>index2: %(index2)s </p>
</div>
</td>"""
html_list = []
fields = 'image dimension accession display'.split()
with open("v2img_10724_list.csv", 'rb') as inputfile:
reader = csv.DictReader(inputfile, fields=fields, restval='_ignored')
for count, artifact in enumerate(reader, 5794):
if artifact and artifact['display'] == "This artwork is not on display":
artifactinfo["index2"] = count
html_list.append(artifact_template % artifact)
This use a csv.DictReader() instead to create the dictionaries per row, a with statement to ensure the file is closed when done, and enumerate() with a start value to track count.

Related

Easier way to use csv file with jinja

stackoverflow.csv:
name,age,country
Dutchie, 10, Netherlands
Germie, 20, Germany
Swisie, 30, Switzerland
stackoverflow.j2:
Name: {{ name }}
Age: {{ age }}
Country: {{ country }}
#####
Python script:
#! /usr/bin/env python
import csv
from jinja2 import Template
import time
source_file = "stackoverflow.csv"
template_file = "stackoverflow.j2"
# String that will hold final full text
full_text = ""
# Open up the Jinja template file (as text) and then create a Jinja Template Object
with open(template_file) as f:
template = Template(f.read(), keep_trailing_newline=True)
# Open up the CSV file containing the data
with open(source_file) as f:
# Use DictReader to access data from CSV
reader = csv.DictReader(f)
# For each row in the CSV, generate a configuration using the jinja template
for row in reader:
text = template.render(
name=row["name"],
age=row["age"],
country=row["country"]
)
# Append this text to the full text
full_text += text
output_file = f"{template_file.split('.')[0]}_{source_file.split('.')[0]}.txt"
# Save the final configuration to a file
with open(output_file, "w") as f:
f.write(full_text)
output:
Name: Dutchie
Age: 10
Country: Netherlands
#####
Name: Germie
Age: 20
Country: Germany
#####
Name: Swisie
Age: 30
Country: Switzerland
#####
See the script and input file above. Everything is working at the moment, but I would like to optimize the script that when I add a new column in the CSV file, I **don'**t need to add the script.
Example: when I add to the CSV file the column "address", I would need the update the template.render with the following:
text = template.render(
name=row["name"],
age=row["age"],
country=row["country"],
address=row["address"]
)
Is there a way to do this more efficient? I once had a code example to do this, but I cannot find it anymore :(.
You can unpack the dict into key and value variables in a for loop with items().
{% for key, value in row.items() %}
{{ key }}: {{ value }}
{% endfor %}
You can also pass the list of rows to the template and use another for loop so that you only have to render the template once.

can't display the line by line converted list on the webpage using python

this the first function that returns the list:
def getListOfKeyWord(keyword):
df=pd.read_excel('finaltrue.xlsx')
corpus=[]
for i in range(len(df)):
if keyword in df["text"][i]:
corpus.append(df["text"][i])
return corpus
this the function where i printing the list as line by line:
def listing(result):
x=0
for item in range(0,len(result)):
x+=1
table = print(x,"",result[item])
return table
this is the placeholder in html :
<div class="row center">
{{key}}
</div>
and here where i call the functions at the app.routing
if request.method=='POST':
word = request.form['kword']
result=getListOfKeyWord(word)
table=listing(result)
return render_template('advance_search.html',key=table)
return render_template('advance_search.html')
now i get a "none" word at the placeholder position
can anyone help please ?
First of all, you cannot assign the output of print() function to a variable.
The key issue here is your listing function. You can use this code instead:
def listing(result):
table = '<br>'.join([ f'{i+1}- {item}' for i, item in enumerate(result) ])
print(table)
return table
You need to use <br> instead of \n to join the list since HTML will show <br> as new line, not \n.

Converting CSV to HTML Table in Python

I'm trying to take data from a .csv file and importing into a HTML table within python.
This is the csv file https://www.mediafire.com/?mootyaa33bmijiq
Context:
The csv is populated with data from a football team [Age group, Round, Opposition, Team Score, Opposition Score, Location]. I need to be able to select a specific age group and only display those details in separate tables.
This is all I've got so far....
infile = open("Crushers.csv","r")
for line in infile:
row = line.split(",")
age = row[0]
week = row [1]
opp = row[2]
ACscr = row[3]
OPPscr = row[4]
location = row[5]
if age == 'U12':
print(week, opp, ACscr, OPPscr, location)
First install pandas:
pip install pandas
Then run:
import pandas as pd
columns = ['age', 'week', 'opp', 'ACscr', 'OPPscr', 'location']
df = pd.read_csv('Crushers.csv', names=columns)
# This you can change it to whatever you want to get
age_15 = df[df['age'] == 'U15']
# Other examples:
bye = df[df['opp'] == 'Bye']
crushed_team = df[df['ACscr'] == '0']
crushed_visitor = df[df['OPPscr'] == '0']
# Play with this
# Use the .to_html() to get your table in html
print(crushed_visitor.to_html())
You'll get something like:
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>age</th>
<th>week</th>
<th>opp</th>
<th>ACscr</th>
<th>OPPscr</th>
<th>location</th>
</tr>
</thead>
<tbody>
<tr>
<th>34</th>
<td>U17</td>
<td>1</td>
<td>Banyo</td>
<td>52</td>
<td>0</td>
<td>Home</td>
</tr>
<tr>
<th>40</th>
<td>U17</td>
<td>7</td>
<td>Aspley</td>
<td>62</td>
<td>0</td>
<td>Home</td>
</tr>
<tr>
<th>91</th>
<td>U12</td>
<td>7</td>
<td>Rochedale</td>
<td>8</td>
<td>0</td>
<td>Home</td>
</tr>
</tbody>
</table>
Firstly, install pandas:
pip install pandas
Then,
import pandas as pd
a = pd.read_csv("Crushers.csv")
# to save as html file
# named as "Table"
a.to_html("Table.htm")
# assign it to a
# variable (string)
html_file = a.to_html()
Below function takes filename, headers(optional) and delimiter(optional) as input and converts csv to html table and returns as string.
If headers are not provided, assumes header is already present in csv file.
Converts csv file contents to HTML formatted table
def csv_to_html_table(fname,headers=None,delimiter=","):
with open(fname) as f:
content = f.readlines()
#reading file content into list
rows = [x.strip() for x in content]
table = "<table>"
#creating HTML header row if header is provided
if headers is not None:
table+= "".join(["<th>"+cell+"</th>" for cell in headers.split(delimiter)])
else:
table+= "".join(["<th>"+cell+"</th>" for cell in rows[0].split(delimiter)])
rows=rows[1:]
#Converting csv to html row by row
for row in rows:
table+= "<tr>" + "".join(["<td>"+cell+"</td>" for cell in row.split(delimiter)]) + "</tr>" + "\n"
table+="</table><br>"
return table
In your case, function call will look like this, but this will not filter out entries in csv but directly convert whole csv file to HTML table.
filename="Crushers.csv"
myheader='age,week,opp,ACscr,OPPscr,location'
html_table=csv_to_html_table(filename,myheader)
Note: To filter out entries with certain values add conditional statement in for loop.
Before you begin printing the desired rows, output some HTML to set up an appropriate table structure.
When you find a row you want to print, output it in HTML table row format.
# begin the table
print("<table>")
# column headers
print("<th>")
print("<td>Week</td>")
print("<td>Opp</td>")
print("<td>ACscr</td>")
print("<td>OPPscr</td>")
print("<td>Location</td>")
print("</th>")
infile = open("Crushers.csv","r")
for line in infile:
row = line.split(",")
age = row[0]
week = row [1]
opp = row[2]
ACscr = row[3]
OPPscr = row[4]
location = row[5]
if age == 'U12':
print("<tr>")
print("<td>%s</td>" % week)
print("<td>%s</td>" % opp)
print("<td>%s</td>" % ACscr)
print("<td>%s</td>" % OPPscr)
print("<td>%s</td>" % location)
print("</tr>")
# end the table
print("</table>")
First some imports:
import csv
from html import escape
import io
Now the building blocks - let's make one function for reading the CSV and another function for making the HTML table:
def read_csv(path, column_names):
with open(path, newline='') as f:
# why newline='': see footnote at the end of https://docs.python.org/3/library/csv.html
reader = csv.reader(f)
for row in reader:
record = {name: value for name, value in zip(column_names, row)}
yield record
def html_table(records):
# records is expected to be a list of dicts
column_names = []
# first detect all posible keys (field names) that are present in records
for record in records:
for name in record.keys():
if name not in column_names:
column_names.append(name)
# create the HTML line by line
lines = []
lines.append('<table>\n')
lines.append(' <tr>\n')
for name in column_names:
lines.append(' <th>{}</th>\n'.format(escape(name)))
lines.append(' </tr>\n')
for record in records:
lines.append(' <tr>\n')
for name in column_names:
value = record.get(name, '')
lines.append(' <td>{}</td>\n'.format(escape(value)))
lines.append(' </tr>\n')
lines.append('</table>')
# join the lines to a single string and return it
return ''.join(lines)
Now just put it together :)
records = list(read_csv('Crushers.csv', 'age week opp ACscr OPPscr location'.split()))
# Print first record to see whether we are loading correctly
print(records[0])
# Output:
# {'age': 'U13', 'week': '1', 'opp': 'Waterford', 'ACscr': '22', 'OPPscr': '36', 'location': 'Home'}
records = [r for r in records if r['age'] == 'U12']
print(html_table(records))
# Output:
# <table>
# <tr>
# <th>age</th>
# <th>week</th>
# <th>opp</th>
# <th>ACscr</th>
# <th>OPPscr</th>
# <th>location</th>
# </tr>
# <tr>
# <td>U12</td>
# <td>1</td>
# <td>Waterford</td>
# <td>0</td>
# <td>4</td>
# <td>Home</td>
# </tr>
# <tr>
# <td>U12</td>
# <td>2</td>
# <td>North Lakes</td>
# <td>12</td>
# <td>18</td>
# <td>Away</td>
# </tr>
# ...
# </table>
A few notes:
csv.reader works better than line splitting because it also handles quoted values and even quoted values with newlines
html.escape is used to escape strings that could potentially contain character < or >
it is often times easier to worh with dicts than tuples
usually the CSV files contain header (first line with column names) and could be easily loaded using csv.DictReader; but the Crushers.csv has no header (the data start from very first line) so we build the dicts ourselves in the function read_csv
both functions read_csv and html_table are generalised so they can work with any data, the column names are not "hardcoded" into them
yes, you could use pandas read_csv and to_html instead :) But it is good to know how to do it without pandas in case you need some customization. Or just as a programming exercise.
This should be working as well:
from html import HTML
import csv
def to_html(csvfile):
H = HTML()
t=H.table(border='2')
r = t.tr
with open(csvfile) as csvfile:
reader = csv.DictReader(csvfile)
for column in reader.fieldnames:
r.td(column)
for row in reader:
t.tr
for col in row.iteritems():
t.td(col[1])
return t
and call the function by passing the csv file to it.
Other answers are suggesting pandas, but that's probably overkill if formatting CSV to an HTML table is all you need. If you want to use an existing package just for this purpose, there's tabulate:
import csv
from tabulate import tabulate
with open("Crushers.csv") as file:
reader = csv.reader(file)
u12_rows = [row for row in reader if row[0] == "U12"]
print(tabulate(u12_rows, tablefmt="html"))

Parse Code from Text - Python

I am analyzing StackOverflow's dump file "Posts.Small.xml" using pySpark. I want to separate 'code block' from 'text' in a Row. A typical parsed row looks like:
['[u"<p>I want to use a track-bar to change a form\'s opacity.</p>
<p>This is my code:</p>
<pre><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>
<p>When I try to build it, I get this error:</p>
<blockquote>
<p>Cannot implicitly convert type \'decimal\' to \'double\'.
</p>
</blockquote>
<p>I tried making <code>trans</code> a <code>double</code>, but then the control doesn\'t work.',
'", u\'This code has worked fine for me in VB.NET in the past.',
'\', u"</p>
When setting a form\'s opacity should I use a decimal or double?"]']
I've tried "itertools" and some python functions but couldn't get the result.
My initial code to extract the above row is:
postsXml = textFile.filter( lambda line: not line.startswith("<?xml version=")
postsRDD = postsXml.map(............)
tokensentRDD = postsRDD.map(lambda x:(x[0], nltk.sent_tokenize(x[3])))
new = tokensentRDD.map(lambda x: x[1]).take(1)
a = ''.join(map(str,new))
b = a.replace("<", "<")
final = b.replace(">", ">")
nltk.sent_tokenize(final)
Any ideas are appreciated!
You can extract the code contents by using XPath (the lxml library will help) and then extract the text content selecting everything else, for example:
import lxml.etree
data = '''<p>I want to use a track-bar to change a form's opacity.</p>
<p>This is my code:</p> <pre><code>decimal trans = trackBar1.Value / 5000; this.Opacity = trans;</code></pre>
<p>When I try to build it, I get this error:</p>
<p>Cannot implicitly convert type 'decimal' to 'double'.</p>
<p>I tried making <code>trans</code> a <code>double</code>.</p>'''
html = lxml.etree.HTML(data)
code_blocks = html.xpath('//code/text()')
text_blocks = html.xpath('//*[not(descendant-or-self::code)]/text()')
The easiest way will probably be to apply a regex to the text, matching tags '' and ''. That would enable you to find the code blocks. You don't say what you would do with them afterwards, though. So ...
from itertools import zip_longest
sample_paras = [
"""<p>I want to use a track-bar to change a form\'s opacity.</p>
<p>This is my code:</p>
<pre><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>
<p>When I try to build it, I get this error:</p>
<blockquote>
<p>Cannot implicitly convert type \'decimal\' to \'double\'. </p>
</blockquote>
<p>I tried making <code>trans</code> a <code>double</code>, but then the control doesn\'t work.""",
"""This code has worked fine for me in VB.NET in the past.""",
"""</p>
When setting a form\'s opacity should I use a decimal or double?""",
]
single_block = " ".join(sample_paras)
import re
separate_code = re.split(r"</?code>", single_block)
text_blocks, code_blocks = zip(*zip_longest(*[iter(separate_code)] * 2))
print("Text:\n")
for t in text_blocks:
print("--")
print(t)
print("\n\nCode:\n")
for t in code_blocks:
print("--")
print(t)

How to create a dynamic table

I am creating a table using the following code based on the input provided in XML which is working perfectly fine but I want to convert to code to create a table dynamically meaning if i add more columns,code should automatically adjust..currently I have hardcoded that the table will contain four columns..please suggest on what changes need to be done to the code to achieve this
Input XML:-
<Fixes>
CR FA CL TITLE
409452 WLAN 656885 Age out RSSI values from buffer in Beacon miss scenario
12345,45678 BT 54567,34567 Test
379104 BT 656928 CR379104: BT doesn’t work that Riva neither sends HCI Evt for HID ACL data nor response to HCI_INQUIRY after entering into pseudo sniff subrating mode.
</Fixes>
Python code
crInfo = [ ]
CRlist = [ ]
CRsFixedStart=xmlfile.find('<Fixes>')
CRsFixedEnd=xmlfile.find('</Fixes>')
info=xmlfile[CRsFixedStart+12:CRsFixedEnd].strip()
for i in info.splitlines():
index = i.split(None, 3)
CRlist.append(index)
crInfo= CRlisttable(CRlist)
file.close()
def CRlisttable(CRlist,CRcount):
#For logging
global logString
print "\nBuilding the CRtable\n"
logString += "Building the build combo table\n"
#print "CRlist"
#print CRlist
CRstring = "<table cellspacing=\"1\" cellpadding=\"1\" border=\"1\">\n"
CRstring += "<tr>\n"
CRstring += "<th bgcolor=\"#67B0F9\" scope=\"col\">" + CRlist[0][0] + "</th>\n"
CRstring += "<th bgcolor=\"#67B0F9\" scope=\"col\">" + CRlist[0][1] + "</th>\n"
CRstring += "<th bgcolor=\"#67B0F9\" scope=\"col\">" + CRlist[0][2] + "</th>\n"
CRstring += "<th bgcolor=\"#67B0F9\" scope=\"col\">" + CRlist[0][3] + "</th>\n"
CRstring += "</tr>\n"
TEMPLATE = """
<tr>
<td><a href='http://prism/CR/{CR}'>{CR}</a></td>
<td>{FA}</td>
<td>{CL}</td>
<td>{Title}</td>
</tr>
"""
for item in CRlist[1:]:
CRstring += TEMPLATE.format(
CR=item[0],
FA=item[1],
CL=item[2],
Title=item[3],
)
CRstring += "\n</table>\n"
#print CRstring
return CRstring
Although I have some reservations about providing this since you seem unwilling to even attempt doing so yourself, here's an example showing one way it could be done -- all in the hopes that perhaps at least you'll be inclined to the effort to study and possibly learn a little something from it even though it's being handed to you...
with open('cr_fixes.xml') as file: # get some data to process
xmlfile = file.read()
def CRlistToTable(CRlist):
cols = CRlist[0] # first item is header-row of col names on the first line
CRstrings = ['<table cellspacing="1" cellpadding="1" border="1">']
# table header row
CRstrings.append(' <tr>')
for col in cols:
CRstrings.append(' <th bgcolor="#67B0F9" scope="col">{}</th>'.format(col))
CRstrings.append(' </tr>')
# create a template for each table row
TR_TEMPLATE = [' <tr>']
# 1st col of each row is CR and handled separately since it corresponds to a link
TR_TEMPLATE.append(
' <td>{{{}}}</td>'.format(*[cols[0]]*2))
for col in cols[1:]:
TR_TEMPLATE.append(' <td>{{}}</td>'.format(col))
TR_TEMPLATE.append(' </tr>')
TR_TEMPLATE = '\n'.join(TR_TEMPLATE)
# then apply the template to all the non-header rows of CRlist
for items in CRlist[1:]:
CRstrings.append(TR_TEMPLATE.format(CR=items[0], *items[1:]))
CRstrings.append("</table>")
return '\n'.join(CRstrings) + '\n'
FIXES_START_TAG, FIXES_END_TAG = '<Fixes>, </Fixes>'.replace(',', ' ').split()
CRsFixesStart = xmlfile.find(FIXES_START_TAG) + len(FIXES_START_TAG)
CRsFixesEnd = xmlfile.find(FIXES_END_TAG)
info = xmlfile[CRsFixesStart:CRsFixesEnd].strip().splitlines()
# first line of extracted info is a blank-separated list of column names
num_cols = len(info[0].split())
# split non-blank lines of info into list of columnar data
# assuming last col is the variable-length title, comprising reminder of line
CRlist = [line.split(None, num_cols-1) for line in info if line]
# convert list into html table
crInfo = CRlistToTable(CRlist)
print crInfo
Output:
<table cellspacing="1" cellpadding="1" border="1">
<tr>
<th bgcolor="#67B0F9" scope="col">CR</th>
<th bgcolor="#67B0F9" scope="col">FA</th>
<th bgcolor="#67B0F9" scope="col">CL</th>
<th bgcolor="#67B0F9" scope="col">TITLE</th>
</tr>
<tr>
<td>409452</td>
<td>WLAN</td>
<td>656885</td>
<td>Age out RSSI values from buffer in Beacon miss scenario</td>
</tr>
<tr>
<td>12345,45678</td>
<td>BT</td>
<td>54567,34567</td>
<td>Test</td>
</tr>
<tr>
<td>379104</td>
<td>BT</td>
<td>656928</td>
<td>CR379104: BT doesnt work that Riva neither sends HCI Evt for HID ACL data nor
response to HCI_INQUIRY after entering into pseudo sniff subrating mode.</td>
</tr>
</table>
That doesn't look like an XML file - it looks like a tab delimited CSV document within a pair of tags.
I suggest looking into the csv module for parsing the input file, and then a templating engine like jinja2 for writing the HTML generation.
Essentially - read in the csv, check the length of the headers (gives you number of columns), and then pass that data into a template. Within the template, you'll have a loop over the csv structure to generate the HTML.

Categories