I have been working on a program and I encountered a problem as I was programming.
this is my code:
dict=([{'geneA': [10, 20]}, {'geneB': [12, 45]}, {'geneC': [36, 50]}],
[{'geneD': [45, 90]}, {'geneT': [100, 200]}],
[{'geneF': [15, 25]}, {'geneX': [67, 200]}, {'GeneZ': [234, 384]}])
So I basically set up the dict equal to the data of chromosome1,2,and 3.
Is there any way in which I can display the names of these three strings in dict without having it as part of the dict index?
The main problem here is you're not declaring a dictionary but a tuple.
Try:
>> dict= {
'chromosome1' : [{'geneA': [10, 20]}, {'geneB': [12, 45]}, {'geneC': [36, 50]}],
'chromosome2' : [{'geneD': [45, 90]}, {'geneT': [100, 200]}],
'chromosome3' : [{'geneF': [15, 25]}, {'geneX': [67, 200]}, {'GeneZ': [234, 384]}]
}
>> print(dict['chromosome1'])
[{'geneA': [10, 20]}, {'geneB': [12, 45]}, {'geneC': [36, 50]}]
>> print(dict['chromosome2'])
[{'geneD': [45, 90]}, {'geneT': [100, 200]}]
>> print(dict['chromosome3'])
[{'geneF': [15, 25]}, {'geneX': [67, 200]}, {'GeneZ': [234, 384]}]
POO approach
You can also try the POO approach. If you implement a couple of classes like below:
class Gene:
def __init__(self, name, data):
self.name = name
self.data = data
def __getitem__(self, index):
return self.data[index]
class Chromosome:
def __init__(self, name, data):
self.name = name
self.data = data
def __getitem__(self, index):
return self.data[index]
You will be able to write code like:
chromosome1 = Chromosome("chromosome1", [
Gene('geneA', [10, 20]),
Gene('geneB', [12, 45]),
Gene('geneC', [36, 50])
])
and do thinks like:
print(chromosome1.name) # Print the chromosome name
>>> chromosome1
print(chromosome1[0].name) # The name of the first gene
>>> geneA
print(chromosome1[1].name) # The name of the second gene
>>> geneB
print(chromosome1[0][1]) # The second value of the first gene
20
You can also have a list of choromosomes (this is actually what you want):
lchrom = [chromosome1, ...]
print(lchrom[0][1]) # The second gene of the first choromosome in the list. (geneB)
print(lchrom[0][1][0]) # The first value second gene of the first choromosome in the list. (12)
Related
I'm learning Python and had a small issue. I have this loop:
found = None
print ('Before', found)
for value in [ 41, 5, 77, 3, 21, 55, 6]:
if value == 21:
found = True
else:
found = False
print (found, value)
print ('After', found)
The code is well, but the issue is print ('After', found) I want it to tell me that there was a True value found in the loop. Is there a way to keep the code the way it is and resolve the issue?
You don't want to reset found to False once you've set it to True. Initialize it to False, then only set it to True if value == 21; don't do anything if value != 21.
found = False
print ('Before', found)
for value in [ 41, 5, 77, 3, 21, 55, 6]:
if value == 21:
found = True
print (found, value)
print ('After', found)
Ignoring the print statement in the loop, you could just use any:
found = any(value == 21 for value in [ 41, 5, 77, 3, 21, 55, 6])
or even
found = 21 in [ 41, 5, 77, 3, 21, 55, 6]
I'm trying to print a select row and columns from a spreadsheet, however when I call on the spreadsheet dataframe attribute it fails to print state that the name dataframe is not defined. where have I gone wrong?
import pandas
class spreadsheet:
def __init__(self, location, dataframe, column, rows):
self.location = ('Readfrom.xlsx')
self.dataframe = pandas.read_excel(location)
self.column = 2
self.rows = 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 27, 28, 29
a = dataframe.iloc[column,[rows]]
print(a)
You should instantiate an object from the Spreadsheet class and then access the attribute of the instance. You can learn more about Object-Oriented Programming in Python here.
I think that what you want to do in your code is something like the code below.
import pandas
class Spreadsheet:
def __init__(self, location):
self.location = location
self.dataframe = pandas.read_excel(location)
sp = Spreadsheet(location="Readfrom.xlsx")
rows = [4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 27, 28, 29]
a = sp.dataframe.iloc[rows, 2]
print(a)
I think you have an indentation problem.
Your dataframe is a parameter of your spreadsheet constructor method and you try to access it even from outside the class.
To access the dataframe variable u have to move your code a = dataframe.iloc[column,[rows]] inside your __init__ method or you need to create a spreadsheet object first and access it via this object.
EDIT:
On second thoughts i think you should check out the basics how to use classes in Python.
You don't use the parameters of the __init__ so why du you have them?
dataframe is only accessible by a spreadsheet object
This code should fix your problem but i recommend to go through some basic tutorials to understand how exactly classes and objects are working:
import pandas
class spreadsheet:
def __init__(self):
self.location = ('Readfrom.xlsx')
self.dataframe = pandas.read_excel(self.location)
self.column = 2
self.rows = 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 27, 28, 29
s = spreadsheet()
a = s.dataframe.iloc[s.column,[s.rows]]
print(a)
So i need to store input from system and trigger code block when given command via input matches with condition. Given commands are randomly produced by system and its not same everytime when codes are executed. What i do below is; i store input in a list until input become blankspace and which shows commands are over and it specifically stated in statement that commands will end with blankspace after last command. Read commands, input and values from that command list until there is no command to perform. I know this is bad practice. Since I am newb in this language i need some advice to change my code. Thanks in advance. Btw i cant change conditions in if statements as given commands via input is not the same but like this and more:
append_it 15
insert_it 0 25
remove_it 30
Code works just fine i need advice to make it good code practice to improve myself in Python.
i = 0
command_list = []
while True:
command = input('')
if command == '':
break
command_list.append(command)
i += 1
b = 0
arr = []
while i != b:
command1 = command_list[b]
b += 1
if command1[0:8] == "append_it":
value = int(command1[9:])
arr.append(value)
elif command1[0:4] == "insert_it":
index = int(command1[5:7])
value = int(command1[7:])
arr.insert(index, value)
elif command1[0:3] == "remove_it":
value = int(command1[4:])
if value in liste:
arr.remove(value)
elif command1[0:] == "print_it":
print(arr)
elif command1[0:] == "reverse_it":
arr.reverse()
elif command1[0:] == "sort_it":
arr.sort()
elif command1[0:] == "pop_it":
arr.pop()
You can improve by defining actions to do in a dictionary, adding the inputted values as splitted list and call the appropriate function for the appropriate input:
def appendit(a, *prms):
v = int(prms[0])
a.append(v)
def insertit(a, *prms):
i = int(prms[0])
v = int(prms[1])
a.insert(i,v)
def removeit(a, *prms):
v = int(prms[0])
a.remove(v) # no need to test
def reverseit(a): a.reverse()
def sortit(a): a.sort()
def popit(a): a.pop()
# define what command to run for what input
cmds = {"append_it" : appendit,
"insert_it" : insertit,
"remove_it" : removeit,
"print_it" : print, # does not need any special function
"reverse_it": reverseit,
"sort_it" : sortit,
"pop_it" : popit}
command_list = []
while True:
command = input('')
if command == '':
break
c = command.split() # split the command already
# only allow commands you know into your list - they still might have the
# wrong amount of params given - you should check that in the functions
if c[0] in cmds:
command_list.append(c)
arr = []
for (command, *prms) in command_list:
# call the correct function with/without params
if prms:
cmds[command](arr, *prms)
else:
cmds[command](arr)
Output:
# inputs from user:
append_it 42
append_it 32
append_it 52
append_it 62
append_it 82
append_it 12
append_it 22
append_it 33
append_it 12
print_it # 1st printout
sort_it
print_it # 2nd printout sorted
reverse_it
print_it # 3rd printout reversed sorted
pop_it
print_it # one elem popped
insert_it 4 99
remove_it 42
print_it # 99 inserted and 42 removed
# print_it - outputs
[42, 32, 52, 62, 82, 12, 22, 33, 12]
[12, 12, 22, 32, 33, 42, 52, 62, 82]
[82, 62, 52, 42, 33, 32, 22, 12, 12]
[82, 62, 52, 42, 33, 32, 22, 12]
[82, 62, 52, 99, 33, 32, 22, 12]
import re
import os
import sys
class Marks:
def __init__(self):
self.marks = []
self.marks_file = '/root/projectpython/mark.txt'
def loadAll(self):
file = open(self.marks_file, 'r')
for line in file.readlines():
name,math,phy,chem = line.strip().split()
name=name
math=int(math)
phy=int(phy)
chem=int(chem)
self.marks=[name,math,phy,chem]
print(self.marks)
file.close()
def percent(self):
dash = '-' * 40
self.loadAll()
for n in self.marks:
print(n)
Book_1 = Marks()
Book_1.percent()
output:-
['gk', 50, 40, 30]
['rahul', 34, 54, 30]
['rohit', 87, 45, 9]
rohit
87
45
9
but i want to print all value in tabular format,it showing only last record.
is it correct method to use list to store student data name and marks.
problem here is with the line read
self.marks=[name,math,phy,chem]
this will keep reinitializing the list each time mark is read
instead use:
self.marks.append([name,math,phy,chem])
You continue to initialize the list in the for statement
and declare it so that only the array value of the last line is reflected.
I think you can remove the initialization statement and process it as an append.
import re
import os
import sys
class Marks:
def __init__(self):
self.marks = []
self.marks_file = '/root/projectpython/mark.txt'
def loadAll(self):
file = open(self.marks_file, 'r')
for line in file.readlines():
name,math,phy,chem = line.strip().split()
name=name
math=int(math)
phy=int(phy)
chem=int(chem)
self.marks.append(name)
self.marks.append(math)
self.marks.append(phy)
self.marks.append(chem)
# self.marks=[name,math,phy,chem]
print(self.marks)
file.close()
def percent(self):
dash = '-' * 40
self.loadAll()
for n in self.marks:
print(n)
Book_1 = Marks()
Book_1.percent()
Make self.marks=[name,math,phy,chem] as self.marks.append([name,math,phy,chem]).
Then easiest solution is to transpose the self.marks list and print them.
suppose your marks list is [['gk', 50, 40, 30],['rahul', 34, 54, 30],['rohit', 87, 45, 9]] then simply transpose it.
print(marks)
transposed=list(zip(*marks))
print(transposed)
for x in transposed:
print(x)
output :
[['gk', 50, 40, 30], ['rahul', 34, 54, 30], ['rohit', 87, 45, 9]] #marks list
[('gk', 'rahul', 'rohit'), (50, 34, 87), (40, 54, 45), (30, 30, 9)] #transposed list
('gk', 'rahul', 'rohit') # output the way you want
(50, 34, 87)
(40, 54, 45)
(30, 30, 9)
Its working now.
i was doing mistake earlier here only self.marks.append([name,math,phy,chem])
[['gk', 50, 40, 30], ['rahul', 34, 54, 30], ['rohit', 87, 45, 9]]
I have loaded HTML into pyqt and would like to create a list of all the content on the page.
I then need to be able to get the position of the text, using .geometry()
I would like a list of objects, where the following would be possible:
for i in list_of_content_in_html:
print i.toPlainText(), i.geometry() #prints the text, and the position.
In case I am unclear, by "contents" I mean in the HTML below, contents is
'c', 'r1 c1', 'r1, c2', 'row2 c2', 'more contents' - the text the web user sees in the browser, basically.
c
<table border="1">
<tr>
<td>r1 c1</td>
<td>r1 c2</td>
</tr>
<tr>
<td></td>
<td>row2 c2</td>
</tr>
</table>
more contents
This doesn't seem to be possible using QtWebKit and pages like this one, that nest objects but don't use <p>...</p> for other text, that is outside of the table. In result c and more contents don't go into separate QWebElements. They are only to be found in the BODY level block. As a solution one could run that page through a parser. Simply traversing through children of currentFrame documentElement brings out following elements:
# position in element tree, bounding box, tag, text:
(0, 0) [0, 0, 75, 165] HTML - u'c\nr1 c1\tr1 c2\nrow2 c2\nmore contents'
(1, 1) [8, 8, 67, 157] BODY - u'c\nr1 c1\tr1 c2\nrow2 c2\nmore contents'
(2, 0) [8, 27, 75, 119] TABLE - u'r1 c1\tr1 c2\nrow2 c2'
(3, 0) [9, 28, 74, 118] TBODY - u'r1 c1\tr1 c2\nrow2 c2'
(4, 0) [9, 30, 74, 72] TR - u'r1 c1\tr1 c2'
(5, 0) [11, 30, 32, 72] TD - u'r1 c1'
(5, 1) [34, 30, 72, 72] TD - u'r1 c2'
(4, 1) [9, 74, 74, 116] TR - u'row2 c2'
(5, 1) [34, 74, 72, 116] TD - u'row2 c2'
Code for this:
import sys
from PySide.QtCore import *
from PySide.QtGui import *
from PySide.QtWebKit import *
class WebPage(QObject):
finished = Signal()
def __init__(self, data, parent=None):
super(WebPage, self).__init__(parent)
self.output = []
self.data = data
self.page = QWebPage()
self.page.loadFinished.connect(self.process)
def start(self):
self.page.mainFrame().setHtml(self.data)
#Slot(bool)
def process(self, something=False):
self.page.setViewportSize(self.page.mainFrame().contentsSize())
frame = self.page.currentFrame()
elem = frame.documentElement()
self.gather_info(elem)
self.finished.emit()
def gather_info(self, elem, i=0):
if i > 200: return
cnt = 0
while cnt < 100:
s = elem.toPlainText()
rect = elem.geometry()
name = elem.tagName()
dim = [rect.x(), rect.y(),
rect.x() + rect.width(), rect.y() + rect.height()]
if s: self.output.append(dict(pos=(i, cnt), dim=dim, tag=name, text=s))
child = elem.firstChild()
if not child.isNull():
self.gather_info(child, i+1)
elem = elem.nextSibling()
if elem.isNull():
break
cnt += 1
webpage = None
def print_strings():
for s in webpage.output:
print s['pos'], s['dim'], s['tag'], '-', repr(s['text'])
if __name__ == '__main__':
app = QApplication(sys.argv)
data = open(sys.argv[1]).read()
webpage = WebPage(data)
webpage.finished.connect(print_strings)
webpage.start()
.
A different approach
The desired course of action depends on what you want to achieve. You can get all the strings from the QWebPage using webpage.currentFrame().documentElement().toPlainText(), but that just shows the whole page as a string with no positioning information related to all the tags. Browsing the QWebElement tree gives you the desired information but it has the drawbacks, which I mentioned above.
If you really want to know the position of all text, The only accurate way to do this (other than rendering the page and using OCR) is breaking text into characters and saving their individual bounding boxes. Here's how I did it:
First I parsed the page with BeautifulSoup4 and enclosed every non-space text character X in a <span class="Nd92KSx3u2">X</span>. Then I ran a PyQt script (actually a PySide script) which loads the altered page and printed out the characters with their bounding boxes after I looked them up using findAllElements('span[class="Nd92KSx3u2"]').
parser.py:
import sys, cgi, re
from bs4 import BeautifulSoup, element
magical_class = "Nd92KSx3u2"
restricted_tags="title script object embed".split()
re_my_span = re.compile(r'<span class="%s">(.+?)</span>' % magical_class)
def no_nl(s): return str(s).replace("\r", "").replace("\n", " ")
if len(sys.argv) != 3:
print "Usage: %s <input_html_file> <output_html_file>" % sys.argv[0]
sys.exit(1)
def process(elem):
for x in elem.children:
if isinstance(x, element.Comment): continue
if isinstance(x, element.Tag):
if x.name in restricted_tags:
continue
if isinstance(x, element.NavigableString):
if not len(no_nl(x.string).strip()):
continue # it's just empty space
print '[', no_nl(x.string).strip(), ']', # debug output of found strings
s = ""
for c in x.string:
if c in (' ', '\r', '\n', '\t'): s += c
else: s += '<span class="%s">%s</span>' % (magical_class, c)
x.replace_with(s)
continue
process(x)
soup = BeautifulSoup(open(sys.argv[1]))
process(soup)
output = re_my_span.sub(r'<span class="%s">\1</span>' % magical_class, str(soup))
with open(sys.argv[2], 'w') as f:
f.write(output)
charpos.py:
import sys
from PySide.QtCore import *
from PySide.QtGui import *
from PySide.QtWebKit import *
magical_class = "Nd92KSx3u2"
class WebPage(QObject):
def __init__(self, data, parent=None):
super(WebPage, self).__init__(parent)
self.output = []
self.data = data
self.page = QWebPage()
self.page.loadFinished.connect(self.process)
def start(self):
self.page.mainFrame().setHtml(self.data)
#Slot(bool)
def process(self, something=False):
self.page.setViewportSize(self.page.mainFrame().contentsSize())
frame = self.page.currentFrame()
elements = frame.findAllElements('span[class="%s"]' % magical_class)
for e in elements:
s = e.toPlainText()
rect = e.geometry()
dim = [rect.x(), rect.y(),
rect.x() + rect.width(), rect.y() + rect.height()]
if s and rect.width() > 0 and rect.height() > 0: print dim, s
if __name__ == '__main__':
app = QApplication(sys.argv)
data = open(sys.argv[1]).read()
webpage = WebPage(data)
webpage.start()
input.html (slightly altered to show more problems with simple string dumping:
a<span>b<span>c</span></span>
<table border="1">
<tr><td>r1 <font>c1</font> </td><td>r1 c2</td></tr>
<tr><td></td><td>row2 & c2</td></tr>
</table>
more <b>contents</b>
and the test run:
$ python parser.py input.html temp.html
[ a ] [ b ] [ c ] [ r1 ] [ c1 ] [ r1 c2 ] [ row2 & c2 ] [ more ] [ contents ]
$ charpos.py temp.html
[8, 8, 17, 26] a
[17, 8, 26, 26] b
[26, 8, 34, 26] c
[13, 48, 18, 66] r
[18, 48, 27, 66] 1
[13, 67, 21, 85] c
[21, 67, 30, 85] 1
[36, 48, 41, 66] r
[41, 48, 50, 66] 1
[36, 67, 44, 85] c
[44, 67, 53, 85] 2
[36, 92, 41, 110] r
[41, 92, 50, 110] o
[50, 92, 61, 110] w
[61, 92, 70, 110] 2
[36, 111, 47, 129] &
[51, 111, 59, 129] c
[59, 111, 68, 129] 2
[8, 135, 21, 153] m
[21, 135, 30, 153] o
[30, 135, 35, 153] r
[35, 135, 44, 153] e
[8, 154, 17, 173] c
[17, 154, 27, 173] o
[27, 154, 37, 173] n
[37, 154, 42, 173] t
[42, 154, 51, 173] e
[51, 154, 61, 173] n
[61, 154, 66, 173] t
[66, 154, 75, 173] s
Looking at the bounding boxes, it is (in this simple case without changes in font size and things like subscripts) quite easy to glue them back into words if you wish.
I worked it out.
for elem in QWebView().page().currentFrame().documentElement().findAll('*'):
print unicode(elem.toPlainText()), unicode(elem.geometry().getCoords()), '\n'
It matches anything, and then iterates over what is found - thereby iterating over the DOM tree.