My goal here is to print lines from text files together. Some lines, however, are not together like they should be. I resolved the first problem where the denominator was on the line after. For the else statement, they all seem to have the same value/index.
import fitz # this is pymupdf
with fitz.open("math-problems.pdf") as doc: #below converts pdf to txt
text = ""
for page in doc:
text += page.getText()
file_w = open("output.txt", "w") #save as txt file
file_w.write(text)
file_w.close()
file_r = open("output.txt", "r") #read txt file
word = 'f(x) = '
#--------------------------
list1 = file_r.readlines() # read each line and put into list
list2 = [k for k in list1 if word in k] # look for all elements with "f(x)" and put all in new list
list1_N = list1
list2_N = list2
list1 = [e[3:] for e in list1] #remove first three characters (the first three characters are always "1) " or "A) "
char = str('\n')
for char in list2:
index = list1.index(char)
def digitcheck(s):
isdigit = str.isdigit
return any(map(isdigit,s))
xx = digitcheck(list1[index])
if xx:
print(list1[index] + " / " + list1_N[index+1])
else:
print(list1[index] + list1[index+1]) # PROBLEM IS HERE, HOW COME EACH VALUE IS SAME HERE?
Output from terminal:
f(x) = x3 + x2 - 20x
/ x2 - 3x - 18
f(x) =
2 + 5x
f(x) =
2 + 5x
f(x) =
2 + 5x
f(x) =
2 + 5x
f(x) = x2 + 3x - 10
/ x2 - 5x - 14
f(x) = x2 + 2x - 8
/ x2 - 3x - 10
f(x) = x - 1
/ x2 + 8
f(x) = 3x3 - 2x - 6
/ 8x3 - 7x + 4
f(x) =
2 + 5x
f(x) = x3 - 6x2 + 4x - 1
/ x2 + 8x
Process finished with exit code 0
SOLVED
#copperfield was correct, I had repeating values so my index was repeating. I solved this using a solution by #Shonu93 in here. Essentially it locates all indices of duplicate values and puts these indices into one list elem_pos and then prints each index from list1
if empty in list1:
counter = 0
elem_pos = []
for i in list1:
if i == empty:
elem_pos.append(counter)
counter = counter + 1
xy = elem_pos
for i in xy:
print(list1[i] + list1_N[i+1])
Related
I have a list of strings that have to be not more than X characters. Each string can contain many sentences (separated by punctuation like dots). I need to separate longer sentences than X characters with this logic:
I have to divide them into the minimum number of parts (starting from 2), in order to have all the chunks with a lower length than X as similar as possible (possibly identical), but considering the punctuation (example: if I have Hello. How are you?, I can't divide it into Hello. Ho and w are you? but in Hello. and How are you? because it's the most similar way to divide it into two equal parts, without loosing the sense of the sentences)
max = 10
strings = ["Hello. How are you? I'm fine", "other string containg dots", "another string containg dots"]
for string in string:
if len(string) > max:
#algorithm to chunck it
In this case, I will have to divide the first string Hello. How are you? I'm fine into 3 parts because with 2 parts, I'll have one of the 2 chunks longer than 10 characters (max).
Is there a smart existing solution? Or does anyone know how to do that?
An example function for chunking string (within the character minimum and maximum lengths) by punctuation (e.g. ".", ",", ";", "?"); in other words, prioritizing punctuation over character length:
import numpy as np
def chunkingStringFunction(strings, charactersDefiningChunking = [".", ",", ";", "?"], numberOfMaximumCharactersPerChunk = None, numberOfMinimumCharactersPerChunk = None, **kwargs):
if numberOfMaximumCharactersPerChunk is None:
numberOfMaximumCharactersPerChunk = 100
if numberOfMinimumCharactersPerChunk is None:
numberOfMinimumCharactersPerChunk = 2
storingChunksOfString = []
for string in strings:
chunkingStartingAtThisIndex = 0
indexingCharactersInStrings = 0
while indexingCharactersInStrings < len(string) - 1:
indexingCharactersInStrings += 1
currentChunk = string[chunkingStartingAtThisIndex:indexingCharactersInStrings + 1]
if len(currentChunk) >= numberOfMinimumCharactersPerChunk and len(currentChunk) <= numberOfMaximumCharactersPerChunk:
indexesForStops = []
for indexingCharacterDefiningChunking in range(len(charactersDefiningChunking)):
indexesForStops.append(currentChunk.find(charactersDefiningChunking[indexingCharacterDefiningChunking]) + chunkingStartingAtThisIndex)
indexesForStops = np.max(indexesForStops, axis = None)
addChunk = string[chunkingStartingAtThisIndex:indexesForStops + 1]
if len(addChunk) > 1 and addChunk != " ":
storingChunksOfString.append(addChunk)
chunkingStartingAtThisIndex = indexesForStops + 1
indexingCharactersInStrings = chunkingStartingAtThisIndex
return storingChunksOfString
Alternatively, to prioritize character length; as in, if we want to consider our (average) character length and from there, find out where our defined characters for chunking are:
import numpy as np
def chunkingStringFunction(strings, charactersDefiningChunking = [".", ",", ";", "?"], averageNumberOfCharactersPerChunk = None, **kwargs):
if averageNumberOfCharactersPerChunk is None:
averageNumberOfCharactersPerChunk = 10
storingChunksOfString = []
for string in strings:
lastIndexChunked = 0
for indexingCharactersInString in range(1, len(string), 1):
chunkStopsAtADefinedCharacter = False
if indexingCharactersInString - lastIndexChunked == averageNumberOfCharactersPerChunk:
indexingNumberOfCharactersAwayFromAverageChunk = 1
while chunkStopsAtADefinedCharacter == False:
indexingNumberOfCharactersAwayFromAverageChunk += 1
for thisCharacter in charactersDefiningChunking:
findingAChunkCharacter = string[indexingCharactersInString - indexingNumberOfCharactersAwayFromAverageChunk:indexingCharactersInString + (indexingNumberOfCharactersAwayFromAverageChunk + 1)].find(thisCharacter)
if findingAChunkCharacter > -1 and len(string[lastIndexChunked:indexingCharactersInString - indexingNumberOfCharactersAwayFromAverageChunk + findingAChunkCharacter + 1]) != 0:
storingChunksOfString.append(string[lastIndexChunked:indexingCharactersInString - indexingNumberOfCharactersAwayFromAverageChunk + findingAChunkCharacter + 1])
lastIndexChunked = indexingCharactersInString - indexingNumberOfCharactersAwayFromAverageChunk + findingAChunkCharacter + 1
chunkStopsAtADefinedCharacter = True
elif indexingCharactersInString == len(string) - 1 and lastIndexChunked != len(string) - 1 and len(string[lastIndexChunked:indexingCharactersInString + 1]) != 0:
storingChunksOfString.append(string[lastIndexChunked:indexingCharactersInString + 1])
return storingChunksOfString
I've looked at quite a few posts but none seem to help.
I want to calcuate Term Frequency & Inverse Document Frequency; a Bag of Words technique used in Deep Learning. The purpose of this code is just to calculate the formula. I do not implement an ANN here.
Below is a minimal code example. It is after the for loop I have this problem.
import math
docs = 1000
words_per_doc = 100 # length of doc
#word_freq = 10
#doc_freq = 100
dp = 4
print('Term Frequency Inverse Document Frequency')
# term, word_freq, doc_freq
words = [['the', 10, 100], ['python', 10, 900]]
tfidf_ = []
for idx, val in enumerate(words):
print(words[idx][0] + ':')
word_freq = words[idx][1]
doc_freq = words[idx][2]
tf = round(word_freq/words_per_doc, dp)
idf = round(math.log10(docs/doc_freq), dp)
tfidf = round((tf*idf), dp)
print(str(tf) + ' * ' + str(idf) + ' = ' + str(tfidf))
tfidf_.append(tfidf)
print()
max_val = max(tfidf)
max_idx = tfidf.index(max_val)
#max_idx = tfidf.index(max(tfidf))
lowest_idx = 1 - max_idx
print('Therefore, \'' + words[max_idx][0] + '\' semantically is more important than \'' + words[lowest_idx][0] + '\'.')
#print('log(N/|{d∈D:w∈W}|)')
Error:
line 25, in <module>
max_val = max(tfidf)
TypeError: 'float' object is not iterable
You are trying to pass tfidf on your function instead of tfidf_
tfidf is int and tfidf_ is your list
So code should be
max_val = max(tfidf_)
max_idx = tfidf_.index(max_val)
I have the following variables:
InputNumber (user defined from 1 to 6),
nameofstate (list of 2 names, let's call them X and Y)
list[state][protein][counts][x]
list[state][protein][counts][y], where x and y are lists that will equal the length of InputNumber
So where InputNumber = 3
list[state][protein][counts][x] = [a,b,c]
list[state][protein][counts][y] = [r,s,t]
Where InputNumber = 4
list[state][protein][counts][x] = [a,b,c,d]
list[state][protein][counts][y] = [r,s,t,u]
I want to create new columns for these counts, but I can't figure out how to make dynamic column headers:
Name X1 X2 X3 Y1 Y2 Y3 Name X1 X2 X3 X4 Y1 Y2 Y3 Y4
Name a b c r s t Name a b c d r s t u
Here is the code I have for my headers:
listheaders = [str(nameofstate[0]) + str(x + 1) for x in range(0, InputNumber), str(nameofstate[1]) + str(x + 1) for x in range(0, InputNumber)]
with open("PATH and NAME" + ".csv", 'w') as myfile:
wr = csv.writer(myfile, delimiter=',', lineterminator='\n')
wr.writerow(["Name", (i for i in listheaders[0]), (e for e in listheaders[1]])
for name in anotherlist:
perform the same iterative code
My problem is with the generator expression - I can't wrap my head around how to tell it to unpack the entries in listheaders.
Not sure I follow everything that's happening in your question, but it appears that you're trying to do too much in listheaders. If I understand correctly, you want to use a helper function to construct
def _get_header_list(name_of_state, input_number):
return ['{}{}'.format(name_of_state, str(x)) for x in range(1, input_number + 1)]
first_list = _get_header_list(nameofstate[0], InputNumber)
second_list = _get_header_list(nameofstate[1], InputNumber)
then later...
wr.writerow(['Name'] + first_list + second_list)
I'd appreciate some help debugging this code:
testing = """There is something unique about this line
in that it can span across several lines, which is unique and
useful in python."""
listofthings = []
i = 0
while i < len(testing):
if testing[i] == " ":
listofthings.append(i + 1)
i = i + 1
listofthings.insert(0, 0)
listofthings.append(len(testing))
print listofthings
word_list = []
i = 0
while i < len(listofthings):
l = i + 1
x = listofthings[i]
y = listofthings[l]
word = testing[x:y]
word_list.append(word)
i = l
print word_list
I am not sure why I am getting the index out of range error. I understand what the error means obviously, but am not sure what I am doing wrong. Weirdly enough, this only happens when I run the above code. It doesn't give me any errors when I run this:
word = testing[x:y]
print word
I am fairly new with Python(going on three days) so I am sure it is a stupid overlooked syntactical error...
l = i + 1
x = listofshit[i]
y = listofshit[l]
word = testing[x:y]
word_list.append(word)
When i=length-1,then y=length, which is an error.Python array indexing starts from 0, hence max address is length-1
The length of list listofshit is 21 with the range of index from 0 to 20. And when it comes to the final loop, i is 20 and l is 21, so there is a out of range error. And I think the following code is what you want:
testing = """There is something unique about this line
in that it can span across several lines, which is unique and
useful in python."""
listofshit = []
i = 0
while i < len(testing):
if testing[i] == " ":
listofshit.append(i)
i = i + 1
listofshit.insert(0, 0)
listofshit.append(len(testing))
word_list = []
i = 0
while i < len(listofshit) - 1:
l = i + 1
x = listofshit[i]
y = listofshit[l]
word = testing[x:y]
word_list.append(word)
i = l
print word_list
while i < len(listofshit):
l = i + 1
x = listofshit[i]
y = listofshit[l]
When i corresponds to the last element,
y = listofshit[l]
You are trying to access the element next to the last element. Thats why it is throwing the error.
On the last iteration of the second while loop, l is set to len(listofshit). This is past the end of listofshit; the last valid index is len(listofshit) - 1.
I'm taking this intro to python course online
The problem reads:
For this program, the first line of input is an integer width. Then, there are some lines of text; the line "END" indicates the end of the text. For each line of text, you need to print out a centered version of it, by adding periods .. to the left and right, so that the total length of each line of text is width. (All input lines will have length at most width.) Centering means that the number of periods added to the left and added to the right should be equal if possible; if needed we allow one more period on the left than the right. For example, for input
13
Text
in
the
middle!
END
the correct output would be
.....Text....
......in.....
.....the.....
...middle!...
the Hint given is:
For input line length of L, you should add (width-L)\\2 periods to the right side
Here is my code so far:
width = int(input())
s1 = input()
periods_remain = width - len(s1)
L = periods_remain
periods_rtside = (width-L)//2
periods_leftside = width - periods_rtside
periods_rt_str = '.' * periods_rtside
periods_left_str = '.' * periods_leftside
line1 = periods_left_str + s1 + periods_rt_str
My line1 result looks like "...........Text.." Instead of .....Text....
It can be run here
My problem seems to be the L. I'm not sure how to define L. Thanks!
You can use str.center for this:
>>> lis = ['Text', 'in', 'the', 'middle!', 'END']
>>> for item in lis:
... print item.center(13, '.')
...
.....Text....
......in.....
.....the.....
...middle!...
.....END.....
or format:
for item in lis:
print format(item,'.^13')
...
....Text.....
.....in......
.....the.....
...middle!...
.....END.....
Working version of your code:
lis = ['Text', 'in', 'the', 'middle!', 'END']
width = 13
for s1 in lis:
L = len(s1) #length of line
periods_rtside = (width - L)//2 #periods on the RHS
periods_leftside = width - periods_rtside - L #peroids on the LHS
periods_rt_str = '.' * periods_rtside
periods_left_str = '.' * periods_leftside
line1 = periods_left_str + s1 + periods_rt_str
print line1
output:
.....Text....
......in.....
.....the.....
...middle!...
.....END.....
For those still struggling with this tough question, here is my code that works in Python 3 shell, even though it still fails in http://cscircles.cemc.uwaterloo.ca/8-remix/
First_line = input("First input: ")
width = int(First_line)
while True:
s1 = input("Second input: ")
if s1 != 'END':
L = len(s1) #length of line
periods_rtside = (width - L)//2 #periods on the RHS
periods_leftside = width - periods_rtside - L #periods on the LHS
periods_rt_str = '.' * periods_rtside
periods_left_str = '.' * periods_leftside
line1 = periods_left_str + s1 + periods_rt_str
print(line1)
else:
break
To get it to work in http://cscircles.cemc.uwaterloo.ca/8-remix/ console, you need to change the first 2 lines to
width = int(input())
and
s1 to s1 = input()
and also provide your own test input by clicking on the "Enter test input" button
Just some small changes to the above code and it worked in the grader.
width = int(input())
s1 = input()
while s1 != "END":
L = len(s1)
periods_rtside = (width - L)//2
periods_leftside = width - periods_rtside - L
periods_rt_str = '.' * periods_rtside
periods_left_str = '.' * periods_leftside
line1 = periods_left_str + s1 + periods_rt_str
print(line1)
s1 = input()