Unable To Remove Whitespace On String Using Python

Unable To Remove Whitespace On String Using Python - python

I'm trying to get rid of the whitespace on this list however I am unable to. Anyone know where I am going wrong with my code?
love_maybe_lines = ['Always ', ' in the middle of our bloodiest battles ', 'you lay down your arms', ' like flowering mines ', ' to conquer me home. ']
love_maybe_lines_joined = '\n'.join(love_maybe_lines)
love_maybe_lines_stripped = love_maybe_lines_joined.strip()
print(love_maybe_lines_stripped)
Terminal:
Always
in the middle of our bloodiest battles
you lay down your arms
like flowering mines
to conquer me home.

love_maybe_lines = ['Always ', ' in the middle of our bloodiest battles ', 'you lay down your arms', ' like flowering mines ', ' to conquer me home. ']
love_maybe_lines = [item.strip() for item in love_maybe_lines]
That may help.

Related

Cut out unnecessary characters from pytesseract output

im trying to get a list of prices from an mmorpg using pytesseract to get data as string from screenshots.
Example screenshot:
The output from my image looks like that
[' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', '
', ' ', ' ', '', ' ', '', "eel E J Gbasce'sthiel Sateen nach", '', ' ', '', 'Ly] Preis aufsteigend', '', '[ Tternname Anzahl Preis pro Stick Zeitraum. Verkaufer', '', ' ', '', '
', '', ' ', '', 'Holzstock 1 149,999 30 Tag#e) Heavend', '', '
I just want to get that bold section (name, amount, price) out of the output but i really dont know how to cut it out of that text mess.
Does someone got an idea how can i achieve it?
Thank you.

I think the best method is finding Holzstock section of your images. It's easy, you could use advanced models like YOLO Or you could try it using feature description and matching by classical methods like SURF and SIFT and .... Then crop that part and feed it to tesseract.
This method has some benefits, for example you will find all Holzstock section of your images and doing OCR after that. It leads to decresing OCR error and remove unnecessary parts of text.

parsing a list of strings based on values in the string

I scraped data from a website and output the results in a list using the following code to get the following output using beautifulsoup and requests:
['1\n',
' Saul Alvarez*',
'1545\n',
'\n\n',
' middle\n',
' 30\n',
' 53\xa01\xa02\n',
' \n',
'orthodox\n',
'Guadalajara, Mexico',
'2\n',
' Tyson Fury',
'1030\n',
'\n\n',
' heavy\n',
' 32\n',
' 30\xa00\xa01\n',
' \n',
'orthodox\n',
'Wilmslow, United Kingdom',
'3\n',
' Errol Spence Jr',
'697.2\n',
'\n\n',
' welter\n',
' 30\n',
' 27\xa00\xa00\n',
' \n',
'southpaw\n',
'Desoto, USA',
'4\n',
' Terence Crawford',
'658.9\n',
'\n\n',
' welter\n',
...
I'm having difficulty parsing this list wherever there is an integer + '\n'.
So ideally I would like the output to be a list of lists :
[[
'1\n',
' Saul Alvarez*',
'1545\n',
'\n\n',
' middle\n',
' 30\n',
' 53\xa01\xa02\n',
' \n',
'orthodox\n',
'Guadalajara, Mexico'
],
['2\n',
' Tyson Fury',
'1030\n',
'\n\n',
' heavy\n',
' 32\n',
' 30\xa00\xa01\n',
' \n',
'orthodox\n',
'Wilmslow, United Kingdom']
['3\n',
' Errol Spence Jr',
'697.2\n',
'\n\n',
' welter\n',
' 30\n',
' 27\xa00\xa00\n',
' \n',
'southpaw\n',
'Desoto, USA'],
...]

Well, there are 2 things going on, and I'll address only the first.
You can drop blanks and '\n' because those are newline characters, i.e. linefeeds.
li = ['1\n',
' Saul Alvarez*',
'1545\n',
'\n\n',
' middle\n',
' 30\n',
' 53\xa01\xa02\n',
' \n',
]
li = [val.replace(r"\n","") for val in li]
li = [val.strip() for val in li if val.strip()]
print(li)
That outputs:
['1', 'Saul Alvarez*', '1545', 'middle', '30', '53\xa01\xa02']
Second issue, which I won't address here as we don't know the html format which you haven't given, is that you are grabbing all the element values (the text in each tag) without looking at the HTML markup's structure. That's the wrong approach to take.
I assume that if you look at the page's source you might find something like <div class="name">Saul Alvarez</div><div class="weightclass">middle</div>. Using the markup's annotation and semantic context is more productive than trying to guess at the structure from the above list with 6 elements. BeautifulSoup can do it, trying using soup.select("div.name") for example.
The nice thing with soup.select which uses CSS selectors is that you can pre-test your query in your browser's dev tools.
Just remember, soup.select will return a list of html elements, from which you'll want to look at the value.

Having trouble increasing iterator with multiple for loops

I am a beginner in Python and I decided to attempt a script to help me in my part time job as a weather observer. Essentially, I have a list of observations and I have written regex's to gather the information that I need and put it in a spreadsheet using xlwings. I have a list of partial observations and I am trying to extract the sky condition from these partial obs. The sky condition contains any of the words that I have in the list called "words". I am sure there is a much better way to do this but I am trying to have the script look at the items in each element of the list and then determine whether or not one of the key words is in the element. If it finds one, I am adding it to a new list called 'found' and then I eventually want to add this information to the excel spreadsheet. My problem is I am having trouble finding where to increment over the different sky conditions. I need it to iterate over each element in a line on the skycons list, then I need it to increment to the next line. I feel like I have moved the increment to several different parts of the script but it still won't work properly. It will either increment too soon and will not iterate over all the elements of the line or it will not increment at all. Here is the code...
skycons = [' -RA BR BKN008 OVC012 09/08 ',
' RA BR BKN008 OVC012 09/08 ',
' R02/2600VP6000FT -RA BR BKN008 OVC012 09/08 ',
' -RA BR SCT007 BKN013 OVC019 09/08 ',
' CLR 09/08 ']
i = 0
words = ['FEW', 'SCT', 'BKN', 'OVC', 'CLR']
found = []
for item in skycons[i].split():
print(skycons[i].split())
for word in words:
print(word)
if item.startswith(word):
found += item
print(found)
sky = " ".join(found)
print(sky)
i+=1
This may be a mess and I am open to suggestions. I am basically needing to grab the sky condition from each observation and insert it in a spreadsheet. I have tried doing this strictly with a regex but it would end up grabbing other elements of the observation that is not present in the above list. Any assistance is greatly appreciated.

found += item doesn't do what you seem to think it does. Look at this:
>>> found = []
>>> found += 'OVC'
>>> found
['O', 'V', 'C']
I don't think that's what you're expecting. It appends each character to the list, not the entire string. Try append() instead:
found.append(item)
Also, i isn't getting incremented at the right times. I think you need another outer loop to go through the METARs. Here's what I ended up with:
skycons = [' -RA BR BKN008 OVC012 09/08 ',
' RA BR BKN008 OVC012 09/08 ',
' R02/2600VP6000FT -RA BR BKN008 OVC012 09/08 ',
' -RA BR SCT007 BKN013 OVC019 09/08 ',
' CLR 09/08 ']
words = ['FEW', 'SCT', 'BKN', 'OVC', 'CLR']
for metar in skycons:
found = []
for item in metar.split():
for word in words:
if item.startswith(word):
found.append(item)
sky = " ".join(found)
print(sky)
... which outputs...
BKN008 OVC012
BKN008 OVC012
BKN008 OVC012
SCT007 BKN013 OVC019
CLR

Question about changing the argument in range function through iterations

I'm a newbie so I'm really sorry if this is too basic of a question, but I just couldn't solve it on my own. Perhaps it's not considered complex enough ( at all ) which would explain why I couldn't find an adequate answer online.
I've made a tic-tac-toe program following the Automate the Boring Stuff with Python textbook, but modified it a tiny bit so it doesn't allow players to enter 'X'/'O' in already filled slots. Here's what it looks like :
theBoard = {'top-L': ' ', 'top-M': ' ', 'top-R': ' ',
'mid-L': ' ', 'mid-M': ' ', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': ' '}
def printBoard(board):
print(board['top-L']+'|'+board['top-M']+'|'+board['top-R'])
print('-+-+-')
print(board['mid-L']+'|'+board['mid-M']+'|'+board['mid-R'])
print('-+-+-')
print(board['low-L']+'|'+board['low-M']+'|'+board['low-R'])
turn='X'
_range=9
for i in range(_range):
printBoard(theBoard)
print('''It is the '''+turn+''' player's turn.''')
move=input()
if theBoard[move]==' ':
theBoard[move]=turn
else:
print('The slot is already filled !')
_range+=1
if turn=='X':
turn ='O'
else:
turn='X'
printBoard(theBoard)
However, it doesn't seem like the _range variable is being increased by one at all through iterations where I intentionally enter 'X'\'O' in the slots where such symbols are already existent.
Is there something that I'm missing her ? Is there any way I could make this work as I planned it ?
Thank you in advance.

It could be easier to sanitize the input immediately via a while loop instead of increasing the range.
move = input(f"It is the {turn} player's turn.")
while theBoard[move]!=' ':
move=input('The slot is already filled')
Your attempt did not work because you changed _range after it being used and it is not used after you changed it.

How do I interpret the results of a sentence's parse tree built using Spacy in Python?

I'm trying to build and interpret the results of a parse tree of a sentence using Spacy in Python.
I've used the below code for the same :
from spacy.en import English
nlp=English()
example = "The angry bear chased the frightened little squirrel"
parsedEx = nlp(unicode(example))
for token in parsedEx:
print("Head:", token.head, " Left:",token.left_edge, " Right:",token.right_edge ," Relationship:",token.dep_)
The code gave the below result.Can someone tell me how to interpret it? Thanks in advance!
('Head:', bear, ' Left:', The, ' Right:', The, ' Relationship:', u'det')
('Head:', bear, ' Left:', angry, ' Right:', angry, ' Relationship:', u'amod')
('Head:', chased, ' Left:', The, ' Right:', bear, ' Relationship:', u'nsubj')
('Head:', chased, ' Left:', The, ' Right:', squirrel, ' Relationship:', u'ROOT')
('Head:', squirrel, ' Left:', the, ' Right:', the, ' Relationship:', u'det')
('Head:', squirrel, ' Left:', frightened, ' Right:', frightened, ' Relationship:', u'amod')
('Head:', squirrel, ' Left:', little, ' Right:', little, ' Relationship:', u'amod')
('Head:', chased, ' Left:', the, ' Right:', squirrel, ' Relationship:', u'dobj')

You could interpret the dependency tree by listing out it's edges as shown below:
import spacy
nlp = spacy.load('en')
doc = nlp(u'The world has enough for everyone\'s need, not for everyone\'s greed')
for tok in doc:
print('{}({}-{}, {}-{})'.format(tok.dep_, tok.head.text, tok.head.i, tok.text, tok.i))
The result of the above code will look like this:
det(world-1, The-0)
nsubj(has-2, world-1)
ROOT(has-2, has-2)
dobj(has-2, enough-3)
prep(enough-3, for-4)
poss(need-7, everyone-5)
case(everyone-5, 's-6)
pobj(for-4, need-7)
punct(need-7, ,-8)
neg(for-10, not-9)
prep(need-7, for-10)
poss(greed-13, everyone-11)
case(everyone-11, 's-12)
pobj(for-10, greed-13)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable To Remove Whitespace On String Using Python - python

love_maybe_lines = ['Always ', ' in the middle of our bloodiest battles ', 'you lay down your arms', ' like flowering mines ', ' to conquer me home. '] love_maybe_lines = [item.strip() for item in love_maybe_lines] That may help.

Related

Cut out unnecessary characters from pytesseract output

parsing a list of strings based on values in the string

Having trouble increasing iterator with multiple for loops

Question about changing the argument in range function through iterations

How do I interpret the results of a sentence's parse tree built using Spacy in Python?

Categories

Resources