Python re.findall (returns a number)[0] - python

I'm trying to teach myself a little python and in the process I'm 'borrowing' code from places to help build my project. A snipit from a piece of code I have which extracts a temperature value from a string looks like this...
re.findall(r"Temp=(\d+.\d+)", *string_variable*)[0]
for the life of me, I cannot find any documentation on what the "[0]" is used for at the end and how to use it.
Obviously I figured out that without it my final output is something like this:
['71.8']
and with it, my number is cleaner and rounded up:
72.0
Can someone point me to where this is documented so I can better understand how to use it in the future?

re.findall(r"Temp=(\d+.\d+)", string_variable) returns a list, [0] gets the first element of that list.
This is a sign that your method of teaching yourself by looking at snippets of code without context is not working. Go through a more traditional tutorial.

This documentation for re in the section re.findall states "Return all non-overlapping matches of pattern in string, as a list of strings." So the return value is a list. The Python Tutorial section on lists explains what [0] at the end of the list does.
I highly recommend that you read through the entire Python Tutorial, as I did, or something similar, to learn Python.

Related

Forward slash notation with Python dictionaries

I've never seen forward slash notation with Python dictionaries, and when looking to the official documentation, I couldn't find any reference, so I'm hoping someone can school me here.
I was playing with a new library I hope to use on a project when I ran across the notation:
object['/someKeyword']['/anotherKeyword'].someMethod()
I didn't understand what the bracketed terms meant at the time. A colleague helped me understand it was dictionary notation, but I haven't been able to find any follow up to study.
Any information on the notation would be helpful!
Those are strings, and you can have any unicode codepoint as string constituting character, which obviously includes /.
In the example, it seems a dict like object is being accessed by subscription with string keys that start with /.

Google cloud speech to text grammar to narrow results to a number?

I very simply want to pass is a tiny audio clip (8Khz telephony) containing a single digit number, and get back a single digit number as text, narrowed down to a number.
File in > number as text out. Preferably via the python command line API.
The problem is, by default, it recognises things like 1,2,3,4,5 as won,too,free,fore,5 ... no good!
I believe I want what is called a grammar? Or something like Amazon's number slot types it uses in Alexa? I've looked over the cloud speech docs and can't find it. The only thing I could think of is looping over the alternatives given and see if any match an int rather than a word. And if none do, then what?
Thanks.
A.Queue's answer is correct, however, in case others are bitten by the docs:
The link given suggests:
{ "phrases": [ string], }
The python documentation says:
speech_contexts
Optional: A means to provide context to assist the speech recognition.
The python examples show:
language_code='en-US',
max_alternatives=max_alternatives,
profanity_filter=True,
speech_contexts=['Google', 'cloud'],
What actually works is:
speech_contexts=[speech.types.SpeechContext(
phrases=['Google', 'cloud'],
)]
I managed to get this from a Googler on Slack who pointed me to some alternative more comprehensive and accurate documentation. Bookmark that last link for future sanity.
Try adding speechContexts. You can then add a few phrases that you think are most probable.

What is the need of memoization in python

I was reading this article
http://programmingzen.com/2009/05/18/memoization-in-ruby-and-python/
Actually can anyone please explain with example what will happen if i don't use it. I am not able to find which problem is solved by it. I just want to know two example where in one simple example without memoization and other with memoization so that i can see why we use it.
If example can be based on web realted stuff or Django that will be good so that i cam more understand it. I am not too techy in algorithms
Explained simply, I'll put the question like this. How many "E" characters are there in this block of text?
Now, how many "E" characters are there in the first block of text?
And now, how many "E" characters are there in the first block of text?
Finally, how many "E" characters are there in the first block of text?
If you were wondering, there were 9 "e"s and 2 "E"s in that first block. By the second run through, you probably already memorized how many "E"s there were in the first block. That's memoization for a count function/method over that block a text.
Memoization caches (stores) the most recently used results of the function so it can retrieve them fast later. Basically if you a function is slow but has has the same results most of the time it can be helpful.

Trying to understand which is better in python creating variables or using expressions

One of the practices I have gotten into in Python from the beginning is to reduce the number of variables I create as compared to the number I would create when trying to do the same thing in SAS or Fortran
for example here is some code I wrote tonight:
def idMissingFilings(dEFilings,indexFilings):
inBoth=set(indexFilings.keys()).intersection(dEFilings.keys())
missingFromDE=[]
for each in inBoth:
if len(dEFilings[each])<len(indexFilings[each]):
dEtemp=[]
for filing in dEFilings[each]:
#dateText=filing.split("\\")[-1].split('-')[0]
#year=dateText[0:5]
#month=dateText[5:7]
#day=dateText[7:]
#dETemp.append(year+"-"+month+"-"+day+"-"+filing[-2:])
dEtemp.append(filing.split('\\')[-1].split('-')[0][1:5]+"-"+filing.split('\\')[-1].split('-')[0][5:7]+"-"+filing.split('\\')[-1].split('-')[0][7:]+"-"+filing[-2:])
indexTemp=[]
for infiling in indexFilings[each]:
indexTemp.append(infiling.split('|')[3]+"-"+infiling[-6:-4])
tempMissing=set(indexTemp).difference(dEtemp)
for infiling in indexFilings[each]:
if infiling.split('|')[3]+"-"+infiling[-6:-4] in tempMissing:
missingFromDE.append(infiling)
return missingFromDE
Now I split one of the strings I am processing 4 times in the line dEtemp.append(blah blah blah)
filing.split('\\')
Historically in Fortran or SAS if I were to attempt the same I would have 'sliced' my string once and assigned a variable to each part of the string that I was going to use in this expression.
I am constantly forcing myself to use expressions instead of first resolving to a value and using the value. The only reason I do this is that I am learning by mimicking other people's code but it has been in the back of my mind to ask this question - where can I find a cogent discussion of why one is better than the other
The code compares a set of documents on a drive and a source list of those documents and checks to see whether all of those from the source are represented on the drive
Okay the commented section is much easier to read and how I decided to respond to nosklos answer
Yeah, it is not better to put everything in the expression. Please use variables.
Using variables is not only better because you will do the operation only once and save the value for multiple uses. The main reason is that code becomes more readable that way. If you name the variable right, it doubles as free implicit documentation!
Use more variables. Python is known for its readability; taking away that feature is called not "Pythonic" (See https://docs.python-guide.org/writing/style/). Code that is more readable will be easier for others to understand, and easier to understand yourself later.

Parsing a range of integers in a list

I've just began learning Python and I've ran into a small problem.
I need to parse a text file, more specifically an HTML file (but it's syntax is so weird - divs after divs after divs, the result of a Google's 'View as HTML' for a certain PDF i can't seem to extract the text because it has a messy table done in m$ word).
Anyway, I chose a rather low-level approach because i just need the data asap and since I'm beginning to learn Python, I figured learning the basics would do me some good too.
I've got everything done except for a small part in which i need to retrieve a set of integers from a set of divs. Here's an example:
<div style="position:absolute;top:522;left:1020"><nobr>*88</nobr></div>
Now the numbers i want to retrieve all the ones inside <nobr></nobr> (in that case, '588') and, since it's quite a messy file, i have to make sure that what I am getting is correct. To do so, that number inside <nobr></nobr> must be preceded by "left:1020", "left:1024" or "left:1028". This is because of the automatic conversion and the best choice would be to get all the number preceded by left:102[0-] in my opinion.
To do so, I was trying to use:
for o in re.finditer('left:102[0-9]"><nobr>(.*?)</nobr></div>', words[index])
out = o.group(1)
But so far, no such luck... How can I get those numbers?
Thanks in advance,
J.
Don't use regular expressions to parse HTML. BeautifulSoup will make light work of this.
As for your specific problem, it might be that you are missing a colon at the end of the first line:
for o in re.finditer('left:102[0-9]"><nobr>(.*?)</nobr></div>', words[index]):
out = o.group(1)
If this isn't the problem, please post the error you are getting, at what you expect the output to be.

Categories