How can I find this pattern in a text document?

How can I find this pattern in a text document? - python

So im practicing some RegEx in python and essentially I want to look through a log of transaction numbers and see if any of them are returning an error such as Error in phone Activation.
I was successful in searching in a dictionary for something that starts with Error and then ends with Activation, so that if it was tablet, watch, etc , it would still find the error. However, as a bulk text file, it will not successfully find the pattern.
So the code I used to find it in a dictionary was such that the dictionary key was a transaction number and the error (or lack thereof) was the value:
for i in Transaction_Log:
if bool(re.search("^Error.* Activation$", Transaction_Log[i])):
print("Found requested error in transaction number " + i)
error_count += 1
This works, however using the same search function cant find anything when in a text file setup like this:
Transnum: 20190510001 error: Error in phone Activation,
Transnum: 20190510002 error: none,
Transnum: 20190510003 error: Error in tablet Activation,
Ideally, it can find the type of errors, and when successful I can make a counter to see how many there are, however my boolean statement is not True when searching this way through a text file.
Searching for just the word Error does work though.

With the help of #CAustin, I figured out that I was searching for the wrong pattern due to the line not starting with error and the ending of the line also having a comma at the end. By removing both anchors, I was able to find what I needed to find in this example, so for anyone else looking for something similar it was this...
for line in testingDoc:
if bool(re.search("Error.* Activation", line)):
print("found error in transaction")

Related

How to get partial match with soup.find()?

For some reason I wasn't able to find the answer to this somewhere.
So, I'm using this
soup.find(text="Dimensions").find_next().text
to grab the text after "Dimensions". My problem is on the website I'm scraping sometimes it is displayed as "Dimensions:" (with colon) or sometimes it has space "Dimensions " and my code throws an error. So that's why I'm looking for smth like (obviously, this is an invalid piece of code) to get a partial match:
soup.find(if "Dimensions" in text).find_next().text
How can I do that?

Ok, I've just found out looks like it's much simpler than I thought
soup.find(text=re.compile(r"Dimensions")).find_next().text
does what I need

How do I add an error message to a question box?

I am trying to make a revision material for classmates, and I have a list of terms for OCR Computer Science, in a dictionary named "my_dict". However, if a term is entered incorrectly, it just sends an error message to the Python shell. If anyone can help me add an error message to the code provided, that would be much appreciated.
I have tried basic if, while, next loops, but to no avail.
def button_click():
typed_text = (entry1.get()).lower()
output_text.delete(0.0, END)
if typed_text is in my_dict{}:
meaning = my_dict[(typed_text)]
else:
meaning = str("Are you sure you entered the term correctly?"
output_text.insert(END, meaning)
I expect the output to fill the output box with the error message "Are you sure you entered the term correctly?", but the actual output is an invalid syntax currently.

my_dict{} is not a valid syntax, you just have to pass my_dict for the in expression. is in is also not valid syntax, so just use in
In the str("Are you sure you entered the term correctly?" you are missing a paranthese at the end.
There are some spots where you don't need extra parantheses. Ditch them in (entry1.get()) and [(typed_text)]

Word .docx created using python-docx + python-docx-template results in eror

I'm using python-docx and python-docx-template to generate a multi-page document. Word on both MacOS and Windows complains about an error in the .docx file produced, yet Word is able to open the file if allowed to continue and the document looks fine when opened. (On MacOS, the error dialog reads "HRESULT 0x80004005 Location: Part: /word/document.xml, line 0, column 0").
The .docx template is a pretty simple one-page document. The loop to construct the compound document is based on an answer to another question and is this simple Python code:
overall_doc = Document()
num_pages = len(records_list)
for index, record in enumerate(records_list):
page = DocxTemplate(template)
values = vars(records_list[index])
page.render(values)
if index < (num_pages - 1):
page.add_page_break()
for element in page.element.body:
overall_doc.element.body.append(element)
overall_doc.save('outputfile.docx')
The values substituted into the template are UTF-8 strings with no special characters (and in particular, no ampersands or greater than/less than characters). I've verified the problem is not due to the string values being substituted into the template.
If you break the loop after the first page is created, no error results. If the loop is allowed to create even just 2 pages, the error in Word occurs. If I remove the page break code altogether, the error still occurs. If I add an extra page break at the end, the error still occurs.
I've tried to find a docx validation tool. The only thing I have been able to run is docx4j's OpenMainDocumentAndTraverse function, which as far as I can tell, should report errors. But docx4j does not report any error with the output document.
What could cause this error? If my mistake is not obvious, how can I diagnose the reason that Word is complaining?

BioPython Pubmed Eutils url?

I'm trying to run some queries against Pubmed's Eutils service. If I run them on the website I get a certain number of records returned, in this case 13126 (link to pubmed).
A while ago I bodged together a python script to build a query to do much the same thing, and the resultant url returns the same number of hits (link to Eutils result).
Of course, not having any formal programming background, it was all a bit cludgy, so I'm trying to do the same thing using Biopython. I think the following code should do the same thing, but it returns a greater number of hits, 23303.
from Bio import Entrez
Entrez.email = "A.N.Other#example.com"
handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
record = Entrez.read(handle)
print(record["Count"])
I'm fairly sure it's just down to some subtlety in how the url is being generated, but I can't work out how to see what url is being generated by Biopython. Can anyone give me some pointers?
Thanks!
EDIT:
It's something to do with how the url is being generated, as I can get back the original number of hits by modifying the code to include double quotes around the search term, thus:
handle = Entrez.esearch(db='pubmed', term='"stem+cell"[ALL]', datetype='pdat', mindate='2012', maxdate='2012')
I'm still interested in knowing what url is being generated by Biopython as it'll help me work out how i have to structure the search term for when i want to do more complicated searches.

handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
print(handle.url)

You've solved this already (Entrez likes explicit double quoting round combined search terms), but currently the URL generated is not exposed via the API. The simplest trick would be to edit the Bio/Entrez/__init__.py file to add a print statement inside the _open function.
Update: Recent versions of Biopython now save the URL as an attribute of the returned handle, i.e. in this example try doing print(handle.url)

ValueError: invalid literal for int() with base 10: 'MSIE'

After I run my Python code on a big file of only HTTP headers, it gives me the above error. Any idea what that means?
Here is a piece of the code:
users = output.split(' ')[1]
accesses = output.split(' ')[3]
ave_accesses = int(accesses)/int(users)
Basically the 'users' are users who have accessed a website and 'accesses' are the total number of accesses by the users to that site. The 'ave_accesses' gives the number of accesses to that site by an average user. I hope this is enough to clear things, if not I can explain more.
thanks a lot, Adia.

It means that you are trying to convert a string to an integer, and the value of the string is 'MSIE'. The traceback will have a filename near this error and the line number (e.g., /my/module.py:123). Open the file and go to the line the error occurred, you should see a call to int() with a parameter. That parameter is probably supposed to be a number in string form, but it's not. You probably got your parsing code a little wrong, and fields were mixed up.
To track down the problem, use print statements around the code to see what is not working as expected. You can also use pdb.

I think, your header output is garbled. It is obviously looking for a number where it is find an string MSIE (which may be the value for User-Agent).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I find this pattern in a text document? - python

Related

How to get partial match with soup.find()?

How do I add an error message to a question box?

Word .docx created using python-docx + python-docx-template results in eror

BioPython Pubmed Eutils url?

ValueError: invalid literal for int() with base 10: 'MSIE'

Categories

Resources