Extract and get value from a text file python - python

I have executed ssh commands in remote machine using paramiko library and written output to text file. Now, I want to extract few values from a text file. The output of a text file looks as pasted below
b'\nMS Administrator\n(C) Copyright 2006-2016 LP\n\n[MODE]> SHOW INFO\n\n\nMode: \nTrusted Certificates\n1 Details\n------------\n\tDeveloper ID: MS-00c1\n\tTester ID: ms-00B1\n\tValid from: 2030-01-29T06:51:15Z\n\tValid until: 2030-01-30T06:51:15Z\n\t
how do i get the value of Developer ID and Tester ID. The file is huge.
As suggested by users I have written the snippet below.
file = open("Output.txt").readlines()
for lines in file:
word = re.findall('Developer\sID:\s(.*)\n', lines)[0]
print(word)
I see the error IndexError: list index out of range
If i remove the index. I see empty output

file = open("Output.txt").readlines()
developer_id=""
for lines in file:
if 'Developer ID' in line:
developer_id = line.split(":")[-1].strip()
print developer_id

You can use Regular expressions
text = """\nMS Administrator\n(C) Copyright 2006-2016 LP\n\n[MODE]> SHOW INFO\n\n\nMode: \nTrusted Certificates\n1 Details\n------------\n\tDeveloper ID: MS-00c1\n\tTester ID: ms-00B1\n\tValid from: 2030-01-29T06:51:15Z\n\tValid until: 2030-01-30T06:51:15Z\n\t"""
import re
developerID = re.match("Developer ID:(.+)\\n", text).group(0)
testerID = re.match("Tester ID:(.+)\\n", text).group(0)

If your output is consistent in format, you can use something as easy as line.split():
developer_id = line.split('\n')[11].lstrip()
tester_id = line.split('\n')[12].lstrip()
Again, this assumes that every line is using the same formatting. Otherwise, use regex as suggested above.

Related

Is there a way to detect exisiting link from a text file in python

I have code in jupyter notebook with the help of requests to get confirmation on whether that url existed or not and after that prints out the output into the text file. Here is the line code for that
import requests
Instaurl = open("dictionaries/insta.txt", 'w', encoding="utf-8")
cli = ['duolingo', 'ryanair', 'mcguinness.paddy', 'duolingodeutschland', 'duolingobrasil']
exist=[]
url = []
for i in cli:
r = requests.get("https://www.instagram.com/"+i+"/")
if r.apparent_encoding == 'Windows-1252':
exist.append(i)
url.append("instagram.com/"+i+"/")
Instaurl.write(url)
Let's say that inside the cli list, i accidentally added the same existing username as before into the text file (duolingo for example). Is there a way where if the requests found the same URL from the text file, it would not be added into the the text file again?
Thank you!
You defined a list:
cli = ['duolingo', ...]
It sounds like you would prefer to define a set:
cli = {'duolingo', ...}
That way, duplicates will be suppressed.
It happens for dups in the initial
assignment, and for any duplicate cli.add(entry) you might attempt later.

Get the 'Last saved by' (windows file) with python

How can I get the username value from the "Last saved by" property from any windows file?
e.g.: I can see this info right clicking on a word file and accessing the detail tab. See the picture below:
Does any body knows how can I get it using python code?
Following the comment from #user1558604, I searched a bit on google and reached a solution. I tested on extensions .docx, .xlsx, .pptx.
import zipfile
import xml.dom.minidom
# Open the MS Office file to see the XML structure.
filePath = r"C:\Users\Desktop\Perpetual-Draft-2019.xlsx"
document = zipfile.ZipFile(filePath)
# Open/read the core.xml (contains the last user and modified date).
uglyXML = xml.dom.minidom.parseString(document.read('docProps/core.xml')).toprettyxml(indent=' ')
# Split lines in order to create a list.
asText = uglyXML.splitlines()
# loop the list in order to get the value you need. In my case last Modified By and the date.
for item in asText:
if 'lastModifiedBy' in item:
itemLength = len(item)-20
print('Modified by:', item[21:itemLength])
if 'dcterms:modified' in item:
itemLength = len(item)-29
print('Modified On:', item[46:itemLength])
The result in the console is:
Modified by: adm.UserName
Modified On: 2019-11-08"

Using other languages with Pylatex

I'm trying to get hebrew to pring into a pdf using pylatex. In a sample hebrew .tex file that I'm trying to emulate the format of, the header looks like this:
%\title{Hebrew document in WriteLatex - מסמך בעברית}
\documentclass{article}
\usepackage[utf8x]{inputenc}
\usepackage[english,hebrew]{babel}
\selectlanguage{hebrew}
\usepackage[top=2cm,bottom=2cm,left=2.5cm,right=2cm]{geometry}
I was able to emulate this entire header except for the line \selectlanguage{hebrew}. I'm not sure how I should go about getting this in my .tex file using pylatex. The code for generating the rest of the file is:
doc = pylatex.Document('basic', inputenc = 'utf8x', lmodern = False, fontenc = None, textcomp = None)
packages = [Package('babel', options = ['english', 'hebrew']), Package('inputenc', options = 'utf8enc')]
doc.packages.append(Package('babel', options = ['english', 'hebrew']))
doc.append(text.decode('utf-8'))
doc.generate_pdf(clean_tex=False, compiler = "XeLaTeX ")
doc.generate_tex()
And the header of the .tex file generated is:
\documentclass{article}%
\usepackage[utf8x]{inputenc}%
\usepackage{lastpage}%
\usepackage[english,hebrew]{babel}%
How do you get the selectlanguage line there? I'm pretty new to latex so I apologize for not being so accurate with my terminology.
You want to use Command:
from pylatex import Command
To add it to your preamble,
doc.preamble.append(Command('selectlanguage', 'hebrew'))
or to another specific place in your document,
doc.append(Command('selectlanguage', 'hebrew'))

Python - parse inconsistently delimted data table

I have a folder of emails that contain data that I need to extract and put into a Database. These emails are all formatted differently so I've grouped them by how similar their formats are. The following two emails bodies are examples of what I am trying to parse right now:
1)
2)
So in my attempts to extract the valuable data (the fish stock, the weight, the price, the sector, the date) I have tried several methods. I have a list of all possible 30+ stocks, and I run a RegEx on the entire email.
fishy_re = re.compile(r'({}).*?([\d,]+)'.format('|'.join(stocks)), re.IGNORECASE|re.MULTILINE|re.DOTALL)
This RegEx, I was told, will search for any occurrence of a fish, then capture the next number that follows, and then group the two together.....and it does that job perfect. But when I tried adding an additional .*?([\d,]+) chunk to capture the NEXT number (the price, as seen in email 2) it fails to do that.
Is my RegEx that tries to grab the price wrong?
Also, in trying to deal with emails that have a Package deal (email 1), I again tried using RegEx to search for any line that has the word Package and then capture the next number that follows on that line.
word = ['package']
package_re = re.compile(r'({}).*?([\d,]+)'.format('|'.join(word)), re.IGNORECASE|re.MULTILINE|re.DOTALL)
But that produces nothing....even when doing a simple command like:
with open(file_path) as f:
for line in f:
for match in package2_re.finditer(f.read()):
print("yes")
It fails to print yes.
So is there a more effective way to extract the Package price information?
Thanks.
I created my own test email and parsed it like so:
import bs4 # BeautifulSoup html parsing
import email # built-in Python mail-parsing library
FNAME = "c:/users/Stephen/mail/test.eml" # full path to saved email
# load message
with open(FNAME) as in_f:
msg = email.message_from_file(in_f)
# message is multipart/MIME - payload 0 is text, 1 is html
html_msg = msg.get_payload(1)
# grab the body of the message
body = html_msg.get_payload(decode=True)
# convert from bytes to unicode
html = body.decode()
# now parse to get table
table = bs4.BeautifulSoup(html).find("table")
data = [[cell.text.strip() for cell in row.find_all("td")] for row in table.find_all("tr")]
which returns something like
[
['', 'LIVE WGT', ''],
['BGE COD', '746', ''],
['GBW CODE', '13,894', ''],
['GOM COD', '60', 'Package deal $52,500.00'],
# etc
]

How can I remove a ppa using python apt module?

I am able to add a ppa using it but cannot remove. I cannot find the correct syntax to remove the ppa from sources.list. Here's my code:
import aptsources.sourceslist as s
repo = ('deb', 'http://ppa.launchpad.net/danielrichter2007/grub-customizer/ubuntu', 'xenial', ['main'])
sources = s.SourcesList()
sources.add(repo)
sources.save()
#doesn't work
sources.remove(repo)
I tried reading the docs found here but I still cannot find the format to call sources.remove(repo)
The SourcesList.remove() help text reads remove(source_entry), which indicates that what it wants is a SourceEntry object. As it hapens, sources.add() returns a SourceEntry object:
import aptsources.sourceslist as sl
sources = sl.SourcesList()
entry = sources.add('deb', 'mirror://mirrors.ubuntu.com/mirrors.txt', 'xenial', ['main'])
print(type(entry))
Outputs:
<class 'aptsources.sourceslist.SourceEntry'>
To remove the entry:
sources.remove(entry)
sources.save()
You can also disable it (which will leave a commented-out entry in sources.list:
entry.set_enabled(False)
sources.save()
I'm using this to do the removing for now.
import fileinput
filename = '/etc/apt/sources.list'
word = 'grub-customizer'
n = ""
remove = fileinput.input(filename, inplace=1)
for line in remove:
if word in line:
line = n
line.strip()
print line,
remove.close()

Categories