searching between two csv output in python - python

Guys I have written this code which ssh to server using paramiko module and get output in csv format for couple of commands. Here is the code and output:-
stdin, stdout, stderr = ssh_client.exec_command('isi nfs exports list --verbose --format=csv')
nfs_exports = (stdout.read().decode(encoding='ascii'))
stdin, stdout, stderr = ssh_client.exec_command('isi sync policies list --format=csv | grep True')
active_sync = (stdout.read().decode(encoding='ascii'))
print(nfs_exports)
16,System,"/test/true/usa/synctest","true
29,System,"/test/lab/Lab_File_Pool_1",false
32,System,"/test/vipr/Lab_File_Pool_1",false
33,System,"/test/testing2/apps001",null
print(active_sync)
synctest,/test/nam/test/synctest,sync,True,target.domain123.com
synctest,/test/lab/Lab_File_Pool_1,sync,True,target.domain123.com
synctest,/test/nar/usa/synctest,sync,True,target.domain123.com
synctest,/test/testing2/apps001,sync,True,target.domain123.com
synctest,/test/true/usa/synctest,sync,True,target.domain123.com
Now the challenging part for me is I need to search for path ("/test/true/usa/synctest") from nfs_exports output in active_sync output. If path matches I need to create new CSV output with all the information from nfs_export.
Desired Output is:-
33,System,"/test/testing2/apps001",null
29,System,"/test/lab/Lab_File_Pool_1",false

First, I would parse the CSV strings into a more usable format. You could use the csv library or re (regular expressions).
Then, if I understand the question correctly, it should just be a simple double-nested for loop.
for each line in one output:
for each line in the other output:
do the paths on the two lines match?
import csv
import io
def parse_csv(string):
string_file = io.StringIO(string)
reader = csv.reader(string_file)
return list(reader)
nfs_exports = parse_csv(nfs_exports)
active_sync = parse_csv(active_sync)
print(nfs_exports)
print(active_sync)
results = []
for active_sync_line in active_sync:
active_sync_path = active_sync_line[1]
for nfs_export_line in nfs_exports:
nfs_export_path = nfs_export_line[2]
if nfs_export_path.strip() == active_sync_path.strip():
results.append(nfs_export_line)
print("output:")
for line in results:
print(",".join(line))
This gives the output:
29,System,/test/lab/Lab_File_Pool_1,false
33,System,/test/testing2/apps001,null
16,System,/test/true/usa/synctest,true
Which is a little different than what you posted but if I understand what you were asking correctly it should be right. If not -- please let me know and I can amend this answer.

Related

Is there a way to detect exisiting link from a text file in python

I have code in jupyter notebook with the help of requests to get confirmation on whether that url existed or not and after that prints out the output into the text file. Here is the line code for that
import requests
Instaurl = open("dictionaries/insta.txt", 'w', encoding="utf-8")
cli = ['duolingo', 'ryanair', 'mcguinness.paddy', 'duolingodeutschland', 'duolingobrasil']
exist=[]
url = []
for i in cli:
r = requests.get("https://www.instagram.com/"+i+"/")
if r.apparent_encoding == 'Windows-1252':
exist.append(i)
url.append("instagram.com/"+i+"/")
Instaurl.write(url)
Let's say that inside the cli list, i accidentally added the same existing username as before into the text file (duolingo for example). Is there a way where if the requests found the same URL from the text file, it would not be added into the the text file again?
Thank you!
You defined a list:
cli = ['duolingo', ...]
It sounds like you would prefer to define a set:
cli = {'duolingo', ...}
That way, duplicates will be suppressed.
It happens for dups in the initial
assignment, and for any duplicate cli.add(entry) you might attempt later.

gnupg - decrypt into Python bytesio stream

How can I select a stream as the output of a decrypt_file operation in gnupg?
The docs and the code seem to suggest this is not possible. If I am correct (see below), what workarounds are possible?
~~~
The documentation seems to suggest it is not possible:
decrypt_file(filename, always_trust=False, passphrase=None, output=None)¶
with "output (str) – A filename to write the decrypted output to."
~~~
Opening up the code, I see:
def decrypt_file(self, file, always_trust=False, passphrase=None,
output=None, extra_args=None):
args = ["--decrypt"]
if output: # write the output to a file with the specified name
self.set_output_without_confirmation(args, output)
if always_trust: # pragma: no cover
args.append("--always-trust")
if extra_args:
args.extend(extra_args)
result = self.result_map['crypt'](self)
self._handle_io(args, file, result, passphrase, binary=True)
logger.debug('decrypt result: %r', result.data)
return result
which points to set_output_without_confirmation, confirming the idea is that you pass a string filename:
def set_output_without_confirmation(self, args, output):
"If writing to a file which exists, avoid a confirmation message."
if os.path.exists(output):
# We need to avoid an overwrite confirmation message
args.extend(['--yes'])
args.extend(['--output', no_quote(output)])
To output the decrypted data to a variable use decrypt instead of decrypt_file, as shown here in the "Decrypt a string" paragraph.
So the original code:
status = gpg.decrypt_file(input_file, passphrase='my_passphrase', output='my_output_file')
is substituted by:
decrypted_data = gpg.decrypt(input_file.read(), passphrase='my_passphrase')
# decrypted_data.data contains the data
decrypted_stream = io.BytesIO(decrypted_data.data)
# this is py3, in py2 BytesIO is imported from BytesIO
As an example for the specific use case for csv data, building on this SO post, you could then do:
my_df = pandas.read_csv(decrypted_stream)

Extract and get value from a text file python

I have executed ssh commands in remote machine using paramiko library and written output to text file. Now, I want to extract few values from a text file. The output of a text file looks as pasted below
b'\nMS Administrator\n(C) Copyright 2006-2016 LP\n\n[MODE]> SHOW INFO\n\n\nMode: \nTrusted Certificates\n1 Details\n------------\n\tDeveloper ID: MS-00c1\n\tTester ID: ms-00B1\n\tValid from: 2030-01-29T06:51:15Z\n\tValid until: 2030-01-30T06:51:15Z\n\t
how do i get the value of Developer ID and Tester ID. The file is huge.
As suggested by users I have written the snippet below.
file = open("Output.txt").readlines()
for lines in file:
word = re.findall('Developer\sID:\s(.*)\n', lines)[0]
print(word)
I see the error IndexError: list index out of range
If i remove the index. I see empty output
file = open("Output.txt").readlines()
developer_id=""
for lines in file:
if 'Developer ID' in line:
developer_id = line.split(":")[-1].strip()
print developer_id
You can use Regular expressions
text = """\nMS Administrator\n(C) Copyright 2006-2016 LP\n\n[MODE]> SHOW INFO\n\n\nMode: \nTrusted Certificates\n1 Details\n------------\n\tDeveloper ID: MS-00c1\n\tTester ID: ms-00B1\n\tValid from: 2030-01-29T06:51:15Z\n\tValid until: 2030-01-30T06:51:15Z\n\t"""
import re
developerID = re.match("Developer ID:(.+)\\n", text).group(0)
testerID = re.match("Tester ID:(.+)\\n", text).group(0)
If your output is consistent in format, you can use something as easy as line.split():
developer_id = line.split('\n')[11].lstrip()
tester_id = line.split('\n')[12].lstrip()
Again, this assumes that every line is using the same formatting. Otherwise, use regex as suggested above.

When working with a named pipe is there a way to do something like readlines()

Overall Goal: I am trying to read some progress data from a python exe to update the progress of the exe in another application
I have a python exe that is going to do some stuff, I want to be able to communicate the progress to another program. Based on several other Q&A here I have been able to have my running application send progress data to a named pipe using the following code
import win32pipe
import win32file
import glob
test_files = glob.glob('J:\\someDirectory\\*.htm')
# test_files has two items a.htm and b.htm
p = win32pipe.CreateNamedPipe(r'\\.\pipe\wfsr_pipe',
win32pipe.PIPE_ACCESS_DUPLEX,
win32pipe.PIPE_TYPE_MESSAGE | win32pipe.PIPE_WAIT,
1,65536,65536,300,None)
# the following line is the server-side function for accepting a connection
# see the following SO question and answer
""" http://stackoverflow.com/questions/1749001/named-pipes-between-c-sharp-and-python
"""
win32pipe.ConnectNamedPipe(p, None)
for each in testFiles:
win32file.WriteFile(p,each + '\n')
#send final message
win32file.WriteFile(p,'Process Complete')
# close the connection
p.close()
In short the example code writes the path of the each file that was globbed to the NamedPipe - this is useful and can be easily extended to more logging type events. However, the problem is trying to figure out how to read the content of the named pipe without knowing the size of each possible message. For example the first file could be named J:\someDirectory\a.htm, but the second could have 300 characters in the name.
So far the code I am using to read the contents of the pipe requires that I specify a buffer size
First establish the connection
file_handle = win32file.CreateFile("\\\\.\\pipe\\wfsr_pipe",
win32file.GENERIC_READ | win32file.GENERIC_WRITE,
0, None,
win32file.OPEN_EXISTING,
0, None)
and then I have been playing around with reading from the file
data = win32file.ReadFile(file_handle,128)
This generally works but I really want to read until I hit a newline character, do something with the content between when I started reading and the newline character and then repeat the process until I get to a line that has Process Complete in the line
I have been struggling with how to read only until I find a newline character (\n). I basically want to read the file by lines and based on the content of the line do something (either display the line or shift the application focus).
Based on the suggestion provided by #meuh I am updating this because I think there is a dearth of examples, guidance in how to use pipes
My server code
import win32pipe
import win32file
import glob
import os
p = win32pipe.CreateNamedPipe(r'\\.\pipe\wfsr_pipe',
win32pipe.PIPE_ACCESS_DUPLEX,
win32pipe.PIPE_TYPE_MESSAGE | win32pipe.PIPE_WAIT,
1,65536,65536,300,None)
# the following line is the server-side function for accepting a connection
# see the following SO question and answer
""" http://stackoverflow.com/questions/1749001/named-pipes-between-c-sharp-and-python
"""
win32pipe.ConnectNamedPipe(p, None)
for file_id in glob.glob('J:\\level1\\level2\\level3\\*'):
for filer_id in glob.glob(file_id + os.sep + '*'):
win32file.WriteFile(p,filer_id)
#send final message
win32file.WriteFile(p,'Process Complete')
# close the connection
p.close() #still not sure if this should be here, I need more testing
# I think the client can close p
The Client code
import win32pipe
import win32file
file_handle = win32file.CreateFile("\\\\.\\pipe\\wfsr_pipe",
win32file.GENERIC_READ |
win32file.GENERIC_WRITE,
0, None,win32file.OPEN_EXISTING,0, None)
# this is the key, setting readmode to MESSAGE
win32pipe.SetNamedPipeHandleState(file_handle,
win32pipe.PIPE_READMODE_MESSAGE, None, None)
# for testing purposes I am just going to write the messages to a file
out_ref = open('e:\\testpipe.txt','w')
dstring = '' # need some way to know that the messages are complete
while dstring != 'Process Complete':
# setting the blocksize at 4096 to make sure it can handle any message I
# might anticipate
data = win32file.ReadFile(file_handle,4096)
# data is a tuple, the first position seems to always be 0 but need to find
# the docs to help understand what determines the value, the second is the
# message
dstring = data[1]
out_ref.write(dstring + '\n')
out_ref.close() # got here so close my testfile
file_handle.close() # close the file_handle
I don't have windows but looking through the api it seems you should convert
your client to message mode by adding after the CreateFile() the call:
win32pipe.SetNamedPipeHandleState(file_handle,
win32pipe.PIPE_READMODE_MESSAGE, None, None)
then each sufficiently long read will return a single message, ie what the other wrote in a single write. You already set PIPE_TYPE_MESSAGE when you created the pipe.
You could simply use an implementation of io.IOBase that would wrap the NamedPipe.
class PipeIO(io.RawIOBase):
def __init__(self, handle):
self.handle = handle
def read(self, n):
if (n == 0): return ""
elif n == -1: return self.readall()
data = win32file.ReadFile(self.file_handle,n)
return data
def readinto(self, b):
data = self.read(len(b))
for i in range(len(data)):
b[i] = data[i]
return len(data)
def readall(self):
data = ""
while True:
chunk = win32file.ReadFile(self.file_handle,10240)
if (len(chunk) == 0): return data
data += chunk
BEWARE : untested, but it should work after fixing the eventual typos.
You could then do:
with PipeIO(file_handle) as fd:
for line in fd:
# process a line
You could use the msvcrt module and open to turn the pipe into a file object.
Sending code
import win32pipe
import os
import msvcrt
from io import open
pipe = win32pipe.CreateNamedPipe(r'\\.\pipe\wfsr_pipe',
win32pipe.PIPE_ACCESS_OUTBOUND,
win32pipe.PIPE_TYPE_MESSAGE | win32pipe.PIPE_WAIT,
1,65536,65536,300,None)
# wait for another process to connect
win32pipe.ConnectNamedPipe(pipe, None)
# get a file descriptor to write to
write_fd = msvcrt.open_osfhandle(pipe, os.O_WRONLY)
with open(write_fd, "w") as writer:
# now we have a file object that we can write to in a standard way
for i in range(0, 10):
# create "a\n" in the first iteration, "bb\n" in the second and so on
text = chr(ord("a") + i) * (i + 1) + "\n"
writer.write(text)
Receiving code
import win32file
import os
import msvcrt
from io import open
handle = win32file.CreateFile(r"\\.\pipe\wfsr_pipe",
win32file.GENERIC_READ,
0, None,
win32file.OPEN_EXISTING,
0, None)
read_fd = msvcrt.open_osfhandle(handle, os.O_RDONLY)
with open(read_fd, "r") as reader:
# now we have a file object with the readlines and other file api methods
lines = reader.readlines()
print(lines)
Some notes.
I've only tested this with python 3.4, but I believe you may be using python 2.x.
Python seems to get weird if you try to close both the file object and the pipe..., so I've only used the file object (by using the with block)
I've only created the file objects to read on one end and write on the other. You can of course make the file objects duplex by
Creating the file descriptors (read_fd and write_fd) with the os.O_RDWR flag
Creating the file objects in in "r+" mode rather than "r" or "w"
Going back to creating the pipe with the win32pipe.PIPE_ACCESS_DUPLEX flag
Going back to creating the file handle object with the win32file.GENERIC_READ | win32file.GENERIC_WRITE flags.

Yajl parse error with githubarchive.org JSON stream in Python

I'm trying to parse a GitHub archive file with yajl-py. I believe the basic format of the file is a stream of JSON objects, so the file itself is not valid JSON, but it contains objects which are.
To test this out, I installed yajl-py and then used their example parser (from https://github.com/pykler/yajl-py/blob/master/examples/yajl_py_example.py) to try to parse a file:
python yajl_py_example.py < 2012-03-12-0.json
where 2012-03-12-0.json is one of the GitHub archive files that's been decompressed.
It appears this sort of thing should work from their reference implementation in Ruby. Do the Python packages not handle JSON streams?
By the way, here's the error I get:
yajl.yajl_common.YajlError: parse error: trailing garbage
9478bbc3","type":"PushEvent"}{"repository":{"url":"https://g
(right here) ------^
You need to use a stream parser to read the data. Yajl supports stream parsing, which allows you to read one object at a time from a file/stream. Having said that, it doesn't look like Python has working bindings for Yajl..
py-yajl has iterload commented out, not sure why: https://github.com/rtyler/py-yajl/commit/a618f66005e9798af848c15d9aa35c60331e6687#L1R264
Not a Python solution, but you can use Ruby bindings to read in the data and emit it in a format you need:
# gem install yajl-ruby
require 'open-uri'
require 'zlib'
require 'yajl'
gz = open('http://data.githubarchive.org/2012-03-11-12.json.gz')
js = Zlib::GzipReader.new(gz).read
Yajl::Parser.parse(js) do |event|
print event
end
The example does not enable any of the Yajl extra features, for what you are looking for you need to enable allow_multiple_values flag on the parser. Here is what you need to modify to the basic example to have it parse your file.
--- a/examples/yajl_py_example.py
+++ b/examples/yajl_py_example.py
## -37,6 +37,7 ## class ContentHandler(YajlContentHandler):
def main(args):
parser = YajlParser(ContentHandler())
+ parser.allow_multiple_values = True
if args:
for fn in args:
f = open(fn)
Yajl-Py is a thin wrapper around yajl, so you can use all the features Yajl provides. Here are all the flags that yajl provides that you can enable:
yajl_allow_comments
yajl_dont_validate_strings
yajl_allow_trailing_garbage
yajl_allow_multiple_values
yajl_allow_partial_values
To turn these on in yajl-py you do the following:
parser = YajlParser(ContentHandler())
# enabling these features, note that to make it more pythonic, the prefix `yajl_` was removed
parser.allow_comments = True
parser.dont_validate_strings = True
parser.allow_trailing_garbage = True
parser.allow_multiple_values = True
parser.allow_partial_values = True
# then go ahead and parse
parser.parse()
I know this has been answered, but I prefer the following approach and it does not use any packages. The github dictionary is on a single line for some reason, so you cannot assume a single dictionary per line. This looks like:
{"json-key":"json-val", "sub-dict":{"sub-key":"sub-val"}}{"json-key2":"json-val2", "sub-dict2":{"sub-key2":"sub-val2"}}
I decided to create a function which fetches one dictionary at a time. It returns json as a string.
def read_next_dictionary(f):
depth = 0
json_str = ""
while True:
c = f.read(1)
if not c:
break #EOF
json_str += str(c)
if c == '{':
depth += 1
elif c == '}':
depth -= 1
if depth == 0:
break
return json_str
I used this function to loop through the Github archive with a while loop:
arr_of_dicts = []
f = open(file_path)
while True:
json_as_str = read_next_dictionary(f)
try:
json_dict = json.loads(json_as_str)
arr_of_dicts.append(json_dict)
except:
break # exception on loading json to end loop
pprint.pprint(arr_of_dicts)
This works on the dataset post here: http://www.githubarchive.org/ (after gunzip)
As a workaround you can split the GitHub Archive files into lines and then parse each line as json:
import json
with open('2013-05-31-10.json') as f:
lines = f.read().splitlines()
for line in lines:
rec = json.loads(line)
...

Categories