Issue using replace function in python - python

I am trying to encrypt a word and than replace it in the given text for that i am using replace() in python. This method is able to replace the word but keeps the original one in the text also. Below is my code
import subprocess
import bz2
import base64
from subprocess import Popen, PIPE
cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/cloudera/xxx.dat"], stdout=subprocess.PIPE)
for line in cat.stdout:
code = line.split('|')[0]
if (code == "ID"):
name = line.split('|')[5]
address = line.split('|')[11]
ciphername = base64.b64encode(bz2.compress(name))
cipheraddr = base64.b64encode(bz2.compress(address))
line.replace(name,ciphername).replace(address,cipheraddr)
print line
Sample:
ID|1|ZXD0629|ZXD0629||HODJON||11383129|M|||221 B POLLARD RD��KAsODK�TBN�37764|||||||629Z800060|480837
Output:
'ID|1|ZXD0629|ZXD0629||QlpoOTFBWSZTWbk9uLgAAAIGCAbRiAACACAAMQZMQQaMItAUVNzxdyRThQkLk9uLgA==||11383129|M|||QlpoOTFBWSZTWT0tjHQAAAQeCEAALeAkDdQAAgAgADFNMjExMQpo0ZqBmowcuKOA3JhB1VMGcoxTGvi7kinChIHpbGOg|||||||QlpoOTFBWSZTWc5EbhIAAAQKAFNgABAgACEpppkIYBoRvMsvi7kinChIZyI3CQA=|480837\n'
ID|1|ZXD0629|ZXD0629||HODJON||11383129|M|||221 B POLLARD RD��KAsODK�TBN�37764|||||||629Z800060|480837
Expected Output:
ID|1|ZXD0629|ZXD0629||QlpoOTFBWSZTWbk9uLgAAAIGCAbRiAACACAAMQZMQQaMItAUVNzxdyRThQkLk9uLgA==||11383129|M|||QlpoOTFBWSZTWT0tjHQAAAQeCEAALeAkDdQAAgAgADFNMjExMQpo0ZqBmowcuKOA3JhB1VMGcoxTGvi7kinChIHpbGOg|||||||QlpoOTFBWSZTWc5EbhIAAAQKAFNgABAgACEpppkIYBoRvMsvi7kinChIZyI3CQA=|480837\n
I don't need the original text without encryption i only need the encrypted one in my text. I have huge records so i cannot post entire sample here that's why i have posted a small sample. I don't know this issue is because of replace() or some mistake i did while implementing. Please help

Wheb calling str.replace() you don't change the original string value, the replace() function returns new value, so here you need to rewrite your original string with the replaced one:
line = line.replace(name,ciphername).replace(address,cipheraddr)
print line

Related

Extract only additions from diff in python

I am trying to solve a problem:
I receive auto-generated email from government with no tags in HTML. It's one table nested upon another. An abomination of a template. I get it every few days and I want to extract some fields from it. My idea was this
Use HTML in the email as template. Remove all fields that change with every mail like Name of my client, their Unique ID and issue explained in the mail.
Use this html template with missing fields and diff it with new emails. That will give me all the new info in one shot without having to parse this email.
Problem is, I can't find any way of loading only these additions. I am trying to use difflib in python and it returns byte streams of additions and subtractions in each line that I am not able to process properly. I want to find a way to only return the additions and nothing else. I am open to using other libraries or methods. I do not want to write a huge regex with tons of html.
When I got the stdout from using Popen calling diff it also returned bytes.
You can convert the bytes to chars, then continue with your processing.
You could do something similar to what I do below to convert your bytes to a string
The below calls diff on two files and prints only the lines beginning with the '>' symbol (new in the rhs file):
#! /usr/env python
import os
import sys, subprocess
file1 = 'test1'
file2 = 'test2'
if len(sys.argv)==3:
file1=sys.argv[1]
file2=sys.argv[2]
if not os.access(file1,os.R_OK):
print(f'Unable to read: \'{file1}\'')
sys.exit(1)
if not os.access(file2,os.R_OK):
print(f'Unable to read: \'{file2}\'')
sys.exit(1)
argv = ['diff',file1,file2]
runproc = subprocess.Popen(args=argv, stdout=subprocess.PIPE)
out, err = runproc.communicate()
outstr=''
for c in out:
outstr+=chr(c)
for line in outstr.split('\n'):
if len(line)==0:
continue
if line[0]=='>':
print(line)

PwnTools recv() on output that expects input directly after

Hi I have a problem that I cannot seem to find any solution for.
(Maybe i'm just horrible at phrasing searches correctly in english)
I'm trying to execute a binary from python using pwntools and reading its output completely before sending some input myself.
The output from my binary is as follows:
Testmessage1
Testmessage2
Enter input: <binary expects me to input stuff here>
Where I would like to read the first line, the second line and the output part of the third line (with ':' being the last character).
The third line of the output does not contain a newline at the end and expects the user to make an input directly. However, I'm not able to read the output contents that the third line starts with, no matter what I try.
My current way of trying to achieve this:
from pwn import *
io = process("./testbin")
print io.recvline()
print io.recvline()
print io.recvuntil(":", timeout=1) # this get's stuck if I dont use a timeout
...
# maybe sending data here
# io.send(....)
io.close()
Do I missunderstand something about stdin and stdout? Is "Enter input:" of the third line not part of the output that I should be able to receive before making an input?
Thanks in advance
I finally figured it out.
I got the hint I needed from
https://github.com/zachriggle/pwntools-glibc-buffering/blob/master/demo.py
It seems that Ubuntu is doing lots of buffering on its own.
When manually making sure that pwnTools uses a pseudoterminal for stdin and stdout it works!
import * from pwn
pty = process.PTY
p = process(stdin=pty, stdout=pty)
You can use the clean function which is more reliable and which can be used for remote connections: https://docs.pwntools.com/en/dev/tubes.html#pwnlib.tubes.tube.tube.clean
For example:
def start():
p = remote("0.0.0.0", 4000)
return p
io = start()
io.send(b"YYYY")
io.clean()
io.send(b"ZZZ")

search string in text file and capture following n characters

I'm using subprocess.Popen and stdout to write the output of a curl from Backendless (BaaS).
What's written to the output file is a long single line of data, separated by commas. Here's a small portion of it.
{"APIEndpoint":"asdfaasdfa","created":1429550024000,"updated":null,"objectId":"EE51537D-A9AC-721C-FF33-F4B258931E00"...}
The value I need from this output file is the 37-character string following "objectID":". I've read many similar questions but haven't been able to find a solution to this specific one. I've tried something like:
objectid = 37
searchfile open('backendless.txt', 'r')
for line in searchfile:
if "\"objectId\":\"" in line:
print(right[:objectidd])
which returns nothing. Please correct me if I'm using line incorrectly. I'm very new to this. Also, is there a way to achieve the same result without saving it to the text file first and instead performing the curl with PIPE and communicate?
I'm using Python 3.4. Thank you.
EDIT/SOLUTION:
from subprocess import *
baascurl = Popen(['curl', '-H', appid, '-H', secretkey, '-H', apptype, '-H', contenttype, '-X', 'GET', '-v', 'https://api.backendless.com/v1/data/Latency/last'], stdout=PIPE).communicate()[0]
objidbytes = baascurl.decode(encoding='utf-8')
objid = json.loads(objidbytes)["objectId"]
Your data looks pretty much like JSON, so maybe you can use Python's json module:
import json
objid = json.loads(datastring)["objectId"]
If you want to stay on the text level, the right tool for this job are regular expressions. Look into Python's re module.
import re
m = re.search(r'"objectId":"([^"]+)"', datastring, re.IGNORECASE)
if m:
objid = m.group(1)

How to read next logical line in python

I would like to read the next logical line from a file into python, where logical means "according to the syntax of python".
I have written a small command which reads a set of statements from a file, and then prints out what you would get if you typed the statements into a python shell, complete with prompts and return values. Simple enough -- read each line, then eval. Which works just fine, until you hit a multi-line string.
I'm trying to avoid doing my own lexical analysis.
As a simple example, say I have a file containing
2 + 2
I want to print
>>> 2 + 2
4
and if I have a file with
"""Hello
World"""
I want to print
>>>> """Hello
...World"""
'Hello\nWorld'
The first of these is trivial -- read a line, eval, print. But then I need special support for comment lines. And now triple quotes. And so on.
You may want to take a look at the InteractiveInterpreter class from the code module .
The runsource() method shows how to deal with incomplete input.
Okay, so resi had the correct idea. Here is my trivial code which does the job.
#!/usr/bin/python
import sys
import code
class Shell(code.InteractiveConsole):
def write(data):
print(data)
cons = Shell()
file_contents = sys.stdin
prompt = ">>> "
for line in file_contents:
print prompt + line,
if cons.push(line.strip()):
prompt = "... "
else:
prompt = ">>> "

Python not splitting CRLF correctly

I'm writing a script to convert very simple function documentation to XML in python. The format I'm using would convert:
date_time_of(date) Returns the time part of the indicated date-time value, setting the date part to 0.
to:
<item name="date_time_of">
<arg>(date)</arg>
<help> Returns the time part of the indicated date-time value, setting the date part to 0.</help>
</item>
So far it works great (the XML I posted above was generated from the program) but the problem is that it should be working with several lines of documentation pasted, but it only works for the first line pasted into the application. I checked the pasted documentation in Notepad++ and the lines did indeed have CRLF at the end, so what is my problem?
Here is my code:
mainText = input("Enter your text to convert:\r\n")
try:
for line in mainText.split('\r\n'):
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
Any idea of what the issue is here?
Thanks.
input() only reads one line.
Try this. Enter a blank line to stop collecting lines.
lines = []
while True:
line = input('line: ')
if line:
lines.append(line)
else:
break
print(lines)
The best way to handle reading lines from standard input (the console) is to iterate over the sys.stdin object. Rewritten to do this, your code would look something like this:
from sys import stdin
try:
for line in stdin:
name = line.split("(")[0]
arg = line.split("(")[1]
arg = arg.split(")")[0]
hlp = line.split(")",1)[1]
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
except:
print("Error!")
That said, It's worth noting that your parsing code could be significantly simplified with a little help from regular expressions. Here's an example:
import re, sys
for line in sys.stdin:
result = re.match(r"(.*?)\((.*?)\)(.*)", line)
if result:
name = result.group(1)
arg = result.group(2).split(",")
hlp = result.group(3)
print('<item name="%s">\r\n<arg>(%s)</arg>\r\n<help>%s</help>\r\n</item>\r\n' % (name,arg,hlp))
else:
print "There was an error parsing this line: '%s'" % line
I hope this helps you simplify your code.
Patrick Moriarty,
It seems to me that you didn't particularly mention the console and that your main concern is to pass several lines together at one time to be treated. There's only one manner in which I could reproduce your problem: it is, executing the program in IDLE, to copy manually several lines from a file and pasting them to raw_input()
Trying to understand your problem led me to the following facts:
when data is copied from a file and pasted to raw_input() , the newlines \r\n are transformed into \n , so the string returned by raw_input() has no more \r\n . Hence no split('\r\n') is possible on this string
pasting in a Notepad++ window a data containing isolated \r and \n characters, and activating display of the special characters, it appears CR LF symbols at all the extremities of the lines, even at the places where there are \r and \n alone. Hence, using Notepad++ to verify the nature of the newlines leads to erroneous conclusion
.
The first fact is the cause of your problem. I ignore the prior reason of this transformation affecting data copied from a file and passed to raw_input() , that's why I posted a question on stackoverflow:
Strange vanishing of CR in strings coming from a copy of a file's content passed to raw_input()
The second fact is responsible of your confusion and despair. Not a chance....
.
So, what to do to solve your problem ?
Here's a code that reproduce this problem. Note the modified algorithm in it, replacing your repeated splits applied to each line.
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
print "open 'funcdoc.txt' to manually copy its content, and paste it on the following line"
mainText = raw_input("Enter your text to convert:\n")
print "OK, copy-paste of file 'funcdoc.txt' ' s content has been performed"
print "\nrepr(mainText)==",repr(mainText)
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
Here's the solution mentioned by delnan : « read from the source instead of having a human copy and paste it. »
It works with your split('\r\n') :
ch = "date_time_of(date) Returns the time part.\r\n"+\
"divmod(a, b) Returns quotient and remainder.\r\n"+\
"enumerate(sequence[, start=0]) Returns an enumerate object.\r\n"+\
"A\rB\nC"
with open('funcdoc.txt','wb') as f:
f.write(ch)
print "Having just recorded the following string in a file named 'funcdoc.txt' :\n"+repr(ch)
#####################################
with open('funcdoc.txt','rb') as f:
mainText = f.read()
print "\nfile 'funcdoc.txt' has just been opened and its content copied and put to mainText"
print "\nrepr(mainText)==",repr(mainText)
print
try:
for line in mainText.split('\r\n'):
name,_,arghelp = line.partition("(")
arg,_,hlp = arghelp.partition(") ")
print('<item name="%s">\n<arg>(%s)</arg>\n<help>%s</help>\n</item>\n' % (name,arg,hlp))
except:
print("Error!")
.
And finally, here's the solution of Python to process the altered human copy: providing the splitlines() function that treat all kind of newlines (\r or \n or \r\n) as splitters. So replace
for line in mainText.split('\r\n'):
by
for line in mainText.splitlines():

Categories