I've got the following line in my .procmailrc on SMTP server:
BODY=`formail -I ""`
Later I echo this body to a local file:
echo "$BODY" >> $HOME/$FILENAME; \
I've also tried prinf (but I got the same effect):
printf "$BODY" >> $HOME/$FILENAME; \
When I read this file I can see that encoding has been change. Here's what I got:
Administrator System=C3=B3w
while it should be (in Polish):
Administrator Systemów
How to decode/encode the body either directly in .procmailrc or later (bash/python) to get the right string?
Another line in my .procmailrc works properly but it needs additional pipe with perl encoder:
SUBJECT=`formail -xSubject: | tr -d '\n' | sed -e 's/^ //' | /usr/bin/perl -MEncode -ne 'print encode ("utf8",decode ("MIME-Header",$_ )) '`
SUBJECT contains UTF8 characters and everything looks OK. Maybe there's a way to use a similar solution with the body of the mail?
OK.
I finally got everything up and running. Here's what I did:
First the .procmailrc file:
VERBOSE=yes
LOGFILE=$HOME/procmail.log
:0f
* ^From.*(some_address#somedomain.com)
| $HOME/python_script.py
Now to the python_script.py:
#!/usr/bin/python
from email.parser import Parser
import sys
temp_file = open("/home/(user)/file.txt","w")
temp_file.write("START\n")
if not message.is_multipart():
temp_file.write(message.get_payload(decode=True))
else:
for part in message.get_payload():
if part.get_content_type() == 'text/plain':
temp_file.write(part.get_payload(decode=True))
temp_file.close()
The most difficult part to debug was the .procmailrc recipe, where I had to test many options for :0, :0f, :0fbW etc... and finally found the one that suits best.
The next problematic step was the $BODY part decoded directly in .procmailrc. I figured out the solution though, by getting rid of all the stuff and moving everything to Python script. Just as tripleee suggested.
It is not changed, but you are zapping the headers so that the correct Content-Type: header is no longer present (you should also keep Mime-Version: and any other standard Content-* headers).
You should see, by examining the source of the message in your mail client, that Procmail or Bash have actually not changed anything. The text you receive is in fact literally Administrator System=C3=B3w but the MIME headers inform your email client that this is Content-Transfer-Encoding: quoted-printable and Content-type: text/plain; charset="utf-8" and so it knows how to decode and display this correctly.
If you want just the payload, you will need to decode it yourself, but in order to do that, you need this information from the MIME headers, so you should not kill them before you have handled the message (if at all). Something like this, perhaps:
from email.parser import Parser
import sys
message = Parser().parse(sys.stdin)
if message['content-type'].lower().startswith('text/'):
print(message.get_payload(decode=True))
else:
raise DieScreamingInAnguish('aaaargh!') # pseudo-pseudocode
This is extremely simplistic in that it assumes (like your current, even more broken solution) that the message contains a single, textual part. Extending it to multipart messages is not technically hard, but how exactly you do that depends on what sort of multiparts you expect to receive, and what you want to do with the payload(s).
Like in your previous question I would like to suggest that you move more, or all, of your email manipulation into Python, if you are going to be using it anyway. Procmail has no explicit MIME support so you would have to reinvent all of that in Procmail, which is neither simple nor particularly fruitful.
I think it could be your echo doesn't return correct unicode to write to your file in the first place, here are 2 of many solutions that may help you:
to echo with escape character:
echo -e "$BODY" >> $HOME/$FILENAME; \
or, use iconv or similar to encode your file to utf-8, assuming you have iconv in linux
iconv -t UTF-8 original.txt > encoded_result.txt
Related
My problem: I try to send logs from python3 project via logging module to fluentd.
log = '{"#timestamp":"2020-06-18T11:52:37.391","severity":"INFO", "message":"Processing request started"}'
logging.error(json.dumps(log))
At fluentd I get such error:
pattern not matched data="<14>{"#timestamp":"2020-06-18T11:52:37.391","severity":"INFO", "message":"Processing request started"}\x00"
I see strange symbols, <14> and \x00. When I try to send same string via bash console --everything works well
echo -n '{"#timestamp":"2020-06-18T11:52:37.391","severity":"INFO", "message":"Processing request started"}' > /dev/udp/HOST/PORT
Looks like there are some problems with encoding, but I can't recognise how to fix this error in python.
I am using Python 3.x on Linux (Ubuntu) to read piped terminal content and save it to a database. So far everything works great! But there is a thing i dont ready understand - it seems to skip some of the newlines.
This is how i use my script (simple example) :
sudo ufw status verbose | python3 myscript.py
This is how the output direkt in terminal looks like (without pipe):
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip
This is how it looks like after stdin reads it:
Status: active
Logging: on (low) Default: deny (incoming), allow (outgoing), disabled (routed) New profiles: skip
This is how my function to read (from stdin) looks like :
if not sys.stdin.isatty():
ldata = ""
for line in sys.stdin:
ldata += line + "\n"
Like above shown, Python seems to skip some of the "\n". Call me pathetic, but this kind of behavior bothers me. In case of very long strings/contents this behavior makes texts almost not readable anymore.
thx in advance!
I'm trying to make some functions in python so that I can connect to a linux terminal and do stuff (like in this case, create a file). The code I have, works partially. The only thing that doesn't work is if you want to do something after you have entered the code. Like for instance you create the file and then want to navigate somewhere else (cd /tmp) for instance. Instead of doing the next command, it will just add to the file created.
def create_file(self, name, contents, location):
try:
log.info("Creating a file...")
self.device.execute("mkdir -p {}".format(location))
self.cd_path(location)
self.device.sendline("cat > {}".format(name))
self.device.sendline("{}".format(contents))
self.device.sendline("EOF") # send the CTRL + D command to save and exit I tried here with ^D as well
except:
log.info("Failed to create the file!")
The contents of the file is:
cat test.txt
#!/bin/bash
echo "Fail Method Requested"
exit 1
EOF
ls -d /tmp/asdasd
The order of commands executed is:
execute.create_file(test.txt, the_message, the_location)
execute.check_path("/tmp/adsasd") #this function just checks with ls -d if the directory exists.
I have tried with sendline the following combinations:
^D, EOF, <<EOF
I don't really understand how I could make this happen. I just want to create a file with a specific message. (When researching on how to do this with VI I got the same problem, but there the command I needed was the one for ESC)
If anyone could help with some input that would be great!!
Edit: As Rob mentioned below, sending the character "\x04" actually works. For anyone else having this issue, you can also consult this chart for other combinations if needed:
http://donsnotes.com/tech/charsets/ascii.html
You probably need to send the EOF character, which is typically CONTROL-D, not the three characters E, O, and F.
self.device.sendline("\x04")
http://wiki.bash-hackers.org/syntax/redirection#here_documents
Here docs allow you to use any file input termination string you like to represent end of file ( such as the literal EOF you're attempting to use now). Quoting that string tells the shell not to interpret expansions inside the heredoc content, ensuring that said content is treated as literal.
Using pipes.quote() here ensures that filenames with literal quotes, $s, spaces, or other surprising characters won't break your script. (Of course, you'll need to import pipes; on Python 3, by contrast, this has moved to shlex.quote()).
self.device.sendline("cat > {} <<'EOF'".format(pipes.quote(name)))
Then you can write the EOF as is, having told bash to interpret it as the end of file input.
a way of checking if a file has been modified, is calculating and storing a hash (or checksum) for the file. Then at any point the hash can be re-calculated and compared against the stored value.
I'm wondering if there is a way to store the hash of a file in the file itself? I'm thinking text files.
The algorithm to calculate the hash should be iterative and consider that the hash will be added to the file the hash is being calculated for... makes sense? Anything available?
Thanks!
edit:
https://security.stackexchange.com/questions/3851/can-a-file-contain-its-md5sum-inside-it
from Crypto.Hash import HMAC
secret_key = "Don't tell anyone"
h = HMAC.new(secret_key)
text = "whatever you want in the file"
## or: text = open("your_file_without_hash_yet").read()
h.update(text)
with open("file_with_hash") as fh:
fh.write(text)
fh.write(h.hexdigest())
Now, as some people tried to point out, though they seemed confused - you need to remember that this file has the hash on the end of it and that the hash is itself not part of what gets hashed. So when you want to check the file, you would do something along the lines of:
end_len = len(h.hex_digest())
all_text = open("file_with_hash").read()
text, expected_hmac = all_text[:end_len], all_text[end_len:]
h = HMAC.new(secret_key)
h.update(text)
if h.hexdigest() != expected_hmac:
raise "Somebody messed with your file!"
It should be clear though that this alone doesn't ensure your file hasn't been changed; the typical use case is to encrypt your file, but take the hash of the plaintext. That way, if someone changes the hash (at the end of the file) or tries changing any of the characters in the message (the encrypted portion), things will mismatch and you will know something was changed.
A malicious actor won't be able to change the file AND fix the hash to match because they would need to change some data, and then rehash everything with your private key. So long as no one knows your private key, they won't know how to recreate the correct hash.
This is an interesting question. You can do it if you adopt a proper convention for hashing and verifying the integrity of the files. Suppose you have this file, namely, main.py:
#!/usr/bin/env python
# encoding: utf-8
print "hello world"
Now, you could append an SHA-1 hash to the python file as a comment:
(printf '#'; cat main.py | sha1sum) >> main.py
Updated main.py:
#!/usr/bin/env python
# encoding: utf-8
print "hello world"
#30e3b19d4815ff5b5eca3a754d438dceab9e8814 -
Hence, to verify if the file was modified you can do this in Bash:
if [ "$(printf '#';head -n-1 main.py | sha1sum)" == "$(tail -n1 main.py)" ]
then
echo "Unmodified"
else
echo "Modified"
fi
Of course, someone could try to fool you by changing the hash string manually. In order to stop these bad guys, you can improve the system by tempering the file with a secret string before adding the hash to the last line.
Improved version
Add the hash in the last line including your secret string:
(printf '#';cat main.py;echo 'MyUltraSecretTemperString12345') | sha1sum >> main.py
For checking if the file was modified:
if [ "$(printf '#';(head -n-1 main.py; echo 'MyUltraSecretTemperString12345') | sha1sum)" == "$(tail -n1 main.py)" ]
then
echo "Unmodified"
else
echo "Modified"
fi
Using this improved version, the bad guys only can fool you if they find your ultra secret key first.
EDIT: This is a rough implementation of the keyed-hash message authentication code (HMAC).
Well although it looks like a strange idea, it could be an application of a little used but very powerful property of windows NTFS file system: the File Streams.
It allows to add many streams to a file without changing the content of the default stream. For example:
echo foo > foo.text
echo bar > foo.text:alt
type foo.text
=> foo
more < foo.text:alt
=> bar
But when listing the directory, you can only see one single file: foo.txt
So in your use case, you could write the hash of main stream in stream named hash, and later compare the content of the hash stream with the hash of the main stream.
Just a remark: for a reason I do not know, type foo.text:alt generates the following error:
"The filename, directory name, or volume label syntax is incorrect."
that's why my example uses more < as recommended in the Using streams page on MSDN
So assuming you have a myhash function that gives the hash for a file (you can easily build one by using the hashlib module):
def myhash(filename):
# compute the hash of the file
...
return hash_string
You can do:
def store_hash(filename):
hash_string = myhash(filename)
with open(filename + ":hash") as fd:
fd.write(hash_string)
def compare_hash(filename):
hash_string = myhash(filename)
with open(filename + ":hash") as fd:
orig = fd.read()
return (hash_string == orig)
After a few days of dwelling over stackoverflow and python 2.7 doc, I have come to no conclusion about this.
Basically I'm running a python script on a windows server that must have as input a block of text. This block of text (unfortunately) has to be passed by a pipe. Something like:
PS > [something_that_outputs_text] | python .\my_script.py
So the problem is:
The server uses cp1252 encoding and I really cannot change it due to administrative regulations and whatnot. And when I pipe the text to my python script, when I read it, it comes already with ? whereas characters like \xe1 should be.
What I have done so far:
Tested with UTF-8. Yep, chcp 65001 and $OutputEncoding = [Console]::OutputEncoding "solve it", as in python gets the text perfectly and then I can decode it to unicode etc. But apparently they don't let me do it on the server /sadface.
A little script to test what the hell is happening:
import codecs
import sys
def main(argv=None):
if argv is None:
argv = sys.argv
if len(argv)>1:
for arg in argv[1:]:
print arg.decode('cp1252')
sys.stdin = codecs.getreader('cp1252')(sys.stdin)
text = sys.stdin.read().strip()
print text
return 0
if __name__=="__main__":
sys.exit(main())
Tried it with both the codecs wrapping and without it.
My input & output:
PS > echo "Blá" | python .\testinput.py blé
blé
Bl?
--> So there's no problem with the argument (blé) but the piped text (Blá) is no good :(
I even converted the text string to hex and, yes, it gets flooded with 3f (AKA mr ?), so it's not a problem with the print.
[Also: it's my first question here... feel free to ask any more info about what I did]
EDIT
I don't know if this is relevant or not, but when I do sys.stdin.encoding it yields None
Update: So... I have no problems with cmd. Checked sys.stdin.encoding while running the program on cmd and everything went fine. I think my head just exploded.
How about saving the data into a file and piping it to Python on a CMD session? Invoke Powershell and Python on CMD. Like so,
c:\>powershell -command "c:\genrateDataForPython.ps1 -output c:\data.txt"
c:\>type c:\data.txt | python .\myscript.py
Edit
Another an idea: convert the data into base64 format in Powershell and decode it in Python. Base64 is simple in Powershell, I guess in Python it isn't hard either. Like so,
# Convert some accent chars to base64
$s = [Text.Encoding]::UTF8.GetBytes("éêèë")
[System.Convert]::ToBase64String($s)
# Output:
w6nDqsOow6s=
# Decode:
$d = [System.Convert]::FromBase64String("w6nDqsOow6s=")
[Text.Encoding]::UTF8.GetString($d)
# Output
éêèë