Currently reading from s3 and saving within a dataframe.
Problem image:
S3 objects are read in as bytes however it seems within my string, the byte string is also there.
Unable to decode a string using - example_string.decode().
Another problem from this is trying to find emojis within the text. These are saved as UTF-8 and due to be saved as a byte string within a string, it adds extra \ etc.
I wish just the string with no additional byte string or any combination.
Any help would be appreciated.
bucket_iter = iter(bucket)
while (True) :
next_val = next(bucket_iter)
current_file = (next_val.get()['Body'].read())).decode('utf-8')
split_file = current_file.split(']')
for tweet in split_file:
a = tweet.split(',')
if (len(a) == 10):
a[0] = a[0][2:12]
new_row = {'date':a[0], 'tweet':a[1], 'user':a[2], 'cashtags':a[3],'number_cashtags':a[4],'Hashtags':a[5],'number_hashtags':a[6],'quoted_tweet':a[7],'urs_present':a[8],'spam':a[9]}
df = df.append(new_row, ignore_index=True)
example of a line in s3bucket
["2021-01-06 13:41:48", "Q1 2021 Earnings Estimate for The Walt Disney Company $DIS Issued By Truist Securiti https://t co/l5VSCCCgDF #stocks", "b'AmericanBanking'", "$DIS", "1", "#stocks'", "1", "False", "1", "0"]
Even though this is a string, it will keep the 'b' before the string, even though the item is a string. Just make a small bit of code to only keep what is inside the quotes.
def bytes_to_string(b):
return str(b)[2:-1]
EDIT: you could technically use regexes to do this, but this is a much more readable way of doing it (and shorter)
When the arguments of SubProcess.Run() gets passed with double quotes, it is adding extra character '\' before and after the double quotes and causing the issue on the actual.exe which gets called by a python script.
def RunWithJsonResponse(self, commandList, argumentList):
commandList.extend(argumentList)
process = subprocess.run(commandList, stdout=subprocess.PIPE)
'''process already returns in Json format'''
Response = process.stdout.decode('utf-8')
'''translate to python dictionary'''
ResponseDicts = {}
try:
ResponseDicts = json.loads(Response)
except:
ResponseDicts = json.loads(json.dumps(Response))
print(Response)
commandList.clear()
return ResponseDicts
Example command list send by python script = ['C:\\Users\\srw.exe', 'port=COM3', 'writebytes="09 78 00 00"']
the command received at the.exe end= C:\\Users\\srw.exe port=COM3 writebytes=\"09 78 00 00"\
So I would like to find a way of getting rid-off this extra character '\'on the other end.
I did the following when the third argument is constructed 'writebytes=' + ' \" ' + 09 78 00 00 + ' \" ' ..but I have no luck
Maybe try escaping quotes with \ in the input to subprocess so in commandList in the first place.
I'll be receiving a JSON encoded string from Objective-C, and I am decoding a dummy string (for now) like the code below. My output comes out with character 'u' prefixing each item:
[{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}...
How is JSON adding this Unicode character? What's the best way to remove it?
mail_accounts = []
da = {}
try:
s = '[{"i":"imap.gmail.com","p":"aaaa"},{"i":"imap.aol.com","p":"bbbb"},{"i":"333imap.com","p":"ccccc"},{"i":"444ap.gmail.com","p":"ddddd"},{"i":"555imap.gmail.com","p":"eee"}]'
jdata = json.loads(s)
for d in jdata:
for key, value in d.iteritems():
if key not in da:
da[key] = value
else:
da = {}
da[key] = value
mail_accounts.append(da)
except Exception, err:
sys.stderr.write('Exception Error: %s' % str(err))
print mail_accounts
The u- prefix just means that you have a Unicode string. When you really use the string, it won't appear in your data. Don't be thrown by the printed output.
For example, try this:
print mail_accounts[0]["i"]
You won't see a u.
Everything is cool, man. The 'u' is a good thing, it indicates that the string is of type Unicode in python 2.x.
http://docs.python.org/2/howto/unicode.html#the-unicode-type
The d3 print below is the one you are looking for (which is the combination of dumps and loads) :)
Having:
import json
d = """{"Aa": 1, "BB": "blabla", "cc": "False"}"""
d1 = json.loads(d) # Produces a dictionary out of the given string
d2 = json.dumps(d) # Produces a string out of a given dict or string
d3 = json.dumps(json.loads(d)) # 'dumps' gets the dict from 'loads' this time
print "d1: " + str(d1)
print "d2: " + d2
print "d3: " + d3
Prints:
d1: {u'Aa': 1, u'cc': u'False', u'BB': u'blabla'}
d2: "{\"Aa\": 1, \"BB\": \"blabla\", \"cc\": \"False\"}"
d3: {"Aa": 1, "cc": "False", "BB": "blabla"}
Those 'u' characters being appended to an object signifies that the object is encoded in Unicode.
If you want to remove those 'u' characters from your object, you can do this:
import json, ast
jdata = ast.literal_eval(json.dumps(jdata)) # Removing uni-code chars
Let's checkout from python shell
>>> import json, ast
>>> jdata = [{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}]
>>> jdata = ast.literal_eval(json.dumps(jdata))
>>> jdata
[{'i': 'imap.gmail.com', 'p': 'aaaa'}, {'i': '333imap.com', 'p': 'bbbb'}]
Unicode is an appropriate type here. The JSONDecoder documentation describe the conversion table and state that JSON string objects are decoded into Unicode objects.
From 18.2.2. Encoders and Decoders:
JSON Python
==================================
object dict
array list
string unicode
number (int) int, long
number (real) float
true True
false False
null None
"encoding determines the encoding used to interpret any str objects decoded by this instance (UTF-8 by default)."
The u prefix means that those strings are unicode rather than 8-bit strings. The best way to not show the u prefix is to switch to Python 3, where strings are unicode by default. If that's not an option, the str constructor will convert from unicode to 8-bit, so simply loop recursively over the result and convert unicode to str. However, it is probably best just to leave the strings as unicode.
I kept running into this problem when trying to capture JSON data in the log with the Python logging library, for debugging and troubleshooting purposes. Getting the u character is a real nuisance when you want to copy the text and paste it into your code somewhere.
As everyone will tell you, this is because it is a Unicode representation, and it could come from the fact that you’ve used json.loads() to load in the data from a string in the first place.
If you want the JSON representation in the log, without the u prefix, the trick is to use json.dumps() before logging it out. For example:
import json
import logging
# Prepare the data
json_data = json.loads('{"key": "value"}')
# Log normally and get the Unicode indicator
logging.warning('data: {}'.format(json_data))
>>> WARNING:root:data: {u'key': u'value'}
# Dump to a string before logging and get clean output!
logging.warning('data: {}'.format(json.dumps(json_data)))
>>> WARNING:root:data: {'key': 'value'}
Try this:
mail_accounts[0].encode("ascii")
Just replace the u' with a single quote...
print (str.replace(mail_accounts,"u'","'"))
I have this very simple Python code:
in_data = "eNrtmD1Lw0AY..."
print("Input: " + in_data)
out_data = in_data.decode('base64').decode('zlib').encode('zlib').encode('base64')
print("Output: " + out_data)
It outputs:
Input: eNrtmD1Lw0AY...
Output: eJztmE1LAkEY...
The string is also correctly decoded; if I display in_data.decode('base64').decode('zlib'), it gives the expected result.
Also, the formatting is different for both strings:
Why is the decoding/encoding not working properly? Are there some sort of parameters I should use?
Your data on input starts with the hex bytes 78 DA, your output starts with 78 9C:
>>> 'eNrt'.decode('base64').encode('hex')[:4]
'78da'
>>> 'eJzt'.decode('base64').encode('hex')[:4]
'789c'
DA is the highest compression level, 9C is the default. See What does a zlib header look like?
Rather than use .encode('zlib') use the zlib.compress() function, an set the level to 9:
import zlib
zlib.compress(decoded_data, 9).encode('base64')
The output of the base64 encoding inserts a newline every 76 characters to make it suitable for MIME encapsulation (emailing). You could use the base64.b64encode() function instead to encode without newlines.
Hello i have an encrypted message i have opened the file in python created a list from the text document of all the individual characters, I then want to add a key to each letter in the list.
print (chr((ord(Z+key)))) # takes the ASCII value of the letter adds the key then changes back into a character
My issue is how do i made Z+1 Equal A instead of [
Use congruent addition!
key = 5
for i in range(26):
print (chr((i + key) % 26 + ord('A')))
Like the comment says. if result > Z. Then you count up more number to result so that is becomes a.
result = chr(ord(Z+key))
if result > ord(Z):
result = chr(ord(Z+102))
i am not sure of it is 102 or 103.
An easy way out is to view both the message and the keys as bytes.
Then you can just perform an exclusive-or (^) for both encryption and decryption.
If you need readable output, use base64 encoding on the key and ciphertext before writing them to disk. You can use my onepad program as an example of this approach.