How to unpack double digit hex file with python msgPack? - python

I have a text file containing some data, among these data there's a JSON packed with msgPack.
I am able to unpack on https://toolslick.com/conversion/data/messagepack-to-json but I can't get to make it work in python.
Up to now I am trying to do the following :
def parseAndSplit(path):
with open(path) as f:
fContent = f.read()
for subf in fContent.split('Payload: '):
'''for ssubf in subf.split('DataChunkMsg'):
print(ssubf)'''
return subf.split('DataChunkMsg')[0]
fpath = "path/to/file"
t = parseAndSplit(fpath)
l = t.split("-")
s = ""
for i in l:
s=s+i
print(s)
a = msgpack.unpackb(bytes(s,"UTF-8"), raw=False)
print(a)
but the output is
import msgpack
Traceback (most recent call last):
File "C:/Users/Marco/PycharmProjects/codeTest/msgPack.py", line 19, in <module>
a = msgpack.unpackb(bytes(s,"UTF-8"), raw=False)
File "msgpack\_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.
9392AA6E722D736230322D3032AC4F444D44617...(string goes on)
I am quite sure that it's an encoding problem of some sort but I am having no luck, wether in the docs or by trying .
Thank you very much for the attention

I found the solution in the end:
msgpack.unpackb(bytes.fromhex(hexstring)
where hexstring is the string read from the file.

Related

Why we can't print particular line of text file with this method? [duplicate]

The main function that the code should do is to open a file and get the median. This is my code:
def medianStrat(lst):
count = 0
test = []
for line in lst:
test += line.split()
for i in lst:
count = count +1
if count % 2 == 0:
x = count//2
y = lst[x]
z = lst[x-1]
median = (y + z)/2
return median
if count %2 == 1:
x = (count-1)//2
return lst[x] # Where the problem persists
def main():
lst = open(input("Input file name: "), "r")
print(medianStrat(lst))
Here is the error I get:
Traceback (most recent call last):
File "C:/Users/honte_000/PycharmProjects/Comp Sci/2015/2015/storelocation.py", line 30, in <module>
main()
File "C:/Users/honte_000/PycharmProjects/Comp Sci/2015/2015/storelocation.py", line 28, in main
print(medianStrat(lst))
File "C:/Users/honte_000/PycharmProjects/Comp Sci/2015/2015/storelocation.py", line 24, in medianStrat
return lst[x]
TypeError: '_io.TextIOWrapper' object is not subscriptable
I know lst[x] is causing this problem but not too sure how to solve this one.
So what could be the solution to this problem or what could be done instead to make the code work?
You can't index (__getitem__) a _io.TextIOWrapper object. What you can do is work with a list of lines. Try this in your code:
lst = open(input("Input file name: "), "r").readlines()
Also, you aren't closing the file object, this would be better:
with open(input("Input file name: ", "r") as lst:
print(medianStrat(lst.readlines()))
with ensures that file get closed.
basic error my end, sharing in case anyone else finds it useful. Difference between datatypes is really important! just because it looks like JSON doesn't mean it is JSON - I ended up on this answer, learning this the hard way.
Opening the IO Stream needs to be converted using the python json.load method, before it is a dict data type, otherwise it is still a string. Now it is in a dict it can be brought into a dataFrame.
def load_json(): # this function loads json and returns it as a dataframe
with open("1lumen.com.json", "r") as io_str:
data = json.load(io_str)
df = pd.DataFrame.from_dict(data)
logging.info(df.columns.tolist())
return(df)

Determine if a file is "more likely" json or csv

I have a few files with generalized extensions, such as "txt" or no extension at all. I'm trying to determine in a very quick manner whether the file is json or a csv. I thought of using the magic module, but it doesn't work for what I'm trying to do. For example:
>>> import magic
>>> magic.from_file('my_json_file.txt')
'ASCII text, with very long lines, with no line terminators'
Is there a better way to determine if something is json or csv? I'm unable to load the entire file, and I want to determine it in a very quick manner. What would be a good solution here?
You can check if the file starts with either { or [ to determine if it's JSON, and you can load the first two lines with csv.reader and see if the two rows have the same number of columns to determine if it's CSV.
import csv
with open('file') as f:
if f.read(1) in '{[':
print('likely JSON')
else:
f.seek(0)
reader = csv.reader(f)
try:
if len(next(reader)) == len(next(reader)) > 1:
print('likely CSV')
except StopIteration:
pass
You can use the try/catch "technique" trying to parse the data to JSON object. When loading an invalid formatted JSON from a string it raises a ValueError which you can catch and process however you want:
>>> import json
>>> s1 = '{"test": 123, "a": [{"b": 32}]}'
>>> json.loads(s1)
If valid, nothing happens, if not:
>>> import json
>>> s2 = '1;2;3;4'
>>> json.loads(s2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 2 - line 1 column 8 (char 1 - 7)
So you can build a function as follows:
import json
def check_format(filedata):
try:
json.loads(filedata)
return 'JSON'
except ValueError:
return 'CSV'
>>> check_format('{"test": 123, "a": [{"b": 32}]}')
'JSON'
>>> check_format('1;2;3;4')
'CSV'

How to apply regex sub to a csv file in python

I have a csv file I wish to apply a regex replacement to with python.
So far I have the following
reader = csv.reader(open('ffrk_inventory_relics.csv', 'r'))
writer = csv.writer(open('outfile.csv','w'))
for row in reader:
reader = re.sub(r'\+','z',reader)
Which is giving me the following error:
Script error: Traceback (most recent call last):
File "ffrk_inventory_tracker_v1.6.py", line 22, in response
getRelics(data['equipments'], 'ffrk_inventory_relics')
File "ffrk_inventory_tracker_v1.6.py", line 72, in getRelics
reader = re.sub(r'\+','z',reader)
File "c:\users\baconcatbug\appdata\local\programs\python\python36\lib\re.py",
line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
After googling to not much luck, I would like to ask the community here how to open the csv file correctly so I can use re.sub on it and then write out the altered csv file back to the same filename.
csv.reader(open('ffrk_inventory_relics.csv', 'r')) is creating a list of lists, and when you iterate over it and pass each value to re.sub, you are passing a list, not a string. Try this:
import re
import csv
final_data = [[re.sub('\+', 'z', b) for b in i] for i in csv.reader(open('ffrk_inventory_relics.csv', 'r'))]
write = csv.writer(open('ffrk_inventory_relics.csv'))
write.writerows(final_data)
If you don't need csv you can use replace with regular open:
with open('ffrk_inventory_relics.csv', 'r') as reader, open('outfile.csv','w') as writer:
for row in reader:
writer.write(row.replace('+','z'))

python prettytable module raise Could not determine delimiter error for valid csv file

I'm trying to use prettytable module to print out data from csv file. But it failed with the following exception
Could not determine delimiter error for valid csv file
>>> import prettytable
>>> with file("/tmp/test.csv") as f:
... prettytable.from_csv(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "build/bdist.linux-x86_64/egg/prettytable.py", line 1337, in from_csv
File "/usr/lib/python2.7/csv.py", line 188, in sniff
raise Error, "Could not determine delimiter"
_csv.Error: Could not determine delimiter
The CSV file:
input_gps,1424185824460,1424185902788,1424185939525,1424186019313,1424186058952,1424186133797,1424186168766,1424186170214,1424186246354,1424186298434,1424186376789,1424186413625,1424186491453,1424186606143,1424186719394,1424186756366,1424186835829,1424186948532,1424187107293,1424187215557,1424187250693,1424187323097,1424187358989,1424187465475,1424187475824,1424187476738,1424187548602,1424187549228,1424187550690,1424187582866,1424187584248,1424187639923,1424187641623,1424187774567,1424187776418,1424187810376,1424187820238,1424187820998,1424187916896,1424187917472,1424187919241,1424188048340,dummy-0,dummy-1,Total
-73.958315%2C 40.815569,0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),13.0 (42%)
-76.932984%2C 38.992186,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),1.0(100%),0.0(nan%),1.0(100%),0.0(nan%),1.0(100%),0.0(nan%),0.0(nan%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),0.0(0%),17.0 (55%)
null_input-0,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0 (0%)
null_input-1,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),1.0(100%),1.0 (3%)
Total,0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),0.0(0%),1.0(3%),0.0(0%),0.0(0%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),31.0(100%)
If you anyone can inform me how to workaround the problem or other alternative alternatives, it will be very helpful.
According to pypi, prettytable is only alpha level. I could not find where you could give it the configuration to pass to the csv module. So in that case, you probably should read the csv file by explicitely declaring the delimiter, and build the PrettyTable line by line
pt = None # to avoid it vanished at end of block...
with open('/tmp/test.csv') as fd:
rd = csv.reader(fd, delimiter = ',')
pt = PrettyTable(next(rd))
for row in rd:
pt.add_row(row)
Got the same, working on some messier csv's. Ended up implementing fallback method with manual search
from string import punctuation, whitespace
from collections import Counter
def get_delimiter(self, contents: str):
# contents = f.read()
try:
sniffer = csv.Sniffer()
dialect = sniffer.sniff(contents)
return dialect.delimiter
except Error:
return fallback_delimiter_search(contents)
def fallback_delimiter_search(contents: str) -> str:
# eliminate space in case of a lot of text
content_chars = list(filter(lambda x: (x in punctuation or x in whitespace) and x!=' ', contents))
counts = Counter(content_chars)
tgt_delimiter = counts.most_common(1)[0][0]
return tgt_delimiter

read an ascii file into a numpy array

I have an ascii file and I want to read it into a numpy array. But it was failing and for the first number in the file, it returns 'NaN' when I use numpy.genfromtxt. Then I tried to use the following way of reading the file into an array:
lines = file('myfile.asc').readlines()
X = []
for line in lines:
s = str.split(line)
X.append([float(s[i]) for i in range(len(s))])
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
ValueError: could not convert string to float: 15.514
when I printed the first line of the file it looks like :
>>> s
['\xef\xbb\xbf15.514', '15.433', '15.224', '14.998', '14.792', '15.564', '15.386', '15.293', '15.305', '15.132', '15.073', '15.005', '14.929', '14.823', '14.766', '14.768', '14.789']
how could I read such a file into a numpy array without problem and any presumption about the number of rows and columns?
Based on #falsetru's answer, I want to provide a solution with Numpy's file reading capabilities:
import numpy as np
import codecs
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
X = np.loadtxt(f)
It loads the file into an open file instance using the correct encoding. Numpy uses this kind of handle (it can also use handles from open() and works seemless like in every other case.
The file is encoded with utf-8 with BOM. Use codecs.open with utf-8-sig encoding to handle it correctly (To exclude BOM \xef\xbb\xbf).
import codecs
X = []
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
for line in f:
s = line.split()
X.append([float(s[i]) for i in range(len(s))])
UPDATE You don't need to use index at all:
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
X = [[float(x) for x in line.split()] for line in f]
BTW, instead of using the unbound method str.split(line), use line.split() if you have no special reason to do it.

Categories