Decoding String list in python from a binary file - python

I need to read a list of strings from a binary file and create a python list.
I'm using the below command to extract data from binary file:
tmp = f.read(100)
abc, = struct.unpack('100c',tmp)
The data that I can see in variable 'abc' is exactly as shown below, but I need to get the below data into a python list as strings.
Data that I need as a list: 'UsrVal' 'VdetHC' 'VcupHC' ..... 'Gravity_Axis'
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'

Here is how i would suggest you to do it with one liner.
You need to decode binary string and then you can do a split based on "\x00" which will return the list you are looking for.
e.g
my_binary_out = b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
decoded_list = my_binary_out.decode("latin1", 'ignore').split('\x00')
#or
decoded_list = my_binary_out.decode("cp1252", 'ignore').split('\x00')
Output Will look like this :
['UsrVal', 'VdetHC', 'VcupHC', 'VdirHC', 'HdirHC', 'UpFlwHC', 'UxHC', 'UyHC', 'UzHC', 'VresHC', 'UxRP', 'UyRP', 'UzRP', 'VresRP', 'Gravity_Axis']
Hope this helps

If you're going for a quick and messy way here, AND assuming your string
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
is in fact interpreted as
" b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis' "
Then the following few lines of code result with 'b' having the array you want.
a = {YourStringHere}
b = a[2:-1].split("\x00")

Related

Split List Elements in byte format to separate bytes in python

I have a list with byte elements like this:
list = [b'\x00\xcc\n', b'\x14I\x8dy_\xeb\xbc1C']
Now I want to separate all bytes like following:
list_new =[b'\x00', b'\xcc', b'\x14I', b'\x8dy_', b'\xeb', b'\xbc1C']
I am assuming here that you wanted to split the data with split criteria of '\x', this seems to be matching with your desired output. Let me know otherwise. Also I am not sure why you got this type of string, its little awkward to work with. A bigger context on the question might be more helpful. Nevertheless, I tried to get your desired output in following way:(May be not efficient but gets your job done).
import re
from codecs import encode
lists = [b'\x00\xcc\n', b'\x14I\x8dy_\xeb\xbc1C']
split = [re.split(r'(?=\\x)', str(item)) for item in lists] ## splitting with assumption of \x using lookarounds here
output = [] ## container to save the final item
for item in split: ## split is list of lists hence required two for loops
for nitem in item:
if nitem != "b'": ## remove anything which has only "b'"
output.append(nitem.replace('\\n','').replace("'",'').encode()) ## finally appending everyitem
## Note here that output contains two backward slashes , to remove them we use encode function from codecs module
## like below
[encode(itm.decode('unicode_escape'), 'raw_unicode_escape') for itm in output] ## Final output
Output:
[b'\x00', b'\xcc', b'\x14I', b'\x8dy_', b'\xeb', b'\xbc1C']

How to turn a list containing strings into a list containing integers (Python)

I am optimizing PyRay (https://github.com/oscr/PyRay) to be a usable Python ray-casting engine, and I am working on a feature that takes a text file and turns it into a list (PyRay uses as a map). But when I use the file as a list, it turns the contents into strings, therefore not usable by PyRay. So my question is: How do I convert a list of strings into integers? Here is my code so far. (I commented the actual code so I can test this)
print("What map file to open?")
mapopen = input(">")
mapload = open(mapopen, "r")
worldMap = [line.split(',') for line in mapload.readlines()]
print(worldMap)
The map file:
1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,2,3,2,3,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,3,1,0,0,2,0,0,0,2,3,2,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,2,0,0,2,1,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,3,1,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,2,3,2,1,2,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,2,0,0,2,
2,3,1,0,0,2,0,0,2,1,3,2,0,2,0,0,3,0,3,1,
1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,2,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,0,0,2,3,0,1,2,0,1,
1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,3,0,2,
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,
2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,
Please help me, I have been searching all about and I can't find anything.
try this: Did you want a list of lists? or just one big list?
with open(filename, "r") as txtr:
data = txtr.read()
data = txtr.split("/n") # split into list of strings
data = [ list(map(int, x.split(","))) for x in data]
fourth line splits string into list by removing comma, then appliea int() on each element then turns it into a list. It does this for every element in data. I hope it helps.
Here is for just one large list.
with open(filename, "r") as txtr:
data = txtr.readlines() # remove empty lines in your file!
data = ",".join(data) # turns it into a large string
data = data.split(",") # now you have a list of strings
data = list(map(int, data)) # applies int() to each element in data.
Look into the map built-in function in python.
L=['1', '2', '3']
map = map(int, L)
for el in map:
print(el)
>>> 1
... 2
... 3
As per you question, please find below a way you can change list of strings to list of integers (or integers if you use list index to get the integer value). Hope this helps.
myStrList = ["1","2","\n","3"]
global myNewIntList
myNewIntList = []
for x in myStrList:
if(x != "\n"):
y = int(x)
myNewIntList.append(y)
print(myNewIntList)

creating docID for each text file in folder

hello I have a folder with name dict and that folder contains 4 to 6 text files, now I wanted to assign a ID docID to each text file in folder and I have used the code below
docID_list = [int(docID_string) for docID_string in os.listdir('/Users/suryavamsi/dict')]
and I have got an error
invalid literal for int() with base 10:
I have tried lots of ways but couldn't crack it can any one help me out
It looks like you're trying to convert strings to integers.
That will only work if your strings look like integers (e.g. '1').
If you just want an integer value associated with each file, use enumerate:
docID_list = [i for i, _ in enumerate(os.listdir('/Users/suryavamsi/dict'))]
Or just:
docID_list = list(range(len(os.listdir('/Users/suryavamsi/dict'))))
You might want to keep a dict that maps docID to filename, in which case you can use a dictionary comprehension:
docID_list = {i:doc for i, doc in enumerate(os.listdir('/Users/suryavamsi/dict'))}

Python put string into dictionary

I want to convert a string into a dictionary. I saved this dictionary previously in a text file.
The problem is now, that I am not sure, how the structure of the keys are. The values are generated with Counter(dictionaryName). The dictionary is really large, so I cannot check every key to see how it would be possible.
The keys can contain simple quotes like ', double quotes ", commas and maybe other characters. So is there any possibility to convert it back into a dictionary?
For example this is stored in the file:
Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23,...})
I found previous solutions with for example json, but I have problems with the double quotes and I cannot simply split for the commas.
If you trust the source, load from collections import Counter and eval() the string
How about something like:
>> from collections import Counter
>> line = '''Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23})'''
>> D = eval(line)
>> D
Counter({"'4,5'element1": 50, '4:55foobar': 23, 'element0': 512})
You could remove the Counter( and ) parts, then parse the rest with ast.literal_eval as long as it only involves basic Python data types:
import ast
def parse_Counter_string(s):
s = s.strip()
if not (s.startswith('Counter(') and s.endswith(')')):
raise ValueError('String does not match expected format')
# Counter( is 8 characters
# 12345678
s = s[8:-1]
return Counter(ast.literal_eval(s))
In the future, I recommend picking a different way to serialize your data.
you can use demjson library for doing this, you can have the text directly in your program
import demjson
counter = demjson.decode("enter your text here")
if it is in the file ,you can do the following steps :
WD = dirname(realpath(__file__))
file = open(WD, "filename"), "r")
counter = demjson.decode(file.read())
file.close()

python parse csv to lists

I have a csv file thru which I want to parse the data to the lists.
So I am using the python csv module to read that
so basically the following:
import csv
fin = csv.reader(open(path,'rb'),delimiter=' ',quotechar='|')
print fin[0]
#gives the following
['"1239","2249.00","1","3","2011-02-20"']
#lets say i do the following
ele = str(fin[0])
ele = ele.strip().split(',')
print ele
#gives me following
['[\'"1239"', '"2249.00"', '"1"', '"3"', '"2011-02-20"\']']
now
ele[0] gives me --> output---> ['"1239"
How do I get rid of that ['
In the end, I want to do is get 1239 and convert it to integer.. ?
Any clues why this is happening
Thanks
Edit:*Never mind.. resolved thanks to the first comment *
Change your delimiter to ',' and you will get a list of those values from the csv reader.
It's because you are converting a list to a string, there is no need to do this. Grab the first element of the list (in this case it is a string) and parse that:
>>> a = ['"1239","2249.00","1","3","2011-02-20"']
>>> a
['"1239","2249.00","1","3","2011-02-20"']
>>> a[0]
'"1239","2249.00","1","3","2011-02-20"'
>>> b = a[0].replace('"', '').split(',')
>>> b[-1]
'2011-02-20'
of course before you do replace and split string methods you should check if the type is string or handle the exception if it isn't.
Also Blahdiblah is correct your delimiter is probably wrong.

Categories