Replace a string by another string in an "array" - python

My file so far looks like this:
[
{
"asks" : [
[
0.00276477,
NumberInt(9)
],
[
0.00276478,
NumberInt(582)]]
}
]
I would like to replace the "NumberInt(9)" by the digit 9.
What I tried so far looks like this:
json_data=open("test.json").read()
number = re.findall("NumberInt\(([0-9]+)\)", json_data)
Nint = re.findall("(Nu.*)", json_data)
json_data.replace('Nint', 'number')
But it does not replace it in my original file... Does someone has an idea?

Here's how to do it, based on the documentation of re.sub():
with open("test.json") as file:
json_data = file.read()
new_json = re.sub("NumberInt\(([0-9]+)\)", r"\1", json_data)
Note that re.sub() returns a copy of the string, just like the built-in str.replace() method does.

First point: use re.sub() instead of str.replace() here. Also note that python strings are immutable so in both case you have to rebind your string to the result of the function.
Second point: your file will OF COURSE not be updated if you don't explicitely do it by yourself - you have to write back the corrected string to the file (reopening the file in write mode).

Related

Decoding String list in python from a binary file

I need to read a list of strings from a binary file and create a python list.
I'm using the below command to extract data from binary file:
tmp = f.read(100)
abc, = struct.unpack('100c',tmp)
The data that I can see in variable 'abc' is exactly as shown below, but I need to get the below data into a python list as strings.
Data that I need as a list: 'UsrVal' 'VdetHC' 'VcupHC' ..... 'Gravity_Axis'
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
Here is how i would suggest you to do it with one liner.
You need to decode binary string and then you can do a split based on "\x00" which will return the list you are looking for.
e.g
my_binary_out = b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
decoded_list = my_binary_out.decode("latin1", 'ignore').split('\x00')
#or
decoded_list = my_binary_out.decode("cp1252", 'ignore').split('\x00')
Output Will look like this :
['UsrVal', 'VdetHC', 'VcupHC', 'VdirHC', 'HdirHC', 'UpFlwHC', 'UxHC', 'UyHC', 'UzHC', 'VresHC', 'UxRP', 'UyRP', 'UzRP', 'VresRP', 'Gravity_Axis']
Hope this helps
If you're going for a quick and messy way here, AND assuming your string
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
is in fact interpreted as
" b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis' "
Then the following few lines of code result with 'b' having the array you want.
a = {YourStringHere}
b = a[2:-1].split("\x00")

I can't remove whistespace

I have json file which is inserted in a sqlite database.
After inserting, all non breaking space are automatically converted to whitespace, which is good!
json file looks like : [{'john' : "6\u00a0500\u00a0\u20ac" , 'dams' : "7\u00a0500\u00a0\u20ac"}, {'john' : "10\u00a0900\u00a0\u20ac" , 'dams' : "13\u00a0980\u00a0\u20ac"}] ##style it in code block
sqlite file looks like:
My goal is to remove whitespace, '€' and cast the value to integer.
I used trim, ltrim, rtrim, replace and combinations of trim and replace to remove whitespace, but it doesn't work.
First off, I would suggest that you ensure that you're using double quotes throughout your JSON files. This is the standard for JSON syntax and moreover, not having things consistent will cause more of a headache later on.
With that out of the way, here's my solution:
with open(jsonFile, "r") as file:
jsonLines = file.readlines()
cleanJsonLines = []
for jsonDict in jsonLines:
for key in jsonDict:
almostCleanJson = jsonDict[key].replace("\u00a0", "")
cleanJson = almostCleanJson.replace("\u20ac", "")
cleanJsonLines.append({key: cleanJson})
print(cleanJsonLines)
Output:
[{'john': '6500'}, {'dams': '7500'}]

How to read text file in python with unsuccessful format?

I made a big mistake, when I choose the way of dumping data;
Now I have a text file, that consist of
{ "13234134": ["some", "strings", ...]}{"34545345": ["some", "strings", ...]} ..so on
How can I read it into python?
edit:
I have tried json,
when I add at begin and at end of file curly-braces manually, I have "ValueError: Expecting property name:", because "13234134" string maybi invalid for json, I do not know how to avoid it.
edit1
with open('new_file.txt', 'w') as outfile:
for index, user_id in enumerate(users):
json.dump(dict = get_user_tweets(user_id), outfile)
It looks like what you have is an undelimited stream of JSON objects. As if you'd called json.dump over and over on the same file, or ''.join(json.dumps(…) for …). And, in fact, the first one is exactly what you did. :)
So, you're in luck. JSON is a self-delimiting format, which means you can read up to the end of the first JSON object, then read from there up to the end of the next JSON object, and so on. The raw_decode method essentially does the hard part.
There's no stdlib function that wraps it up, and I don't know of any library that does it, but it's actually very easy to do yourself:
def loads_multiple(s):
decoder = json.JSONDecoder()
pos = 0
while pos < len(s):
pos, obj = decoder.raw_decode(s, pos)
yield obj
So, instead of doing this:
obj = json.loads(s)
do_stuff_with(obj)
… you do this:
for obj in loads_multi(s):
do_stuff_with(obj)
Or, if you want to combine all the objects into one big list:
objs = list(loads_multi(s))
Consider simply rewriting it to something that is valid json. If indeed your bad data only contains the format that you've shown (a series of json structures that are not comma-separated), then just add commas and square braces:
with open('/tmp/sto/junk.csv') as f:
data = f.read()
print(data)
s = "[ {} ]".format(data.strip().replace("}{", "},{"))
print(s)
import json
data = json.loads(s)
print(type(data))
Output:
{ "13234134": ["some", "strings"]}{"34545345": ["some", "strings", "like", "this"]}
[ { "13234134": ["some", "strings"]},{"34545345": ["some", "strings", "like", "this"]} ]
<class 'list'>

Python Regex: find all lines that start with '{' and end with '}'

I am receiving data over a socket, a bunch of JSON strings. However, I receive a set amount of bytes, so sometimes the last of my JSON strings is cut-off. I will typically get the following:
{"pitch":-30.778193,"yaw":-124.63285,"roll":-8.977466}
{"pitch":-30.856342,"yaw":-124.57556,"roll":-7.7220345}
{"pitch":-31.574106,"yaw":-124.65623,"roll":-7.911794}
{"pitch":-30.479567,"yaw":-124.24301,"roll":-8.730827}
{"pitch":-29.30239,"yaw":-123.97949,"roll":-8.134723}
{"pitch":-29.84712,"yaw":-124.584465,"roll":-8.588374}
{"pitch":-31.072054,"yaw":-124.707466,"roll":-8.877062}
{"pitch":-31.493435,"yaw":-124.75457,"roll":-9.019922}
{"pitch":-29.591925,"yaw":-124.960815,"roll":-9.379437}
{"pitch":-29.37105,"yaw":-125.14427,"roll":-9.642341}
{"pitch":-29.483717,"yaw":-125.16528,"roll":-9.687177}
{"pitch":-30.903332,"yaw":-124.603935,"roll":-9.423098}
{"pitch":-30.211857,"yaw":-124.471664,"roll":-9.116135}
{"pitch":-30.837414,"yaw":-125.18984,"roll":-9.824204}
{"pitch":-30.526165,"yaw":-124.85788,"roll":-9.158611}
{"pitch":-30.333513,"yaw":-123.68705,"roll":-7.9481263}
{"pitch":-30.903502,"yaw":-123.78847,"roll":-8.209373}
{"pitch":-31.194769,"yaw":-124.79708,"roll":-8.709783}
{"pitch":-30.816765,"yaw":-125
With Python, I would like to create a string array of the first 18 complete { data... } strings.
Here is what I have tried: cleanData = re.search('{.*}', data) but it seems like this is only giving me the very first { data... } entry. How can I get the full string array of complete { } sets?
To get all, you can use re.finditer or re.findall.
>>> re.findall(r'{.*}', s)
['{"pitch":-30.778193,"yaw":-124.63285,"roll":-8.977466}', '{"pitch":-30.856342,"yaw":-124.57556,"roll":-7.7220345}', '{"pitch":-31.574106,"yaw":-124.65623,"roll":-7.911794}', '{"pitch":-30.479567,"yaw":-124.24301,"roll":-8.730827}', '{"pitch":-29.30239,"yaw":-123.97949,"roll":-8.134723}', '{"pitch":-29.84712,"yaw":-124.584465,"roll":-8.588374}', '{"pitch":-31.072054,"yaw":-124.707466,"roll":-8.877062}', '{"pitch":-31.493435,"yaw":-124.75457,"roll":-9.019922}', '{"pitch":-29.591925,"yaw":-124.960815,"roll":-9.379437}', '{"pitch":-29.37105,"yaw":-125.14427,"roll":-9.642341}', '{"pitch":-29.483717,"yaw":-125.16528,"roll":-9.687177}', '{"pitch":-30.903332,"yaw":-124.603935,"roll":-9.423098}', '{"pitch":-30.211857,"yaw":-124.471664,"roll":-9.116135}', '{"pitch":-30.837414,"yaw":-125.18984,"roll":-9.824204}', '{"pitch":-30.526165,"yaw":-124.85788,"roll":-9.158611}', '{"pitch":-30.333513,"yaw":-123.68705,"roll":-7.9481263}', '{"pitch":-30.903502,"yaw":-123.78847,"roll":-8.209373}', '{"pitch":-31.194769,"yaw":-124.79708,"roll":-8.709783}']
>>>
OR
>>> [x.group() for x in re.finditer(r'{.*}', s)]
['{"pitch":-30.778193,"yaw":-124.63285,"roll":-8.977466}', '{"pitch":-30.856342,"yaw":-124.57556,"roll":-7.7220345}', '{"pitch":-31.574106,"yaw":-124.65623,"roll":-7.911794}', '{"pitch":-30.479567,"yaw":-124.24301,"roll":-8.730827}', '{"pitch":-29.30239,"yaw":-123.97949,"roll":-8.134723}', '{"pitch":-29.84712,"yaw":-124.584465,"roll":-8.588374}', '{"pitch":-31.072054,"yaw":-124.707466,"roll":-8.877062}', '{"pitch":-31.493435,"yaw":-124.75457,"roll":-9.019922}', '{"pitch":-29.591925,"yaw":-124.960815,"roll":-9.379437}', '{"pitch":-29.37105,"yaw":-125.14427,"roll":-9.642341}', '{"pitch":-29.483717,"yaw":-125.16528,"roll":-9.687177}', '{"pitch":-30.903332,"yaw":-124.603935,"roll":-9.423098}', '{"pitch":-30.211857,"yaw":-124.471664,"roll":-9.116135}', '{"pitch":-30.837414,"yaw":-125.18984,"roll":-9.824204}', '{"pitch":-30.526165,"yaw":-124.85788,"roll":-9.158611}', '{"pitch":-30.333513,"yaw":-123.68705,"roll":-7.9481263}', '{"pitch":-30.903502,"yaw":-123.78847,"roll":-8.209373}', '{"pitch":-31.194769,"yaw":-124.79708,"roll":-8.709783}']
>>>
You need re.findall() (or re.finditer)
>>> import re
>>> for r in re.findall(r'{.*}', data)[:18]:
print r
{"pitch":-30.778193,"yaw":-124.63285,"roll":-8.977466}
{"pitch":-30.856342,"yaw":-124.57556,"roll":-7.7220345}
{"pitch":-31.574106,"yaw":-124.65623,"roll":-7.911794}
{"pitch":-30.479567,"yaw":-124.24301,"roll":-8.730827}
{"pitch":-29.30239,"yaw":-123.97949,"roll":-8.134723}
{"pitch":-29.84712,"yaw":-124.584465,"roll":-8.588374}
{"pitch":-31.072054,"yaw":-124.707466,"roll":-8.877062}
{"pitch":-31.493435,"yaw":-124.75457,"roll":-9.019922}
{"pitch":-29.591925,"yaw":-124.960815,"roll":-9.379437}
{"pitch":-29.37105,"yaw":-125.14427,"roll":-9.642341}
{"pitch":-29.483717,"yaw":-125.16528,"roll":-9.687177}
{"pitch":-30.903332,"yaw":-124.603935,"roll":-9.423098}
{"pitch":-30.211857,"yaw":-124.471664,"roll":-9.116135}
{"pitch":-30.837414,"yaw":-125.18984,"roll":-9.824204}
{"pitch":-30.526165,"yaw":-124.85788,"roll":-9.158611}
{"pitch":-30.333513,"yaw":-123.68705,"roll":-7.9481263}
{"pitch":-30.903502,"yaw":-123.78847,"roll":-8.209373}
{"pitch":-31.194769,"yaw":-124.79708,"roll":-8.709783}
Extracting lines that start and end with a specific character can be done without any regex, use str.startswith and str.endswith methods when iterating through the lines in a file:
results = []
with open(filepath, 'r') as f:
for line in f:
if line.startswith('{') and line.rstrip('\n').endswith('}'):
results.append(line.rstrip('\n'))
Note the .rstrip('\n') is used before .endswith to make sure the final newline does not interfere with the } check at the end of the string.

How to make python program extensible

So I have a bunch of line of codes like these in a row in my program:
str = str.replace('ten', '10s')
str = str.replace('twy', '20s')
str = str.replave('fy', '40s')
...
I want to make it such that I don't have to manually open my source file to add new cases. For example ('sy', '70'). I know I have to put all these in a function somehow, but I'd like to map cases that are not in my "mapper lib" from the command line. Configuration file maybe? how?
Thanks!
You could use a config file in json format like this:
[
["ten", "10s"],
["twy", "20s"],
["fy", "40s"]
]
Save it as 'replacements.json' and then use it this way:
import json
with open('replacements.json') as i:
replacements = json.load(i)
text = 'ten, twy, fy'
for r in replacements:
text = text.replace(r[0], r[1])
Then when you need to change the values just edit the replacements.json file without touching any Python code.
The format for you replacements file could be anything but json is easy to use and edit.
a simple solution could be to put those in a file, read them in your program and do your replaces in a loop..
Many ways to do this, if it's a rarely changing thing you could consider doing it with a Python dict:
mappings = {
'ten': '10s',
'twy': '20s',
'fy': '40s',
}
def replace(str_):
for s, r in mappings.iteritems():
str_.replace(s, r)
return str_
Alternatively in a Text file (make sure you use a safe delimiter which isn't used in any of the keys!)
mappings.txt
ten|10s
twy|20s
fy|40s
And the Python part:
mappings = {}
for line in open('mappings.txt'):
k, v = line.split('|', 1)
mappings[k] = v
And use the replace from above :)
You could use csv to store the replacements in a human-editable form in a file:
import csv
with open('replacements.csv', 'rb') as f:
replacements = list(csv.reader(f))
for old, new in replacements:
your_string = your_string.replace(old, new)
where replacements.csv:
ten,10s
twy,20s
fy,40s
It avoids unnecessary markup such as ", [] in the json format and allows a delimiter (,) to be present in a string itself unlike the plain text format from #WoLpH's answer.
(live example)

Categories