I can't remove whistespace - python

I have json file which is inserted in a sqlite database.
After inserting, all non breaking space are automatically converted to whitespace, which is good!
json file looks like : [{'john' : "6\u00a0500\u00a0\u20ac" , 'dams' : "7\u00a0500\u00a0\u20ac"}, {'john' : "10\u00a0900\u00a0\u20ac" , 'dams' : "13\u00a0980\u00a0\u20ac"}] ##style it in code block
sqlite file looks like:
My goal is to remove whitespace, '€' and cast the value to integer.
I used trim, ltrim, rtrim, replace and combinations of trim and replace to remove whitespace, but it doesn't work.

First off, I would suggest that you ensure that you're using double quotes throughout your JSON files. This is the standard for JSON syntax and moreover, not having things consistent will cause more of a headache later on.
With that out of the way, here's my solution:
with open(jsonFile, "r") as file:
jsonLines = file.readlines()
cleanJsonLines = []
for jsonDict in jsonLines:
for key in jsonDict:
almostCleanJson = jsonDict[key].replace("\u00a0", "")
cleanJson = almostCleanJson.replace("\u20ac", "")
cleanJsonLines.append({key: cleanJson})
print(cleanJsonLines)
Output:
[{'john': '6500'}, {'dams': '7500'}]

Related

How to split a comma-separated line if the chunk contains a comma in Python?

I'm trying to split current line into 3 chunks.
Title column contains comma which is delimiter
1,"Rink, The (1916)",Comedy
Current code is not working
id, title, genres = line.split(',')
Expected result
id = 1
title = 'Rink, The (1916)'
genres = 'Comedy'
Any thoughts how to split it properly?
Ideally, you should use a proper CSV parser and specify that double quote is an escape character. If you must proceed with the current string as the starting point, here is a regex trick which should work:
inp = '1,"Rink, The (1916)",Comedy'
parts = re.findall(r'".*?"|[^,]+', inp)
print(parts) # ['1', '"Rink, The (1916)"', 'Comedy']
The regex pattern works by first trying to find a term "..." in double quotes. That failing, it falls back to finding a CSV term which is defined as a sequence of non comma characters (leading up to the next comma or end of the line).
lets talk about why your code does not work
id, title, genres = line.split(',')
here line.split(',') return 4 values(since you have 3 commas) on the other hand you are expecting 3 values hence you get.
ValueError: too many values to unpack (expected 3)
My advice to you will be to not use commas but use other characters
"1#\"Rink, The (1916)\"#Comedy"
and then
id, title, genres = line.split('#')
Use the csv package from the standard library:
>>> import csv, io
>>> s = """1,"Rink, The (1916)",Comedy"""
>>> # Load the string into a buffer so that csv reader will accept it.
>>> reader = csv.reader(io.StringIO(s))
>>> next(reader)
['1', 'Rink, The (1916)', 'Comedy']
Well you can do it by making it a tuple
line = (1,"Rink, The (1916)",Comedy)
id, title, genres = line

Python put string into dictionary

I want to convert a string into a dictionary. I saved this dictionary previously in a text file.
The problem is now, that I am not sure, how the structure of the keys are. The values are generated with Counter(dictionaryName). The dictionary is really large, so I cannot check every key to see how it would be possible.
The keys can contain simple quotes like ', double quotes ", commas and maybe other characters. So is there any possibility to convert it back into a dictionary?
For example this is stored in the file:
Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23,...})
I found previous solutions with for example json, but I have problems with the double quotes and I cannot simply split for the commas.
If you trust the source, load from collections import Counter and eval() the string
How about something like:
>> from collections import Counter
>> line = '''Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23})'''
>> D = eval(line)
>> D
Counter({"'4,5'element1": 50, '4:55foobar': 23, 'element0': 512})
You could remove the Counter( and ) parts, then parse the rest with ast.literal_eval as long as it only involves basic Python data types:
import ast
def parse_Counter_string(s):
s = s.strip()
if not (s.startswith('Counter(') and s.endswith(')')):
raise ValueError('String does not match expected format')
# Counter( is 8 characters
# 12345678
s = s[8:-1]
return Counter(ast.literal_eval(s))
In the future, I recommend picking a different way to serialize your data.
you can use demjson library for doing this, you can have the text directly in your program
import demjson
counter = demjson.decode("enter your text here")
if it is in the file ,you can do the following steps :
WD = dirname(realpath(__file__))
file = open(WD, "filename"), "r")
counter = demjson.decode(file.read())
file.close()

How to dump json without quotes in python

Here is how I dump a file
with open('es_hosts.json', 'w') as fp:
json.dump(','.join(host_list.keys()), fp)
The results is
"a,b,c"
I would like:
a,b,c
Thanks
Before doing a string replace, you might want to strip the quotation marks:
print '"a,b,c"'.strip('"')
Output:
a,b,c
That's closer to what you want to achieve. Even just removing the first and the last character works: '"a,b,c"'[1:-1].
But have you looked into this question?
To remove the quotation marks in the keys only, which may be important if you are parsing it later (presumably with some tolerant parser or maybe you just pipe it directly into node for bizarre reasons), you could try the following regex.
re.sub(r'(?<!: )"(\S*?)"', '\\1', json_string)
One issue is that this regex expects fields to be seperated key: value and it will fail for key:value. You could make it work for the latter with a minor change, but similarly it won't work for variable amounts of whitespace after :
There may be other edge cases but it will work with outputs of json.dumps, however the results will not be parseable by json. Some more tolerant parsers like yaml might be able to read the results.
import re
regex = r'(?<!: )"(\S*?)"'
o = {"noquotes" : 127, "put quotes here" : "and here", "but_not" : "there"}
s = json.dumps(o)
s2 = json.dumps(o, indent=3)
strip_s = re.sub(regex,'\\1',s)
strip_s2 = re.sub(regex,'\\1',s2)
print(strip_s)
print(strip_s2)
assert(json.loads(strip_s) == json.loads(s) == json.loads(strip_s2) == json.loads(s2) == object)
Will raise a ValueError but prints what you want.
Well, that's not valid json, so the json module won't help you to write that data. But you can do this:
import json
with open('es_hosts.json', 'w') as fp:
data = ['a', 'b', 'c']
fp.write(json.dumps(','.join(data)).replace('"', ''))
That's because you asked for json, but since that's not json, this should suffice:
with open('es_hosts.json', 'w') as fp:
data = ['a', 'b', 'c']
fp.write(','.join(data))
Use python's built-in string replace function
with open('es_hosts.json', 'w') as fp:
json.dump(','.join(host_list.keys()).replace('\"',''), fp)
Just use for loop to assign list to string.
import json
with open('json_file') as f:
data = json.loads(f.read())
for value_wo_bracket in data['key_name']:
print(value_wo_bracket)
Note there is difference between json.load and json.loads

How to make python program extensible

So I have a bunch of line of codes like these in a row in my program:
str = str.replace('ten', '10s')
str = str.replace('twy', '20s')
str = str.replave('fy', '40s')
...
I want to make it such that I don't have to manually open my source file to add new cases. For example ('sy', '70'). I know I have to put all these in a function somehow, but I'd like to map cases that are not in my "mapper lib" from the command line. Configuration file maybe? how?
Thanks!
You could use a config file in json format like this:
[
["ten", "10s"],
["twy", "20s"],
["fy", "40s"]
]
Save it as 'replacements.json' and then use it this way:
import json
with open('replacements.json') as i:
replacements = json.load(i)
text = 'ten, twy, fy'
for r in replacements:
text = text.replace(r[0], r[1])
Then when you need to change the values just edit the replacements.json file without touching any Python code.
The format for you replacements file could be anything but json is easy to use and edit.
a simple solution could be to put those in a file, read them in your program and do your replaces in a loop..
Many ways to do this, if it's a rarely changing thing you could consider doing it with a Python dict:
mappings = {
'ten': '10s',
'twy': '20s',
'fy': '40s',
}
def replace(str_):
for s, r in mappings.iteritems():
str_.replace(s, r)
return str_
Alternatively in a Text file (make sure you use a safe delimiter which isn't used in any of the keys!)
mappings.txt
ten|10s
twy|20s
fy|40s
And the Python part:
mappings = {}
for line in open('mappings.txt'):
k, v = line.split('|', 1)
mappings[k] = v
And use the replace from above :)
You could use csv to store the replacements in a human-editable form in a file:
import csv
with open('replacements.csv', 'rb') as f:
replacements = list(csv.reader(f))
for old, new in replacements:
your_string = your_string.replace(old, new)
where replacements.csv:
ten,10s
twy,20s
fy,40s
It avoids unnecessary markup such as ", [] in the json format and allows a delimiter (,) to be present in a string itself unlike the plain text format from #WoLpH's answer.
(live example)

removing double quote from json.dumps of string data

I have some data that I'm retrieving from a data feed as text. For example, I receive the data like the following:
1105488000000, 34.1300, 34.5750, 32.0700, 32.2800\r\n
1105574400000, 32.6750, 32.9500, 31.6500, 32.7300\r\n
1105660800000, 36.8250, 37.2100, 34.8650, 34.9000\r\n
etc.
(This is stock data, where the first column is the timestamp, the next columns are the open, high, low, and close price for the time period.)
I want to convert this into a json such as the following:
[
[1105488000000, 34.1300, 34.5750, 32.0700, 32.2800],
[1105574400000, 32.6750, 32.9500, 31.6500, 32.7300],
[1105660800000, 36.8250, 37.2100, 34.8650, 34.9000],
...
The code that I'm using is:
lines = data.split("\r\n");
output = []
for line in lines:
currentLine = line.split(",")
currentLine = [currentLine[0] , currentLine[1] , currentLine[2], currentLine[3], currentLine[4]]
output.append(currentLine)
jsonOutput = json.dumps(output)
However, when I do this, I'm finding that the values are:
[
["1105488000000", "34.1300", "34.5750", "32.0700", "32.2800"],
["1105574400000", "32.6750", "32.9500", "31.6500", "32.7300"],
["1105660800000", "36.8250", "37.2100", "34.8650", "34.9000"],
Is there anyway for me to get the output without the double quotes?
Pass the data through the int() or float() constructors before outputting in order to turn them into numbers.
...
currentLine = [float(i) for i in currentLine]
output.append(currentLine)
...
Change
currentLine = [currentLine[0] , currentLine[1] , currentLine[2], currentLine[3], currentLine[4]]
output.append(currentLine)
to
currentData = map(lambda num: float(num.strip()) , currentLine)
output.append(currentData)
Whenever you initialize currentLine with
currentLine = line.split(",")
all the elements of currentLine are strings. So, whenever you write this to JSON, you get JSON strings throughout. By converting all the strings to numbers, you get something without quotes. Also, I added the strip() calls to handle leading and trailing whitespace as is shown in your data example.
P.S. Please don't use the same variable name for two completely different things. It's more clear to use currentLine for the list of strings, and currentData for the list of numbers.

Categories