How to make json decimal round to three digits - python

I have the following list of list:
x = [["foo",3.923239],["bar",1.22333]]
What I want to do is to convert the numeric value into 3 digits under JSON string.
Yielding
myjsonfinal = "[["foo", 3.923], ["bar", 1.223]]"
I tried this but failed:
import json
print json.dumps(x)
Ideally we'd like this to be fast because need to deal with ~1000 items. Then load to web.

Try round()
import json
x = [["foo",3.923239],["bar",1.22333]]
json.dumps([[s, round(i, 3)] for s, i in x])

#neversaint, Try this:
x = [["foo", 3.923239], ["bar", 1.22333]]
for i, j in enumerate(x):
x[i][1] = round(j[1], 3)
print x
Output:
[['foo', 3.923], ['bar', 1.223]]
Cheers!!

Related

How to convert strings with billion or million abbreviation into integers in a list

i have few string items which are the numbers with billion or million abbreviation in a list:
list = ["150M", "360M", "2.6B", "3.7B"]
I would like to use a syntax that could convert those string items into integers counted in thousands (e.g 150M > 150,000, 3.7B> 3,700,000 ), thanks
you should really show some attempt to try solve the problem yourself, but here is a simple example:
multipliers = {'K':1000, 'M':1000000, 'B':1000000000}
def string_to_int(string):
if string[-1].isdigit(): # check if no suffix
return int(string)
mult = multipliers[string[-1]] # look up suffix to get multiplier
# convert number to float, multiply by multiplier, then make int
return int(float(string[:-1]) * mult)
testvals = ["150M", "360M", "2.6B", "3.7B"]
print(list(map(string_to_int, testvals)))
You can use list comprehension with a dict mapping:
l = ["150M", "360M", "2.6B", "3.7B"]
m = {'K': 3, 'M': 6, 'B': 9, 'T': 12}
print([int(float(i[:-1]) * 10 ** m[i[-1]] / 1000) for i in l])
This outputs:
[150000, 360000, 2600000, 3700000]
Another solution, using re.sub:
import re
lst = ["150M", "360M", "2.6B", "3.7B"]
tbl = {'K':1, 'M':1_000, 'B':1_000_000}
new_lst = [int(i) for i in (re.sub(r'([\d\.]+)(K|M|B)', lambda v: str(int(float(v.groups()[0]) * tbl[v.groups()[1]])), i) for i in lst)]
print(new_lst)
Prints:
[150000, 360000, 2600000, 3700000]

CSV string to decimal in python list

I am using an API that returns what appears to be a CSV string that i need to parse for two decimal numbers and then need to append those numbers to separate lists as decimal numbers (also while ignoring the timestamp at the end):
returned_string_from_API = '0,F,F,1.139520,1.139720,0,0,20160608163132000'
decimal_lowest_in_string = []
decimal_highest_in_string = []
Processing time is a factor in this situation so, what is the fastest way to accomplish this?
Split the string by comma:
>>> string_values = returned_string_from_API.split(',')
>>> string_values
['0', 'F', 'F', '1.139520', '1.139720', '0', '0', '20160608163132000']
Get the values from string:
>>> string_values[3:5]
['1.139520', '1.139720']
Convert to float:
>>> decimal_values = [float(val) for val in string_values[3:5]]
>>> decimal_values
[1.13952, 1.13972]
Get min and max in the appropriate list:
>>> decimal_lowest_in_string = []
>>> decimal_highest_in_string = []
>>> decimal_lowest_in_string.append(min(decimal_values))
>>> decimal_lowest_in_string
[1.13952]
>>> decimal_highest_in_string.append(max(decimal_values))
>>> decimal_highest_in_string
[1.13972]
1) The version which does not rely on cvs
returned_string_from_API = '0,F,F,1.139520,1.139720,0,0,20160608163132000'
def isfloat(value):
try:
float(value)
return True
except ValueError:
return False
float_numbers = filter(isfloat, returned_string_from_API.split(','))
2) try pandas package
Fastest way is to use regular expression. Readability is another issue..
import re
returned_string_from_API = '0,F,F,1.139520,1.139720,0,0,20160608163132000'
decimal_lowest_in_string = []
decimal_highest_in_string = []
re_check = re.compile(r"[0-9]+\.\d*")
m = re_check.findall(returned_string_from_API)
decimal_lowest_in_string.append(min(m))
decimal_highest_in_string.append(max(m))

How to encode categorical values in Python

Given a vocabulary ["NY", "LA", "GA"],
how can one encode it in such a way that it becomes:
"NY" = 100
"LA" = 010
"GA" = 001
So if I do a lookup on "NY GA", I get 101
you can use numpy.in1d:
>>> xs = np.array(["NY", "LA", "GA"])
>>> ''.join('1' if f else '0' for f in np.in1d(xs, 'NY GA'.split(' ')))
'101'
or:
>>> ''.join(np.where(np.in1d(xs, 'NY GA'.split(' ')), '1', '0'))
'101'
vocab = ["NY", "LA", "GA"]
categorystring = '0'*len(vocab)
selectedVocabs = 'NY GA'
for sel in selectedVocabs.split():
categorystring = list(categorystring)
categorystring[vocab.index(sel)] = '1'
categorystring = ''.join(categorystring)
This is the end result of my won testing, turns out Python doesn't support string item assignment, somehow i thought it did.
Personally i think behzad's solution is better, numpy does a better job and is faster.
Or you can
vocabulary = ["NY","LA","GA"]
i=pow(10,len(vocabulary)-1)
dictVocab = dict()
for word in vocabulary:
dictVocab[word] = i
i /= 10
yourStr = "NY LA"
result = 0
for word in yourStr.split():
result += dictVocab[word]
Another solution using numpy. It looks like you're tyring to binary encode a dictionary, so the code below feels natural to me.
import numpy as np
def to_binary_representation(your_str="NY LA"):
xs = np.array(["NY", "LA", "GA"])
ys = 2**np.arange(3)[::-1]
lookup_table = dict(zip(xs,ys))
return bin(np.sum([lookup_table[k] for k in your_str.split()]))
It's also not needed to do it in numpy, but it is probably faster in case you have large arrays to work on. np.sum can be replaced by the builtin sum then and the xs and ys can be transformed to non-numpy equivalents.
To create a lookup dictionary, reverse the vocabulary, enumerate it, and take the power of 2:
>>> vocabulary = ["NY", "LA", "GA"]
d = dict((word, 2 ** i) for i, word in enumerate(reversed(vocabulary)))
>>> d
{'NY': 4, 'GA': 1, 'LA': 2}
To query the dictionary:
>>> query = "NY GA"
>>> sum(code for word, code in d.iteritems() if word in query.split())
5
If you want it formatted to binary:
>>> '{0:b}'.format(5)
'101'
edit: if you want a 'one liner':
>>> '{0:b}'.format(
sum(2 ** i
for i, word in enumerate(reversed(vocabulary))
if word in query.split()))
'101'
edit2: if you want padding, for example with six 'bits':
>>> '{0:06b}'.format(5)
'000101'

Single integer to multiple integer translation in Python

I'm trying to translate a single integer input to a multiple integer output, and am currently using the transtab function. For instance,
intab3 = "abcdefg"
outtab3 = "ABCDEFG"
trantab3 = maketrans(intab3, outtab3)
is the most basic version of what I'm doing. What I'd like to be able to do is have the input be a single letter and the output be multiple letters. So something like:
intab4 = "abc"
outtab = "yes,no,maybe"
but commas and quotation marks don't work.
It keeps saying :
ValueError: maketrans arguments must have same length
Is there a better function I should be using? Thanks,
You can use a dict here:
>>> dic = {"a":"yes", "b":"no", "c":"maybe"}
>>> strs = "abcd"
>>> "".join(dic.get(x,x) for x in strs)
'yesnomaybed'
In python3, the str.translate method was improved so this just works.
>>> intab4 = "abc"
>>> outtab = "yes,no,maybe"
>>> d = {ord(k): v for k, v in zip(intab4, outtab.split(','))}
>>> print(d)
{97: 'yes', 98: 'no', 99: 'maybe'}
>>> 'abcdefg'.translate(d)
'yesnomaybedefg'

python get difference from arrays

I have the following two arrays , i am trying to see whether if the elements in invalid_id_arr exists in valid_id_arr if it doesn't exist then i would form the diff array.But from the below code i see the following in diff array ['id123', 'id124', 'id125', 'id126', 'id789', 'id666'], i expect the output to be ["id789","id666"] what am i doing wrong here
tag_file= {}
tag_file['invalid_id_arr']=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
tag_file['valid_id_arr']=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
diff = [ele.split('-')[0] for ele in tag_file['invalid_id_arr'] if str(ele.split('-')[0]) not in tag_file['valid_id_arr']]
Current Output:
['id123', 'id124', 'id125', 'id126', 'id789', 'id666']
Expected ouptut:
["id789","id666"]
Using a set is more efficient, but your main problem is that you weren't removing the second half of the elements in valid_id_arr.
invalid_id_arr=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
valid_id_arr=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
valid_id_set = set(ele.split('-')[0] for ele in valid_id_arr)
diff = [ele for ele in invalid_id_arr if ele.split('-')[0] not in valid_id_set]
print diff
output:
['id789-123', 'id666']
http://ideone.com/Q9JBw
Try sets:
invalid_id_arr = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
valid_id_arr = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
set_invalid = set(x.split('-')[0] for x in invalid_id_arr)
print set_invalid.difference(x.split('-')[0] for x in valid_id_arr)
>>> a = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
>>> b = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
>>> c = (s.split('-')[0] for s in b)
>>> [ele.split('-')[0] for ele in a if str(ele.split('-')[0]) not in c]
['id789', 'id666']
>>>

Categories