Given a numpy array like this:
prob= [[ -1.77244641 -3.89116659 -4.92632753 -8.04921303
-9.05286957]]
I wish to write them in a file separated by commas and reduce the decimal place to 4.
This is what I have tried:
for pk in prob:
jk=' '.join(str(b) for b in pk)
f.write("("+ "%.4f" %jk + ")")
f.write("\t")
q=map(float, jk.split(','))
This however doesn't accept the decimal limiting logic and gives me an error saying float argument required no string.. I am trying to extract the probabilities and write them like this:
(-1.7724, -3.8911, -4.9263, -8.0492, -9.0528)
You need to do the formatting on each string, not on the joined string. Another thing: you said comma-separated, but you used a space to join them. Here is what your program should look like:
for pk in prob:
jk = ','.join(["%.4f" % b for b in pk])
f.write("(" + jk + ")" + "\t")
Your q = map(float, pk.split(',')) line is really useless since pk is already storing all of the floats.
I think your prob like this:
prob=numpy.array([[ -1.77244641 , -3.89116659 ,-4.92632753, -8.04921303,
-9.05286957]])
And I changed your code like this:
for pk in prob:
jk=', '.join("%.4f"%b for b in pk)
jk="("+jk+")"+"\t"
f.write(jk)
Related
I'm trying to use Python to call an API and clean a bunch of strings that represent a movie budget.
So far, I have the following 6 variants of data that come up.
"$1.2 million"
"$1,433,333"
"US$ 2 million"
"US$1,644,736 (est.)
"$6-7 million"
"£3 million"
So far, I've only gotten 1 and 2 parsed without a problem with the following code below. What is the best way to handle all of the other cases or a general case that may not be listed below?
def clean_budget_string(input_string):
number_to_integer = {'million' : 1000000, 'thousand' : 1000}
budget_parts = input_string.split(' ')
#Currently, only indices 0 and 1 are necessary for computation
text_part = budget_parts[1]
if text_part in number_to_integer:
number = budget_parts[0].lstrip('$')
int_representation = number_to_integer[text_part]
return int(float(number) * int_representation)
else:
number = budget_parts[0]
idx_dollar = 0
for idx in xrange(len(number)):
if number[idx] == '$':
idx_dollar = idx
return int(number[idx_dollar+1:].replace(',', ''))
The way I would approach a parsing task like this -- and I'm happy to hear other opinions -- would be to break up your function into several parts, each of which identify a single piece of information in the input string.
For instance, I'd start by identifying what float number can be parsed from the string, ignoring currency and order of magnitude (a million, a thousand) for now :
f = float(''.join([c for c in input_str if c in '0123456789.']))
(you might want to add error handling for when you end up with a trailing dot, because of additions like 'est.')
Then, in a second step, you determine whether the float needs to be multiplied to adjust for the correct order of magnitude. One way of doing this would be with multiple if-statements :
if 'million' in input_str :
oom = 6
elif 'thousand' in input_str :
oom = 3
else :
oom = 1
# adjust number for order of magnitude
f = f*math.pow(10, oom)
Those checks could of course be improved to account for small differences in formatting by using regular expressions.
Finally, you separately determine the currency mentioned in your input string, again using one or more if-statements :
if '£' in input_str :
currency = 'GBP'
else :
currency = 'USD'
Now the one case that this doesn't yet handle is the dash one where lower and upper estimates are given. One way of making the function work with these inputs is to split the initial input string on the dash and use the first (or second) of the substrings as input for the initial float parsing. So we would replace our first line of code with something like this:
if '-' in input_str :
lower = input_str.split('-')[0]
f = float(''.join([c for c in lower if c in '0123456789.']))
else :
f = float(''.join([c for c in input_str if c in '0123456789.']))
using regex and string replace method, i added the return of the curency as well if needed.
Modify accordingly to handle more input or multiplier like billion etc.
import re
# take in string and return integer amount and currency
def clean_budget_string(s):
mult_dict = {'million':1000000,'thousand':1000}
tmp = re.search('(^\D*?)\s*((?:\d+\.?,?)+)(?:-\d+)?\s*((?:million|thousand)?)', s).groups()
currency = tmp[0]
mult = tmp[-1]
tmp_int = ''.join(tmp[1:-1]).replace(',', '') # join digits and multiplier, remove comma
tmp_int = int(float(tmp_int) * mult_dict.get(mult, 1))
return tmp_int, currency
>>? clean_budget_string("$1.2 million")
(1200000, '$')
>>? clean_budget_string("$1,433,333")
(1433333, '$')
>>? clean_budget_string("US$ 2 million")
(2000000, 'US$')
>>? clean_budget_string("US$1,644,736 (est.)")
(1644736, 'US$')
>>? clean_budget_string("$6-7 million")
(6000000, '$')
>>? clean_budget_string("£3 million")
(3000000, '£') # my script don't recognize the £ char, might need to set the encoding properly
for line in open('transactions.dat','r'):
item=line.rstrip('\n')
item=item.split(',')
custid=item[2]
amt=item[4]
if custid in cust1:
a=cust1[custid]
b=amt
c=(a)+(b)
print(cust1[custid]+" : "+a+" :"+b+":"+c)
break
else:
cust1[custid]=amt
Output:
85.91 : 85.91 :85.91:85.9185.91
Well above is my code what I want is
when I read from a file I want to add the customer amount with same
id.
Secondly there should not be repetition of customer id in my
dictionary.
so I am trying to add customer amount which is c but it gives me appended string instead of adding the two. You can see in the last part of my output which is value of c. So how do I add the values.
Sample transaction data:
109400182,2016-09-10,119257029,1094,40.29
109400183,2016-09-10,119257029,1094,9.99
377700146,2016-09-10,119257029,3777,49.37
276900142,2016-09-10,135127654,2769,23.31
276900143,2016-09-10,135127654,2769,25.58
You reading strings, instead of floats, from the file. Use this amt=float(item[4]) to convert strings representing numbers to floats, and then print(str(cust1[custid])+" : "+str(a)+" :"+str(b)+":"+str(c)) to print out.
Your code may need lots of refactor, but in a nutshell and if I understand what you are trying to do you could do
c = float(a) + float(b)
and that should work.
I built a calculator for a bio equation, and I think I've narrowed down the source of my problem, which is a natural log I take:
goldman = ((R * T) / F) * cmath.log(float(top_row) / float(bot_row))
print("Membrane potential: " + str(goldman) + "V"
My problem is that it will only display the output in a complex form:
Membrane potential: (0.005100608207126714+0j)V
Is there any way of getting this to print as a floating number? Nothing I've tried has worked.
Complex numbers have a real part and an imaginary part:
>>> c = complex(1, 0)
>>> c
(1+0j)
>>> c.real
1.0
It looks like you just want the real part... so:
print("Membrane potential: " + str(goldman.real) + "V"
Use math.log instead of cmath.log.
Since you don't want the imaginary part of the result, it would be better to use math.log instead of cmath.log. This way, you'll get an error if your input is not in the valid domain for a real log, which is much better than silently giving you meaningless results (for example, if top_row is negative). Also, complex numbered results make no sense for this particular equation.
ie, use:
goldman = ((R * T) / F) * math.log(float(top_row) / float(bot_row))
If you want to just convert it to float value you can do this:
def print_float(x1):
a=x1.real
b=x1.imag
val=a+b
print(val)
ec=complex(2,3)
In this way you can actually get the floating value.
It is simple if you need only real part of the complex number.
goldman = 0.005100608207126714+0j
print(goldman)
(0.005100608207126714+0j)
goldman = float(goldman.real)
print(goldman)
0.005100608207126714
If you need absolute value then use following
goldman = 0.005100608207126714+0j
print(goldman)
(0.005100608207126714+0j)
goldman = float(abs(goldman))
print(goldman)
0.005100608207126714
print(type(goldman))
<class 'float'>
I am new to Python and I have a hard time solving this.
I am trying to sort a list to be able to human sort it 1) by the first number and 2) the second number. I would like to have something like this:
'1-1bird'
'1-1mouse'
'1-1nmouses'
'1-2mouse'
'1-2nmouses'
'1-3bird'
'10-1birds'
(...)
Those numbers can be from 1 to 99 ex: 99-99bird is possible.
This is the code I have after a couple of headaches. Being able to then sort by the following first letter would be a bonus.
Here is what I've tried:
#!/usr/bin/python
myList = list()
myList = ['1-10bird', '1-10mouse', '1-10nmouses', '1-10person', '1-10cat', '1-11bird', '1-11mouse', '1-11nmouses', '1-11person', '1-11cat', '1-12bird', '1-12mouse', '1-12nmouses', '1-12person', '1-13mouse', '1-13nmouses', '1-13person', '1-14bird', '1-14mouse', '1-14nmouses', '1-14person', '1-14cat', '1-15cat', '1-1bird', '1-1mouse', '1-1nmouses', '1-1person', '1-1cat', '1-2bird', '1-2mouse', '1-2nmouses', '1-2person', '1-2cat', '1-3bird', '1-3mouse', '1-3nmouses', '1-3person', '1-3cat', '2-14cat', '2-15cat', '2-16cat', '2-1bird', '2-1mouse', '2-1nmouses', '2-1person', '2-1cat', '2-2bird', '2-2mouse', '2-2nmouses', '2-2person']
def mysort(x,y):
x1=""
y1=""
for myletter in x :
if myletter.isdigit() or "-" in myletter:
x1=x1+myletter
x1 = x1.split("-")
for myletter in y :
if myletter.isdigit() or "-" in myletter:
y1=y1+myletter
y1 = y1.split("-")
if x1[0]>y1[0]:
return 1
elif x1[0]==y1[0]:
if x1[1]>y1[1]:
return 1
elif x1==y1:
return 0
else :
return -1
else :
return -1
myList.sort(mysort)
print myList
Thanks !
Martin
You have some good ideas with splitting on '-' and using isalpha() and isdigit(), but then we'll use those to create a function that takes in an item and returns a "clean" version of the item, which can be easily sorted. It will create a three-digit, zero-padded representation of the first number, then a similar thing with the second number, then the "word" portion (instead of just the first character). The result looks something like "001001bird" (that won't display - it'll just be used internally). The built-in function sorted() will use this callback function as a key, taking each element, passing it to the callback, and basing the sort order on the returned value. In the test, I use the * operator and the sep argument to print it without needing to construct a loop, but looping is perfectly fine as well.
def callback(item):
phrase = item.split('-')
first = phrase[0].rjust(3, '0')
second = ''.join(filter(str.isdigit, phrase[1])).rjust(3, '0')
word = ''.join(filter(str.isalpha, phrase[1]))
return first + second + word
Test:
>>> myList = ['1-10bird', '1-10mouse', '1-10nmouses', '1-10person', '1-10cat', '1-11bird', '1-11mouse', '1-11nmouses', '1-11person', '1-11cat', '1-12bird', '1-12mouse', '1-12nmouses', '1-12person', '1-13mouse', '1-13nmouses', '1-13person', '1-14bird', '1-14mouse', '1-14nmouses', '1-14person', '1-14cat', '1-15cat', '1-1bird', '1-1mouse', '1-1nmouses', '1-1person', '1-1cat', '1-2bird', '1-2mouse', '1-2nmouses', '1-2person', '1-2cat', '1-3bird', '1-3mouse', '1-3nmouses', '1-3person', '1-3cat', '2-14cat', '2-15cat', '2-16cat', '2-1bird', '2-1mouse', '2-1nmouses', '2-1person', '2-1cat', '2-2bird', '2-2mouse', '2-2nmouses', '2-2person']
>>> print(*sorted(myList, key=callback), sep='\n')
1-1bird
1-1cat
1-1mouse
1-1nmouses
1-1person
1-2bird
1-2cat
1-2mouse
1-2nmouses
1-2person
1-3bird
1-3cat
1-3mouse
1-3nmouses
1-3person
1-10bird
1-10cat
1-10mouse
1-10nmouses
1-10person
1-11bird
1-11cat
1-11mouse
1-11nmouses
1-11person
1-12bird
1-12mouse
1-12nmouses
1-12person
1-13mouse
1-13nmouses
1-13person
1-14bird
1-14cat
1-14mouse
1-14nmouses
1-14person
1-15cat
2-1bird
2-1cat
2-1mouse
2-1nmouses
2-1person
2-2bird
2-2mouse
2-2nmouses
2-2person
2-14cat
2-15cat
2-16cat
You need leading zeros. Strings are sorted alphabetically with the order different from the one for digits. It should be
'01-1bird'
'01-1mouse'
'01-1nmouses'
'01-2mouse'
'01-2nmouses'
'01-3bird'
'10-1birds'
As you you see 1 goes after 0.
The other answers here are very respectable, I'm sure, but for full credit you should ensure that your answer fits on a single line and uses as many list comprehensions as possible:
import itertools
[''.join(r) for r in sorted([[''.join(x) for _, x in
itertools.groupby(v, key=str.isdigit)]
for v in myList], key=lambda v: (int(v[0]), int(v[2]), v[3]))]
That should do nicely:
['1-1bird',
'1-1cat',
'1-1mouse',
'1-1nmouses',
'1-1person',
'1-2bird',
'1-2cat',
'1-2mouse',
...
'2-2person',
'2-14cat',
'2-15cat',
'2-16cat']
I wrote a script in Python removing tabs/blank spaces between two columns of strings (x,y coordinates) plus separating the columns by a comma and listing the maximum and minimum values of each column (2 values for each the x and y coordinates). E.g.:
100000.00 60000.00
200000.00 63000.00
300000.00 62000.00
400000.00 61000.00
500000.00 64000.00
became:
100000.00,60000.00
200000.00,63000.00
300000.00,62000.00
400000.00,61000.00
500000.00,64000.00
10000000 50000000 60000000 640000000
This is the code I used:
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_new.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = s.split()
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
s = input.readline()
input.close()
output.close()
I need to change the above code to also transform the coordinates from two decimal to one decimal values and each of the two new columns to be sorted in ascending order based on the values of the x coordinate (left column).
I started by writing the following but not only is it not sorting the values, it is placing the y coordinates on the left and the x on the right. In addition I don't know how to transform the decimals since the values are strings and the only function I know is using %f and that needs floats. Any suggestions to improve the code below?
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_sorted.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = string.split(s)
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
sorted(s, key=lambda x: x[o])
s = input.readline()
input.close()
output.close()
thanks!
First, try to format your code according to PEP8—it'll be easier to read. (I've done the cleanup in your post already).
Second, Tim is right in that you should try to learn how to write your code as (idiomatic) Python not just as if translated directly from its C equivalent.
As a starting point, I'll post your 2nd snippet here, refactored as idiomatic Python:
# there is no need to import the `string` module; `.strip()` is a built-in
# method of strings (i.e. objects of type `str`).
# read in the data as a list of pairs of raw (i.e. unparsed) coordinates in
# string form:
with open(r'C:\coordinates.txt') as in_file:
coords_raw = [line.strip().split() for line in in_file.readlines()]
# convert the raw list into a list of pairs (2-tuples) containing the parsed
# (i.e. float not string) data:
coord_pairs = [(float(x_raw), float(y_raw)) for x_raw, y_raw in coords_raw]
coord_pairs.sort() # you want to sort the entire data set, not just values on
# individual lines as in your original snippet
# build a list of all x and y values we have (this could be done in one line
# using some `zip()` hackery, but I'd like to keep it readable (for you at
# least)):
all_xs = [x for x, y in coord_pairs]
all_ys = [y for x, y in coord_pairs]
# compute min and max:
x_min, x_max = min(all_xs), max(all_xs)
y_min, y_max = min(all_ys), max(all_ys)
# NOTE: the above section performs well for small data sets; for large ones, you
# should combine the 4 lines in a single for loop so as to NOT have to read
# everything to memory and iterate over the data 6 times.
# write everything out
with open(r'C:\coordinates_sorted.txt', 'wb') as out_file:
# here, we're doing 3 things in one line:
# * iterate over all coordinate pairs and convert the pairs to the string
# form
# * join the string forms with a newline character
# * write the result of the join+iterate expression to the file
out_file.write('\n'.join('%f,%f' % (x, y) for x, y in coord_pairs))
out_file.write('\n\n')
out_file.write('%f %f %f %f' % (x_min, x_max, y_min, y_max))
with open(...) as <var_name> gives you guaranteed closing of the file handle as with try-finally; also, it's shorter than open(...) and .close() on separate lines. Also, with can be used for other purposes, but is commonly used for dealing with files. I suggest you look up how to use try-finally as well as with/context managers in Python, in addition to everything else you might have learned here.
Your code looks more like C than like Python; it is quite unidiomatic. I suggest you read the Python tutorial to find some inspiration. For example, iterating using a while loop is usually the wrong approach. The string module is deprecated for the most part, <> should be !=, you don't need to call str() on an object that's already a string...
Then, there are some errors. For example, sorted() returns a sorted version of the iterable you're passing - you need to assign that to something, or the result will be discarded. But you're calling it on a string, anyway, which won't give you the desired result. You also wrote x[o] where you clearly meant x[0].
You should be using something like this (assuming Python 2):
with open(r'C:\coordinates.txt') as infile:
values = []
for line in infile:
values.append(map(float, line.split()))
values.sort()
with open(r'C:\coordinates_sorted.txt', 'w') as outfile:
for value in values:
outfile.write("{:.1f},{:.1f}\n".format(*value))