limit a float list into 10 digits - python

I have a list import from a data file.
lines=['1628.246', '100.0000', '0.4563232E-01', '0.4898217E-01', '0.3017656E-02', '0.2271272', '0.2437533', '0.1500232E-01', '0.4102987', '0.4117742', '0.5461504E-02', '2.080838', '0.5527303E-03', '-0.4542367E-03', '-0.2238781E-01', '-0.8196812E-03', '-0.3796306E-01', '-0.7906407E-03', '-0.6738000E-03', '0.000000']
I want to generate a new list include all element in same 10 digits and put back to file
Here is I did:
newline=map(float,lines)
newline=map("{:.10f}".format,newline)
newline=map(str,newline)
jitterfile.write(join(newline)+'\n')
It works, but looks not beautiful. Any idea to make it good looking?

You can do it in a single line like so:
newline=["{:.10f}".format(float(i)) for i in lines]
jitterfile.write(join(newline)+'\n')
Of note, your third instruction newline=map(str,newline) is redundant as the entries in the list are already strings, so casting them is unnecessary.

The map function also accept lambda , also as the result of format is string you don't need to apply the str on your list ,and you need to use join with a delimiter like ',':
>>> newline=map(lambda x:"{:.10f}".format(float(x)),newline)
>>> newline
['1628.2460000000', '100.0000000000', '0.0456323200', '0.0489821700', '0.0030176560', '0.2271272000', '0.2437533000', '0.0150023200', '0.4102987000', '0.4117742000', '0.0054615040', '2.0808380000', '0.0005527303', '-0.0004542367', '-0.0223878100', '-0.0008196812', '-0.0379630600', '-0.0007906407', '-0.0006738000', '0.0000000000']
jitterfile.write(','.join(newline)+'\n')

Related

Problem in elemintaing the brackets () and post processing the dataframe using Pandas in Python

I am just a beginner in the Python so kindly excuse for this question, I tried a lot to get it done, but failed, thus I am posting this. I have a data set which looks like:
5.96303e-07 (11.6667 3.21427 -2.20471e-07) (11.8746 -1.75419 -2.37923e-07) (8.66991 -2.84873 5.29442e-07) (2.19427 13.547 1.16203e-05)
9.67139e-07 (11.6171 3.16081 -8.83286e-08) (11.8851 -1.763 -4.38136e-07) (8.68988 -2.85339 1.81039e-07) (1.61058 13.629 4.42662e-07)
1.34613e-06 (11.5562 3.11037 -7.74061e-08) (11.8897 -1.77006 -3.81523e-07) (8.70652 -2.8608 8.00436e-08) (1.47268 13.5569 -2.03173e-06)
1.73261e-06 (11.4961 3.06921 -1.49294e-07) (11.8919 -1.77567 -3.48887e-07) (8.71974 -2.86802 5.2652e-08) (1.59798 13.4556 -2.52073e-06)
2.12563e-06 (11.4423 3.03706 -1.53771e-07) (11.8932 -1.78022 -3.33928e-07) (8.73 -2.87398 4.65075e-08) (1.77817 13.3679 -2.42045e-06)
Now when I am accessing the data frame for an instance df.iloc[:,1] it gives me (11.6171, when I tried to plot it --it gives me error, then I thought that since the "(" is creating a problem I removed that using df.replace('\(','',regex=True).replace('\)','',regex=True) . The plot function seems to work but gives very weird figure(not allowed to post the figure). In addition to that when I tried to do some calculations like (df.iloc[:,1])^2 it is giving me errors which says:
TypeError: can't multiply sequence by non-int of type 'str'
I guess the data is not in the correct form. Any comment or suggestion will be a great help. Thanks in advance.
There are two relatively minor issues. Something like the following might be what you're looking for. Maybe.
First, the column you are trying to plot is a string. Essentially it contains letters/symbols. Even when you remove the "(" ")" the "numbers" are still considered a string.
# To convert a "3.14" (string) to a 3.14 (float)
# floats are basically decimals
my_string = "3.14"
my_number = float(my_string)
Additionally, there are multiple "numbers" in the string. So to plot the numbers in that column, I think you would first need to split the string and then convert to numbers.
# Use your code to replace the special characters
df.replace('\(','',regex=True).replace('\)','',regex=True)
# new data frame with split value columns
new = df["colname_with_three_numbers"].str.split(" ", n = 2, expand = True)
# Making separate first name column from new data frame
df["first_number"]= new[0]
df["second_number"]= new[1]
df["third_number"]= new[2]
# change the type to allow you to plot something like this should work
df["first_number"] = float(df["first_number"])
df
This is a pretty bad method to solve this, but if the dataset is not too large you can get each element using a for loop and remove the brackets using str.replace(")","").

Keep leading zeros when saving numbers that start with 0

I am trying to save a list of site codes, for example:
site_codes = [1302,9033,1103,5005,0016]
Then I want to add the site code to URLs before running web scraping, using site_codes[i], for example:
for i in range(len(site_codes)):
Data_site_A.append("https://.../"+str(parameters[i])+"site="+str(site_codes[0]))
Data_site_B.append("https://.../"+str(parameters[i])+"site="+str(site_codes[1]))
But I can not save 0016 into the list just like other numbers. I have tried many ways including:
# make a string
str("{0}{1}{2}".format(0,0,16))
# fill the 0
"%04d" % 16
But they all return '0016' instead of 0016. So when I input '0016' into the urls, it won't work, because it is not a number.
Is there a way to save this number just as 0016? Or since that print("%04d" % 16) will print out a pure 0016, is there a way to save the output from there?
For the desired output, the computer should interpret it as:
"https://...."+str(parameters[i])+"site=0016")
# use regular expression
import re
site_codes = '''
site code:
site_A: 1302
site_B: 9033
site_C: 1103
site_D: 5005
site_E: 0016
'''
site_codes = re.findall(r'\d+',site_codes)
for i in range(len(site_codes)):
Data_site_A.append("https://.../"+str(parameters[i])+"site="+str(site_codes[0]))
Data_site_B.append("https://.../"+str(parameters[i])+"site="+str(site_codes[1]))
Use str.zfill() to add leading zeros to a number;
Call str(object) with a number as object to convert it to a string.
Call str.zfill(width) on the numeric string to pad it with 0 to the specified width.
print(a_number)
OUTPUT=
123
Convert a_number to a string
number_str = str(a_number)
Pad number_str with zeros to 5 digits
zero_filled_number = number_str.zfill(5)
print(zero_filled_number)
OUTPUT=
00123
Assuming that you really do have a list of integers that can't be retained as strings and want to create the URLs. Also assuming that you are using Python 3.6 or above, you can achieve this with a simple f-string.
print(f"https://.../{str(parameters[i])}site={site_codes[1]:04d}")
This will pad with leading zeros without the need to resort to zfill.
Alternatively, or if you're running Python below 3.6, this will also work:
print("https://.../{}site={:04d}".format(str(parameters[i]), site_codes[1]))
With a site code of 16, both of the above will give you
https://.../parametersite=0016

How to change a string list to a list of integers?

I have a list that combines inputs from two sources, that ends up looking like this 'pde_fin' given below. I need to extract the integer values of the elements in the list for further processing. However, the second set of numbers seem to give an error ("invalid literal for int() with base 10: "'1118'").
pde_fin =['2174', '2053', '2080', '2160', '2065', "'1118'", "'1098'", "'2052'", "'2160'", "'2078'", "'2161'", "'2134'", "'2091'", "'2089'", "'2105'", "'2109'", "'2077'", "'2057'"]
for i in pde_fin:
print(int(i))
The simplest fix is to strip the single quotes:
for i in pde_fin:
print(int(i.strip("'")))
Use the following code to correct your values:
pde_fin =['2174', '2053', '2080', '2160', '2065', "'1118'", "'1098'", "'2052'", "'2160'", "'2078'", "'2161'", "'2134'", "'2091'", "'2089'", "'2105'", "'2109'", "'2077'", "'2057'"]
for i in pde_fin:
print(int(i.replace("'",'')))

Using an if statement to pass through variables ot further functions for python

I am a biologist that is just trying to use python to automate a ton of calculations, so I have very little experience.
I have a very large array that contains values that are formatted into two columns of observations. Sometimes the observations will be the same between the columns:
v1,v2
x,y
a,b
a,a
x,x
In order to save time and effort I wanted to make an if statement that just prints 0 if the two columns are the same and then moves on. If the values are the same there is no need to run those instances through the downstream analyses.
This is what I have so far just to test out the if statement. It has yet to recognize any instances where the columns are equivalen.
Script:
mylines=[]
with open('xxxx','r') as myfile:
for myline in myfile:
mylines.append(myline) ##reads the data into the two column format mentioned above
rang=len(open ('xxxxx,'r').readlines( )) ##returns the number or lines in the file
for x in range(1, rang):
li = mylines[x] ##selected row as defined by x and the number of lines in the file
spit = li.split(',',2) ##splits the selected values so they can be accessed seperately
print(spit[0]) ##first value
print(spit[1]) ##second value
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
Output:
192Alhe52
192Alhe52
Issue ##should be 0
188Alhe48
192Alhe52
Issue
191Alhe51
192Alhe52
Issue
How do I get python to recgonize that certain observations are actually equal?
When you read the values and store them in the array, you can be storing '\n' as well, which is a break line character, so your array actually looks like this
print(mylist)
['x,y\n', 'a,b\n', 'a,a\n', 'x,x\n']
To work around this issue, you have to use strip(), which will remove this character and occasional blank spaces in the end of the string that would also affect the comparison
mylines.append(myline.strip())
You shouldn't use rang=len(open ('xxxxx,'r').readlines( )), because you are reading the file again
rang=len(mylines)
There is a more readable, pythonic way to replicate your for
for li in mylines[1:]:
spit = li.split(',')
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
Or even
for spit.split(',') in mylines[1:]:
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
will iterate on the array mylines, starting from the first element.
Also, if you're interested in python packages, you should have a look at pandas. Assuming you have a csv file:
import pandas as pd
df = pd.read_csv('xxxx')
for i, elements in df.iterrows():
if elements['v1'] == elements['v2']:
print('Equal')
else:
print('Different')
will do the trick. If you need to modify values and write another file
df.to_csv('nameYouWant')
For one, your issue with the equals test might be because iterating over lines like this also yields the newline character. There is a string function that can get rid of that, .strip(). Also, your argument to split is 2, which splits your row into three groups - but that probably doesn't show here. You can avoid having to parse it yourself when using the csv module, as your file presumably is that:
import csv
with open("yourfile.txt") as file:
reader = csv.reader(file)
next(reader) # skip header
for first, second in reader:
print(first)
print(second)
if first == second:
print(0)
else:
print("Issue")

adding string objects which are numbers in dictionary

for line in open('transactions.dat','r'):
item=line.rstrip('\n')
item=item.split(',')
custid=item[2]
amt=item[4]
if custid in cust1:
a=cust1[custid]
b=amt
c=(a)+(b)
print(cust1[custid]+" : "+a+" :"+b+":"+c)
break
else:
cust1[custid]=amt
Output:
85.91 : 85.91 :85.91:85.9185.91
Well above is my code what I want is
when I read from a file I want to add the customer amount with same
id.
Secondly there should not be repetition of customer id in my
dictionary.
so I am trying to add customer amount which is c but it gives me appended string instead of adding the two. You can see in the last part of my output which is value of c. So how do I add the values.
Sample transaction data:
109400182,2016-09-10,119257029,1094,40.29
109400183,2016-09-10,119257029,1094,9.99
377700146,2016-09-10,119257029,3777,49.37
276900142,2016-09-10,135127654,2769,23.31
276900143,2016-09-10,135127654,2769,25.58
You reading strings, instead of floats, from the file. Use this amt=float(item[4]) to convert strings representing numbers to floats, and then print(str(cust1[custid])+" : "+str(a)+" :"+str(b)+":"+str(c)) to print out.
Your code may need lots of refactor, but in a nutshell and if I understand what you are trying to do you could do
c = float(a) + float(b)
and that should work.

Categories