I'm new in Python and I don't know why I'm getting this error sometimes.
This is the code:
import random
sorteio = []
urna = open("urna.txt")
y = 1
while y <= 50:
sort = int(random.random() * 392)
print sort
while sort > 0:
x = urna.readline()
sort = sort - 1
print x
sorteio = sorteio + [int(x)]
y = y + 1
print sorteio
Where urna.txt is a file on this format:
1156
459
277
166
638
885
482
879
33
559
I'll be grateful if anyone knows why this error appears and how to fix it.
Upon attempting to read past the end of the file, you're getting an empty string '' which cannot be converted to an int.
>>> int('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
to satisfy the requirement of selecting 50 random lines from the text value, if I understand your problem correctly:
import random
with open("urna.txt") as urna:
sorteio = [int(line) for line in urna] # all lines of the file as ints
selection = random.sample(sorteio, 50)
print selection
.readline() returns an empty string when you come to the end of the file, and that is not a valid number.
Test for it:
if x.strip(): # not empty apart from whitespace
sorteio = sorteio + [int(x)]
You appear to beappending to a list; lists have a method for that:
sorteio.append(int(x))
If you want to get a random sample from your file, there are better methods. One is to read all values, then use random.sample(), or you can pick values as you read the file line by line all the while adjusting the likelihood the next line is part of the sample. See a previous answer of mine for a more in-depth discussion on that subject.
Related
I have Python (2.7) code which takes a float, formats it with thousand-separating commas and 3 decimal places, and adds the string literal " sec" afterwards.
The result is then formatted further by being aligned left and given a width of 20:
num = '{:,.3f} sec'.format(1200300.4443333333)
print '{:<20}'.format(num) + 'more'
Ouput:
1,200,300.444 sec more
I wanted to condense this into a single format call, but I couldn't figure out how to use the width properly with the string literal.
I tried the following:
num = '{:,.3f}'.format(1200300.4443333333)
print '{:<20} sec'.format(num) + 'more'
But the output isn't the same:
1,200,300.444 secmore
I also tried the following:
num = '{:,.3f}'.format(1200300.4443333333)
print '{:<20 sec}'.format(num) + 'more'
But that failed:
Traceback (most recent call last):
File "test.py", line 8, in <module>
print '{:<20 sec}'.format(num) + 'more'
ValueError: Invalid conversion specification
Is there any way to condense the initial code into a single format call?
Not sure if it is but the following code might be what you're after:
num = '{:<20,.3f} sec more'.format(1200300.4443333333)
print(num)
# 1,200,300.444 sec more
I am having the hardest time figuring out why the scientific notation string I am passing into the float() function will not work:
time = []
WatBalR = []
Area = np.empty([1,len(time)])
Volume = np.empty([1,len(time)])
searchfile = open("C:\GradSchool\Research\Caselton\Hydrus2d3d\H3D2_profile1v3\Balance.out", "r")
for line in searchfile:
if "Time" in line:
time.append(re.sub("[^0-9.]", "", line))
elif "WatBalR" in line:
WatBalR.append(re.sub("[^0-9.]", "", line))
elif "Area" in line:
Area0 = re.sub("[^0-9.\+]", "", line)
print repr(Area0[:-10])
Area0 = float(Area0[:-10].replace("'", ""))
Area = numpy.append(Area, Area0)
elif "Volume" in line:
Volume0 = re.sub("[^0-9.\+]", "", line)
Volume0 = float(Volume0[:-10].replace("'", ""))
Volume = numpy.append(Volume, Volume0)
searchfile.close()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-80-341de12bbc94> in <module>()
13 Area0 = re.sub("[^0-9.\+]", "", line)
14 print repr(Area0[:-10])
---> 15 Area0 = float(Area0[:-10].replace("'", ""))
16 Area = numpy.append(Area, Area0)
17 elif "Volume" in line:
ValueError: invalid literal for float(): 0.55077+03
However, the following works:
float(0.55077+03)
3.55077
If I put quotes around the argument, the same invalid literal comes up, but I am tried to remove the quotes from the string and cannot seem to do so.
0.55077+03 is 0.55077 added to 03. You need an e for scientific notation:
0.55077e+03
float(0.55077+03) adds 3 to .55077 and then converts it to a float (which it already is).
Note that this also only works on python2.x. On python3.x, 03 is an invalid token -- the correct way to write it there is 0o3...
float('0.55077+03') doesn't work (and raises the error that you're seeing) because that isn't a valid notation for a python float. You need: float('0.55077e03') if you're going for a sort of scientific notation. If you actually want to evaluate the expression, then things become a little bit trickier . . .
So I am having a problem extracting text from a larger (>GB) text file. The file is structured as follows:
>header1
hereComesTextWithNewlineAtPosition_80
hereComesTextWithNewlineAtPosition_80
hereComesTextWithNewlineAtPosition_80
andEnds
>header2
hereComesTextWithNewlineAtPosition_80
hereComesTextWithNewlineAtPosAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAlineAtPosition_80
MaybeAnotherTargetBBBBBBBBBBBrestText
andEndsSomewhereHere
Now I have the information that in the entry with header2 I need to extract the text from position X to position Y (the A's in this example), starting with 1 as the first letter in the line below the header.
BUT: the positions do not account for newline characters. So basically when it says from 1 to 95 it really means just the letters from 1 to 80 and the following 15 of the next line.
My first solution was to use file.read(X-1) to skip the unwanted part in front and then file.read(Y-X) to get the part I want, but when that stretches over newline(s) I get to few characters extracted.
Is there a way to solve this with another python-function than read() maybe? I thought about just replacing all newlines with empty strings but the file maybe quite large (millions of lines).
I also tried to account for the newlines by taking extractLength // 80 as added length, but this is problematic in cases like the example when eg. of 95 characters it's 2-80-3 over 3 lines I actually need 2 additional positions but 95 // 80 is 1.
UPDATE:
I modified my code to use Biopython:
for s in SeqIO.parse(sys.argv[2], "fasta"):
#foundClusters stores the information for substrings I want extracted
currentCluster = foundClusters.get(s.id)
if(currentCluster is not None):
for i in range(len(currentCluster)):
outputFile.write(">"+s.id+"|cluster"+str(i)+"\n")
flanking = 25
start = currentCluster[i][0]
end = currentCluster[i][1]
left = currentCluster[i][2]
if(start - flanking < 0):
start = 0
else:
start = start - flanking
if(end + flanking > end + left):
end = end + left
else:
end = end + flanking
#for debugging only
print(currentCluster)
print(start)
print(end)
outputFile.write(s.seq[start, end+1])
But I get the following error:
[[1, 55, 2782]]
0
80
Traceback (most recent call last):
File "findClaClusters.py", line 92, in <module>
outputFile.write(s.seq[start, end+1])
File "/usr/local/lib/python3.4/dist-packages/Bio/Seq.py", line 236, in __getitem__
return Seq(self._data[index], self.alphabet)
TypeError: string indices must be integers
UPDATE2:
Changed outputFile.write(s.seq[start, end+1]) to:
outRecord = SeqRecord(s.seq[start: end+1], id=s.id+"|cluster"+str(i), description="Repeat-Cluster")
SeqIO.write(outRecord, outputFile, "fasta")
and its working :)
With Biopython:
from Bio import SeqIO
X = 66
Y = 130
for s in in SeqIO.parse("test.fst", "fasta"):
if "header2" == s.id:
print s.seq[X: Y+1]
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Biopython let's you parse fasta file and access its id, description and sequence easily. You have then a Seq object and you can manipulate it conveniently without recoding everything (like reverse complement and so on).
My code for the sorting of the file.
g = open('Lapse File.txt', 'r')
column = []
i = 1
next(g)
for line in g:
column.append(int(line.split('\t')[2]))
column.sort()
This is the error I get.
Traceback (most recent call last):
File "E:/Owles/new lapse .py", line 51, in <module>
column.append(int(line.split('\t')[2]))
ValueError: invalid literal for int() with base 10: '-8.3\n
My main question is why is there a \n. Earlier in the code I had written to another text file and wrote it by column from a previously read in file.
This is my code for writing the file
for columns in (raw.strip().split() for raw in Sounding):
if (i >2 and i <=33):
G.write(columns [3]+'\t'+columns[2]+'\t'+columns[4]+'\n')
i = i + 1
elif (i >= 34):
G.write(columns [0]+'\t'+columns[1]+'\t'+columns[2]+'\n')
i = i + 1
else:
i = i + 1
I am unsure if writing the lines like that is the issue because I have inserted the new line function.
The traceback is telling you exactly what happened:
ValueError: invalid literal for int() with base 10: '-8.3\n'
The problem here is that, while int() can handle the negative sign and the trailing newline character, it can't handle the decimal point, '.'. As you know, -8.3 may be a real, rational number, but it's not an integer. If you want to preserve the fractional value to end up with -8.3, use float() instead of int(). If you want to discard the fractional value to end up with -8, use float() to parse the string and then use int() on the result.
-8.3:
column.append(float(line.split('\t')[2]))
-8:
column.append(int(float(line.split('\t')[2])))
Because only numeric strings can be cast to integers; look at this:
numeric_string = "109"
not_numeric_string = "f9"
This is okay:
>>> int(numeric_string)
109
And it cannot be cast:
>>> int(not_numeric_string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'f9'
So somewhere in your script it is getting a non-numeric string.
It seems as though the "-8.3\n" string sequence has raised the error, so you must strip escape chars as well.
I've spent the last 2 hours trying to find a solution for this and came up with nothing. So either this is not possible or its so basic that no one write about this. Basically I have 2 strings that both equal numbers, but when I go to add them together I get a concatenate instead of a number.. here is my code (Python)
currentNukeScriptName = nuke.root().name()
splitUpScriptName1 = currentNukeScriptName.split('/')
splitUpScriptName2 = splitUpScriptName1[-1]
splitScriptNameAndExtention = splitUpScriptName2.split('.')
currentNukeScriptName = splitScriptNameAndExtention[0]
splitUpCurrentScriptName = currentNukeScriptName.split('_')
currentVersionNumber = splitUpCurrentScriptName[-1]
decimalVersionNumber = "1" + "," + str(currentVersionNumber)
addingNumber = 1
newVersionNumber = str(decimalVersionNumber) + str(addingNumber)
print newVersionNumber
decimaleVersionNumber = 1,019
If I change the newVersionNumber code too:
newVersionNumber = int(decimalVersionNumber) + int(addingNumber)
I get:
# Result: Traceback (most recent call last):
File "<string>", line 10, in <module>
ValueError: invalid literal for int() with base 10: '1,019'
I am unsure what to do.. Is this not possible? Or am I doing something totally wrong?
Edit:
So the problem was found in the decimalVersionNumber where I was adding a comma. What would be the best way of keeping the comma and still adding the numbers together?
ValueError: invalid literal for int() with base 10: '1,019'
Sounds like it doesn't like the comma - try removing it first.
You need to use
int.Parse(decimalVersionNumber) + int.Parse(addingNumber)
This will parse the string representation of the numbers into integers, so they can be added.
eg:
String concatenation:
"10" + "20" = "1020"
Integer addition, parsed from strings:
int.Parse("10") + int.Parse("20") = 30