ValueError in Python 3 code - python

I have this code that will allow me to count the number of missing rows of numbers within the csv for a script in Python 3.6. However, these are the following errors in the program:
Error:
Traceback (most recent call last):
File "C:\Users\GapReport.py", line 14, in <module>
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
File "C:\Users\GapReport.py", line 14, in <genexpr>
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
ValueError: invalid literal for int() with base 10: 'AC-SEC 000000001'
Code:
import csv
def out(*args):
print('{},{}'.format(*(str(i).rjust(4, "0") for i in args)))
prev = 0
data = csv.reader(open('Padded Numbers_export.csv'))
print(*next(data), sep=', ') # header
for line in data:
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
if start != prev+1:
out(prev+1, start-1)
prev = end
out(start, end)
I'm stumped on how to fix these issues.Also, I think the csv many lines in it, so if there's a section that limits it to a few numbers, please feel free to update me on so.
CSV Snippet (Sorry if I wasn't clear before!):

The values you have in your CSV file are not numeric.
For example, FMAC-SEC 000000001 is not a number. So when you run int(s.strip()[2:]), it is not able to convert it to an int.
Some more comments on the code:
What is the utility of doing EndDoc_Padded, EndDoc_Padded = (...)? Currently you are assigning values to two variables with the same name. Either name one of them something else, or just have one variable there.
Are you trying to get the two different values from each column? In that case, you need to split line into two first. Are the contents of your file comma separated? If yes, then do for s in line.split(','), otherwise use the appropriate separator value in split().
You are running this inside a loop, so each time the values of the two variables would get updated to the values from the last line. If you're trying to obtain 2 lists of all the values, then this won't work.

Related

Finding SUM, AVG, and DIFF of columns using Python with a CSV

Working with a CSV on Python 3.8 that goes something like:
Column_0>>>>>>>>>Column_1>>>>>>Column_2>>>>Column_3>>>>>Column_4
Some_Numbers0>>>Some_String1>>>Some_String2>Some_Numbers3>>Some_Numbers4
Now, the numbers in Column_3 and Column_4 are what need to be SUM, AVG, and finding the differences of their totals.
I'm currently stuck on trying to get both sums to print. This is how far i've got:
import csv
import decimal
with open("sample.csv") as myFile:
reader = csv.DictReader(myFile)
print(sum(float(line["Column_3"]) for line in reader))
print(sum(float(line["Column_4"]) for line in reader))
Using this, Column_3's total prints but Column_4 I get a "0". Remove prin line for Column_3, then I get Column_4's total just fine. I've also tried:
import csv
import decimal
with open("sample.csv") as myFile:
total = 0
for line in csv.DictReader(myFile):
total += int(line["Column_3"])
print(total)
but i get
Traceback (most recent call last):
File "some file pathway", line 7, in <module>
total += int(line["Column_3"])
ValueError: invalid literal for int() with base 10: '1345.67'
Which that number represents the first number value of that column_3.
I'm stumped. Any help is appreciated. I'm sure I'll be returning with questions on finding the AVG and then using their totals to find their differences, all need to print running from the same program but here I am already stuck.
your reader object can only go through the CSV file once, because you go through the list with col 3 its wont print for col 4 because there's nothing left to read. Your second approach would be fine, just replace int() with float() because your working with decimals

Name not defined error python when reading file line by line

So, I am very new to python and I am not sure if my code is the most effective, but would still be very appreciative if someone could explain to me why my script is returning the "name not defined" error when I run it. I have a list of 300 gene names in a separate file, one name per line, that I want to read, and store each line as a string variable.
Within the script I have a list of 600 variables. 300 variables labeled name_bitscore and 300 labeled name_length for each of the 300 names.
I want to filter through the list based on a condition. My script looks like this:
#!/usr/bin/python
with open("seqnames-test1-iso-legal-temp.txt") as f:
for line in f:
exec("b="+line+"_bitscore")
exec("l="+line+"_length")
if 0.5*b <= 2*1.05*l and 0.5*b >= 2*0.95*l:
print line
ham_pb_length=2973
ham_pb_bitscore=2165
g2225_ph_length=3303
cg2225_ph_bitscore=2278
etc. for the length and bitscore variables.
Essentially, what I am trying to do here, is read line 1 of the file "seqnames-test1-iso-legal-temp.txt" which is ham_pb. Then I use wanted to use the exec function to create a variable b=ham_pb_bitscore and l=ham_pb_length, so that I could test if half the value of the gene's bitscore is within the range of double its length with a 5% margin of error. Then, repeat this for every gene, i.e. every line of the file "seqnames-test1-sio-legal-temp.txt".
When I execute the script, I get the error message:
Traceback (most recent call last):
File "duplicatebittest.py", line 4, in <module>
exec("b="+line+"_bitscore")
File "<string>", line 1, in <module>
NameError: name 'ham_pb' is not defined
I made another short script to make sure I was using the exec function correctly that looks like this:
#!/usr/pin/python
name="string"
string_value=4
exec("b="+name+"_value")
print(name)
print(b)
And this returns:
string
4
So, I know that I can use exec to include a string variable in a variable declaration because b returns 4 as expected. So, I am not sure why I get an error in my first script.
I tested to make sure the variable line was a string by entering
#!/usr/bin/python
with open("seqnames-test1-iso-legal-temp.txt") as f:
for line in f:
print type(line)
And it returned the line
<type 'str'>
300 times, so I know each variable line is a string, which is why I don't understand why my test script worked, but this one did not.
Any help would be super appreciated!
line is yield by the text file iterator, which issues a newline for each line read.
So your expression:
exec("b="+line+"_bitscore")
is passed to exec as:
b=ham_pb
_bitscore
Strip the output and that will work
exec("b="+line.rstrip()+"_bitscore")
provided that you move the following lines before the loop so variables are declared:
ham_pb_length=2973
ham_pb_bitscore=2165
g2225_ph_length=3303
cg2225_ph_bitscore=2278
Better: quit using exec and use dictionaries to avoid defining variables dynamically.
put #!/usr/bin/env python as the first line. See this question for more explanation.
As Jean pointed out, exec is not the right tool for this job. You should be using dictionaries, as they are less dangerous (search code injection) and dictionaries are easier to read. Here's an example of how to use dictionaries taken from the python documentation:
>>> tel = {'jack': 4098, 'sape': 4139}
>>> tel['guido'] = 4127
>>> tel
{'sape': 4139, 'guido': 4127, 'jack': 4098}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'guido': 4127, 'irv': 4127, 'jack': 4098}
>>> list(tel.keys())
['irv', 'guido', 'jack']
>>> sorted(tel.keys())
['guido', 'irv', 'jack']
>>> 'guido' in tel
True
>>> 'jack' not in tel
False
Here's a way I can think of to accomplish your goal:
with open("seqnames-test1-iso-legal-temp.txt") as f:
gene_data = {'ham_pb_length':2973, 'am_pb_bitscore':2165,
'g2225_ph_length':3303, 'cg2225_ph_bitscore':2278}
'''maybe you have more of these gene data things. If so,
just append them to the end of the above dictionary literal'''
for line in f:
if not line.isspace():
bitscore = gene_data[line.rstrip()+'_bitscore']
length = gene_data[line.rstrip()+'_bitscore']
if (0.95*length <= bitscore/4 <= 1.05*length):
print line
I take advantage of a few useful python features here. In python3, 5/7 evaluates to 0.7142857142857143, not your typical 0 as in many programming languages. If you want integer division in python3, use 5//7. Additionally, in python 1<2<3 evaluates to True, and 1<3<2 evaluates to False whereas in many programming languages, 1<2<3 evaluates to True<3 which might give an error or evaluate to True depending on the programming language.

Convert a string field to a number field in arcpy

I have a large (>1000) number of files in which there are fields containing numbers that are defined as text fields. I need to have a fields containing these values as numbers. I can add the new fields, but when I'm failing to populate them.
I'm using ArcGis 10.1. Rows may have values ranging from 0-10, and including up to one decimal place, or they may be empty for a variable (actually blank, no placeholder).
Below is the python script I'm using for two of the variables (N_CT and N_CFY), and the error I get. It looks like my problem is in how to transfer the text value into the Decimal conversion.
I'm new to scripting, so please excuse me if my description or word choices are unclear.
import arcpy, os, sys
from arcpy import env
from decimal import *
arcpy.env.overwriteOutput = True
env.workspace = "C:\Users\OuelletteMS\Desktop\Ice_Data_testarea"
listFCs = arcpy.ListFeatureClasses("*")
for fc in listFCs:
print str("processing " + fc) # displays the file that is currently being handled
strNCT = "N_CT" # the current, text version of the field
newNCT = "NCT" # the new, number version I want to create
strNCFY = "N_CFY" # the current, text version of the field
newNCFY = "NCFY" # the new, number version I want to create
arcpy.AddField_management(fc,newNCT,"DOUBLE")
arcpy.AddField_management(fc,newNCFY,"DOUBLE")
cursor = arcpy.UpdateCursor(fc)
for row in cursor:
row.setValue(newNCT, row.getValue(Decimal(strNCT)))
row.setValue(newNCFY, row.getValue(Decimal(strNCFY)))
cursor.updateRow(row)
Error mesage:
Runtime error Traceback (most recent call last): File "",
line 23, in File "C:\Python27\ArcGIS10.1\Lib\decimal.py",
line 548, in new
"Invalid literal for Decimal: %r" % value) File "C:\Python27\ArcGIS10.1\Lib\decimal.py", line 3844, in _raise_error
raise error(explanation) InvalidOperation: Invalid literal for Decimal: 'N_CT'
You could convert a string value to an integer of float by using:
stringA = '12'
# Convert string to integer:
int(stringA)
# Convert string to float
float(stringA)

Rename a file using variables in the program - Python

I want to rename a file called decon.out using two variables in my program. So far I have
gwf = input ("Enter value: ")
myList = os.listdir('.')
for myFile in myList:
if re.match("^HHEMQZ", myFile):
numE = myFile
elif re.match("^HHNMQZ", myFile):
numN = myFile
else:
den = myFile
os.rename('decon.out', 'RF'+gwf+''+numE+'')
For example, gwf = 2.5 and numE = HHEMQZ20010101
I would then want decon.out to be renamed as RF2.5HHEMQZ20010101 where RF will always be the same.
Currently when I run the script I get an error:
Traceback (most recent call last):
File "RunDeconv.py", line 77, in <module>
os.rename('decon.out', 'RF'+gwf+''+numE+'')
TypeError: cannot concatenate 'str' and 'float' objects
Any suggestions?
Use raw_input() instead, input() interprets the input values as Python code turning your 2.5 input into a float number.
About the error: in the string concatenation
'RF'+gwf+''+numE+''
all the members must be strings.
You can use
type(gwf)
type(numE)
to check which is a number.
You then just need to
str(gwf)
or
str(numE)
depending on which may be the case. Or probably both gwf and numE need the str() treatment, so your last line of code should look like this:
os.rename('decon.out', 'RF'+str(gwf)+''+str(numE)+'')

Substring in Python, what is wrong here?

I'm trying to simulate a substring in Python but I'm getting an error:
length_message = len(update)
if length_message > 140:
length_url = len(short['url'])
count_message = 140 - length_url
update = update["msg"][0:count_message] # Substring update variable
print update
return 0
The error is the following:
Traceback (most recent call last):
File "C:\Users\anlopes\workspace\redes_sociais\src\twitterC.py", line 54, in <module>
x.updateTwitterStatus({"url": "http://xxx.com/?cat=49s", "msg": "Searching for some ....... tips?fffffffffffffffffffffffffffffdddddddddddddddddddddddddddddssssssssssssssssssssssssssssssssssssssssssssssssssseeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeedddddddddddddddddddddddddddddddddddddddddddddddfffffffffffffffffffffffffffffffffffffffffffff "})
File "C:\Users\anlopes\workspace\redes_sociais\src\twitterC.py", line 35, in updateTwitterStatus
update = update["msg"][0:count_message]
TypeError: string indices must be integers
I can't do this?
update = update["msg"][0:count_message]
The variable "count_message" return "120"
Give me a clue.
Best Regards,
UPDATE
I make this call, update["msg"] comes from here
x = TwitterC()
x.updateTwitterStatus({"url": "http://xxxx.com/?cat=49", "msg": "Searching for some ...... ....?fffffffffffffffffffffffffffffdddddddddddddddddddddddddddddssssssssssssssssssssssssssssssssssssssssssssssssssseeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeedddddddddddddddddddddddddddddddddddddddddddddddfffffffffffffffffffffffffffffffffffffffffffffddddddddddddddddd"})
Are you looping through this code more than once?
If so, perhaps the first time through update is a dict, and update["msg"] returns a string. Fine.
But you set update equal to the result:
update = update["msg"][0:int(count_message)]
which is (presumably) a string.
If you are looping, the next time through the loop you will have an error because now update is a string, not a dict (and therefore update["msg"] no longer makes sense).
You can debug this by putting in a print statement before the error:
print(type(update))
or, if it is not too large,
print(repr(update))

Categories