Python: ValueError: invalid literal for float(): - python

I'm really new with python. So maybe my question is really basic...For my work I'm checking different parameters in a period of time. For the Beginning with Python I wanted to plot a simple List with daily measured Temperature-values for a month. In the List I've three splits like these following structure:
Day -TAB- Temperature -TAB- Nr
My Code:
import pylab as pl
import numpy as np
filename = "u_netCDF_write"
file = open(filename)
NoOfValues = 31
counter=0
data = []
for line in file:
if counter <= NoOfValues:
data.append(line.strip('\n').strip('\t').split(' '))
if len(data[-1]) == 4:
data[-1].pop(3)
counter+=1
x = np.linspace(0,30,31)
data = np.transpose(data)
for i in range(len(data[2])):
data[2][i] = float(data[2][i])-273.15
When I try to plot a Temperature-per-Day-Plot I get the Error-Message:
Traceback (most recent call last):
File ".../.../unetCDFplot.py", line 43, in <module>
data[2][i] = float(data[2][i])-273.15
ValueError: invalid literal for float(): 03.07.2014
It looks like that the Code didn't transpose the Data. Why is that so? Can anybody help me?
Thanks!

I solved my problem! So for anybody who has the same problem, here is what I did: I used
print(repr(data))
(thanks to Stephany Dionysio) to check every step in my code and understood that the problem was not the transpose-function, but the empty spaces in every line. After trying different methods to delete the empty spaces I saw that I couldn't delete an array in an array caused by 'data.append'. To get the values I needed I used pop() in the append method:
data.append(line.strip('\n').strip('\t').split(' ').pop(7))
Now my code works fine. Thank you for your good advices, they put me to the right way! :)

I don't know the content of your "u_netCDF_write" file so it is reasonably dificult to debug it. But as the other post shows, it cloud be a non-printing character that's present in the value.
See if this helps
python ValueError: invalid literal for float()

03.07.2014 cannot be a float. It looks like you're using the wrong data from your data list.

Related

Python list(?) sending incorrect number of values to other program

I have been trying to upload close to a thousand SVG files to a particular program (FontForge) and as such searched for a way to automate it. Unfortunately I am really unfamiliar with Python, to the extent that I'm not sure if what I changed about the code ended up changing something fundamental.
The original code you were meant to individually continue the table that the original coder left with the file name and glyph names of the SVG files. This would require doing it manually, as I realized quickly it wouldn't allow loops within the brackets itself. The original code was as follows, albeit with more items:
select = [
('null', 'null'),
('bs-info-circle', 'infoCircle'),
]
Looking at it, and with a lot of googling and experimentation, I guessed that it was a list of tuples. As such, I created various loops adding onto a toSelect list that I created. Since they are rougly the same I'll just show one here:
for x in svc:
icon="svc-"+x+"con_lang"
glyph=x
#print((icon, glyph))
toSelect.extend(((icon, glyph),)) #comma necessary to force it to add in a pair, rather than individually
The variable svc is a list of strings: ['ba', 'be', 'bi', 'bo'...] pulled from a TXT file. The variable toSelect, when printed, looks as follows:
[('svc-bacon_lang', 'ba'), ('svc-becon_lang', 'be'), ('svc-bicon_lang', 'bi'), ...]
Long story short, I now have a list that seems to be formatted the same as the contents of the original code. As such, I set it equal in a simple manner:
select = toSelect
However, running the build program that pulls from this code is giving the following error message:
Traceback (most recent call last):
File "C:\Users\*****\Downloads\ff-batch-main\ff-batch-main\build.py", line 392, in <module>
run_fontforge(config)
File "C:\Users\*****\Downloads\ff-batch-main\ff-batch-main\build.py", line 148, in run_fontforge
for key, name in config.select:
ValueError: not enough values to unpack (expected 2, got 1)
I have tried every variation of declaring select that I can, including a few that cause the following error message:
Traceback (most recent call last):
File "C:\Users\*****\Downloads\ff-batch-main\ff-batch-main\build.py", line 392, in <module>
run_fontforge(config)
File "C:\Users\*****\Downloads\ff-batch-main\ff-batch-main\build.py", line 148, in run_fontforge
for key, name in config.select:
ValueError: too many values to unpack (expected 2)
Printing the value of select[0] does seem to tell me that that error is caused by all of the entries being considered one entry on the list? So at least I know that.
Still, I can't figure out why it doesn't take my original attempt, as select[0] = ('null', 'null'). I'm worried that select isn't supposed to be a list at all, but is something different that I'm simply unfamiliar with since I do not know python. Is this some sort of function that I broke by adding items sequentially instead of all at once?
I will also show the code from the 'build' program that is flagged as the problem. The only thing that I edited was the 'config' program, as instructed by the coder, but hopefully this at least will give context?
def run_fontforge(config):
from icon_map import maps
try:
from select_cache import maps as icons
except:
icons = {}
print(f"Generating fonts => {config.build_dir}/{config.font_name}.ttf")
with open('select_cache.py', 'w') as f:
f.write("# SVG to Font Icon mapping\n")
f.write(f"# Generated: {datetime.now()}\n")
f.write("maps = {\n")
last = config.font_start_code - 1
for _,m in icons.items():
last = max(last, m['code'])
last += 1
for key, name in config.select:
icon = icons.get(key)
src = maps.get(key)
So yeah. Um, any advice or explanations would be greatly appreciated, and I'll do my best to give additional information when possible? Unfortunately I started trying to understand Python yesterday and am coming at this from a rusty knowledge of java so I am not really fluent, might not know terms and whatnot. I just wanted to import some files man...
You've gotten quite far just 1 day into Python.
My hypothesis for the bug is config.select not containing the toSelect data.
Ways to investigate:
Interactively run this to verify that toSelect is a list of pairs:
for k, n in toSelect: pass
print both variables, or interactively evaluate config.select == toSelect to compare them, or set breakpoints in the PyCharm or VSCode debugger and examine these variables.
Is this some sort of function that I broke by adding items sequentially instead of all at once?
No.
BTW, you can make:
toSelect.extend(((icon, glyph),))
easier to understand by writing it as:
toSelect.append((icon, glyph))
Bonus: The most "Pythonic" way to write that for loop is as a list comprehension:
toSelect = [("svc-"+x+"con_lang", x) for x in svc]

Trying to edit private dicom tag

I'm currently trying to edit a private dicom tag which is causing problems with a radiotherapy treatment, using pydicom in python. Bit of a python newbie here so bear with me.
The dicom file imports correctly into python; I've attached some of the output in the first image from the commands
ds = dicomio.read_file("xy.dcm")
print(ds)
This returns the following data:
pydicom output
The highlighted tag is the one I need to edit.
When trying something like
ds[0x10,0x10].value
This gives the correct output:
'SABR Spine'
However, trying something along the lines of
ds[3249,1000]
or
ds[3249,1000].value
returns the following output:
> Traceback (most recent call last):
File "<pyshell#64>", line 1, in <module>
ds[3249,1000].value
File "C:\Users\...\dataset.py", line 317, in __getitem__
data_elem = dict.__getitem__(self, tag)
KeyError: (0cb1, 03e8)
If I try accessing [3249,1010] via the same method, it returns a KeyError of (0cb1, 03f2).
I have tried adding the tag to the _dicom_dict.py file, as highlighted in the second image:
end of _dicom_dict.py
Have I done this right? I'm not even sure if I'm accessing the tags correctly - using
ds[300a,0070]
gives me 'SyntaxError: invalid syntax' as the output, for example, even though this is present in the file as fraction group sequence. I have also been made aware that [3249,1000] is connected to [3249,1010] somehow, and apparently since they are proprietary tags, they cannot be edited in Matlab, however it was suggested they could be edited in python for some reason.
Thanks a lot
It looks like your dicomio lookup is converting all inputs to hexadecimal.
You could try:
ds[0x3249,0x1000]
This should prevent any forced conversion to hexadecimal.
You can apparently access them directly as strings:
ds['3249', '1000']
However, your issue is that you are trying to access a data element that is nested several layers deep. Based on your output at the top, I would suggest trying:
first_list_item = ds['300a', '0070'][0]
for item in first_list_item['300c', '0004']:
print(item['3249','1000'])
Essentially, a data element from the top level Dataset object can be either a list or another Dataset object. Makes parsing the data a little harder, but probably unavoidable.
Have a look at this for more info.
As Andrew Guy notes in his last comment, you need to get the first sequence item for 300a,0070. Then get the second sequence item from the 300c,0004 sequence in that item. In that sequence item, you should be able to get the 3249,1000 attribute.

Choosing random words from a file without duplicates Python (sets)

I'm attempting to create a program which selects 10 words from a text file which contains 10+ words. For the purpose of the program when importing these 10 words from the text file, I must not import the same words twice! Currently I'm utilising a set for this however I'm greeted by a syntax error. I have some knowledge of sets and know they cannot hold the same value twice. As of now I'm clueless on how to solve this any help would be much appreciated. THANKS!
Relevent Code: (FileSelection)= open file dialog
def GameStage03_E():
global WordSet
if WrdCount >= 10:
WordSet = set()
for n in range(0,10):
FileLines = open(FileSelection).read().splitlines()
RandWrd = random.choice(FileLines)
WordSet.update(set([RandWrd]))
SelectButton.destroy()
GameStage01Button.destroy()
GameStage04_E()
elif WrdCount <= 10:
tkinter.messagebox.showinfo("ERROR", " Insufficient Amount Of Words Within Your Text File! ")
error code:
File "C:\Python34\lib\random.py", line 256, in choice
return seq[i]
`TypeError: 'set' object does not support indexing`
You can just use random.sample (2/3), so you don't have to do that yourself. You also don't need the call to list bigblind's answer suggests, because random.sample can take a set as an argument:
WordSet.update(random.sample(FileLines, 10))
That way, you can replace the entire body of that function with this:
try:
WordSet.update(random.sample(FileLines, 10))
except ValueError:
stkinter.messagebox.showinfo("ERROR", "The text file doesn't have enough words!")
I also left out that global statement, which you don't need. It's only necessary if you're assigning a new value to the variable, but all you need to do is call one of its functions, update.
This happens because random.choiceis trying to access the set as if it is a list (or some other datastructure that implements __getitem__). To solve this, change your call to random.choice to:
random.choice(list(FileLines))
This converts the set to a list before passing it to random.choice.
You can just use random.sample(the_list, 10) to get 10 distinct elements instead of repeatedly trying to add to a set using a loop.

Series not callable when trying to parse string in DataFrame

I tried looking, but clearly I am missing a trick here. I tried to use couple of ideas on splitting a string separated by ; in a DataFrame in Python.
Can anybody tell me what I am doing wrong, I have only just picked up Python and would appreciate help. What I want is to split the string in recipient-address and duplicate the rest of the rows for each row. I have a LOT of log files to get through so it needs to be efficient. I am using Anaconda python version 2.7 o Windows 7 64bit. Thanks.
The data in the input looks roughly like this:
#Fields: date-time,sender-address,recipient-address
2015-06-22T00:00:01.051Z, persona#gmail.com, other#gmail.com;mickey#gmail.com
2015-06-22T00:00:01.254Z, personb#gmail.com, mickey#gmail.com
What I am aiming at is:
#Fields: date-time,sender-address,recipient-address
2015-06-22T00:00:01.051Z, persona#gmail.com, other#gmail.com
2015-06-22T00:00:01.051Z, persona#gmail.com, mickey#gmail.com
2015-06-22T00:00:01.254Z, personb#gmail.com, mickey#gmail.com
I have tried this based on this
for LOGfile in LOGfiles[:1]:
readin = pandas.read_csv(LOGfile, skiprows=[0,1,2,3], parse_dates=['#Fields: date-time'], date_parser = dateparse )
#s = df['recipient-address'].str.split(';').apply(Series, 1).stack()
df=pandas.concat([Series(row['#Fields: date-time'], row['sender-address'],row['recipient-address'].split(';'))
for _, row in readin.iterrows()]).reset_index()
I keep getting the error:
NameError Traceback (most recent call last)
in ()
4 readin = pandas.read_csv(LOGfile, skiprows=[0,1,2,3], parse_dates= ['#Fields: date-time'], date_parser = dateparse )
5 df=pandas.concat([Series(row['#Fields: date-time'], row['sender-address'],row['recipient-address'].split(';'))
----> 6 for _, row in readin.iterrows()]).reset_index()
7
NameError: name 'Series' is not defined
I updated this with more complete/correct code - it now generates one row in the output Dataframe df for each recipient-address in the input logfile.
This might not be the most efficient solution but at least it works :-)
Err, you would get a quicker and easier-for-the-answerer answer if with your question you a) give a complete and executable short example of code you have tried which works to reproduce your error, and b) include sample data needed to reproduce the error, and c) include example output/error messages from the code you show with the data you show. It's probably also a good idea to include version numbers and the platform you are running on. I'm working with 32-bit python 2.7.8 on Windows 7 64-bit.
I created myself some sample data in a file log.txt:
date-time,sender-address,recipient-address
1-1-2015,me#my.com,me1#my.com;me2#my.com
2-2-2015,me3#my.com,me4#my.com;me5#my.com
I then created a complete working example python file (also making some minimal simplifications to your code snippet) and fixed it. My code which works with my data is:
import pandas
LOGfiles = ('log.txt','log.txt')
for LOGfile in LOGfiles[:1]:
readin = pandas.read_csv(LOGfile, parse_dates=['date-time'])
#s = df['recipient-address'].str.split(';').apply(Series, 1).stack()
rows = []
for _, row in readin.iterrows():
for recip in row['recipient-address'].split(';'):
rows.append(pandas.Series(data={'date-time':row['date-time'], 'sender-address':row['sender-address'],'recipient-address':recip}))
df = pandas.concat(rows)
print df
The output from this code is:
date-time 2015-01-01 00:00:00
recipient-address me1#my.com
sender-address me#my.com
date-time 2015-01-01 00:00:00
recipient-address me2#my.com
sender-address me#my.com
date-time 2015-02-02 00:00:00
recipient-address me4#my.com
sender-address me3#my.com
date-time 2015-02-02 00:00:00
recipient-address me5#my.com
sender-address me3#my.com
dtype: object
The main thing I did to find out what was wrong with your code was to break the problem down because your code may be short but it includes several potential sources of problems as well as the split - first I made sure the iteration over the rows works and that the split(';') works as expected (it does), then I started constructing a Series and found I needed the pandas. prefix to Series, and the data={} as a dictionary.
HTH
barny
I updated the code below to add untested code for passing through the first six lines of the logfile directly to the output.
If all you're doing with the csv logfiles is this transformation, then a possibly faster approach - although not without some significant potential disadvantages - would be to avoid csv reader/pandas and process the csv logfiles at a text level, maybe something like this:
LOGfiles = ('log.txt','log.txt')
outfile = open( 'result.csv',"wt")
for LOGfile in LOGfiles[:1]:
linenumber=0
for line in open(LOGfile,"rt"):
linenumber += 1
if linenumber < 6:
outfile.write(line)
else:
line = line.strip()
fields = line.split(",")
recipients = fields[2].split(';')
for recip in recipients:
outfile.write(','.join([fields[0],fields[1],recip])+'\n')
Some of the disadvantages of this approach are:
The field for recipient-address is hardcoded, as are the fields for
output
It happens to pass-through the header line - you may want to
make this more robust e.g. by reading the header line before getting
into the expansion code
It assumes that the csv field seperator is hardcoded comma (,) and so
won't like if any of the the fields in the csv file contain a
comma
It probably works OK with ascii csv files, but may barf on extended
character sets (UTF, etc.) which are very common found these days
It will likely be harder to maintain than the pandas approach
Some of these are quite serious and would take a lot of messing about to fix if you were going to code it yourself - particularly the character sets - so personally it's difficult to strongly recommend this approach - you need to weigh up the pros and cons for your situation.
HTH
barny

ValueError: invalid literal for int() with base 10: 'MSIE'

After I run my Python code on a big file of only HTTP headers, it gives me the above error. Any idea what that means?
Here is a piece of the code:
users = output.split(' ')[1]
accesses = output.split(' ')[3]
ave_accesses = int(accesses)/int(users)
Basically the 'users' are users who have accessed a website and 'accesses' are the total number of accesses by the users to that site. The 'ave_accesses' gives the number of accesses to that site by an average user. I hope this is enough to clear things, if not I can explain more.
thanks a lot, Adia.
It means that you are trying to convert a string to an integer, and the value of the string is 'MSIE'. The traceback will have a filename near this error and the line number (e.g., /my/module.py:123). Open the file and go to the line the error occurred, you should see a call to int() with a parameter. That parameter is probably supposed to be a number in string form, but it's not. You probably got your parsing code a little wrong, and fields were mixed up.
To track down the problem, use print statements around the code to see what is not working as expected. You can also use pdb.
I think, your header output is garbled. It is obviously looking for a number where it is find an string MSIE (which may be the value for User-Agent).

Categories