i have a python script that collects data from a server in the form of
<hh-mm-ss>,<ddd>
here, the first field is Date and the second field is an integer digit. this data is being written into a file.
i have another thread running which is plotting a live graph from the file which i mentioned in the above paragraph.
so this file has data like,
<hh-mm-ss>,<ddd>
<hh-mm-ss>,<ddd>
<hh-mm-ss>,<ddd>
<hh-mm-ss>,<ddd>
Now i want to plot a time series Matplotlib graph with the above shown data.
but when i try , it throws an error saying,
ValueError: invalid literal for int() with base 10: '15:53:09'
when i have normal data like shown below, things are fine
<ddd>,<ddd>
<ddd>,<ddd>
<ddd>,<ddd>
<ddd>,<ddd>
UPDATE
my code that generates graph from the file i have described above is shown below,
def animate(i):
pullData = open("sampleText.txt","r").read()
dataArray = pullData.split('\n')
xar = []
yar = []
for eachLine in dataArray:
if len(eachLine)>1:
x,y = eachLine.split(',')
xar.append(int(x))
yar.append(int(y))
ax1.clear()
ax1.plot(xar,yar)
UPDATED CODE
def animate(i):
print("inside animate")
pullData = open("sampleText.txt","r").read()
dataArray = pullData.split('\n')
xar = []
yar = []
for eachLine in dataArray:
if len(eachLine)>1:
x,y = eachLine.split(',')
timeX=datetime.strptime(x, "%H:%M:%S")
xar.append(timeX.strftime("%H:%M:%S"))
yar.append(float(y))
ax1.clear()
ax1.plot(xar,yar)
Now i am getting the error at this line (ax1.plot(xar,yar))
how will i get over this?
You are trying to parse an integer from a string representing a timestamp. Of course it fails.
In order to be able to use the timestamps in a plot, you need to parse them to the proper type, e.g., datetime.time or datetime.datetime. You can use datetime.datetime.strptime(), dateutil.parser.parse() or maybe also time.strptime() for this.
Plotting the data is straight-forward, then. Have a look at the interactive plotting mode: matplotlib.pyplot.ion().
For reference/further reading:
https://pypi.python.org/pypi/python-dateutil
http://dateutil.readthedocs.org/en/latest/parser.html#dateutil.parser.parse
https://docs.python.org/2/library/datetime.html#datetime.datetime.strptime
https://docs.python.org/2/library/time.html#time.strptime
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.ion
Plotting time in Python with Matplotlib
How to iterate over the file in python
Based on your code I have created an example. I have inlined some notes as to why I think it's better to do it this way.
# use with-statement to make sure the file is eventually closed
with open("sampleText.txt") as f:
data = []
# iterate the file using the file object's iterator interface
for line in f:
try:
t, f = line.split(",")
# parse timestamp and number and append it to data list
data.append((datetime.strptime(t, "%H:%M:%S"), float(f)))
except ValueError:
# something went wrong: inspect later and continue for now
print "failed to parse line:", line
# split columns to separate variables
x,y = zip(*data)
# plot
plt.plot(x,y)
plt.show()
plt.close()
For further reading:
https://docs.python.org/2/reference/datamodel.html#context-managers
https://docs.python.org/2/library/stdtypes.html#file-objects
The error tells you the cause of the problem: You're trying to convert a string, such as '15:53:09', into an integer. This string is not a valid number.
Instead, you should either look into using a datetime object from the datetime module to work with date/time things or at least splitting the string into fields using ':' as the delimiter and the using each field separately.
Consider this brief demo:
>>> time = '15:53:09'
>>> time.split(':')
['15', '53', '09']
>>> [int(v) for v in time.split(':')]
[15, 53, 9]
>>> int(time) # expect exception
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '15:53:09'
>>>
Related
I am trying to generate a chart that shows the top 5 spending categories.
I've got it working up to a certain point and then it says "matplotlib does not support generators as input". I am pretty new to python in general but am trying to learn more about it.
Up to this point in the code it works:
import Expense
import collections
import matplotlib.pyplot as plt
expenses = Expense.Expenses()
expenses.read_expenses(r"C:\Users\budget\data\spending_data.csv")
spending_categories = []
for expense in expenses.list:
spending_categories.append(expense.category)
spending_counter = collections.Counter(spending_categories)
top5 = spending_counter.most_common(5)
If you did a print(top5) on the above it would show the following results:
[('Eating Out', 8), ('Subscriptions', 6), ('Groceries', 5), ('Auto and Gas', 5), ('Charity', 2)]
Now I was trying to separate the items (the count from the category) and I guess I'm messing up on that part.
The rest of the code looks like this:
categories = zip(*top5)
count = zip(*top5)
fig, ax = plt.subplots()
ax.bar(count,categories)
ax.set_title('# of Purchases by Category')
plt.show()
This is where the error is occurring. I can get something to show if I make count and categories a string but it doesn't actually plot anything and doesn't make sense.
The error shows (the name of this .py file I'm working in is FrequentExpenses.py)
Traceback (most recent call last):
File "C:\Users\budget\data\FrequentExpenses.py", line 24, in <module>
ax.bar(count,categories)
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\__init__.py", line 1447, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axes\_axes.py", line 2407, in bar
self._process_unit_info(xdata=x, ydata=height, kwargs=kwargs)
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axes\_base.py", line 2189, in _process_unit_info
kwargs = _process_single_axis(xdata, self.xaxis, 'xunits', kwargs)
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axes\_base.py", line 2172, in _process_single_axis
axis.update_units(data)
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\axis.py", line 1460, in update_units
converter = munits.registry.get_converter(data)
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\units.py", line 210, in get_converter
first = cbook.safe_first_element(x)
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\matplotlib\cbook\__init__.py", line 1669, in safe_first_element
raise RuntimeError("matplotlib does not support generators "
RuntimeError: matplotlib does not support generators as input
The "Expense" import is another another file (Expense.py) which looks like this that create two classes (Expenses & Expense) and also has a method of read_expenses()
import csv
from datetime import datetime
class Expense():
def __init__(self, date_str, vendor, category, amount):
self.date_time = datetime.strptime(date_str, '%m/%d/%Y %H:%M:%S')
self.vendor = vendor
self.category = category
self.amount = amount
class Expenses():
def __init__(self):
self.list = []
self.sum = 0
# Read in the December spending data, row[2] is the $$, and need to format $$
def read_expenses(self,filename):
with open(filename, newline='') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
for row in csvreader:
if '-' not in row[3]:
continue
amount = float((row[3][2:]).replace(',',''))
self.list.append(Expense(row[0],row[1], row[2], amount))
self.sum += amount
def categorize_for_loop(self):
necessary_expenses = set()
food_expenses = set()
unnecessary_expenses = set()
for i in self.list:
if (i.category == 'Phone' or i.category == 'Auto and Gas' or
i.category == 'Classes' or i.category == 'Utilities' or
i.category == 'Mortgage'):
necessary_expenses.add(i)
elif(i.category == 'Groceries' or i.category == 'Eating Out'):
food_expenses.add(i)
else:
unnecessary_expenses.add(i)
return [necessary_expenses, food_expenses, unnecessary_expenses]
I know this seems pretty simple to most, can anyone help me? I appreciate all the help and I'm looking forward to learning much more about python!
Python knows a data type called “generators” which is a thing which generates values when asked (similar to an iterator). Very often it is cheaper to have a generator than to have a list produced up front. One example is that zip() function. Instead of returning a list of tuples it returns a generator which in turn would return one tuple after the other:
zip([1,2,3],[4,5,6])
<zip object at 0x7f7955c6dd40>
If you iterate over such a generator it will generate one value after the other, so in this case it behaves like a list:
for q in zip([1,2,3],[4,5,6]):
print(q)
(1, 4)
(2, 5)
(3, 6)
But in other contexts it doesn't behave like the list, e.g. if it is being asked for the length of the result. A generator (typically) doesn't know that up front:
len(zip([1,2,3],[4,5,6]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'zip' has no len()
This is mostly to save time during execution and is called lazy evaluation. Read more about generators in general.
In your case, you can simply skip the performance optimization by constructing a true list out of the generator by calling list(...) explicitly:
r = list(zip([1,2,3],[4,5,6]))
Then you can also ask for the length of the result:
len(r)
3
The matlib library will probably do this internally as well, so it accepts lists as input but not generators. Pass it a list instead of a generator, and you will be fine.
I'm attempting to get 2 different elements from an XML file, I'm trying to print them as the x and y on a scatter plot, I can manage to get both the elements but when I plot them it only uses one of the dates to plot the other elements. I'm using the below code to get a weather HTML and save it as an XML.
url = "http://api.met.no/weatherapi/locationforecast/1.9/?lat=52.41616;lon=-4.064598"
response = requests.get(url)
xml_text=response.text
weather= bs4.BeautifulSoup(xml_text, "xml")
f = open('file.xml', "w")
f.write(weather.prettify())
f.close()
I'm then trying to get the time ('from') element and the ('windSpeed' > 'mps') element and attribute. I'm then trying to plot it as an x and y on a scatter plot.
with open ('file.xml') as file:
soup = bs4.BeautifulSoup(file, "xml")
times = soup.find_all("time")
windspeed = soup.select("windSpeed")
form = ("%Y-%m-%dT%H:%M:%SZ")
x = []
y = []
for element in times:
time = element.get("from")
t = datetime.datetime.strptime(time, form)
x.append(t)
for mps in windspeed:
speed = mps.get("mps")
y.append(speed)
plt.scatter(x, y)
plt.show()
I'm trying to make 2 lists from 2 loops, and then read them as the x and y, but when I run it it gives the error;
raise ValueError("x and y must be the same size")
ValueError: x and y must be the same size
I'm assuming it's because it prints the list as datetime.datetime(2016, 12, 22, 21, 0), how do I remove the datetime.datetime from the list.
I know there's probably a simple way of fixing it, any ideas would be great, you people here on stack are helping me a lot with learning to code. Thanks
Simply make two lists one containing x-axis values and other with y-axis values and pass to scatter function
plt.scatter(list1, list2);
I suggest that you use lxml for analysing xml because it gives you the ability to use xpath expressions which can make life much easier. In this case, not every time entry contains a windSpeed entry; therefore, it's essential to identify the windSpeed entries first then to get the associated times. This code does that. There are two little problems I usually encounter: (1) I still need to 'play' with xpath to get it right; (2) Sometimes I get a list when I expect a singleton which is why there's a '[0]' in the code. I find it's better to build the code interactively.
>>> from lxml import etree
>>> XML = open('file.xml')
>>> tree = etree.parse(XML)
>>> for count, windSpeeds in enumerate(tree.xpath('//windSpeed')):
... windSpeeds.attrib['mps'], windSpeeds.xpath('../..')[0].attrib['from']
... if count>5:
... break
...
('3.9', '2016-12-29T18:00:00Z')
('4.8', '2016-12-29T21:00:00Z')
('5.0', '2016-12-30T00:00:00Z')
('4.5', '2016-12-30T03:00:00Z')
('4.1', '2016-12-30T06:00:00Z')
('3.8', '2016-12-30T09:00:00Z')
('4.4', '2016-12-30T12:00:00Z')
I am brand new to Python and looking up examples for what I want to do. I am not sure what is wrong with this loop, what I would like to do is read a csv file line by line and for each line:
Split by comma
Remove the first entry (which is a name) and store it as name
Convert all other entries to floats
Store name and the float entries in my Community class
This is what I am trying at the moment:
class Community:
num = 0
def __init__(self, inName, inVertices):
self.name = inName
self.vertices = inVertices
Community.num += 1
allCommunities = []
f = open("communityAreas.csv")
for i, line in enumerate(f):
entries = line.split(',')
name = entries.pop(0)
for j, vertex in entries: entries[j] = float(vertex)
print name+", "+entries[0]+", "+str(type(entries[0]))
allCommunities.append(Community(name, entries))
f.close()
The error I am getting is:
>>>>> PYTHON ERROR!!! Traceback (most recent call last):
File "alexChicago.py", line 86, in <module>
for j, vertex in entries: entries[j] = float(vertex)
ValueError: too many values to unpack
It may be worth pointing out that this is running in omegalib, a library for a visual cluster that runs in C and interprets Python.
I think you forgot the enumerate() function on line 86; should be
for j, vertex in enumerate(entries): entries[j] = float(vertex)
If there's always a name and then a variable number of float values, it sounds like you need to split twice: the first time with a maxsplit of 1, and the other as many times as possible. Example:
name, float_values = line.split(',',1)
float_values = [float(x) for x in float_values.split(',')]
I may not be absolutely certain about what you want to achieve here, but converting all the element in entries to float, should not this be sufficient?: Line 86:
entries=map(float, entries)
I need to read the values from text files into an arrays, Z. This works fine using just a single file, ChiTableSingle, but when i try to use multiple files it fails. It seems to be reading lines correctly, and produces Z, but gives z[0] as just [], then i get the error, setting an array element with a sequence.
This is my current code:
rootdir='C:\users\documents\ChiGrid'
fileNameTemplate = r'C:\users\documents\ContourPlots\Plot{0:02d}.png'
for subdir,dirs,files in os.walk(rootdir):
for count, file in enumerate(files):
fh=open(os.path.join(subdir,file),'r')
#fh = open( "ChiTableSingle.txt" );
print 'file is '+ str(file)
Z = []
for line in fh.readlines():
y = [value for value in line.split()]
Z.append( y )
print Z[0][0]
fh.close()
plt.figure() # Create a new figure window
Temp=open('TempValues.txt','r')
lineTemp=Temp.readlines()
for i in range(0, len(lineTemp)):
lineTemp[i]=[float(lineTemp[i])]
Grav=open('GravValues2.txt','r')
lineGrav=Grav.readlines()
for i in range(0, len(lineGrav)):
lineGrav[i]=[float(lineGrav[i])]
X,Y = np.meshgrid(lineTemp, lineGrav) # Create 2-D grid xlist,ylist values
plt.contour(X, Y, Z,[1,2,3], colors = 'k', linestyles = 'solid')
plt.savefig(fileNameTemplate.format(count), format='png')
plt.clf()
The first thing I noticed is that your list comprehension y = [value for ...] is only going to return a list of strings (from the split() function), so you will want to convert them to a numeric format at some point before trying to plot it.
In addition, if the files you are reading in are simply white-space delimited tables of numbers, you should consider using numpy.loadtxt(fh) since it takes care of splitting and type conversion and returns a 2-d numpy.array. You can also add comment text that it will ignore if the line starts with the regular python comment character (e.g. # this line is a comment and will be ignored).
Just another thought, I would be careful about using variable names that are the same as a python method (e.g. the word file in this case). Once you redefine it as something else, the previous definition is gone.
I want to import several coordinates (could add up to 20.000) from an text file.
These coordinates need to be added into a list, looking like the follwing:
coords = [[0,0],[1,0],[2,0],[0,1],[1,1],[2,1],[0,2],[1,2],[2,2]]
However when i want to import the coordinates i got the follwing error:
invalid literal for int() with base 10
I can't figure out how to import the coordinates correctly.
Does anyone has any suggestions why this does not work?
I think there's some problem with creating the integers.
I use the following script:
Bronbestand = open("D:\\Documents\\SkyDrive\\afstuderen\\99 EEM - Abaqus 6.11.2\\scripting\\testuitlezen4.txt", "r")
headerLine = Bronbestand.readline()
valueList = headerLine.split(",")
xValueIndex = valueList.index("x")
#xValueIndex = int(xValueIndex)
yValueIndex = valueList.index("y")
#yValueIndex = int(yValueIndex)
coordList = []
for line in Bronbestand.readlines():
segmentedLine = line.split(",")
coordList.extend([segmentedLine[xValueIndex], segmentedLine[yValueIndex]])
coordList = [x.strip(' ') for x in coordList]
coordList = [x.strip('\n') for x in coordList]
coordList2 = []
#CoordList3 = [map(int, x) for x in coordList]
for i in coordList:
coordList2 = [coordList[int(i)], coordList[int(i)]]
print "coordList = ", coordList
print "coordList2 = ", coordList2
#print "coordList3 = ", coordList3
The coordinates needed to be imported are looking like (this is "Bronbestand" in the script):
id,x,y,
1, -1.24344945, 4.84291601
2, -2.40876842, 4.38153362
3, -3.42273545, 3.6448431
4, -4.22163963, 2.67913389
5, -4.7552824, 1.54508495
6, -4.99013376, -0.313952595
7, -4.7552824, -1.54508495
8, -4.22163963, -2.67913389
9, -3.42273545, -3.6448431
Thus the script should result in:
[[-1.24344945, 4.84291601],[-2.40876842, 4.38153362],[-3.42273545, 3.6448431],[-4.22163963, 2.67913389],[-4.7552824, 1.54508495],[-4.99013376,-0.313952595],[-4.7552824, -1.54508495],[-4.22163963, -2.67913389],[-3.42273545, -3.6448431]]
I also tried importing the coordinates with the native python csv parser but this didn't work either.
Thank you all in advance for the help!
Your numbers are not integers so the conversion to int fails.
Try using float(i) instead of int(i) to convert into floating point numbers instead.
>>> int('1.5')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
int('1.5')
ValueError: invalid literal for int() with base 10: '1.5'
>>> float('1.5')
1.5
Other answers have said why your script fails, however, there is another issue here - you are massively reinventing the wheel.
This whole thing can be done in a couple of lines using the csv module and a list comprehension:
import csv
with open("test.csv") as file:
data = csv.reader(file)
next(data)
print([[float(x) for x in line[1:]] for line in data])
Gives us:
[[-1.24344945, 4.84291601], [-2.40876842, 4.38153362], [-3.42273545, 3.6448431], [-4.22163963, 2.67913389], [-4.7552824, 1.54508495], [-4.99013376, -0.313952595], [-4.7552824, -1.54508495], [-4.22163963, -2.67913389], [-3.42273545, -3.6448431]]
We open the file, make a csv.reader() to parse the csv file, skip the header row, then make a list of the numbers parsed as floats, ignoring the first column.
As pointed out in the comments, as you are dealing with a lot of data, you may wish to iterate over the data lazily. While making a list is good to test the output, in general, you probably want a generator rather than a list. E.g:
([float(x) for x in line[1:]] for line in data)
Note that the file will need to remain open while you utilize this generator (remain inside the with block).