removing datetime.datetime from a list in python - python

I'm attempting to get 2 different elements from an XML file, I'm trying to print them as the x and y on a scatter plot, I can manage to get both the elements but when I plot them it only uses one of the dates to plot the other elements. I'm using the below code to get a weather HTML and save it as an XML.
url = "http://api.met.no/weatherapi/locationforecast/1.9/?lat=52.41616;lon=-4.064598"
response = requests.get(url)
xml_text=response.text
weather= bs4.BeautifulSoup(xml_text, "xml")
f = open('file.xml', "w")
f.write(weather.prettify())
f.close()
I'm then trying to get the time ('from') element and the ('windSpeed' > 'mps') element and attribute. I'm then trying to plot it as an x and y on a scatter plot.
with open ('file.xml') as file:
soup = bs4.BeautifulSoup(file, "xml")
times = soup.find_all("time")
windspeed = soup.select("windSpeed")
form = ("%Y-%m-%dT%H:%M:%SZ")
x = []
y = []
for element in times:
time = element.get("from")
t = datetime.datetime.strptime(time, form)
x.append(t)
for mps in windspeed:
speed = mps.get("mps")
y.append(speed)
plt.scatter(x, y)
plt.show()
I'm trying to make 2 lists from 2 loops, and then read them as the x and y, but when I run it it gives the error;
raise ValueError("x and y must be the same size")
ValueError: x and y must be the same size
I'm assuming it's because it prints the list as datetime.datetime(2016, 12, 22, 21, 0), how do I remove the datetime.datetime from the list.
I know there's probably a simple way of fixing it, any ideas would be great, you people here on stack are helping me a lot with learning to code. Thanks

Simply make two lists one containing x-axis values and other with y-axis values and pass to scatter function
plt.scatter(list1, list2);

I suggest that you use lxml for analysing xml because it gives you the ability to use xpath expressions which can make life much easier. In this case, not every time entry contains a windSpeed entry; therefore, it's essential to identify the windSpeed entries first then to get the associated times. This code does that. There are two little problems I usually encounter: (1) I still need to 'play' with xpath to get it right; (2) Sometimes I get a list when I expect a singleton which is why there's a '[0]' in the code. I find it's better to build the code interactively.
>>> from lxml import etree
>>> XML = open('file.xml')
>>> tree = etree.parse(XML)
>>> for count, windSpeeds in enumerate(tree.xpath('//windSpeed')):
... windSpeeds.attrib['mps'], windSpeeds.xpath('../..')[0].attrib['from']
... if count>5:
... break
...
('3.9', '2016-12-29T18:00:00Z')
('4.8', '2016-12-29T21:00:00Z')
('5.0', '2016-12-30T00:00:00Z')
('4.5', '2016-12-30T03:00:00Z')
('4.1', '2016-12-30T06:00:00Z')
('3.8', '2016-12-30T09:00:00Z')
('4.4', '2016-12-30T12:00:00Z')

Related

Elliptic curve addition - how to add point coordinates from a file in Python

i'm just a tech newbie learning how ec cryptography works and stumbled into a problem with my python code
i'm testing basic elliptic curve operations like adding points, multiple by G etc, let's say i have
Ax = 0xbc46aa75e5948daa08123b36f2080d234aac274bf62fca8f9eb0aadf829c744a
Ay = 0xe5f28c3a044b1cac54a9b4bf719f02dfae93a0bae73832301e786104f43255a5
A = (Ax,Ay)
f = open('B-coordinates.txt', 'r')
data = f.read()
f.close()
print (data)
B = 'data'
there B-coordinates.txt contains lines like (0xe7e6bd3424a1e92abb45846c82d570f0596850661d1c952f9fe3564567d9b9e8,0x59c9e0bba945e45f40c0aa58379a3cb6a5a2283993e90c58654af4920e37f5)
then i perform basic point addition A+B add(A,B)
so because of B = 'data' i obviously have this error
TypeError: unsupported operand type(s) for -: 'int' and 'str'
and if i add int(data) >
Error invalid literal for int() with base 10: because letters in input (i.e. in points coordinates).
so my question is, please can someone knowledgeable in python and elliptic curve calculations tell me how to add the coordinates of a point so as to bypass these int problems when extracting lines from a file into .py? I will be very grateful for the answer! I've been trying to figure out how to do it right for many hours now, and maybe just goofing off, but please I'll appreciate any hints
You can load B from B-coordinates.txt by evaluating its content as Python code:
B = eval(data)
As the code above leads to arbitrary code execution if you don't trust B-coordinates.txt content. If so, parse the hexadecimal tuple manually:
B = tuple([int(z, 16) for z in data[1:-1].split(',')])
Then to sum A and B in a pairwise manner using native Python 3 and keep a tuple, you can proceed as follows by summing unpacked coordinates (zip) for both tuples:
print(tuple([a + b for (a, b) in zip(A, B)]))
UPDATE:
Assume B-coordinates.txt looks like the following as described by OP author comment:
(0x1257e93a78a5b7d8fe0cf28ff1d8822350c778ac8a30e57d2acfc4d5fb8c192,0x1124ec11c77d356e042dad154e1116eda7cc69244f295166b54e3d341904a1a7)
(0x754e3239f325570cdbbf4a87deee8a66b7f2b33479d468fbc1a50743bf56cc18,0x673fb86e5bda30fb3cd0ed304ea49a023ee33d0197a695d0c5d98093c536683)
...
You can load the Bs from this file by doing:
f = open('B-coordinates.txt', 'r')
lines = f.read().splitlines()
f.close()
Bs = [eval(line) for line in lines]
As described above to avoid arbitrary code execution, use the following:
Bs = [tuple([int(z, 16) for z in line[1:-1].split(',')]) for line in lines]
That way you can use for instance the first B pair, by using Bs[0], defined by the first line of B-coordinates.txt that is:
(0x1257e93a78a5b7d8fe0cf28ff1d8822350c778ac8a30e57d2acfc4d5fb8c192,0x1124ec11c77d356e042dad154e1116eda7cc69244f295166b54e3d341904a1a7)
You probably dont want to set B equal to 'data' (as a string) but instead to data (as the variable)
replace B = 'data' with B = data in the last row
Your data seems to be a tuple of hex-strings.
Use int(hex_string, 16) to convert them (since hex is base 16 not 10)
EDIT based on comment:
Assuming your file looks like this:
with open("B-coordinates.txt", "r") as file:
raw = file.read()
data = [tuple(int(hex_str, 16) for hex_str in item[1:-1].split(",")) for item in raw.split("\n")]
You can then get the first Bx, By like this:
Bx, By = data[0]

If statement to only select certain XML attributes

I'm attempting to get 2 different elements from an XML file; I'm trying to print them as the x and y on a scatter plot. I can manage to get both the elements but one list is 155 long and the other only 50.
So I need to add an if statement to just select from elements that have an associated windSpeed element.
url = "http://api.met.no/weatherapi/locationforecast/1.9/?lat=52.41616;lon=-4.064598"
response = requests.get(url)
xml_text=response.text
weather= bs4.BeautifulSoup(xml_text, "xml")
f = open('file.xml', "w")
f.write(weather.prettify())
f.close()
I'm then trying to get the time (from) element and the (windSpeed > mps) element and attribute. I'd like to use use Beautifulsoup if possible, or a straight if loop would be great.
with open ('file.xml') as file:
soup = bs4.BeautifulSoup(file, "xml")
times = soup.find_all("time")
windspeed = soup.select("windSpeed")
form = ("%Y-%m-%dT%H:%M:%SZ")
x = []
y = []
for element in times:
time = element.get("from")
t = datetime.datetime.strptime(time, form)
x.append(t)
for mps in windspeed:
speed = mps.get("mps")
y.append(speed)
plt.scatter(x, y)
plt.show()
When I run it raises the following error:
raise ValueError("x and y must be the same size")
ValueError: x and y must be the same size
I'm assuming it's because the lists are different lengths.
I know there's probably a simple way of fixing it, any ideas would be great.
Just modify your code snippet as follows. It will solve the length problem.
....
for element in times:
time = element.get("from")
t = datetime.datetime.strptime(time, form)
if element.find('windSpeed'):
x.append(t)
....

Efficient regex parsing of html

I have a piece of Python code scrapping datapoints value from what seems to be a Javascript graph on a webpage. The data looks like:
...html/javascript...
{'y':765000,...,'x':1248040800000,...},
{'y':1020000,...,'x':1279144800000,...},
{'y':1105000,...,'x':1312754400000,...}
...html/javascript...
where the dots are plotting data I skipped.
To scrap the useful information - x/y datapoints coordinates - I used regex:
#first getting the raw x data
xData = re.findall("'x':\d+", htmlContent)
#now reading each value one by one
xData = [int(re.findall("\d+",x)[0]) for x in xData]
Same for the y values. I don't know if this terribly inefficient but it does not look pretty or very smart as a have many redundant calls to re.findall. Is there a way to do it in one pass? One pass for x and one pass for y?
You can do it a little bit easier:
htmlContent = """
...html/javascript...
{'y':765000,...,'x':1248040800000,...},
{'y':1020000,...,'x':1279144800000,...},
{'y':1105000,...,'x':1312754400000,...}
...html/javascript...
"""
# Get the numbers
xData = [int(_) for _ in re.findall("'x':(\d+)", htmlContent)]
print xData

Python - live update graphs; to plot Time on x-axis

i have a python script that collects data from a server in the form of
<hh-mm-ss>,<ddd>
here, the first field is Date and the second field is an integer digit. this data is being written into a file.
i have another thread running which is plotting a live graph from the file which i mentioned in the above paragraph.
so this file has data like,
<hh-mm-ss>,<ddd>
<hh-mm-ss>,<ddd>
<hh-mm-ss>,<ddd>
<hh-mm-ss>,<ddd>
Now i want to plot a time series Matplotlib graph with the above shown data.
but when i try , it throws an error saying,
ValueError: invalid literal for int() with base 10: '15:53:09'
when i have normal data like shown below, things are fine
<ddd>,<ddd>
<ddd>,<ddd>
<ddd>,<ddd>
<ddd>,<ddd>
UPDATE
my code that generates graph from the file i have described above is shown below,
def animate(i):
pullData = open("sampleText.txt","r").read()
dataArray = pullData.split('\n')
xar = []
yar = []
for eachLine in dataArray:
if len(eachLine)>1:
x,y = eachLine.split(',')
xar.append(int(x))
yar.append(int(y))
ax1.clear()
ax1.plot(xar,yar)
UPDATED CODE
def animate(i):
print("inside animate")
pullData = open("sampleText.txt","r").read()
dataArray = pullData.split('\n')
xar = []
yar = []
for eachLine in dataArray:
if len(eachLine)>1:
x,y = eachLine.split(',')
timeX=datetime.strptime(x, "%H:%M:%S")
xar.append(timeX.strftime("%H:%M:%S"))
yar.append(float(y))
ax1.clear()
ax1.plot(xar,yar)
Now i am getting the error at this line (ax1.plot(xar,yar))
how will i get over this?
You are trying to parse an integer from a string representing a timestamp. Of course it fails.
In order to be able to use the timestamps in a plot, you need to parse them to the proper type, e.g., datetime.time or datetime.datetime. You can use datetime.datetime.strptime(), dateutil.parser.parse() or maybe also time.strptime() for this.
Plotting the data is straight-forward, then. Have a look at the interactive plotting mode: matplotlib.pyplot.ion().
For reference/further reading:
https://pypi.python.org/pypi/python-dateutil
http://dateutil.readthedocs.org/en/latest/parser.html#dateutil.parser.parse
https://docs.python.org/2/library/datetime.html#datetime.datetime.strptime
https://docs.python.org/2/library/time.html#time.strptime
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.ion
Plotting time in Python with Matplotlib
How to iterate over the file in python
Based on your code I have created an example. I have inlined some notes as to why I think it's better to do it this way.
# use with-statement to make sure the file is eventually closed
with open("sampleText.txt") as f:
data = []
# iterate the file using the file object's iterator interface
for line in f:
try:
t, f = line.split(",")
# parse timestamp and number and append it to data list
data.append((datetime.strptime(t, "%H:%M:%S"), float(f)))
except ValueError:
# something went wrong: inspect later and continue for now
print "failed to parse line:", line
# split columns to separate variables
x,y = zip(*data)
# plot
plt.plot(x,y)
plt.show()
plt.close()
For further reading:
https://docs.python.org/2/reference/datamodel.html#context-managers
https://docs.python.org/2/library/stdtypes.html#file-objects
The error tells you the cause of the problem: You're trying to convert a string, such as '15:53:09', into an integer. This string is not a valid number.
Instead, you should either look into using a datetime object from the datetime module to work with date/time things or at least splitting the string into fields using ':' as the delimiter and the using each field separately.
Consider this brief demo:
>>> time = '15:53:09'
>>> time.split(':')
['15', '53', '09']
>>> [int(v) for v in time.split(':')]
[15, 53, 9]
>>> int(time) # expect exception
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '15:53:09'
>>>

Test Highcharts with selenium webdriver

I would like to test the accuracy of a Highcharts graph presenting data from a JSON file (which I already read) using Python and Selenium Webdriver.
How can I read the Highchart data from the website?
thank you,
Evgeny
The highchart data is converted to an SVG path, so you'd have to interpret the path yourself. I'm not sure why you would want to do this, actually: in general you can trust 3rd party libraries to work as advertised; the testing of that code should reside in that library.
If you still want to do it, then you'd have to dive into Javascript to retrieve the data. Taking the Highcharts Demo as an example, you can extract the data points for the first line as shown below. This will give you the SVG path definition as a string, which you can then parse to determine the origin and the data points. Comparing this to the size of the vertical axis should allow you to calculate the value implied by the graph.
# Get the origin and datapoints of the first line
s = selenium.get_eval("window.jQuery('svg g.highcharts-tracker path:eq(0)')")
splitted = re.split('\s+L\s+', s)
origin = splitted[0].split(' ')[1:]
data = [p.split(' ') for p in splitted[1:]]
# Convert to floats
origin = [float(origin[1]), float(origin[2])]
data = [[float(x), float(y)] for x, y in data]
# Get the min and max y-axis value and position
min_y_val = float(selenium.get_eval( \
"window.jQuery('svg g.highcharts-axis:eq(1) text:first').text()")
max_y_val = float(selenium.get_eval( \
"window.jQuery('svg g.highcharts-axis:eq(1) text:last').text()")
min_y_pos = float(selenium.get_eval( \
"window.jQuery('svg g.highcharts-axis:eq(1) text:first').attr('y')")
max_y_pos = float(selenium.get_eval( \
"window.jQuery('svg g.highcharts-axis:eq(1) text:last').attr('y')")
# Calculate the value based on the retrieved positions
y_scale = min_y_pos - max_y_pos
y_range = max_y_val - min_y_val
y_percentage = data[0][1] * 100.0 / y_scale
value = max_y_val - (y_range * percentage)
Disclaimer: I didn't have to time to fully verify it, but something along these lines should give you what you want.

Categories