I am running the following script:
# STEP 1: import packages, declare tindexes
import pandas as pd
import yfinance as yf
import datetime as dt
bnpl_tindex = ["APT.AX","Z1P.AX","LFS.AX","SZL.AX","HUM.AX","SPT.AX","OPY.AX","IOU.AX","LBY.AX","DOU.AX"]
target_dates = pd.date_range(start=dt.date.today() - dt.timedelta(days=365), end=dt.date.today(), freq="M").strftime('%Y-%m-%d')
target_dates = target_dates.append(pd.Index([dt.date.today().strftime('%Y-%m-%d')]))
target_dates.sort_values()
#STEP 2: source functions
from collect_index_data import collect_index_data
#DELETE LATER... TESTING!
collect_index_data(bnpl_tindex, target_dates)
#
collect_index_data is as follows:
def collect_index_data(ticker_list,date_list):
if (bool(ticker_list) and all(isinstance(elem, str) for elem in ticker_list) )==False:
sys.exit('Input should be a list or a single string.')
else:
print("Components of Index: ")
#initialise dictionary
d = {}
#loop through ticker list
for x in ticker_list:
d["DF.{0}".format(x)] = yf.Ticker(x)
#testing
print(x)
#testing
print(d)
and I get the following error message
Components of Index:
Traceback (most recent call last):
File "C:\Users\thoma\Desktop\Files\Programming\Python\run_tindex_data.py", line 27, in <module>
collect_index_data(bnpl_tindex, target_dates)
File "C:\Users\thoma\Desktop\Files\Programming\Python\collect_index_data.py", line 12, in collect_index_data
d["DF.{0}".format(x)] = yf.Ticker("MSFT")
NameError: name 'yf' is not defined
My question is why is yfinance package not being recognised in my function?
I could import it inside the function, but I plan to run the function multiple times in a script - so this would be computationally wasteful.
thanks!
So, I have a .csv file which updates itself. I would like to do some things with it and am not sure how to approach it, hope you can help me.
The data in the csv looks like this:
There is no headers. I can join the date and time to be in same column without a delimiter too.
07/12/2017,23:50,113.179,113.182,113.168,113.180,113.187,113.189,113.176,113.186,144
07/12/2017,23:51,113.180,113.190,113.180,113.187,113.186,113.196,113.186,113.193,175
07/12/2017,23:52,113.187,113.188,113.174,113.186,113.193,113.194,113.181,113.192,340
07/12/2017,23:53,113.186,113.192,113.175,113.181,113.192,113.199,113.182,113.188,282
07/12/2017,23:54,113.181,113.183,113.170,113.171,113.188,113.188,113.176,113.179,74
07/12/2017,23:55,113.171,113.181,113.170,113.179,113.179,113.188,113.176,113.186,329
07/12/2017,23:56,113.179,113.189,113.174,113.181,113.186,113.195,113.181,113.187,148
07/12/2017,23:57,113.181,113.181,113.169,113.169,113.187,113.187,113.175,113.175,55
07/12/2017,23:58,113.169,113.183,113.169,113.182,113.175,113.188,113.175,113.187,246
07/12/2017,23:59,113.182,113.210,113.175,113.203,113.187,113.215,113.181,113.209,378
08/12/2017,00:00,113.203,113.213,113.180,113.183,113.209,113.220,113.187,113.190,651
08/12/2017,00:01,113.183,113.190,113.164,113.167,113.190,113.196,113.171,113.174,333
08/12/2017,00:02,113.167,113.182,113.156,113.156,113.174,113.188,113.162,113.163,265
08/12/2017,00:03,113.156,113.165,113.151,113.163,113.163,113.172,113.158,113.170,222
08/12/2017,00:04,113.163,113.163,113.154,113.159,113.170,113.170,113.159,113.166,148
08/12/2017,00:05,113.159,113.163,113.153,113.154,113.166,113.168,113.159,113.162,162
For starters I would be interested in using just the first two (or 3 if date and time are separate) columns for this exercise. So for example:
07/12/2017,21:54,113.098
07/12/2017,21:55,113.096
07/12/2017,21:56,113.087
07/12/2017,21:57,113.075
07/12/2017,21:58,113.087
07/12/2017,21:59,113.079
New rows are being added with more recent date time every second or so.
I can do something like
df = pd.read_csv("C:\\Users\\xxx\\Desktop\\csvexport\\thefile.csv")
print(df[-1:])
To see the last row (tail) from the dataframe
Now, I can't see how to do the following and appreciate your help:
Update the dataframe so that I have the most recent version up to date available to make calculations on when new rows appear (without using sleep timer?)
Be able to plot the data with the newly updating data being reflected in the plot automatically as new data arrives (datetime on x axis, float on y)
The output I see in the command window from the program generating the .csv file is like this, if that matters
asset 08/12/2017 05:16:37 float:113.336 floattwo:113.328 digit:20
asset 08/12/2017 05:16:40 float:113.334 floattwo:113.328 digit:21
asset 08/12/2017 05:16:40 float:113.335 floattwo:113.323 digit:22
asset 08/12/2017 05:16:41 float:113.331 floattwo:113.328 digit:23
asset 08/12/2017 05:16:43 float:113.334 floattwo:113.327 digit:24
asset 08/12/2017 05:16:47 float:113.332 floattwo:113.328 digit:25
So you can see the updates are not exactly one second apart, they can have gaps, and can sometimes occur within the same second too (05:16:40 twice)
Therefore, what I would like to happen is keep the plot at equal time intervals actually (1 minute, or 5 minutes, etc) but keep changing the most recent point according to the float vlaue in the .csv belonging to that minute. When a row with the next minute arrives, only then should the plot move to the right (but constantly fluctuate in value as the float number is changing)... Hope you get the idea. I would like to use pyqtgraph for the plot.
I managed to code this much... but it is not the greatest example, excuse me. Of course the plot is not meant to look like this. Just illustrating what I would like to see. So the green bar should be changing value constantly until the next time step is added to the csv
import pyqtgraph as pg
from pyqtgraph import QtCore, QtGui
import pandas as pd
import datetime
x = pd.read_csv("C:\\Users\\xxx\\Desktop\\csvexport\\thefile.csv")
z = x[-1:]
def getlastrow():
for a in z.iterrows():
d = ((int(((a[1][0]).split("/")[0]))))
m = ((int(((a[1][0]).split("/")[1]))))
y = ((int(((a[1][0]).split("/")[2]))))
hh = ((int(((a[1][1]).split(":")[0]))))
mm = ((int(((a[1][1]).split(":")[1]))))
#ss = ((int(((a[1][1]).split(":")[2]))))
thedate = datetime.date(y, m, d)
thetime = datetime.time(hh, mm)
p = (a[1][2])
return ((thedate,thetime,p))
# print(str(getlastrow()[0]).replace("-",""))
# print(getlastrow()[1])
# print(getlastrow()[2])
class CandlestickItem(pg.GraphicsObject):
def __init__(self):
pg.GraphicsObject.__init__(self)
self.flagHasData = False
def set_data(self, data):
self.data = data
self.flagHasData = True
self.generatePicture()
self.informViewBoundsChanged()
def generatePicture(self):
self.picture = QtGui.QPicture()
p = QtGui.QPainter(self.picture)
p.setPen(pg.mkPen('w'))
w = (self.data[1][0] - self.data[0][0]) / 2.
for (t, open) in self.data:
p.drawLine(QtCore.QPointF(t, open), QtCore.QPointF(t, open))
p.setBrush(pg.mkBrush('r'))
if open > 122.8:
p.setBrush(pg.mkBrush('g'))
p.drawRect(QtCore.QRectF(t-w, open, w*2, open))
p.end()
def paint(self, p, *args):
if self.flagHasData:
p.drawPicture(0, 0, self.picture)
def boundingRect(self):
return QtCore.QRectF(self.picture.boundingRect())
app = QtGui.QApplication([])
data = [
[(int(str(getlastrow()[0]).replace("-",""))), (getlastrow()[2])],
[(int(str(getlastrow()[0]).replace("-","")))+1, (getlastrow()[2])+0.1],
[(int(str(getlastrow()[0]).replace("-","")))+2, (getlastrow()[2])+0.2],
]
item = CandlestickItem()
item.set_data(data)
plt = pg.plot()
plt.addItem(item)
plt.setWindowTitle('pyqtgraph example: customGraphicsItem')
def update():
global item, data
new_bar = (int(str(getlastrow()[0]).replace("-","")))+3, ((getlastrow()[2])+10)
data.append(new_bar)
item.set_data(data)
app.processEvents()
timer = QtCore.QTimer()
timer.timeout.connect(update)
timer.start(100)
if __name__ == '__main__':
import sys
if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
QtGui.QApplication.instance().exec_()
Hopefully the code below will help with point(1). I realise this is a partial answer. I tested using Linux. The code should be OS agnostic, but I have not tested this.
The code monitors the directory defined in TEST_DIR using the watchdog library. If the file defined in TEST_FILE is changed, then a message is sent from the event handling class called MyHandler to the main function. I put in some ugly time checking as each time a file is altered, multiple events are triggered. So only a single dispatch will be triggered for events occurring within THRESHOLD time. I set this to 0.01 s.
Add code to the dispatcher_receiver function to read in the updated file.
import ntpath
# pip3 install pydispatcher --user
from pydispatch import dispatcher
import sys
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
MYHANDLER_SENDER = 'myhandler_sender'
MYHANDLER_SIGNAL = 'myhandler_signal'
TEST_FILE = 'test_data.csv'
TEST_DIR = '/home/bill/data/documents/infolab2/progs/jupyter_notebooks/pyqtgraph/test_data/'
THRESHOLD_TIME = 0.01
class MyHandler(FileSystemEventHandler):
''' handle events from the file system '''
def __init__(self):
self.start_time = time.time()
def on_modified(self, event):
now_time = time.time()
# filter out multiple modified events occuring for a single file operation
if (now_time - self.start_time) < THRESHOLD_TIME:
print('repeated event, not triggering')
return
changed_file = ntpath.basename(event.src_path)
if changed_file == TEST_FILE:
print('changed file: {}'.format(changed_file))
print('event type: {}'.format(event.event_type))
print('do something...')
# print(event)
message = '{} changed'.format(changed_file)
dispatcher.send(message=message, signal=MYHANDLER_SIGNAL, sender=MYHANDLER_SENDER)
self.start_time = now_time
def main():
dispatcher.connect(dispatcher_receive, signal=MYHANDLER_SIGNAL, sender=MYHANDLER_SENDER)
observer = Observer()
observer.schedule(event_handler, path=TEST_DIR, recursive=False)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
def dispatcher_receive(message):
print('received dispatch: {}'.format(message))
# read in the altered file
if __name__ == "__main__":
event_handler = MyHandler()
main()
I am trying to batch run workspaces from a standalone python script and it fails. I am particularly wondering if I am passing the parameters correctly. I was using an exception catch with a message but it actually provided less information:
'code' Traceback (most recent call last):
File "C:\Users\YouDoWellItDoesGood\Downloads \SixtyFourTest.py", line 89, in <module>
wrkRunner.runWithParameters(qcWorkspace,parameters)
FMEException: FMEException: 1: Failure running workspace 'C:\Users\YouDoWellItDoesGood\Downloads\Anadarko_QC_Tool.fmwt' 'code'
The code:
# Note:
# The path to fmeobjects must by in your python
# path so you may need something like this:
import sys
sys.path.append(r"C:\Program Files\FME\fmeobjects\python27")
import fmeobjects
import fnmatch
import xlrd
import arcpy
import datetime
import os
##dictionary to hold the day of week value generated by the statusRunDateNum
##variable. the first
##number in the dictionary entery is the day of the week as a number. The other
##two numbers are the column numbers where we will find our project status values
datColumnDict={1:[2,3],2:[5,6],3:[8,9],4:[11,12],5:[14,15]}
qcWorkspace=r"C:\Users\YouDoWellItDoesGood\Downloads\Anadarko_QC_Tool.fmwt"
#will return a number depending what day it is.
statusRunDateNum=datetime.datetime.today().isoweekday()
# location of the status report/ will be turned into a variable.
PSL=r"C:\Users\YouDoWellItDoesGood\Downloads\Project%20Status%20Worksheet%203_20%20to%203_24.xlsx"
pjStatus=xlrd.open_workbook(PSL)
pjSheet=pjStatus.sheet_by_index(0)
pjList=[]
for row_index in range(1,pjSheet.nrows):
statusColumn=str(pjSheet.cell(row_index,datColumnDict[statusRunDateNum][1]))
statusColumnNoColon=statusColumn.split(":")[1]
projColumn=str(pjSheet.cell(row_index,datColumnDict[statusRunDateNum][0]))
projColumnNoDec=projColumn.split(".")[0]
projColumnNoColon=projColumnNoDec.split(":")[1]
if "QC" in str(statusColumn):
pjList.append(projColumnNoColon)
print statusColumnNoColon
print pjList
pjDir=r"F:\Projects\ANADARKO-DELAWARE BASIN"
pjDirList=os.listdir(pjDir)
wrkspc=[]
for x in range(len(pjList)):
#create string from TGS proj number
match=str(pjList[x])+"*"
#declare feature list
for f in pjDirList:
if fnmatch.fnmatch(f,match):
## print match[:-1]
gisPath=os.path.join(pjDir,f,"GIS")
for gf in os.listdir(gisPath):
if gf.startswith("DB_") or gf.startswith("ID") and gf.endswith(".gdb"):
wrkspc.append(os.path.join(gisPath,gf))
print wrkspc
for w in wrkspc:
parameters={}
parameters['_gdbpath']=w
parameters['_user']="test"
parameters['_with_corrections']='False'
#parameters['FEATURE_TYPES']="WE_PT WE_PATH_LN WE_PAD_POLY WD_PT WD_POLY WD_LN VG_PT VG_POLY VG_LN VG_BUFFER_POLY TR_TRANS_OTHER_PT TR_TRANS_OTHER_POLY TR_TRANS_OTHER_LN TR_RUNWAY_POLY TR_ROAD_LOW_WATER_PT TR_ROAD_CENTER_LN TR_RAILROAD_PT TR_RAILROAD_POLY TR_RAILROAD_LN TR_LANDING_ZONE_PT TR_LANDING_ZONE_POLY TR_EDGE_OF_PAVEMENT_LN ST_OTHER_PT ST_OTHER_POLY ST_OTHER_LN SI_FACILITY_PERIMETER_POLY SI_FACILITY_OTHER_POLY RC_WETLANDS_POLY RC_SOILBED_PREPARATION_POLY RC_SOIL_SAMPLE_PT RC_SOIL_AMENDMENT_POLY RC_SEEDBED_PREPARATION_POLY RC_RECLAMATION_OTHER_PT RC_RECLAMATION_OTHER_POLY RC_RECLAMATION_OTHER_LN RC_MULCH_POLY RC_HYDROMULCH_POLY RC_BORROW_PIT_POLY PL_VENT_PIPE_PT PL_TRENCH_BREAKER_LN PL_TEST_LEAD_PT PL_TEE_PT PL_TAP_PT PL_SLEEVE_LN PL_ROUTING_NOTE_PT PL_ROCK_SHIELD_LN PL_REDUCER_PT PL_PUMP_STATION_PT PL_PIPELINE_LN PL_PIPE_BEND_LN PL_PIG_SIGNAL_PT PL_PI_EXCAVATION_PT PL_NAT_GROUND_PT PL_METER_STATION_PT PL_JOIN_PT PL_INJECTOR_PT PL_GIRTH_WELD_PT PL_FLANGE_PT PL_ELBOW_PT PL_DRIP_PT PL_DEPTH_OF_COVER_PT PL_COMPRESSOR_STATION_PT"
wrkRunner=fmeobjects.FMEWorkspaceRunner()
wrkRunner.runWithParameters(qcWorkspace,parameters)
print w
print parameters
This description may be a bit complicated so I will try to keep it short.
I have the following code that is working correctly...
def singlelist():
from datetime import datetime
from subprocess import Popen
from subprocess import PIPE
output=Popen(["sar","-r"], stdout=PIPE).communicate()[0]
date=datetime.now()
date=str(date).split()[0]
listtimeval=[]
for line in output.split('\n'):
if line == '' or 'Average' in line or 'kb' in line or 'Linux' in line or 'RESTART' in line:
pass
else:
(time,ampm,field1,field2,field3,field4,field5,field6,field7) = line.split()
listtimeval.append((time + " "+ ampm + "," + field3).split(','))
updatelist= [ [str(date) + " " +x[0],x[1]] for x in listtimeval]
return updatelist
val=singlelist()
...notice how time,ampm,etc are not defined previously...
I am trying to make this more dynamic as the output of sar will not always have the same number of columns.
What I want to do is this...
def fields(method):
if method == '-r':
nf = (time,ampm,field1,field2,field3,field4,field5,field6,field7)
return nf
def singlelist(nf):
from datetime import datetime
from subprocess import Popen
from subprocess import PIPE
output=Popen(["sar","-r"], stdout=PIPE).communicate()[0]
date=datetime.now()
date=str(date).split()[0]
listtimeval=[]
for line in output.split('\n'):
if line == '' or 'Average' in line or 'kb' in line or 'Linux' in line or 'RESTART' in line:
pass
else:
nf = line.split()
listtimeval.append((time + " "+ ampm + "," + field3).split(','))
updatelist= [ [str(date) + " " +x[0],x[1]] for x in listtimeval]
return updatelist
method='-r'
nf=fields(method)
val=singlelist(nf)
However I am getting this...
Traceback (most recent call last):
File "./Logic.py", line 110, in <module>
nf=fields(method)
File "./Logic.py", line 58, in fields
nf = (time,ampm,field1,field2,field3,field4,field5,field6,field7)
NameError: global name 'time' is not defined
How can I accomplish this?
You haven't defined time in your fields function. Well, none of (time,ampm,field1,field2,field3,field4,field5,field6,field7) is defined in that function...
You don't use nf in singlelist, except to reallocate it. What are you trying to achieve ?
You could modify fields to accept the parameters (time,ampm,field1,field2,field3,field4,field5,field6,field7) along your method argument, but how would you define them? You would still have to call fields from singlelist.
Following up on Pierre's answer: you can assign TO an undeclared variable (implicitly creating it), you cannot assign FROM it without getting an undefined variable error.
You also seem to be making that poor function do an awful lot of unrelated things - loading modules, calling subprocesses, parsing and reparsing data. It might be easier to understand and maintain if you break it up as follows:
import datetime
from itertools import izip
from subprocess import Popen, PIPE
def call_sar(options, columns):
sar = Popen(["sar"]+options, stdout=PIPE) # create subprocess
res = sar.communicate()[0] # get stdout text
data = res.splitlines()[3:-1] # grab the relevant lines
return (dict(izip(columns, row.split())) for row in data)
def get_system_stats(mode, fmt=None):
modes = { # different ways to call sar, and the values returned by each
"all_cpus": ('-r', 'time ampm cpu user_pct nice_pct system_pct iowait_pct steal_pct idle_pct'),
"each_cpu": ('-P', 'time ampm cpu user_pct nice_pct system_pct iowait_pct steal_pct idle_pct'),
"mem": ('-r', 'time ampm memfree_kb memused_kb memused_pct buffers_kb cached_kb commit_kb commit_pct active_kb inactive_kb'),
"swap": ('-S', 'time ampm swapfree_kb swapused_kb swapused_pct swapcad_kb swapcad_pct'),
"all_io": ('-b', 'time ampm ts read_ts write_ts read_bs write_bs'),
"each_io": ('-p -d', 'time ampm dev ts read_ss write_ss avg_req_sz avg_queue_sz avg_wait'),
"switch": ('-w', 'time ampm proc_s switch_s'),
"queue": ('-q', 'runq_sz plist_sz avg_load_1 avg_load_5 avg_load_15 blocked')
}
if mode in modes:
options, columns = modes[mode]
data = call_sar(options.split(), columns.split())
if fmt is None:
# return raw data (list of dict)
return list(data)
else:
# return formatted data (list of str)
return [fmt.format(**d) for d in data]
else:
raise ValueError("I don't know mode '{}'".format(mode))
Now you can easily define your function like so:
def single_list():
today = datetime.datetime.now().date()
fmt = "{} {} {} {}".format(today, '{time}', '{ampm}', '{memused_pct}')
return get_system_stats("mem", fmt)
Note: I am writing this on a Windows 7 machine, so I don't have sar and can't actually run-test it - it has no syntax errors, and I think it should work properly as-is, but it may need minor tweaking.
In my code I am trying to extract some data from a file. I am getting this error when I am trying to run my code on line 61. My code here:
from datetime import date
from math import floor;
from adjset_builder import adjset_builder
def check_and_update(d,start,end,log):
# print start,end;
if start in d:
if end in d[start]:
log.write("{0}-{1}\n".format(start, end))
if d[start][end] == 1:
print "one found"
d[start][end] += 1
def build_dictionary(my_adjset,my_list,factor,outfile,log):
log.write("building dictionary:\n");
window_size = int(floor(len(my_list)*factor));
if window_size<2:
log.write("too small\n")
return;
log.write('Total:{0},windowsize:{1}\n'.format(len(my_list),window_size));
log.write("Window at place: 0,")
for i in xrange(window_size):
j = i+1;
while j<window_size:
check_and_update(my_adjset, my_list[i][1], my_list[j][1],log);
j=j+1
i=1;
while i<=len(my_list)-window_size:
log.write("{0},".format(i))
j=i;
k=i+window_size-1;
while j<k:
check_and_update(my_adjset, my_list[i][1], my_list[j][1],log);
j+=1
i += 1
log.write("\nDictionary building done\n")
def make_movie_list(infilename,factor):
log=open('log.txt','w');
outfile=open(infilename.split('.')[0]+"_plot_"+str(factor)+".txt",'w');
f=open(infilename,'r');
my_adjset=dict()
adjset_builder('friends.txt', my_adjset);
count =1
while True:
string = f.readline();
if string=='':
break;
log.write("count:{0}\n".format(count))
count += 1
[movie,freunde] = string.split('=');
freunde = freunde.split(';')
mylist=[]
for i in freunde:
[user_id,date] = i.split(' ');
[yy,mm,dd] = date.split('-');
# l=list((date(int(yy),int(mm),int(dd)),user_id))
mylist.append([date(int(yy),int(mm),int(dd)),user_id]); ## line 61
log.write("list built");
print mylist
break;
# build_dictionary(my_adjset, mylist, factor,outfile,log)
del(mylist);
print 'Done'
log.close();
outfile.close();
f.close();
print count
if __name__ == '__main__':
make_movie_list('grades_processed.txt',.5)
However when I tried to simulate the same thing in 'Console' I do not get any error:
dd='12'
mm='2'
yy='1991'
user_id='98807'
from datetime import date
from datetime import date
l=list((date(int(yy),int(mm),int(dd)),user_id))
l [datetime.date(1991, 2, 12), '98807']
Might be something very silly but I am a beginner so can not seem to notice the mistake. Thank you!
This makes date a function:
from datetime import date
This makes date a string:
[user_id,date] = i.split(' ');
You get a TypeError now, since date is no longer a function:
mylist.append([date(int(yy),int(mm),int(dd)),user_id]);
One way to avoid this error is to import modules instead of functions:
import datetime as dt
mylist.append([dt.date(int(yy),int(mm),int(dd)),user_id])
or more succinctly,
mylist.append([dt.date(*date.split('-')), user_id])
PS: Remove all those unnecessary semicolons!
You have a variable called date, rename it so that it doesn't shadow the date function from datetime.