Clarifying how strings work as arguments in a function in python - python

I have already found a solution to my issue, at least for now, but I wonder if there are better ways, and also what the actual logic is.
I am learning Python for financial applications. I have learned how to request data for a single or multiple quotes through pandas_datareader. Today I tried to turn the process to a function.
For a single quote (let's say Apple) the code goes like this:
stock_data = pandas_datareader.DataReader('AAPL', data_source = 'yahoo', start = '2000-1-1')
I wanted to turn this to a function where you can pass the stock's symbol as an argument and get the stock's data e.g. :
def stock(x):
stock_data = pandas_datareader.DataReader(x, data_source = 'yahoo', start = '2000-1-1')
print(stock_data)
The rationale being that stock('AAPL') would return Apple's data, stock(FB) would do the same for FB and so on.
I found out it doesn't work this way and I wonder how do you tell the function that the argument it should expect is a string?
For now this is how I worked around it, but I didn't really follow a particular logic, I just kept trying things:
def stock(x):
stock_data = pandas_datareader.DataReader(str(x), data_source = 'yahoo', start = '2000-1-1')
print(stock_data)
The way I understand this works is that in line 2 I tell it to take x convert it to a string and move from there, so when I finally write stock('AAPL') it works as expected. I guess my question is do I always need to convert the argument to a string? Why can't x, as an argument, be anything, including a string?

stock(FB) didnt work because its not in string format, FB isnt defined yet.Python will throw an error like : NameError: name 'a' is not defined
it should be stock("FB").
here we are passing "FB" as a string
or you could do
x = "FB"
stock(x)
you have to change the format to string otherwise python will consider it as an object which isnt defined yet.

Related

Python - Pandas Dataframe - Stocks data - Adjusted Close attribute variable creation

I would like to create a variable to easily use the Adjusted Close price, which is simple for other single-worded attributes (like 'Close') but not that straightforward with Adjusted Close. How can I do the same thing I do here below with the column 'Close'?
end = dt.datetime.now()
start = end - dt.timedelta(weeks=104)
stocks_list = ['COST', 'NIO', 'AMD']
df = pdr.get_data_yahoo(stocks_list, start, end)
Close = df.Close
Close.head()
What you are referring to here is called Attribute Access in the documentation. Near the end of the section on Attribute access, there is the following warning:
You can use this access only if the index element is a valid Python
identifier, e.g. s.1 is not allowed. See here for an explanation of
valid identifiers.
From the linked documentation on what a valid identifier is:
Within the ASCII range (U+0001..U+007F), the valid characters for
identifiers are the same as in Python 2.x: the uppercase and lowercase
letters A through Z, the underscore _ and, except for the first
character, the digits 0 through 9.
Adjusted Close is not a valid python identifier (due to the space), so it will not be accessible as an attribute of your dataframe. There are also some other exceptions in the same warning that are worth reading about.
I would recommend an approach like BigBen suggests in the comments where you simply assign a local variable based on the column name as a string adjusted_close = df['Adjusted Close'].
If you are dead set on using Attribute access, you can mangle your column's name so it becomes a valid python identifier, and then you can use attribute access. From your example, this can be done with:
df.rename(columns = {"Adjusted Close" : "AdjustedClose"}, inplace = True)
Once you do this, you can access the Adjusted Close with df.AdjustedClose. I don't really recommend this approach, but it's possible.

How to use quotation marks and full stops in Python function argument

I can't seem to get this formatting working in Python. I am trying to define a function that holds an argument on the form - "[Some].[Name]"
Can anyone tell me how I can this working? I think I have tried all combinations of ' and ", but regardless both the [.] and ["] in the argument seems to not work.
In the below code I am trying to define the argument as "VWS.co"
def get_stock_data(Company):
#This function defines the data to be collected.
#send a get request to query Company's end of day stock prices in period
global VWS_data
Stock_data = yf.Ticker(Company)
Stock_data = Stock_data.history(period="5y")
# look at the first 5 rows of the dataframe
print(Stock_data)
print(Stock_data.describe(include='all'))
get_stock_data("VWS.co")
Edit:
Using escape characters get_stock_data(""VWS.co"") got the definition working. However, something is still wrong. When I run the script it still only works using "VWS.co" as the definition. See below code, the VWS_data_with_arg works. VWS_data does not. Am i missing something really obvious here?
def get_stock_data(Company):
Stock_data = yf.Ticker("VWS.co")
Stock_data_with_arg = yf.Ticker(Company)
VWS_data = Stock_data.history(period="5y")
VWS_data_with_arg = Stock_data_with_arg.history(period="5y")
print(VWS_data) #This returns the expected values
print(VWS_data_with_arg) #This returns an empty dataset
get_stock_data("\"VWS.co\"")
You should use escape characters.
get_stock_data("\"VWS.co\"")
You can use raw strings
get_stock_data(r'"VWS.co"')

Converting company name to ticker

Hey so I have an excel document that has a mapping of company names to their respective tickers. I currently have this function
def(ticker):
mapping = pd.read_excel('ticker.xlsx',header = 3,parse_cols='A,B')
for index,row in mapping.iterrows():
if ticker.upper() in row['Name'].upper().split():
ticker = row['Ticker']
return ticker
The reason I am using "in" on line 4 instead of "==" is because in the excel document "Apple" is listed as "Apple Inc." and since the user isn't likely to type that I want ticker("apple") to return "AAPL".
In the code above the if statement never gets executed and I was curious on the best possible solution here.
Havnt seen this type of syntax before. Must be the nltk syntax.
That being said I will try to be helpful.
If the In command is the same as SQL then it means exactly equal. Meaning 'Apple' in('Apple Inc') would be false.
You want to do a if('AppleInc like '%Apple%')
or perhaps a .Match using regex. That's about the extent to which I can make suggestions as I don't do python.

Using an IF THEN loop with nested JSON files in Python

I am currently writing a program which uses the ComapaniesHouse API to return a json file containing information about a certain company.
I am able to retrieve the data easily using the following commands:
r = requests.get('https://api.companieshouse.gov.uk/company/COMPANY-NO/filing-history', auth=('API-KEY', ''))
data = r.json()
With that information I can do an awful lot, however I've ran into a problem which I was hoping you guys could possible help me with. What I aim to do is go through every nested entry in the json file and check if the value of certain keys matches certain criteria, if the values of 2 keys match a certain criteria then other code is executed.
One of the keys is the date of an entry, and I would like to ignore results that are older than a certain date, I have attempted to do this with the following:
date_threshold = datetime.date.today() - datetime.timedelta(days=30)``
for each in data["items"]:
date = ['date']
type = ['type']
if date < date_threshold and type is "RM01":
print("wwwwww")
In case it isn't clear, what I'm attempting to do (albeit very badly) is assign each of the entries to a variable, which then gets tested against certain criteria.
Although this doesn't work, python spits out a variable mismatch error:
TypeError: unorderable types: list() < datetime.date()
Which makes me think the date is being stored as a string, and so I can't compare it to the datetime value set earlier, but when I check the API documentation (https://developer.companieshouse.gov.uk/api/docs/company/company_number/filing-history/filingHistoryItem-resource.html), it says clearly that the 'date' entry is returned as a date type.
What am I doing wrong, its very clear that I'm extremely new to python given what I presume is the atrocity of my code, but in my head it seems to make at least a little sense. In case none of this clear, I basically want to go through all the entries in the json file, and the if the date and type match a certain description, then other code can be executed (in this case I have just used random text).
Any help is greatly appreciated! Let me know if you need anything cleared up.
:)
EDIT
After tweaking my code to the below:
for each in data["items"]:
date = each['date']
type = each['type']
if date is '2016-09-15' and type is "RM01":
print("wwwwww")
The code executes without any errors, but the words aren't printed, even though I know there is an entry in the json file with that exact date, and that exact type, any thoughts?
SOLUTION:
Thanks to everyone for helping me out, I had made a couple of very basic errors, the code that works as expected is below::
for each in data["items"]:
date = each['date']
typevariable = each['type']
if date == '2016-09-15' and typevariable == "RM01":
print("wwwwww")
This prints the word "wwwwww" 3 times, which is correct seeing as there are 3 entries in the JSON that fulfil those criteria.
You need to first convert your date variable to a datetime type using datetime.strptime()
You are comparing a list type variable date with datetime type variable date_threshold.

Using Strings to Name Hash Keys?

I'm working through a book called "Head First Programming," and there's a particular part where I'm confused as to why they're doing this.
There doesn't appear to be any reasoning for it, nor any explanation anywhere in the text.
The issue in question is in using multiple-assignment to assign split data from a string into a hash (which doesn't make sense as to why they're using a hash, if you ask me, but that's a separate issue). Here's the example code:
line = "101;Johnny 'wave-boy' Jones;USA;8.32;Fish;21"
s = {}
(s['id'], s['name'], s['country'], s['average'], s['board'], s['age']) = line.split(";")
I understand that this will take the string line and split it up into each named part, but I don't understand why what I think are keys are being named by using a string, when just a few pages prior, they were named like any other variable, without single quotes.
The purpose of the individual parts is to be searched based on an individual element and then printed on screen. For example, being able to search by ID number and then return the entire thing.
The language in question is Python, if that makes any difference. This is rather confusing for me, since I'm trying to learn this stuff on my own.
My personal best guess is that it doesn't make any difference and that it was personal preference on part of the authors, but it bewilders me that they would suddenly change form like that without it having any meaning, and further bothers me that they don't explain it.
EDIT: So I tried printing the id key both with and without single quotes around the name, and it worked perfectly fine, either way. Therefore, I'd have to assume it's a matter of personal preference, but I still would like some info from someone who actually knows what they're doing as to whether it actually makes a difference, in the long run.
EDIT 2: Apparently, it doesn't make any sense as to how my Python interpreter is actually working with what I've given it, so I made a screen capture of it working https://www.youtube.com/watch?v=52GQJEeSwUA
I don't understand why what I think are keys are being named by using a string, when just a few pages prior, they were named like any other variable, without single quotes
The answer is right there. If there's no quote, mydict[s], then s is a variable, and you look up the key in the dict based on what the value of s is.
If it's a string, then you look up literally that key.
So, in your example s[name] won't work as that would try to access the variable name, which is probably not set.
EDIT: So I tried printing the id key both with and without single
quotes around the name, and it worked perfectly fine, either way.
That's just pure luck... There's a built-in function called id:
>>> id
<built-in function id>
Try another name, and you'll see that it won't work.
Actually, as it turns out, for dictionaries (Python's term for hashes) there is a semantic difference between having the quotes there and not.
For example:
s = {}
s['test'] = 1
s['othertest'] = 2
defines a dictionary called s with two keys, 'test' and 'othertest.' However, if I tried to do this instead:
s = {}
s[test] = 1
I'd get a NameError exception, because this would be looking for an undefined variable called test whose value would be used as the key.
If, then, I were to type this into the Python interpreter:
>>> s = {}
>>> s['test'] = 1
>>> s['othertest'] = 2
>>> test = 'othertest'
>>> print s[test]
2
>>> print s['test']
1
you'll see that using test as a key with no quotes uses the value of that variable to look up the associated entry in the dictionary s.
Edit: Now, the REALLY interesting question is why using s[id] gave you what you expected. The keyword "id" is actually a built-in function in Python that gives you a unique id for an object passed as its argument. What in the world the Python interpreter is doing with the expression s[id] is a total mystery to me.
Edit 2: Watching the OP's Youtube video, it's clear that he's staying consistent when assigning and reading the hash about using id or 'id', so there's no issue with the function id as a hash key somehow magically lining up with 'id' as a hash key. That had me kind of worried for a while.

Categories