Python - Pandas Dataframe - Stocks data - Adjusted Close attribute variable creation - python

I would like to create a variable to easily use the Adjusted Close price, which is simple for other single-worded attributes (like 'Close') but not that straightforward with Adjusted Close. How can I do the same thing I do here below with the column 'Close'?
end = dt.datetime.now()
start = end - dt.timedelta(weeks=104)
stocks_list = ['COST', 'NIO', 'AMD']
df = pdr.get_data_yahoo(stocks_list, start, end)
Close = df.Close
Close.head()

What you are referring to here is called Attribute Access in the documentation. Near the end of the section on Attribute access, there is the following warning:
You can use this access only if the index element is a valid Python
identifier, e.g. s.1 is not allowed. See here for an explanation of
valid identifiers.
From the linked documentation on what a valid identifier is:
Within the ASCII range (U+0001..U+007F), the valid characters for
identifiers are the same as in Python 2.x: the uppercase and lowercase
letters A through Z, the underscore _ and, except for the first
character, the digits 0 through 9.
Adjusted Close is not a valid python identifier (due to the space), so it will not be accessible as an attribute of your dataframe. There are also some other exceptions in the same warning that are worth reading about.
I would recommend an approach like BigBen suggests in the comments where you simply assign a local variable based on the column name as a string adjusted_close = df['Adjusted Close'].
If you are dead set on using Attribute access, you can mangle your column's name so it becomes a valid python identifier, and then you can use attribute access. From your example, this can be done with:
df.rename(columns = {"Adjusted Close" : "AdjustedClose"}, inplace = True)
Once you do this, you can access the Adjusted Close with df.AdjustedClose. I don't really recommend this approach, but it's possible.

Related

How to use quotation marks and full stops in Python function argument

I can't seem to get this formatting working in Python. I am trying to define a function that holds an argument on the form - "[Some].[Name]"
Can anyone tell me how I can this working? I think I have tried all combinations of ' and ", but regardless both the [.] and ["] in the argument seems to not work.
In the below code I am trying to define the argument as "VWS.co"
def get_stock_data(Company):
#This function defines the data to be collected.
#send a get request to query Company's end of day stock prices in period
global VWS_data
Stock_data = yf.Ticker(Company)
Stock_data = Stock_data.history(period="5y")
# look at the first 5 rows of the dataframe
print(Stock_data)
print(Stock_data.describe(include='all'))
get_stock_data("VWS.co")
Edit:
Using escape characters get_stock_data(""VWS.co"") got the definition working. However, something is still wrong. When I run the script it still only works using "VWS.co" as the definition. See below code, the VWS_data_with_arg works. VWS_data does not. Am i missing something really obvious here?
def get_stock_data(Company):
Stock_data = yf.Ticker("VWS.co")
Stock_data_with_arg = yf.Ticker(Company)
VWS_data = Stock_data.history(period="5y")
VWS_data_with_arg = Stock_data_with_arg.history(period="5y")
print(VWS_data) #This returns the expected values
print(VWS_data_with_arg) #This returns an empty dataset
get_stock_data("\"VWS.co\"")
You should use escape characters.
get_stock_data("\"VWS.co\"")
You can use raw strings
get_stock_data(r'"VWS.co"')

Clarifying how strings work as arguments in a function in python

I have already found a solution to my issue, at least for now, but I wonder if there are better ways, and also what the actual logic is.
I am learning Python for financial applications. I have learned how to request data for a single or multiple quotes through pandas_datareader. Today I tried to turn the process to a function.
For a single quote (let's say Apple) the code goes like this:
stock_data = pandas_datareader.DataReader('AAPL', data_source = 'yahoo', start = '2000-1-1')
I wanted to turn this to a function where you can pass the stock's symbol as an argument and get the stock's data e.g. :
def stock(x):
stock_data = pandas_datareader.DataReader(x, data_source = 'yahoo', start = '2000-1-1')
print(stock_data)
The rationale being that stock('AAPL') would return Apple's data, stock(FB) would do the same for FB and so on.
I found out it doesn't work this way and I wonder how do you tell the function that the argument it should expect is a string?
For now this is how I worked around it, but I didn't really follow a particular logic, I just kept trying things:
def stock(x):
stock_data = pandas_datareader.DataReader(str(x), data_source = 'yahoo', start = '2000-1-1')
print(stock_data)
The way I understand this works is that in line 2 I tell it to take x convert it to a string and move from there, so when I finally write stock('AAPL') it works as expected. I guess my question is do I always need to convert the argument to a string? Why can't x, as an argument, be anything, including a string?
stock(FB) didnt work because its not in string format, FB isnt defined yet.Python will throw an error like : NameError: name 'a' is not defined
it should be stock("FB").
here we are passing "FB" as a string
or you could do
x = "FB"
stock(x)
you have to change the format to string otherwise python will consider it as an object which isnt defined yet.

How to extract URL from Pandas DataFrame?

I need to extract URLs from a column of DataFrame which was created using following values
creation_date,tweet_id,tweet_text
2020-06-06 03:01:37,1269102116364324865,#Webinar: Sign up for #SumoLogic's June 16 webinar to learn how to navigate your #Kubernetes environment and unders… https://stackoverflow.com/questions/42237666/extracting-information-from-pandas-dataframe
2020-06-06 01:29:38,1269078966985461767,"In this #webinar replay, #DisneyStreaming's #rothgar chats with #SumoLogic's #BenoitNewton about how #Kubernetes is… https://stackoverflow.com/questions/46928636/pandas-split-list-into-columns-with-regex
column name tweet_text contains URL. I am trying following code.
df["tweet_text"]=df["tweet_text"].astype(str)
pattern = r'https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)'
df['links'] = ''
df['links']= df["tweet_text"].str.extract(pattern, expand=True)
print(df)
I am using regex from answer of this question and it matches URL in both rows.
But I am getting NaN as values of new column df['links]'. I have also tried solution provided in first answer of this question, which was
df['links']= df["tweet_text"].str.extract(pattern, expand=False).str.strip()
But I am getting following error
AttributeError: 'DataFrame' object has no attribute 'str'
Lastly I created an empty column using df['links'] = '', because I was getting ValueError: Wrong number of items passed 2, placement implies 1 error. If that's relevant.
Can someone help me out here?
The main problem is that your URL pattern contains capturing groups where you need non-capturing ones. You need to replace all ( with (?: in the pattern.
However, it is not enough since str.extract requires a capturing group in the pattern so that it could return any value at all. Thus, you need to wrap the whole pattern with a capturing group.
You may use
pattern = r'(https?:\/\/(?:www\.)?[-a-zA-Z0-9#:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}[-a-zA-Z0-9()#:%_+.~#?&/=]*)'
Note the + is not necessary to escape inside a character class. Also, there is no need to use // inside a character class, one / is enough.

Get code from module in Excel

I have a number of workbooks that have Macros which point to a particular SQL server using a connection string embedded in the code. We've migrated to a new SQL server so I need to go through these and alter the connection string to look at the new server in each of the Macros that explicitly mentions it.
Currently I'm able to list all of the modules in the workbook, however I'm unable to get the code from each module, just the name and type number.
for vbc in wb.VBProject.VBComponents:
print(vbc.Name + ": " + str(vbc.Type) + "\n" + str(vbc.CodeModule))
What property stores the code so that I can find and replace the server name? I've had a look through the VBA and pywin32 docs but can't find anything.
Got it- there's a Lines method in the CodeModule object that allows you to take a selection based on a starting and ending line. Using this in conjunction with the CountOfLines property allows you to get the whole thing.
for vbc in wb.VBProject.VBComponents:
print(vbc.Name + ":\n" + vbc.CodeModule.Lines(1, vbc.CodeModule.CountOfLines))
It's worth noting as well that the first line is line 1, not line 0 as that caught me out. The following will error vbc.CodeModule.Lines(0, vbc.CodeModule.CountOfLines - 1) because the index 0 is out of range.
Method Lines of property CodeModule has the signature
Function Lines(ByVal first as Integer, ByVal count as Integer) as String
first is in range 1...CodeModule.CountOfLines,
index of the first row of the code section, you want to
retrieve
count is in range 1...CodeModule.CountOfLines-first+1,
number of lines of the section
The return value is a concatenation of the code lines of the section with separator vbNewLine.

Using Strings to Name Hash Keys?

I'm working through a book called "Head First Programming," and there's a particular part where I'm confused as to why they're doing this.
There doesn't appear to be any reasoning for it, nor any explanation anywhere in the text.
The issue in question is in using multiple-assignment to assign split data from a string into a hash (which doesn't make sense as to why they're using a hash, if you ask me, but that's a separate issue). Here's the example code:
line = "101;Johnny 'wave-boy' Jones;USA;8.32;Fish;21"
s = {}
(s['id'], s['name'], s['country'], s['average'], s['board'], s['age']) = line.split(";")
I understand that this will take the string line and split it up into each named part, but I don't understand why what I think are keys are being named by using a string, when just a few pages prior, they were named like any other variable, without single quotes.
The purpose of the individual parts is to be searched based on an individual element and then printed on screen. For example, being able to search by ID number and then return the entire thing.
The language in question is Python, if that makes any difference. This is rather confusing for me, since I'm trying to learn this stuff on my own.
My personal best guess is that it doesn't make any difference and that it was personal preference on part of the authors, but it bewilders me that they would suddenly change form like that without it having any meaning, and further bothers me that they don't explain it.
EDIT: So I tried printing the id key both with and without single quotes around the name, and it worked perfectly fine, either way. Therefore, I'd have to assume it's a matter of personal preference, but I still would like some info from someone who actually knows what they're doing as to whether it actually makes a difference, in the long run.
EDIT 2: Apparently, it doesn't make any sense as to how my Python interpreter is actually working with what I've given it, so I made a screen capture of it working https://www.youtube.com/watch?v=52GQJEeSwUA
I don't understand why what I think are keys are being named by using a string, when just a few pages prior, they were named like any other variable, without single quotes
The answer is right there. If there's no quote, mydict[s], then s is a variable, and you look up the key in the dict based on what the value of s is.
If it's a string, then you look up literally that key.
So, in your example s[name] won't work as that would try to access the variable name, which is probably not set.
EDIT: So I tried printing the id key both with and without single
quotes around the name, and it worked perfectly fine, either way.
That's just pure luck... There's a built-in function called id:
>>> id
<built-in function id>
Try another name, and you'll see that it won't work.
Actually, as it turns out, for dictionaries (Python's term for hashes) there is a semantic difference between having the quotes there and not.
For example:
s = {}
s['test'] = 1
s['othertest'] = 2
defines a dictionary called s with two keys, 'test' and 'othertest.' However, if I tried to do this instead:
s = {}
s[test] = 1
I'd get a NameError exception, because this would be looking for an undefined variable called test whose value would be used as the key.
If, then, I were to type this into the Python interpreter:
>>> s = {}
>>> s['test'] = 1
>>> s['othertest'] = 2
>>> test = 'othertest'
>>> print s[test]
2
>>> print s['test']
1
you'll see that using test as a key with no quotes uses the value of that variable to look up the associated entry in the dictionary s.
Edit: Now, the REALLY interesting question is why using s[id] gave you what you expected. The keyword "id" is actually a built-in function in Python that gives you a unique id for an object passed as its argument. What in the world the Python interpreter is doing with the expression s[id] is a total mystery to me.
Edit 2: Watching the OP's Youtube video, it's clear that he's staying consistent when assigning and reading the hash about using id or 'id', so there's no issue with the function id as a hash key somehow magically lining up with 'id' as a hash key. That had me kind of worried for a while.

Categories