Converting company name to ticker - python

Hey so I have an excel document that has a mapping of company names to their respective tickers. I currently have this function
def(ticker):
mapping = pd.read_excel('ticker.xlsx',header = 3,parse_cols='A,B')
for index,row in mapping.iterrows():
if ticker.upper() in row['Name'].upper().split():
ticker = row['Ticker']
return ticker
The reason I am using "in" on line 4 instead of "==" is because in the excel document "Apple" is listed as "Apple Inc." and since the user isn't likely to type that I want ticker("apple") to return "AAPL".
In the code above the if statement never gets executed and I was curious on the best possible solution here.

Havnt seen this type of syntax before. Must be the nltk syntax.
That being said I will try to be helpful.
If the In command is the same as SQL then it means exactly equal. Meaning 'Apple' in('Apple Inc') would be false.
You want to do a if('AppleInc like '%Apple%')
or perhaps a .Match using regex. That's about the extent to which I can make suggestions as I don't do python.

Related

Get the language of an excel instance in python

I set a format using xlwings ws.range('A1').number_format = '#.0;[Red]-#.0' but one of the user has a french Excel and there is an error because of [Red]. I have to add a condition based on Excel language, for French Excel instances it must be [Rouge].
Here is my question, do you know how I can get the language of an Excel instance in python (pywin32 / xlwings) ?
In VBA, the following code will return 1 for English Excel and 33 for French Excel:
Application.International(xlApplicationInternational.xlCountryCode)
But I can't manage to get the python equivalent.
Thanks.
wb.app.api.International returns a tuple with index 0 being 1 or 33 depending on the language.
As asked, the Python equivalent to
Application.International(xlApplicationInternational.xlCountryCode)
would be something like (wb being an xlwings.Workbook instance):
from xlwings.constants import ApplicationInternational
int_constants = wb.app.api.International
country_code = int_constants[ApplicationInternational.xlCountryCode]
The class ApplicationInternational defines the indices you can use to retrieve the respective properties of the International class, as done above. From experience sometimes you need to subtract 1 from these values.
I don't have a lot of experience with specifying colours, but it's probably possible to specify the colour in RGB (it seems to be possible in VBA). This way you can skip the locale part altogether (and you could use something like xlwings.constants.RgbColor.rgbRed if you're feeling fancy)

Using an IF THEN loop with nested JSON files in Python

I am currently writing a program which uses the ComapaniesHouse API to return a json file containing information about a certain company.
I am able to retrieve the data easily using the following commands:
r = requests.get('https://api.companieshouse.gov.uk/company/COMPANY-NO/filing-history', auth=('API-KEY', ''))
data = r.json()
With that information I can do an awful lot, however I've ran into a problem which I was hoping you guys could possible help me with. What I aim to do is go through every nested entry in the json file and check if the value of certain keys matches certain criteria, if the values of 2 keys match a certain criteria then other code is executed.
One of the keys is the date of an entry, and I would like to ignore results that are older than a certain date, I have attempted to do this with the following:
date_threshold = datetime.date.today() - datetime.timedelta(days=30)``
for each in data["items"]:
date = ['date']
type = ['type']
if date < date_threshold and type is "RM01":
print("wwwwww")
In case it isn't clear, what I'm attempting to do (albeit very badly) is assign each of the entries to a variable, which then gets tested against certain criteria.
Although this doesn't work, python spits out a variable mismatch error:
TypeError: unorderable types: list() < datetime.date()
Which makes me think the date is being stored as a string, and so I can't compare it to the datetime value set earlier, but when I check the API documentation (https://developer.companieshouse.gov.uk/api/docs/company/company_number/filing-history/filingHistoryItem-resource.html), it says clearly that the 'date' entry is returned as a date type.
What am I doing wrong, its very clear that I'm extremely new to python given what I presume is the atrocity of my code, but in my head it seems to make at least a little sense. In case none of this clear, I basically want to go through all the entries in the json file, and the if the date and type match a certain description, then other code can be executed (in this case I have just used random text).
Any help is greatly appreciated! Let me know if you need anything cleared up.
:)
EDIT
After tweaking my code to the below:
for each in data["items"]:
date = each['date']
type = each['type']
if date is '2016-09-15' and type is "RM01":
print("wwwwww")
The code executes without any errors, but the words aren't printed, even though I know there is an entry in the json file with that exact date, and that exact type, any thoughts?
SOLUTION:
Thanks to everyone for helping me out, I had made a couple of very basic errors, the code that works as expected is below::
for each in data["items"]:
date = each['date']
typevariable = each['type']
if date == '2016-09-15' and typevariable == "RM01":
print("wwwwww")
This prints the word "wwwwww" 3 times, which is correct seeing as there are 3 entries in the JSON that fulfil those criteria.
You need to first convert your date variable to a datetime type using datetime.strptime()
You are comparing a list type variable date with datetime type variable date_threshold.

match hex string with list indice

I'm building a de-identify tool. It replaces all names by other names.
We got a report that <name>Peter</name> met <name>Jane</name> yesterday. <name>Peter</name> is suspicious.
outpout :
We got a report that <name>Billy</name> met <name>Elsa</name> yesterday. <name>Billy</name> is suspicious.
It can be done on multiple documents, and one name is always replaced by the same counterpart, so you can still understand who the text is talking about. BUT, all documents have an ID, referring to the person this file is about (I'm working with files in a public service) and only documents with the same people ID will be de-identified the same way, with the same names. (the goal is to watch evolution and people's history) This is a security measure, such as when I hand over the tool to a third party, I don't hand over the key to my own documents with it.
So the same input, with a different ID, produces :
We got a report that <name>Henry</name> met <name>Alicia</name> yesterday. <name>Henry</name> is suspicious.
Right now, I'm hashing each name with the document ID as a salt, I convert the hash to an integer, then subtract the length of the name list until I can request a name with that integer as an indice. But I feel like there should be a quicker/more straightforward approach ?
It's really more of an algorithmic question, but if it's of any relevance I'm working with python 2.7 Please request more explanation if needed. Thank you !
I hope it's clearer this way รด_o Sorry when you are neck-deep in your code you forget others need a bigger picture to understand how you got there.
As #LutzHorn pointed out, you could just use a dict to map real names to false ones.
You could also just do something like:
existing_names = []
for nameocurrence in original_text:
if not nameoccurence.name in existing_names:
nameoccurence.id = len(existing_names)
existing_names.append(nameoccurence.name)
else:
nameoccurence.id = existing_names.index(nameoccurence.name)
for idx, _ in enumerate(existing_names):
existing_names[idx] = gimme_random_name()
Try using a dictionary of names.
import re
names = {"Peter": "Billy", "Jane": "Elsa"}
for name in re.findall("<name>([a-zA-Z]+)</name>", s):
s = re.sub("<name>" + name + "</name>", "<name>"+ names[name] + "</name>", s)
print(s)
Output:
'We got a report that <name>Billy</name> met <name>Elsa</name> yesterday. <name>Billy</name> is suspicious.'

Using Strings to Name Hash Keys?

I'm working through a book called "Head First Programming," and there's a particular part where I'm confused as to why they're doing this.
There doesn't appear to be any reasoning for it, nor any explanation anywhere in the text.
The issue in question is in using multiple-assignment to assign split data from a string into a hash (which doesn't make sense as to why they're using a hash, if you ask me, but that's a separate issue). Here's the example code:
line = "101;Johnny 'wave-boy' Jones;USA;8.32;Fish;21"
s = {}
(s['id'], s['name'], s['country'], s['average'], s['board'], s['age']) = line.split(";")
I understand that this will take the string line and split it up into each named part, but I don't understand why what I think are keys are being named by using a string, when just a few pages prior, they were named like any other variable, without single quotes.
The purpose of the individual parts is to be searched based on an individual element and then printed on screen. For example, being able to search by ID number and then return the entire thing.
The language in question is Python, if that makes any difference. This is rather confusing for me, since I'm trying to learn this stuff on my own.
My personal best guess is that it doesn't make any difference and that it was personal preference on part of the authors, but it bewilders me that they would suddenly change form like that without it having any meaning, and further bothers me that they don't explain it.
EDIT: So I tried printing the id key both with and without single quotes around the name, and it worked perfectly fine, either way. Therefore, I'd have to assume it's a matter of personal preference, but I still would like some info from someone who actually knows what they're doing as to whether it actually makes a difference, in the long run.
EDIT 2: Apparently, it doesn't make any sense as to how my Python interpreter is actually working with what I've given it, so I made a screen capture of it working https://www.youtube.com/watch?v=52GQJEeSwUA
I don't understand why what I think are keys are being named by using a string, when just a few pages prior, they were named like any other variable, without single quotes
The answer is right there. If there's no quote, mydict[s], then s is a variable, and you look up the key in the dict based on what the value of s is.
If it's a string, then you look up literally that key.
So, in your example s[name] won't work as that would try to access the variable name, which is probably not set.
EDIT: So I tried printing the id key both with and without single
quotes around the name, and it worked perfectly fine, either way.
That's just pure luck... There's a built-in function called id:
>>> id
<built-in function id>
Try another name, and you'll see that it won't work.
Actually, as it turns out, for dictionaries (Python's term for hashes) there is a semantic difference between having the quotes there and not.
For example:
s = {}
s['test'] = 1
s['othertest'] = 2
defines a dictionary called s with two keys, 'test' and 'othertest.' However, if I tried to do this instead:
s = {}
s[test] = 1
I'd get a NameError exception, because this would be looking for an undefined variable called test whose value would be used as the key.
If, then, I were to type this into the Python interpreter:
>>> s = {}
>>> s['test'] = 1
>>> s['othertest'] = 2
>>> test = 'othertest'
>>> print s[test]
2
>>> print s['test']
1
you'll see that using test as a key with no quotes uses the value of that variable to look up the associated entry in the dictionary s.
Edit: Now, the REALLY interesting question is why using s[id] gave you what you expected. The keyword "id" is actually a built-in function in Python that gives you a unique id for an object passed as its argument. What in the world the Python interpreter is doing with the expression s[id] is a total mystery to me.
Edit 2: Watching the OP's Youtube video, it's clear that he's staying consistent when assigning and reading the hash about using id or 'id', so there's no issue with the function id as a hash key somehow magically lining up with 'id' as a hash key. That had me kind of worried for a while.

Trouble calling Dictionary using Variable using Python

I have only been working with python for a few months,
so sorry if I am asking a stupid question. I am having
a problem calling a dictionary name using a variable.
The problem is, if I use a variable to call a dictionary & [] operators,
python interprets my code trying to return a single character in the string
instead of anything within the dictionary list.
To illustrate by an example ... let's say I
have a dictionary list like below.
USA={'Capital':'Washington',
'Currency':'USD'}
Japan={'Capital':'Tokyo',
'Currency':'JPY'}
China={'Capital':'Beijing',
'Currency':'RMB'}
country=input("Enter USA or JAPAN or China? ")
print(USA["Capital"]+USA["Currency"]) #No problem -> WashingtonUSD
print(Japan["Capital"]+Japan["Currency"]) #No problem -> TokyoJPY
print(China["Capital"]+China["Currency"]) #No problem -> BeijingRMB
print(country["Capital"]+country["Currency"]) #Error -> TypeError: string indices must be integers
In the example above, I understand the interpreter
is expecting an integer because it views the value
of "country" as a string instead of dictionary...
like if I put country[2] using Japan as input (for example),
it will return the character "p". But clearly that
is not what my intent is.
Is there a way I can work around this?
You should put your countries themselves into a dictionary, with the keys being the country names. Then you would be able to do COUNTRIES[country]["Capital"], etc.
Example:
COUNTRIES = dict(
USA={'Capital':'Washington',
'Currency':'USD'},
Japan={'Capital':'Tokyo',
'Currency':'JPY'},
...
)
country = input("Enter USA or Japan or China? ")
print(COUNTRIES[country]["Capital"])
Disclaimer: Any other way of doing it is definitely better than the way I'm about to show. This way will work, but it is not pythonic. I'm offering it for entertainment purposes, and to show that Python is cool.
USA={'Capital':'Washington',
'Currency':'USD'}
Japan={'Capital':'Tokyo',
'Currency':'JPY'}
China={'Capital':'Beijing',
'Currency':'RMB'}
country=input("Enter USA or Japan or China? ")
print(USA["Capital"]+USA["Currency"]) #No problem -> WashingtonUSD
print(Japan["Capital"]+Japan["Currency"]) #No problem -> TokyoJPY
print(China["Capital"]+China["Currency"]) #No problem -> BeijingRMB
# This works, but it is probably unwise to use it.
print(vars()[country]["Capital"] + vars()[country]['Currency'])
This works because the built-in function vars, when given no arguments, returns a dict of variables (and other stuff) in the current namespace. Each variable name, as a string, becomes a key in the dict.
But #tom's suggestion is actually a much better one.

Categories