Naming DataFrames iteritively using items from a List + a string - python

I have a list of names of countries.. and I have a large dataframe where one of the columns is ' COUNTRY ' (yes it has a space before and after the word country) I want to be create smaller DataFrames based on country names
cleaned_df[cleaned_df[' COUNTRY ']==asia_country_list[1]]
seems too long a command to achieve this? It does work though.
Now,
str("%s_data" % (asia_country_list[1]))
gives
'Taiwan_data'
but when I combine the above two:
str("%s_data" % (asia_country_list[1])) = cleaned_df[cleaned_df[' COUNTRY ']==asia_country_list[1]]
I get:
SyntaxError: can't assign to function call
happy to learn other ways as well to achieve this pls.. Thanks vm

I don't think you should do this, but if you really need it :
exec(str("%s_data" % (asia_country_list[1])) +"= cleaned_df[cleaned_df[' COUNTRY ']==asia_country_list[1]]")
should work.
Using a dictionary is likely to solve your problem
D={}
D["%s_data" % (asia_country_list[1]))]=cleaned_df[cleaned_df[' COUNTRY ']==asia_country_list[1]]]
EDIT : the first solution is a bad idea : exec is a dangerous command, if one column is named "del cleaned_df" you will actually execute it, it can get destructive. Typically I am guessing spaces are a problem in your case. It's a bit like SQL injections...

Related

Converting company name to ticker

Hey so I have an excel document that has a mapping of company names to their respective tickers. I currently have this function
def(ticker):
mapping = pd.read_excel('ticker.xlsx',header = 3,parse_cols='A,B')
for index,row in mapping.iterrows():
if ticker.upper() in row['Name'].upper().split():
ticker = row['Ticker']
return ticker
The reason I am using "in" on line 4 instead of "==" is because in the excel document "Apple" is listed as "Apple Inc." and since the user isn't likely to type that I want ticker("apple") to return "AAPL".
In the code above the if statement never gets executed and I was curious on the best possible solution here.
Havnt seen this type of syntax before. Must be the nltk syntax.
That being said I will try to be helpful.
If the In command is the same as SQL then it means exactly equal. Meaning 'Apple' in('Apple Inc') would be false.
You want to do a if('AppleInc like '%Apple%')
or perhaps a .Match using regex. That's about the extent to which I can make suggestions as I don't do python.

Python - Searching a dictionary for strings

Basically, I have a troubleshooting program, which, I want the user to enter their input. Then, I take this input and split the words into separate strings. After that, I want to create a dictionary from the contents of a .CSV file, with the key as recognisable keywords and the second column as solutions. Finally, I want to check if any of the strings from the split users input are in the dictionary key, print the solution.
However, the problem I am facing is that I can do what I have stated above, however, it loops through and if my input was 'My phone is wet', and 'wet' was a recognisable keyword, it would go through and say 'Not recognised', 'Not recognised', 'Not recognised', then finally it would print the solution. It says not recognised so many times because the strings 'My', 'phone' and 'is' are not recognised.
So how do I test if a users split input is in my dictionary without it outputting 'Not recognised' etc..
Sorry if this was unclear, I'm quite confused by the whole matter.
Code:
import csv, easygui as eg
KeywordsCSV = dict(csv.reader(open('Keywords and Solutions.csv')))
Problem = eg.enterbox('Please enter your problem: ', 'Troubleshooting').lower().split()
for Problems, Solutions in (KeywordsCSV.items()):
pass
Note, I have the pass there, because this is the part I need help on.
My CSV file consists of:
problemKeyword | solution
For example;
wet Put the phone in a bowl of rice.
Your code reads like some ugly code golf. Let's clean it up before we look at how to solve the problem
import easygui as eg
import csv
# # KeywordsCSV = dict(csv.reader(open('Keywords and Solutions.csv')))
# why are you nesting THREE function calls? That's awful. Don't do that.
# KeywordsCSV should be named something different, too. `problems` is probably fine.
with open("Keywords and Solutions.csv") as f:
reader = csv.reader(f)
problems = dict(reader)
problem = eg.enterbox('Please enter your problem: ', 'Troubleshooting').lower().split()
# this one's not bad, but I lowercased your `Problem` because capital-case
# words are idiomatically class names. Chaining this many functions together isn't
# ideal, but for this one-shot case it's not awful.
Let's break a second here and notice that I changed something on literally every line of your code. Take time to familiarize yourself with PEP8 when you can! It will drastically improve any code you write in Python.
Anyway, once you've got a problems dict, and a problem that should be a KEY in that dict, you can do:
if problem in problems:
solution = problems[problem]
or even using the default return of dict.get:
solution = problems.get(problem)
# if KeyError: solution is None
If you wanted to loop this, you could do something like:
while True:
problem = eg.enterbox(...) # as above
solution = problems.get(problem)
if solution is None:
# invalid problem, warn the user
else:
# display the solution? Do whatever it is you're doing with it and...
break
Just have a boolean and an if after the loop that only runs if none of the words in the sentence were recognized.
I think you might be able to use something like:
for word in Problem:
if KeywordsCSV.has_key(word):
KeywordsCSV.get(word)
or the list comprehension:
[KeywordsCSV.get(word) for word in Problem if KeywordsCSV.has_key(word)]

match hex string with list indice

I'm building a de-identify tool. It replaces all names by other names.
We got a report that <name>Peter</name> met <name>Jane</name> yesterday. <name>Peter</name> is suspicious.
outpout :
We got a report that <name>Billy</name> met <name>Elsa</name> yesterday. <name>Billy</name> is suspicious.
It can be done on multiple documents, and one name is always replaced by the same counterpart, so you can still understand who the text is talking about. BUT, all documents have an ID, referring to the person this file is about (I'm working with files in a public service) and only documents with the same people ID will be de-identified the same way, with the same names. (the goal is to watch evolution and people's history) This is a security measure, such as when I hand over the tool to a third party, I don't hand over the key to my own documents with it.
So the same input, with a different ID, produces :
We got a report that <name>Henry</name> met <name>Alicia</name> yesterday. <name>Henry</name> is suspicious.
Right now, I'm hashing each name with the document ID as a salt, I convert the hash to an integer, then subtract the length of the name list until I can request a name with that integer as an indice. But I feel like there should be a quicker/more straightforward approach ?
It's really more of an algorithmic question, but if it's of any relevance I'm working with python 2.7 Please request more explanation if needed. Thank you !
I hope it's clearer this way รด_o Sorry when you are neck-deep in your code you forget others need a bigger picture to understand how you got there.
As #LutzHorn pointed out, you could just use a dict to map real names to false ones.
You could also just do something like:
existing_names = []
for nameocurrence in original_text:
if not nameoccurence.name in existing_names:
nameoccurence.id = len(existing_names)
existing_names.append(nameoccurence.name)
else:
nameoccurence.id = existing_names.index(nameoccurence.name)
for idx, _ in enumerate(existing_names):
existing_names[idx] = gimme_random_name()
Try using a dictionary of names.
import re
names = {"Peter": "Billy", "Jane": "Elsa"}
for name in re.findall("<name>([a-zA-Z]+)</name>", s):
s = re.sub("<name>" + name + "</name>", "<name>"+ names[name] + "</name>", s)
print(s)
Output:
'We got a report that <name>Billy</name> met <name>Elsa</name> yesterday. <name>Billy</name> is suspicious.'

Python Struct Arrays

Currently working on some Python scripts to interact with the Spacwalk\Satellite API. I'm able to return one piece of an array I'm looking for, but not the rest. Below is the API call I'm making in my script. (key) is the session key to authenticate with the server.
duplicates = client.system.listDuplicatesByHostname(key)
Running my script will produce the following kind of output:
print duplicates
[{'hostname': 'host01', 'systems': [{'last_checkin': <DateTime '20131231T14:06:54' at 192a908>, 'systemName': 'host01.example.com', 'systemId': 1000011017}
I can pull out the 'hostname' field using something like this:
for duplicate in duplicates:
print 'Hostname: %s' % ( duplicate.get('hostname')
But I can't retrieve any of the other items. "systems" is apparently a separate array (nested?) within the first array. I'm unsure of how to reference that second "systems" array. The API reference says the output will be in this format:
Returns:
array:
struct - Duplicate Group
string "hostname"
array "systems"
struct - system
int "systemId"
string "systemName"
dateTime.iso8601 "last_checkin" - Last time server successfully checked in
I'm not sure how to pull out the other values such as systemID, systemName. Is this considered a tuple? How would I go about retrieving these values? (I'm very new to Python, I've read about "structs" but haven't found any examples that really made sense to me.) Not necessarily looking for an answer to this exact question, but anywhere someone could point me to examples that clearly explain how to work with these kinds of arrays would be most helpful!!
Inside of the for loop you will have a dictionary called duplicate that contains the keys 'hostname' and 'systems', so duplicate['hostname'] will get the hostname (a string) and duplicate['systems'] will get the systems array.
You can then access an individual element from that systesm array using indexing, for example duplicate['systems'][0] would get the first system. However what you probably want to be doing instead is create a loop like for system in duplicate['systems'], that way you can iterate over each system in order.
Each system you get will be a dictionary that has the keys 'systemId', 'systemName', and 'last_checkin'.
Here is what I imagine the full code might look like:
for duplicate in duplicates:
print 'Hostname: ' + duplicate['hostname']
for system in duplicate['systems']:
print 'System ID: ' + system['systemId']
print 'System Name: ' + system['systemName']
print 'Last Checkin: ' + system['last_checkin']
I would suggest taking a look at the data structures tutorial.
Thanks guys, the input provided helped me figure this out. I got the output I needed using the following:
for duplicate in duplicates:
print 'IP: ' + duplicate['ip']
for system in duplicate['systems']:
print 'SystemID: ', system['systemId'], 'Name: ', system['systemName']

Using Strings to Name Hash Keys?

I'm working through a book called "Head First Programming," and there's a particular part where I'm confused as to why they're doing this.
There doesn't appear to be any reasoning for it, nor any explanation anywhere in the text.
The issue in question is in using multiple-assignment to assign split data from a string into a hash (which doesn't make sense as to why they're using a hash, if you ask me, but that's a separate issue). Here's the example code:
line = "101;Johnny 'wave-boy' Jones;USA;8.32;Fish;21"
s = {}
(s['id'], s['name'], s['country'], s['average'], s['board'], s['age']) = line.split(";")
I understand that this will take the string line and split it up into each named part, but I don't understand why what I think are keys are being named by using a string, when just a few pages prior, they were named like any other variable, without single quotes.
The purpose of the individual parts is to be searched based on an individual element and then printed on screen. For example, being able to search by ID number and then return the entire thing.
The language in question is Python, if that makes any difference. This is rather confusing for me, since I'm trying to learn this stuff on my own.
My personal best guess is that it doesn't make any difference and that it was personal preference on part of the authors, but it bewilders me that they would suddenly change form like that without it having any meaning, and further bothers me that they don't explain it.
EDIT: So I tried printing the id key both with and without single quotes around the name, and it worked perfectly fine, either way. Therefore, I'd have to assume it's a matter of personal preference, but I still would like some info from someone who actually knows what they're doing as to whether it actually makes a difference, in the long run.
EDIT 2: Apparently, it doesn't make any sense as to how my Python interpreter is actually working with what I've given it, so I made a screen capture of it working https://www.youtube.com/watch?v=52GQJEeSwUA
I don't understand why what I think are keys are being named by using a string, when just a few pages prior, they were named like any other variable, without single quotes
The answer is right there. If there's no quote, mydict[s], then s is a variable, and you look up the key in the dict based on what the value of s is.
If it's a string, then you look up literally that key.
So, in your example s[name] won't work as that would try to access the variable name, which is probably not set.
EDIT: So I tried printing the id key both with and without single
quotes around the name, and it worked perfectly fine, either way.
That's just pure luck... There's a built-in function called id:
>>> id
<built-in function id>
Try another name, and you'll see that it won't work.
Actually, as it turns out, for dictionaries (Python's term for hashes) there is a semantic difference between having the quotes there and not.
For example:
s = {}
s['test'] = 1
s['othertest'] = 2
defines a dictionary called s with two keys, 'test' and 'othertest.' However, if I tried to do this instead:
s = {}
s[test] = 1
I'd get a NameError exception, because this would be looking for an undefined variable called test whose value would be used as the key.
If, then, I were to type this into the Python interpreter:
>>> s = {}
>>> s['test'] = 1
>>> s['othertest'] = 2
>>> test = 'othertest'
>>> print s[test]
2
>>> print s['test']
1
you'll see that using test as a key with no quotes uses the value of that variable to look up the associated entry in the dictionary s.
Edit: Now, the REALLY interesting question is why using s[id] gave you what you expected. The keyword "id" is actually a built-in function in Python that gives you a unique id for an object passed as its argument. What in the world the Python interpreter is doing with the expression s[id] is a total mystery to me.
Edit 2: Watching the OP's Youtube video, it's clear that he's staying consistent when assigning and reading the hash about using id or 'id', so there's no issue with the function id as a hash key somehow magically lining up with 'id' as a hash key. That had me kind of worried for a while.

Categories