How would I be able to remove this part of the variable? - python

So I am making a code like a guessing game. The data for the guessing game is in the CSV file so I decided to use pandas. I have tried to use pandas to import my csv file, pick a random row and put the data into variables so I can use it in the rest of the code but, I can't figure out how to format the data in the variable correctly.
I've tried to split the string with split() but I am quite lost.
ar = pandas.read_csv('names.csv')
ar.columns = ["Song Name","Artist","Intials"]
randomsong = ar.sample(1)
songartist = randomsong["Artist"]
songname = (randomsong["Song Name"])
songintials = randomsong["Intials"]
print(songname)
My CSV file looks like this.
Song Name,Artist,Intials
Someone you loved,Lewis Capaldi,SYL
Bad Guy,Billie Eilish,BG
Ransom,Lil Tecca,R
Wow,Post Malone, W
I expect the output to be the name of the song from the csv file. For Example
Bad Guy
Instead the output is
1 Bad Guy
Name: Song Name, dtype:object
If anyone knows the solution please let me know. Thanks

You're getting a series object as output. You can try
randomsong["Song Name"].to_string()

Use df['column].values to get values of the column.
In your case, songartist = randomsong["Artist"].values[0] because you want only the first element of the returned list.

Related

how to define sample in a natural language processing model

for doc in sample['documents']:
The error is 'sample' undefined (I was trying to reproduce a natural language processing model)
I this you are searching that the way to display the natural processing language and i this this is helpful to you. i mention the link below so please come check this..
https://www.tableau.com/learn/articles/natural-language-processing-examples
In this case, your problem is the way you are reading the input. Not big deal no worries !
In the loop:
for doc in sample['documents']
sample is the Dataframe of input, or a dictionary, and 'documents' is the name of the column.
Let's suppose I have a csv of input like the following:
documents,label
Being offensive isnt illegal you idiot, negative
Loving the first days of summer! <3, positive
I hate when people put lol when we are having a serious talk ., negative
in python you will read the csv using pandas dataframe, for example:
sample=pd.read_csv('inputdata.csv',header=0)
and your sample['documents'] is the first colum of the input file. header =0 means that the label of your column are specified at the first line of the csv.
for doc in sample['documents'] will iterate over the first column, like this:
Being offensive isnt illegal you idiot
Loving the first days of summer! <3
I hate when people put lol when we are having a serious talk
This means that maybe the origin of your error is that you call your input data in some other ways instead of sample or it is not reading the header of the csv input.
If the csv doesn't have documents as the name of the header you can specify it like this:
columns = ['documents', 'labels']
sample = pd.read_csv(inputdata.csv', header = None, names = columns)
sample
Hope it helps !

Reading a dictionary from within a dictionary

I have a json file for tweet data. The data that I want to look at is the text of the tweet. For some reason, some of the tweets are too long to put into the normal text part of the dictionary.
It seems like there is a dictionary within another dictionary and I can't figure out how to access it very well.
Basically, what I want in the end is one column of a data frame that will have all of the text from each individual tweet. Here is a link to a small sample of the data that contains a problem tweet.
Here is the code I have so far:
import json
import pandas as pd
tweets = []
#This writes the json file so that I can work with it. This part works correctly.
with open("filelocation.txt") as source
for line in source:
if line.strip():
tweets.append(json.loads(line))
print(len(tweets)
df = pd.DataFrame.from_dict(tweets)
df.info()
When looking at the info you can see that there will be a column called extended_tweet that only encompasses one of the two sample tweets. Within this column, there seems to be another dictionary with one of those keys being full_text.
I want to add another column to the dataframe that just has this information along with the normal text column when the full_text is null.
My first thought was to try and read that specific column of the dataframe as a dictionary again using:
d = pd.DataFrame.from_dict(tweets['extended_tweet]['full_text])
But this doesn't work. I don't really understand why that doesn't work as that is how I read the data the first time.
My guess is that I can't look at the specific names because I am going back to the list and it would have to read all or none. The error it gives me says "KeyError: 'full_text' "
I also tried using the recommendation provided by this website. But this gave me a None value no matter what.
Thanks in advance!
I tried to do what #Dan D. suggested, however, this still gave me errors. But it gave me the idea to try this:
tweet[0]['extended_tweet']['full_text']
This works and gives me the value that I am looking for. But I need to run through the whole thing. So I tried this:
df['full'] = [tweet[i]['extended_tweet']['full_text'] for i in range(len(tweet))
This gives me "Key Error: 'extended_tweet' "
Does it seem like I am on the right track?
I would suggest to flatten out the dictionaries like this:
tweet = json.loads(line)
tweet['full_text'] = tweet['extended_tweet']['full_text']
tweets.append(tweet)
I don't know if the answer suggested earlier works. I never got that successfully. But I did figure out something else that works well for me.
What I really needed was a way to display the full text of a tweet. I first loaded the tweets from the json with what I posted above. Then I noticed that in the data file, there is something called truncated. If this value is true, the tweet is cut short and the full tweet is placed within the
tweet[i]['extended_tweet]['full_text]
In order to access it, I used this:
tweet_list = []
for i in range(len(tweets)):
if tweets[i]['truncated'] == 'True':
tweet_list.append(tweets[i]['extended_tweet']['full_text']
else:
tweet_list.append(tweets[i]['text']
Then I can work with the data using the whol text from each tweet.

Creating dict from csv file

I am working with a csv file containing tweets which was generated using this project: https://github.com/Jefferson-Henrique/GetOldTweets-python.
The 2 first tweets, and the headings in the csv file can be seen below:
username;date;retweets;favorites;text;geo;mentions;hashtags;id;permalink;;
thepsalami;02-04-2014 01:59;0;2;Must be #aprilfools because everyone is
saying #HIMYM is over! Haha it'll never stop as long as we hold fast to the
memories.;;;#aprilfools #HIMYM;
4,51147E+17;https://twitter.com/thepsalami/status/451146992131923968;;
shahanasiddiqui;02-04-2014 01:59;0;0;#promahuq yeah B-R was no surprise -
the ending was just right. My FB turned into #HIMYM blog site! Man that show
had a huge impact!;;#promahuq;#HIMYM;4,51147E+17;https://twitter.com/shahanasiddiqui/status
/451146991955759105;;
I want to save this in a dict such that I can easily access e.g. the username, the time or the text. I tried using csv.DictReader:
input_file = csv.DictReader(open("HIMYM_tweets.csv"))
But that results in something very weird:
{'username;date;retweets;favorites;text;geo;mentions;hashtags;id;permalink;;':
"thepsalami;02-04-2014 01:59;0;2;Must be #aprilfools because everyone is
saying #HIMYM is over! Haha it'll never stop as long as we hold fast to the
memories.;;;#aprilfools #HIMYM; 4", None:['51147E+17;https://twitter.com/thepsalami/status/451146992131923968;;']}
{'username;date;retweets;favorites;text;geo;mentions;hashtags;id;permalink;;': ' ....
Any help on creating such a dict, or maybe doing something smarter is very appreciated :D
As the comment by David you need to consider the delimeter when using the DictReader.
Just replace your code with this and it should work
input_file = csv.DictReader(open("HIMYM_tweets.csv"),delimeter=";")

Python - Searching a dictionary for strings

Basically, I have a troubleshooting program, which, I want the user to enter their input. Then, I take this input and split the words into separate strings. After that, I want to create a dictionary from the contents of a .CSV file, with the key as recognisable keywords and the second column as solutions. Finally, I want to check if any of the strings from the split users input are in the dictionary key, print the solution.
However, the problem I am facing is that I can do what I have stated above, however, it loops through and if my input was 'My phone is wet', and 'wet' was a recognisable keyword, it would go through and say 'Not recognised', 'Not recognised', 'Not recognised', then finally it would print the solution. It says not recognised so many times because the strings 'My', 'phone' and 'is' are not recognised.
So how do I test if a users split input is in my dictionary without it outputting 'Not recognised' etc..
Sorry if this was unclear, I'm quite confused by the whole matter.
Code:
import csv, easygui as eg
KeywordsCSV = dict(csv.reader(open('Keywords and Solutions.csv')))
Problem = eg.enterbox('Please enter your problem: ', 'Troubleshooting').lower().split()
for Problems, Solutions in (KeywordsCSV.items()):
pass
Note, I have the pass there, because this is the part I need help on.
My CSV file consists of:
problemKeyword | solution
For example;
wet Put the phone in a bowl of rice.
Your code reads like some ugly code golf. Let's clean it up before we look at how to solve the problem
import easygui as eg
import csv
# # KeywordsCSV = dict(csv.reader(open('Keywords and Solutions.csv')))
# why are you nesting THREE function calls? That's awful. Don't do that.
# KeywordsCSV should be named something different, too. `problems` is probably fine.
with open("Keywords and Solutions.csv") as f:
reader = csv.reader(f)
problems = dict(reader)
problem = eg.enterbox('Please enter your problem: ', 'Troubleshooting').lower().split()
# this one's not bad, but I lowercased your `Problem` because capital-case
# words are idiomatically class names. Chaining this many functions together isn't
# ideal, but for this one-shot case it's not awful.
Let's break a second here and notice that I changed something on literally every line of your code. Take time to familiarize yourself with PEP8 when you can! It will drastically improve any code you write in Python.
Anyway, once you've got a problems dict, and a problem that should be a KEY in that dict, you can do:
if problem in problems:
solution = problems[problem]
or even using the default return of dict.get:
solution = problems.get(problem)
# if KeyError: solution is None
If you wanted to loop this, you could do something like:
while True:
problem = eg.enterbox(...) # as above
solution = problems.get(problem)
if solution is None:
# invalid problem, warn the user
else:
# display the solution? Do whatever it is you're doing with it and...
break
Just have a boolean and an if after the loop that only runs if none of the words in the sentence were recognized.
I think you might be able to use something like:
for word in Problem:
if KeywordsCSV.has_key(word):
KeywordsCSV.get(word)
or the list comprehension:
[KeywordsCSV.get(word) for word in Problem if KeywordsCSV.has_key(word)]

Use generic keys in dictionary in Python

I am trying to name keys in my dictionary in a generic way because the name will be based on the data I get from a file. I am a new beginner to Python and I am not able to solve it, hope to get answer from u guys.
For example:
from collections import defaultdict
dic = defaultdict(dict)
dic = {}
if cycle = fergurson:
dic[cycle] = {}
if loop = mourinho:
a = 2
dic[cycle][loop] = {a}
Sorry if there is syntax error or any other mistake.
The variable fergurson and mourinho will be changing due to different files that I will import later on.
So I am expecting to see my output when i type :
dic[fergurson][mourinho]
the result will be:
>>>dic[fergurson][mourinho]
['2']
It will be done by using Python
Naming things, as they say, is one of the two hardest problems in Computer Science. That and cache invalidation and off-by-one errors.
Instead of focusing on what to call it now, think of how you're going to use the variable in your code a few lines down.
If you were to read code that was
for filename in directory_list:
print filename
It would be easy to presume that it is printing out a list of filenames
On the other hand, if the same code had different names
for a in b:
print a
it would be a lot less expressive as to what it is doing other than printing out a list of who knows what.
I know that this doesn't help what to call your 'dic' variable, but I hope that it gets you on the right track to find the right one for you.
i have found a way, if it is wrong please correct it
import re
dictionary={}
dsw = "I am a Geography teacher"
abc = "I am a clever student"
search = re.search(r'(?<=Geography )(.\w+)',dsw)
dictionary[search]={}
again = re.search(r'(?<=clever )(.\w+)' abc)
dictionary[search][again]={}
number = 56
dictionary[search][again]={number}
and so when you want to find your specific dictionary after running the program:
dictionary["teacher"]["student"]
you will get
>>>'56'
This is what i mean to

Categories