Using Python to get Excel data and fill form - python

I have an excel sheet with data I'd like to input into boxes on a web form.
import pandas as pd
df = pd.read_excel('C:\\Users\\jj\\Documents\\python_date_test.xlsx', Sheet_name=0)
(df['bx1'][0])
The output of the above code is '2'
When I insert this code into the code I'm using to webcrawl, I get the following error 'TypeError: object of type 'numpy.int64' has no len()'
Here's the code that produced this error:
mea1 = browser.find_element_by_name("data1_14581")
mea1.click()
mea1.send_keys((df['bx1'][0]))
mea1.send_keys(Keys.TAB)
mea1 refers to the first box for user input.
How can I get the value of (df['bx1'][0]) and enter it in to the box?

I haven't used this package but looking at it I believe you are on the right track, try changing the code to:
mea1.send_keys(str((df['bx1'][0])))

Related

How would I be able to remove this part of the variable?

So I am making a code like a guessing game. The data for the guessing game is in the CSV file so I decided to use pandas. I have tried to use pandas to import my csv file, pick a random row and put the data into variables so I can use it in the rest of the code but, I can't figure out how to format the data in the variable correctly.
I've tried to split the string with split() but I am quite lost.
ar = pandas.read_csv('names.csv')
ar.columns = ["Song Name","Artist","Intials"]
randomsong = ar.sample(1)
songartist = randomsong["Artist"]
songname = (randomsong["Song Name"])
songintials = randomsong["Intials"]
print(songname)
My CSV file looks like this.
Song Name,Artist,Intials
Someone you loved,Lewis Capaldi,SYL
Bad Guy,Billie Eilish,BG
Ransom,Lil Tecca,R
Wow,Post Malone, W
I expect the output to be the name of the song from the csv file. For Example
Bad Guy
Instead the output is
1 Bad Guy
Name: Song Name, dtype:object
If anyone knows the solution please let me know. Thanks
You're getting a series object as output. You can try
randomsong["Song Name"].to_string()
Use df['column].values to get values of the column.
In your case, songartist = randomsong["Artist"].values[0] because you want only the first element of the returned list.

Reading a dictionary from within a dictionary

I have a json file for tweet data. The data that I want to look at is the text of the tweet. For some reason, some of the tweets are too long to put into the normal text part of the dictionary.
It seems like there is a dictionary within another dictionary and I can't figure out how to access it very well.
Basically, what I want in the end is one column of a data frame that will have all of the text from each individual tweet. Here is a link to a small sample of the data that contains a problem tweet.
Here is the code I have so far:
import json
import pandas as pd
tweets = []
#This writes the json file so that I can work with it. This part works correctly.
with open("filelocation.txt") as source
for line in source:
if line.strip():
tweets.append(json.loads(line))
print(len(tweets)
df = pd.DataFrame.from_dict(tweets)
df.info()
When looking at the info you can see that there will be a column called extended_tweet that only encompasses one of the two sample tweets. Within this column, there seems to be another dictionary with one of those keys being full_text.
I want to add another column to the dataframe that just has this information along with the normal text column when the full_text is null.
My first thought was to try and read that specific column of the dataframe as a dictionary again using:
d = pd.DataFrame.from_dict(tweets['extended_tweet]['full_text])
But this doesn't work. I don't really understand why that doesn't work as that is how I read the data the first time.
My guess is that I can't look at the specific names because I am going back to the list and it would have to read all or none. The error it gives me says "KeyError: 'full_text' "
I also tried using the recommendation provided by this website. But this gave me a None value no matter what.
Thanks in advance!
I tried to do what #Dan D. suggested, however, this still gave me errors. But it gave me the idea to try this:
tweet[0]['extended_tweet']['full_text']
This works and gives me the value that I am looking for. But I need to run through the whole thing. So I tried this:
df['full'] = [tweet[i]['extended_tweet']['full_text'] for i in range(len(tweet))
This gives me "Key Error: 'extended_tweet' "
Does it seem like I am on the right track?
I would suggest to flatten out the dictionaries like this:
tweet = json.loads(line)
tweet['full_text'] = tweet['extended_tweet']['full_text']
tweets.append(tweet)
I don't know if the answer suggested earlier works. I never got that successfully. But I did figure out something else that works well for me.
What I really needed was a way to display the full text of a tweet. I first loaded the tweets from the json with what I posted above. Then I noticed that in the data file, there is something called truncated. If this value is true, the tweet is cut short and the full tweet is placed within the
tweet[i]['extended_tweet]['full_text]
In order to access it, I used this:
tweet_list = []
for i in range(len(tweets)):
if tweets[i]['truncated'] == 'True':
tweet_list.append(tweets[i]['extended_tweet']['full_text']
else:
tweet_list.append(tweets[i]['text']
Then I can work with the data using the whol text from each tweet.

How do I print data in a csv file that corresponds to another column?

I am attempting to print the every value under the attribute Title of a csv file that corresponds to the the value ('Image segmentation') under the attribute Field. I've tried numerous codes to figure this out but keep getting the wrong answer or no answer at all. My latest attempt is below. I'm not sure where to go from here so any help is appreciated.
import pandas as pd
data_file=pd.read_csv('7papers.csv')
data_file.columns=data_file.columns.str.strip()
field=data_file.Field
title=data_file.Title
for field in data_file:
if field == ('Image segmentation'):
print(title)
I'm not certain I'm following your question and as commented above seeing the csv file would help, but it sounds like you are wanting to do this:
print data_file.loc[ data_file.Field == 'Image segmentation', 'Title' ]
This says "select the rows where Field has the value we want and then print the value of Title in those rows". Does that work?

adding character to start of each value in column pandas

I have a sample data set:
ID sequence
H100 ATTCCT
H231 CTGGGA
H2002 CCCCCCA
I simply want to add a ">" in front of each ID:
ID sequence
>H100 ATTCCT
>H231 CTGGGA
>H2002 CCCCCCA
From this post Append string to the start of each value in a said column of a pandas dataframe (elegantly)
I got the code :
df["ID"] = '>' + df["ID"].astype(str)
However, this warning message came up:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
so I tried:
df.loc[: , "ID"] = '>'
The same error message came up
How should i correct it this?
thanks
Give this a shot - works for me in Python 3.5:
df['ID'] = ('>' + df['ID'])
If that won't do it, you may have to refer to df.iloc[:,1] for example (just type it in the terminal first to ensure you grabbed the right field where ID is located).
The other problem you may be experiencing is that your dataframe was created as a slice of another dataframe. Try converting your "slice" to its own dataframe:
dataframename = pandas.DataFrame(dataframename)
Then do the code snip I posted.
Best - Matt
Not sure why I'm losing reputation points for trying to answer questions for people with actually verified answers... kind of wondering what the point of this forum is at the moment.

Unable to convert pandas DataFrame to json using to_json

I am aware that there are several other posts on Stack Overflow regarding this same issue, however, not a single solution found on those posts, or any other post I've found online for that matter, has worked. I have followed numerous tutorials, videos, books, and Stack Overflow posts on pandas and all mentioned solutions have failed.
The frustrating thing is that all the solutions I have found are correct, or at least they should be; I am fairly new to pandas so my only conclusion is that I am probably doing something wrong.
Here is the pandas documentation that I started with: Pandas to_json Doc. I can't seem to get pandas to_json to convert a pandas DataFrame to a json object or json string.
Basically, I want to convert a csv string into a DataFrame, then convert that DataFrame into a json object or json string (I don't care which one). Then, once I have my json data structure, I'm going to bind it to a D3.js bar chart
Here is an example of what I am trying to do:
# Declare my csv string (Works):
csvStr = '"pid","dos","facility","a1c_val"\n"123456","2013-01-01 13:37:00","UOFU",5.4\n"65432","2014-01-01 14:32:00","UOFU",5.8\n"65432","2013-01-01 13:01:00","UOFU",6.4'
print (csvStr) # Just checking the variables contents
# Read csv and convert to DataFrame (Works):
csvDf = pandas.read_csv(StringIO.StringIO(csvStr))
print (csvDf) # Just checking the variables contents
# Convert DataFrame to json (Three of the ways I tried - None of them work):
myJSON = csvDf.to_json(path_or_buf = None, orient = 'record', date_format = 'epoch', double_precision = 10, force_ascii = True, date_unit = 'ms', default_handler = None) # Attempt 1
print (myJSON) # Just checking the variables contents
myJSON = csvDf.to_json() # Attempt 2
print (myJSON) # Just checking the variables contents
myJSON = pandas.io.json.to_json(csvDf)
print (myJSON) # Just checking the variables contents
The error that I am getting is:
argument 1 must be string or read-only character buffer, not DataFrame
Which is misleading because the documentation says "A Series or DataFrame can be converted to a valid JSON string."
Regardless, I tried giving it a string anyway, and it resulted in the exact same error.
I have tried creating a test scenario, following the exact steps from books and other tutorials and/or posts and it just results in the same error. At this point, I need a simple solution asap. I am open to suggestions, but I must emphasize that I do not have time waste on learning a completely new library.
For you first attempt, the correct string is 'records' not 'record' This worked for me:
myJSON = csvDf.to_json(path_or_buf = None, orient = 'records', date_format = 'epoch', double_precision = 10, force_ascii = True, date_unit = 'ms', default_handler = None) # Attempt 1
Printing gives:
[{"pid":123456,"dos":"2013-01-01 13:37:00","facility":"UOFU","a1c_val":5.4},
{"pid":65432,"dos":"2014-01-01 14:32:00","facility":"UOFU","a1c_val":5.8},
{"pid":65432,"dos":"2013-01-01 13:01:00","facility":"UOFU","a1c_val":6.4}]
It turns out that the problem was becuase of my own stupid mistake. While testing my use of to_json, I copy and pasted an example into my code and went from there. Thinking I had commented out that code, I proceeded to try using to_json with my test data. Turns out the error I was receiving was being thrown from the example code that I had copy and pasted. Once I deleted everything and re-wrote it using my test data it worked.
However, as user667648 (Bair) pointed out, there was another mistake in my code. The orient param was suppose to be orient = 'records' and NOT orient = 'record'.

Categories