I dont know if i'm asking this question right but fell free ask more info if needed.
So i do this dataframe where i read csv file. Then i want to use the file to do another tasks. i want that df to be "active" but it seems like it dont recognise that dataframe outside of button.
def on_button_clicked(b):
df = pd.read_csv(F"./siivous/cleanedfiles/node_{karry.value}.csv")
with output:
display (df)
display(img)
clear_output(wait=True)
So how can i make that dataframe active just click of the button. So excample i wrote print(df) it print that df.
Your dataframe named df is declared inside of a function. If you do this you cannot access to it outside of that function.
I suggest you the check out this thread.
I hope it helped!
Related
I was trying to get a data frame of spam messages so I can analyze them. This is what the original CSV file looks like.
I want it to be like
This is what I had tried:
###import the original CSV (it's simplified sample which has only two columns - sender, text)
import pandas as pd
df = pd.read_csv("spam.csv")
### if any of those is in the text column, I'll put that row in the new data frame.
keyword = ["prize", "bit.ly", "shorturl"]
### putting rows that have a keyword into a new data frame.
spam_list = df[df['text'].str.contains('|'.join(keyword))]
### creating a new column 'detected keyword' and trying to show what was detected keyword
spam_list['detected word'] = keyword
spam_list
However, "detected word" is in order of the list.
I know it's because I put the list into the new column, but I couldn't think/find a better way to do this. Should I have used "for" as the solution? Or am I approaching it in a totally wrong way?
You can define a function that gets the result for each row:
def detect_keyword(row):
for key in keyword:
if key in row['text']:
return key
then get it done for all rows with pandas.apply() and save results as a new column:
df['detected_word'] = df.apply(lambda x: detect_keyword(x), axis=1)
You can use the code given below in the picture to solve your stated problem, I wasn't able to paste the code because stackoverflow wasn't allowing to paste short links. The link to the code is available.
The code has been adapted from here
Been working on this project all day and it's destroying me. Currently have finished web scraping and have a final .csv which contains the elements of a pandas dataframe. Working with this dataframe in a new file, and currently have the following:
df = pd.read_csv('active_homes.csv')
for i in range(len(df)):
add = df['Address'][i]
price = df['Price'][i]
if (price<100000) == True:
print(price)
'active_homes.csv' looks like this:
Address,Status,Price,Meta
"387 8th St, Burlington, CO 80807",For Sale,169500,"4bed2bath1,560sqft"
,and the resulting df's shape is (1764, 4).
This should, in theory, print the price for each iteration of price<100000.
In practice, it prints this:
I have confirmed that at each iteration of the above for loop, it is collecting the correct 'Price' and 'Address' information, and have also confirmed that at each interval the logic (price<100000) is working correctly. However, it is still doing the above. I was originally trying to just drop the rows of the dataframe that were <100000 but that wasn't doing anything. I was also trying to reassign the data to a new dataframe and it would either return an empty dataframe, or return a dataframe with duplicate data of this house (with the 'Price' of 58900).
So far, from all of that, I believe that the program is recognizing the amount of correct houses < 100000, but for some reason the assignment is sticking for the one address. It also does the same thing without assignment, as in:
for i in range(len(df)):
if (df['Price'][i]<100000) == True:
print(df['Price'][i])
Any help in identifying the error would be much appreciated.
With Pandas you try to never iterate everything in the traditional python way. Instead, you could achieve the desired result using the following method:
df = pd.read_csv('active_homes.csv')
temp_df = df[df["Price"]<100000] # initiating a new df isn't required, just a force of a habit
print(temp_df["Price"]) # displaying a series of houses that are below 100K; imo prettier print
I am trying to figure out a way to call a dataframe in a different python script using a variable.
I have a main dataframe (maindf) in main.py that holds the names of all the "sub" dataframes (called df1, df2....df9) located in other.py
other.py is imported properly using import others ive also used from others import df1.
Variable gets created by looping through the main dataframe to get the correct name of one of the sub dataframes using dfname = (maindf.loc[i, ['dfnames']].values[0]).
What I'm currently doing to access the correct dataframe after the variable is created by using if statements and it makes me wanna vomit just looking at it.
if dfname == "df1":
df = others.df1
if dfname == "df2":
df = others.df2
if dfname == "df3":
df = others.df4
print(df)
except with many more of these if statements. gets me the result i want but theres gotta be a better way to go about it.
my original idea was to do this.
df = others.dfname
print(df)
I also tried moving the dataframes df1-df9 into main.py but still cant call them using a variable.
I strongly agree with #TimRoberts, use a container. Dictionaries are perfect for this case as you could do other.dict_container['df1'].
That said you can access the attribute by name with getattr: getattr(other, 'df1')
I've been looking around but could not find an similar post, so I thought I'd give it a go.
I wrote an pandas program that sucessfully displays the resulting dataframe in pandas table format in a tkinter textbox. the aim is that the user can select the data ancopy/paste it into an (existing)excel sheet. when doing this, the index is always copied as well. I was wondering if one could programmatically select the complete table except the index?
I know that one can save to excel or other with index=false, but I could not find a kind of df.select....index=false. I hope my explanation is more or less clear ;-)
Thanks a lot
screenshot
you could use dataframe's 'to_string' function, here you could pass 'index = False' as one of the parameters. For Ex: say we have this df:
import pandas as pd
df = pd.DataFrame({'a': ['yes', 'no', 'yes' ], 'b': [10, 5, 20]})
print(df.to_string(index = False))
this would give you:
a b
yes 10
no 5
yes 20
Hope this helps!
I finally found it.
Instead of using something like self.mytable.copy('columns') to select everything and then switch to Excel and paste it, I use this line of code which does exactly what I need :
df.to_clipboard(sep="\t", index=False)
The sep="\t" makes it split up amongst columns in Excel.
Hopefully someone can use this at some stage.
Using Jupyter Notebook, if I put in the following code:
import pandas as pd
df = pd.read_csv('path/to/csv')
while True:
df
The dataframe won't show. Can anyone tell me why this is the case? I'm guessing it's because the constant looping is preventing the dataframe from loading fully. Is that what's happening here?
I need code that would let me get a user's input. If they type in a name, for example, I'll extract the person with that name's info from the dataframe and display it, then the program needs to ask them to give another name. This will continue until they type in "quit". I figured a while loop would be the best for that, but it looks like there's just something about while loops and pandas that won't mix. Does anyone have any suggestions on what I can do instead?