I'm trying to move two dataframes from notebook1 to notebook2
I've tried using nbimporter:
import nbimporter
import notebook1 as nb1
nb1.df()
Which returns:
AttributeError: module 'notebook1' has no attribute 'df' (it does)
I also tried using ipynb but that didn't work either
I would just write it to a excel file and read it but the index gets messed up when reading it in the other notebook.
You could use a magic (literally what it's called, not me being cute lol) command called store. It works like this:
In notebook A:
df = pd.DataFrame(...)
%store df # Store the variable df in the IPython database
Then in another notebook B:
%store -r # This will load variables from the IPython database
df
An advantage of this approach is that you won't run into problems with datatypes changing or indexes getting messed up. This will work with variable types other than pandas dataframes too.
The official documentation displays some more features here
You could do something like this to save it as a csv:
df.to_csv('example.csv')
And then while accessing it in another notebook simply use:
df = pd.read_csv('example.csv', index_col=0)
I propose to use pickle to save then load your dataframe
From the first notebook
df.to_pickle("./df.pkl")
then from the second notebook
df = pd.read_pickle("./df.pkl")
Related
I am working on multiple data sets with similar data attributes (column names) in jupyter notebook. But it is really tiresome to run all the commands again and again with multiple data sets to achieve the same target. Can anyone let me know if I can automate the process and run this for various data sets. Let's say I'm running following commands for one data set in jupyter notebook:
data = pd.read_csv(r"/d/newfolder/test.csv",low_memory=False) <br>
data.head()
list(data.columns)
data_new=data.sort_values(by='column_name')
Now I'd want to run all the commands saving in one function, for different data sets in the notebook.
Can anyone help me out pls on what are the possible ways? Thanks in advance
IIUC, your issue is that something like print(df) doesn't show as pretty as if you just have df as the last line in a Jupyter cell.
You can have the pretty output whenever you want (as long as your jupyter is updated) by using display!
Modifying your code:
def process_data(file):
data = pd.read_csv(file, low_memory=False)
display(data.head())
display(data.columns)
data_new = data.sort_values(by='column_name')
display(data_new.head())
process_data(r"/d/newfolder/test.csv")
This will output data.head(), data.columns, and data_new.head() from a single cell~
I have a lot of different files that I'm trying to load to pandas in a pythonic way but also to add to different cells to make this look easy. Now I have 36 different variables but to make things easy, I'll show you an example with three different dataframes.
But let's say I'm uploading CSV files with this into dataframes but in different cells, automatically generated.
file_list = ['df1.csv', 'df2.csv', 'df3.csv']
name_list = ['df1', 'df2', 'df3']
I could easy create three different cells and type:
df1 = pd.read_csv('df1.csv')
But there are dozens of different CSVs and I want to do similar things like delete columns and there have to be easier ways.
I've done something such as:
var_list = []
for file, name in zip(file_list, name_list):
var_name = name
var_file = pd.read_csv(file)
var_list.append((file, name, var_file))
print(var_list)
But this all occurs in the same cell.
Now I looked at the ipython docs, as this is the package I believe has to do with this, but I couldn't find anything. I appreciate your help.
From what I understand, you need to load the content of several .csv files into several pandas dataframes, plus, you want to execute a repeatable process for each of them. You're not sure they will be loaded correctly, but you still want to be able to get the max out of them, and to this end you want to run each process in its own Jupyter cell.
As pointed out by ddejohn, I don't know if that's the best option, but anyway, I think it's a cool question. Next code generates several cells, each of them having a common structure with different variables (in my example, I simply sort the loaded dataframe by age, as an example). It is based on How to programmatically create several new cells in a Jupyter notebook page, which should get the credit, if it is indeed what you were looking for:
from IPython.core.getipython import get_ipython
import pandas as pd
def create_new_cell(contents):
shell = get_ipython()
payload = dict(
source='set_next_input',
text=contents,
replace=False,
)
shell.payload_manager.write_payload(payload, single=False)
def get_df(file_name, df_name):
content = "{df} = pd.read_csv('{file}', names=['Name', 'Age', 'Height'])\n"\
"{df}.sort_values(by='Age', inplace=True)\n"\
"{df}"\
.format(df=df_name, file=file_name)
create_new_cell(content)
file_list = ['filename_1.csv', 'filename_2.csv']
name_list = ['df1', 'df2']
for file, name in zip(file_list, name_list):
get_df(file, name)
I am starting to learn and understand panda module in Python. However, my issue is with the rename string. The rename works fine when i use print, this shows the column has been renamed:
print(data.rename(columns={"Rep": "Name"}))
However, when i use print(data), to show all of the data from the document, the column does not show as being renamed. This also does not show when the file has been exported using the data.to_csv("example.csv") string.
Would really appreciate if somebody could shed some light on this please.
Full Source code below:
import pandas as pd
data = pd.read_excel(r"D:\Downloads\Book1.xlsx")
del data["Region"]
del data["Item"]
print(data.rename(columns={"Rep": "Name"})
print(data)
data.to_csv("example.csv")
Use inplace argument, to make the changes reflect in the DataFrame as well, like this:
data.rename(columns={"Rep": "Name"}, inplace = True)
Try adding 'inplace=True' to data.rename
print(data.rename(columns={"Rep": "Name"}, inplace=True))
I'm new on this site so be indulgent if i make a mistake :)
I recently imported a csv file on my Jupyter notebook for a student work. I want use some of data of specific column of this file. The problem is that after import, the file appear as a table with 5286 lines (which represent dates and hours of measures) in a single column (that compiles all variables separated by ; that i want use for my work).
I don't know how to do to put this like a regular table.
I used this code to import my csv from my board :
import pandas as pd
data = pd.read_csv('/work/Weather_data/data 1998-2003.csv','error_bad_lines = false')
Output:
Desired output: the same data in multiple columns, separated on ;.
You can try this:
import pandas as pd
data = pd.read_csv('<location>', sep=';')
First question: please be kind.
I am having trouble loading a CSV file into a DataFrame on Spyder, using iPython. When I load an XLS file, it seems to have no problem and populates the new DataFrame variable into the variable explorer.
For example:
import pandas as pd
energy = pd.read_excel('file.xls', skiprows=17)
The above returns a DataFrame, named energy, populated in the variable explorer (i.e. I can actually see the DataFrame).
However, when I try to load in a CSV file using the same method, it seems to read in the file, however it does not populate the variable explorer.
For example:
import pandas as pd
GDP = pd.read_csv('file.csv')
When I run the above line, I don't get an error message, but the new DataFrame, GDP, does not populate the variable explorer. If I print GDP I get the values (268 rows x 60 columns). Am I not saving the new DataFrame correctly as a variable?
Thanks!
The problem is not with the variable, but with the way Variable Explorer filters what it shows. Go to "Tools/Preferences", select "Variable explorer", and uncheck option "Exclude all-uppercase references".