Load a csv file into pandas dataframe

Load a csv file into pandas dataframe - python

First question: please be kind.
I am having trouble loading a CSV file into a DataFrame on Spyder, using iPython. When I load an XLS file, it seems to have no problem and populates the new DataFrame variable into the variable explorer.
For example:
import pandas as pd
energy = pd.read_excel('file.xls', skiprows=17)
The above returns a DataFrame, named energy, populated in the variable explorer (i.e. I can actually see the DataFrame).
However, when I try to load in a CSV file using the same method, it seems to read in the file, however it does not populate the variable explorer.
For example:
import pandas as pd
GDP = pd.read_csv('file.csv')
When I run the above line, I don't get an error message, but the new DataFrame, GDP, does not populate the variable explorer. If I print GDP I get the values (268 rows x 60 columns). Am I not saving the new DataFrame correctly as a variable?
Thanks!

The problem is not with the variable, but with the way Variable Explorer filters what it shows. Go to "Tools/Preferences", select "Variable explorer", and uncheck option "Exclude all-uppercase references".

Related

Pandas error: "None of [Index(['value'], dtype='object', name='Ticker')] are in the [index]"

Hi I am trying to create multiple csv files from a single big csv using python. The original csv file has multiple stocks data in 1 min date/time with Open, high, low, close, volume as other columns.
Sample data from original file is here
At first, I tried to copy individual Ticker and all its corresponding values to a new file with following code:
import pandas as pd
excel_file_path=r'C:\Users\mahan\Documents\test projects\01_07_APR_WEEKLY_expiry_data_VEGE_NF_AND_BNF_Options_Desktop_Vege.csv'
export_path=r"C:\Users\mahan\Documents\exportfiles\{output_file_name}_sheet.csv"
data= pd.read_csv(excel_file_path, index_col="Ticker") #Making data frame from csv file
rows= data.loc[['NIFTYWK17500CE']] #Retrieving rows by loc method
output_file_name ="NIFTYWK17500CE_"
print(type(rows))
rows
rows.to_csv(export_path)
Result was something like this:
a file was saved with the name "{output_file_name}__sheet.csv"
I failed at naming the file but data was copied pertaining to all the values with Ticker value 'NIFTYWK17500CE'.
Then I tried to create a array with column "Ticker" to find unique values. Created a dataframe with original file for all the data. And tried to use a For loop for values in the array matching the 1st column 'Ticker' and copy those data to a new file using the value in the exporting csv file name.
code as below:
import pandas as pd
excel_file_path=r'C:\Users\mahan\Documents\test projects\01_07_APR_WEEKLY_expiry_data_VEGE_NF_AND_BNF_Options_Desktop_Vege.csv'
df2=pd.read_csv(excel_file_path)
df2_uniques =df2['Ticker'].unique()
df2_counts=df2['Ticker'].value_counts()
for value in df2_uniques:
value=value.replace(' ', '_')
export_path=r"C:\Users\mahan\Documents\exportfiles\{value}__sheet.csv"
df=pd.read_csv(excel_file_path,index_col="Ticker")
rows=df.loc[['value']]
print(type(rows))
rows.to_csv(export_path)
Received an error:
KeyError: "None of [Index(['value'], dtype='object', name='Ticker')] are in the [index]"
Where did I went wrong:
In naming the file properly to save in earlier code.
In the second code.
Any help is really appreciated. Thanks in advance.
SOLVED
What worked for me was the following with comments:
import pandas as pd
excel_file_path=r'C:\Users\mahan\Documents\test projects\01_07_APR_WEEKLY_expiry_data_VEGE_NF_AND_BNF_Options_Desktop_Vege.csv'
df2=pd.read_csv(excel_file_path)
df2_uniques =df2['Ticker'].unique()
for value in df2_uniques:
value=value.replace(' ', '_')
df=pd.read_csv(excel_file_path,index_col="Ticker")
rows=df.loc[[value]] #Changed from 'value' to value
print(type(rows))
rows.to_csv(r'_'+value+'.csv')
#Removed export_path as filename and filepath together were giving me hard time to figure out.
#The files get saved in same filepath as the original imported filepath. So that'll do. sharing just for reference
Final output looks like this:

I can't know for sure without seeing the dataframe, but the error indicates that there is no column name 'Ticker'. It appears that you set this column to be the index, so you can try df2_uniques = set(df2.index).

changed
rows=df.loc[['value']]
to
rows=df.loc[[value]]
Also, Removed export_path as both filename and filepath together were giving me hard time to figure out.
The files get saved in same filepath as the original imported filepath. So that'll do. Sharing just for reference
Final code that worked looked like this:
import pandas as pd
excel_file_path=r'C:\Users\mahan\Documents\test projects\01_07_APR_WEEKLY_expiry_data_VEGE_NF_AND_BNF_Options_Desktop_Vege.csv'
df2=pd.read_csv(excel_file_path)
df2_uniques =df2['Ticker'].unique()
for value in df2_uniques:
value=value.replace(' ', '_')
df=pd.read_csv(excel_file_path,index_col="Ticker")
rows=df.loc[[value]] #Changed from 'value' to value
print(type(rows))
rows.to_csv(r'_'+value+'.csv')

deleting some rows from .csv file cause adding NaN columns to it

python version: 3.7.11
pandas version: 1.1.3
IDE: Jupyter Notebook
Software for opening and resaving the .csv file: Microsoft Excel
I have a .csv file. You can download it from here: https://icedrive.net/0/35CvwH7gqr
In .csv file, I looked for rows that have blank cells and after finding that rows I deleted them. To do this I follow bellow instruction:
I Opened .csv file with Microsoft Excel.
I pressed F5, then in the "Reference" field I wrote "A1:E9030", then I clicked on ok.
I pressed F5 again, then clicked on "Special..." button, select "Blanks", then clicked on ok
In the "Home" tab from "Cells", I clicked "Delete", then "Delete Sheet Rows"
saved the file and closed it.
This is the file after deleting some rows: https://icedrive.net/0/cfG1dT6bBr
but when I run bellow code, it seems that extra columns are added after deleting some rows.
import pandas as pd
# The file doesn't have any header.
my_file = pd.read_csv(path_to_my_file, header=None)
my_file.head()
print(my_file.shape)
The output:
(9024, 244)
You can also see the difference by opening the file with notepad:
.csv file before deleting some rows:
.csv file after deleting some rows:
before deleting the rows the my_file.shape shows me 5 columns but after deleting some rows it shows me 244 for number of columns.
Question:
How to remove rows in excel or with other ways so I won't end up with this problem?
Note: I can't remove these rows with pandas because pandas automatically doesn't take into account these rows so I should do this manually.
Thanks in advance for any help.

I am not familiar with the operation you are carrying out in the first part of your question, but I suggest a different solution. Pandas will recognize only np.nan objects as null. So, in this case, we could start by loading the .csv file into Pandas first and replace the empty cells with np.nan values:
>>> import pandas as pd
>>> import numpy as np
>>> my_file = pd.read_csv(path_to_my_file, header=None)
>>> my_file = my_file.replace('', np.nan, inplace=True)
Then, we could ask pandas to drop all the rows containing np.nan:
>>> my_file = my_file.dropna(inplace=True)
This should give you the desired output. I think is a good habit to work on data frames from your IDE directly. Hope this helped!

Problem with csv data imported on jupyter notebook

I'm new on this site so be indulgent if i make a mistake :)
I recently imported a csv file on my Jupyter notebook for a student work. I want use some of data of specific column of this file. The problem is that after import, the file appear as a table with 5286 lines (which represent dates and hours of measures) in a single column (that compiles all variables separated by ; that i want use for my work).
I don't know how to do to put this like a regular table.
I used this code to import my csv from my board :
import pandas as pd
data = pd.read_csv('/work/Weather_data/data 1998-2003.csv','error_bad_lines = false')
Output:
Desired output: the same data in multiple columns, separated on ;.

You can try this:
import pandas as pd
data = pd.read_csv('<location>', sep=';')

Opening a Python file from another application without saving the file first (Opening a Pandas table from Excel)

I am working with pandas, and I've just modified a table
Now, I would like to see my table in excel, but it's just a quick look, and I will have to modify the table again later on, so I don't want to save my table anywhere.
In other words, the solution
my_df = pd.DataFrame()
item_path = "my/path"
my_df.to_csv("my/path")
os.startfile(os.normpath(item_path))
Is not what I want. I would like to obtain the same behavior without saving the Dataframe as CSV first.
#Something like:
my_df = pd.DataFrame()
start_excel(table_to_load = my_df) #Opens excel with a COPY of my_df
Note
To quickly explore a DataFrame, df.head() is the way, but I want to open my DataFrame from a Tkinter application. I need to use an external program to open this temporary table

you can have a quick look using
<dataframe_name>.head()
it will display top 5 rows by default
or
you can simply write how many rows you want
<dataframe_name>.head(<rows_you_want>)

Using Python to load template excel file, insert a DataFrame to specific lines and save as a new file

I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)

I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.

Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Load a csv file into pandas dataframe - python

The problem is not with the variable, but with the way Variable Explorer filters what it shows. Go to "Tools/Preferences", select "Variable explorer", and uncheck option "Exclude all-uppercase references".

Related

Pandas error: "None of [Index(['value'], dtype='object', name='Ticker')] are in the [index]"

deleting some rows from .csv file cause adding NaN columns to it

Problem with csv data imported on jupyter notebook

Opening a Python file from another application without saving the file first (Opening a Pandas table from Excel)

Using Python to load template excel file, insert a DataFrame to specific lines and save as a new file

Categories

Resources