Problem when importing Excel File with Pandas - python

I'm new to python and was hoping someone could help me out.
I imported an excel file using pandas just to play around with. However when I try do any additional analysis or coding on the data it is only using the header row of the excel file.
Here's one of the codes I used:
import pandas as pd
df = pd.read_excel(r'C:\Users\at0789\Documents\Test File.xlsx')
data=list(df)
print(data)
Here's the output:
runfile('C:/Users/at0789/.spyder-py3/temp.py', wdir='C:/Users/at0789/.spyder-py3')
['Name', 'Number', 'Color', 'Date']
This is what my test file looks like:

you can pass only the string 'C:\Users\at0789\Documents\Test File.xlsx'
And you don't have to print the df, only call it, like that
import pandas as pd
df = pd.read_excel('C:\Users\at0789\Documents\Test File.xlsx')
df

import pandas as pd
df = pd.read_excel(r'C:\Users\at0789\Documents\Test File.xlsx')
df - data-frame
Data-frame have some many built-in function. With optimisation code with less line of code and high performance
One best feature is play example play with data as like sql query

Related

Inserting Data into an Excel file using Pandas - Python

I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')

How do I prevent pandas from writing a new column when I save to csv

I wrote this code just so show the example that I'm having. I need to save the data I have to a csv then reopen it later but when I reload the data into a pandas dataframe from csv it now has an extra unnamed column at the front that I don't want and it's messing up my data when I try to do .drop_duplicates() because each row now has its own number and every I reopen it from a csv it will have a new row of number at the front, just making everything worse. How do I make it so it doesn't have this?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(100,4), columns=list('ABCD'))
df.to_csv('data.csv')
print(df.head())
df1 = pd.read_csv('data.csv')
print(df1.head())
Its the dataframe index. You can turn that off with
df.to_csv('data.csv', index=False)
The docs are the first stop to learn the different options you have when writing. pandas.DataFrame.to_csv
While reading, you can prevent columns with empty rows like:
df = pd.read_csv("data.csv").dropna()
The solution was super easy. I needed to do
df.to_csv('data.csv', index= False)

Using Pandas to pull excel document info and save selected columns to new file

Hey guys really confused as to how to approach this, tried looking all over the place. I want to save the selected columns in a new excel file. Any help is appreciated!
import pandas as pd
import numpy as np
data = pd.read_excel('C:\\Users\\me\\Downloads\\Reconcile.xlsx')
data[['batched_at', 'batch_id', 'total', 'customer_firstname', 'customer_lastname']]
data.to_excel('C:\\Users\\me\\Downloads\\Newfile.xlsx')
The third line does nothing here, assign it to a new dataframe and save that one.
import pandas as pd
import numpy as np
data = pd.read_excel('C:\\Users\\me\\Downloads\\Reconcile.xlsx')
new_data = data[['batched_at', 'batch_id', 'total', 'customer_firstname', 'customer_lastname']]
new_data.to_excel('C:\\Users\\me\\Downloads\\Newfile.xlsx')

saving a dataframe to csv file (python)

I am trying to restructure the way my precipitations' data is being organized in an excel file. To do this, I've written the following code:
import pandas as pd
df = pd.read_excel('El Jem_Souassi.xlsx', sheetname=None, header=None)
data=df["El Jem"]
T=[]
for column in range(1,56):
liste=data[column].tolist()
for row in range(1,len(liste)):
liste[row]=str(liste[row])
if liste[row]!='nan':
T.append(liste[row])
result=pd.DataFrame(T)
result
This code works fine and through Jupyter I can see that the result is good
screenshot
However, I am facing a problem when attempting to save this dataframe to a csv file.
result.to_csv("output.csv")
The resulting file contains the vertical index column and it seems I am unable to call for a specific cell.
(Hopefully, someone can help me with this problem)
Many thanks !!
It's all in the docs.
You are interested in skipping the index column, so do:
result.to_csv("output.csv", index=False)
If you also want to skip the header add:
result.to_csv("output.csv", index=False, header=False)
I don't know how your input data looks like (it is a good idea to make it available in your question). But note that currently you can obtain the same results just by doing:
import pandas as pd
df = pd.DataFrame([0]*16)
df.to_csv('results.csv', index=False, header=False)

how to save Python pandas data into excel file?

I am trying to load data from the web source and save it as a Excel file but not sure how to do it. What should I do? The original dataframe has different columns. Let's say that I am trying to save 'Open' column
import matplotlib.pyplot as plt
import pandas_datareader.data as web
import datetime
import pandas as pd
def ViewStockTrend(compcode):
start = datetime.datetime(2015,2,2)
end = datetime.datetime(2016,7,13)
stock = web.DataReader(compcode,'yahoo',start,end)
print(stock['Open'])
compcode = ['FDX','GOOGL','FB']
aa= ViewStockTrend(compcode)
Once you have made the pandas dataframe just use to_excel on the entire thing if you want:
aa.to_excel('output/filename.xlsx')
If stock is a pandas DataFrame, you need to construct a new Framefrom that column and output that one to excel:
df = pd.DataFrame(stock['Open'])
df.to_excel('path/to/your/file')

Categories