I am trying to restructure the way my precipitations' data is being organized in an excel file. To do this, I've written the following code:
import pandas as pd
df = pd.read_excel('El Jem_Souassi.xlsx', sheetname=None, header=None)
data=df["El Jem"]
T=[]
for column in range(1,56):
liste=data[column].tolist()
for row in range(1,len(liste)):
liste[row]=str(liste[row])
if liste[row]!='nan':
T.append(liste[row])
result=pd.DataFrame(T)
result
This code works fine and through Jupyter I can see that the result is good
screenshot
However, I am facing a problem when attempting to save this dataframe to a csv file.
result.to_csv("output.csv")
The resulting file contains the vertical index column and it seems I am unable to call for a specific cell.
(Hopefully, someone can help me with this problem)
Many thanks !!
It's all in the docs.
You are interested in skipping the index column, so do:
result.to_csv("output.csv", index=False)
If you also want to skip the header add:
result.to_csv("output.csv", index=False, header=False)
I don't know how your input data looks like (it is a good idea to make it available in your question). But note that currently you can obtain the same results just by doing:
import pandas as pd
df = pd.DataFrame([0]*16)
df.to_csv('results.csv', index=False, header=False)
Related
I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')
I have a problem with "pandas read_excel", thats my code:
import pandas as pd
df = pd.read_excel('myExcelfile.xlsx', 'Table1', engine='openpyxl', header=1)
print(df.__len__())
If I run this code in Pycharm on Windows PC I got the right length of the dataframe, which is 28757
but if I run this code on my linux server I got only 26645 as output.
Any ideas whats the reason for that?
Thanks
Try this way:
import pandas as pd
data= pd.read_excel('Advertising.xlsx')
data.head()
I got the solution.
The problem was an empty first row in my .xlsx File.
My file is automatically created by another program, so I used openpyxl to delete the first row and make a new .xlsx File.
import openpyxl
path = 'myExcelFile.xlsx'
book = openpyxl.load_workbook(path)
sheet = book['Tabelle1']
#start at row 0, length 1 row:
sheet.delete_rows(0,1)
#save in new file:
book.save('myExcelFile_new.xlsx')
Attention, in this code sample I don`t check if the first row is empty!
So I delete the first line no matter if there is content in it or not.
I wrote this code just so show the example that I'm having. I need to save the data I have to a csv then reopen it later but when I reload the data into a pandas dataframe from csv it now has an extra unnamed column at the front that I don't want and it's messing up my data when I try to do .drop_duplicates() because each row now has its own number and every I reopen it from a csv it will have a new row of number at the front, just making everything worse. How do I make it so it doesn't have this?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(100,4), columns=list('ABCD'))
df.to_csv('data.csv')
print(df.head())
df1 = pd.read_csv('data.csv')
print(df1.head())
Its the dataframe index. You can turn that off with
df.to_csv('data.csv', index=False)
The docs are the first stop to learn the different options you have when writing. pandas.DataFrame.to_csv
While reading, you can prevent columns with empty rows like:
df = pd.read_csv("data.csv").dropna()
The solution was super easy. I needed to do
df.to_csv('data.csv', index= False)
I have a CSV file, diseases_matrix_KNN.csv which has excel table.
Now, I would like to store all the numbers from the row like:
Hypothermia = [0,-1,0,0,0,0,0,0,0,0,0,0,0,0]
For some reason, I am unable to find a solution to this. Even though I have looked. Please let me know if I can read this type of data in the chosen form, using Python please.
most common way to work with excel is use Pandas.
Here is example:
import pandas as pd
df = pd.read_excel(filename)
print (df.iloc['Hypothermia']). # gives you such result
So i am trying to read a csv file by the code as below:
import pandas as pd
user_cols = ['id','listing_type','status','listing_class','property_type','street_address','city','state',' 'zip_4','cross_street','street_index','unit','floor','location','Latitude',
'longitude','subway','neighborhood','price','incentives','fee_type','fee_percentage','fee_details_broker',
'fee_details_clients','application_information','maintenance','taxes','max_financing','other_costs','beds',
'baths','full_baths','three_quarter_baths','half_baths','total_rooms','square_feet','exterior_square_feet',
'lot_area','lot_dimensions','date_available','date_listed','closed_on','year_built','recent_renovation',
'lease_min','lease_max','date_added','date_edited','date_update','contact','access','keys','mls_name','mls_id',
'courtesy_of','vow_opt_out','idx_opt_out','pet_details','notes','sync','private','listing_score','added_by_id',
'featured_office_id','date_expires','exclusive_file_id','condition','guarantor','blast_link']
data = pd.read_csv("C:\\Users\\Desktop\\dump-4.csv", low_memory=False, dtype=object, header=None, names=user_cols)
I am able to read the file but when i try to display the columns there are about 15-16 column names that are missing. Why is this happening and what can I do.
So when i deleted the dtype=object and header=None..it did print all the columns. not really sure what wouldve been the correct dtype though! Thanks anyway! :)