I have an .xlsx file with 5 sheets, each sheet has 4 columns and I need to read the first column of the 5th sheet into a column of a dataframe.
I've tried this:
df = read_excel('file_path.xlsx', sheet_names='sheet_5', index_col='column_name'
However this seems to copy the whole sheet into the dataframe rather than just the first column.
Thanks to #Quang Hoang's comment, I found the solution.
df = pd.read_excel('file_path.xlsx', sheet_name, usecols=['column_name'])
The usecols option in read_excel only read in the column I wanted into the dataframe
Hey lets try it this way.
import pandas as pd
df = pd.read_csv ('path/to/file.csv', sheet_name = '5', index_col = 0)
print(df[['column_name']])
Tell me if it works, I recommend you reading documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
Related
there are many similar questions to this. I have looked through them all and I can't figure out how to fix my issue.
I have 11 dataframes. I would like to export all of these dataframes to one excel file, with one sheet per data frame. I have 2 lists: One is a list of dataframe objects, and one is a list of the names I want for each df. the lists are each ordered so that if you iterated through each list at the same time it would be the df and the df name I want.
Here is my code:
for (df, df_name) in zip(df_list, df_name_list):
sheetname = "{}".format(df_name)
df.to_excel(r"myfolder\myfile.xlsx", index=False, sheet_name=sheetname)
It exports to excel, but it appears to overwrite the sheet each time. The final sheet has the same name as the final dataframe, so it looped through both lists but it won't save separate sheets. Any help would be much appreciated!
UPDATE - ISSUE FIXED - EDITING TO ADD THE CODE THAT WORKED
`with pd.ExcelWriter(r"myfolder\myfile.xlsx") as writer:
for (df, df_name) in zip(df_list, df_name_list):
sheetname = "{}".format(df_name)
df.to_excel(writer, sheet_name=sheetname)'
I just tried something based docs example as it seems to work ok see:
import pandas as pd
data = [(f'Sheet {j}', pd.DataFrame({'a': [i for i in range(20)], 'b': [j for i in range(20)] })) for j in range(10)]
with pd.ExcelWriter('output.xlsx') as writer:
for sheet, df in data:
df.to_excel(writer, sheet_name=sheet)
I ma trying to write a code that deletes the unnamed column , that comes right before Unix Timestamp. After deleting I will save the modified dataframe into data.csv. How would I be able to get the Expected Output below?
import pandas ads pd
data = pd.read_csv('data.csv')
data.drop('')
data.to_csv('data.csv')
data.csv file
,Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
0,1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1,1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
2,1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
Updated csv (Expected Output):
Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
This is the index. Use index=False in to_csv.
data.to_csv('data.csv', index=False)
Set the first column as index df = pd.read_csv('data.csv', index_col=0) and set index=False when writing the results.
you can follow below code.it will take column from 1st position and then you can save that df to csv without index values.
df = df.iloc[:,1:]
df.to_csv("data.csv",index=False)
I have been using the following code from another StackOverflow answer to concatenate data from multiple excel sheets in the same workbook into one sheet.
This works great when the column names is uniform across all sheets in a workbook. However, I'm running into an issue with one specific workbook where only the first column is named differently (or not named at all.. so is blank) but the rest of the columns are the same.
How do I merge such sheets? Is there a way to rename the first column of each sheet into one name so that I can then use the steps from the answer linked above?
Yes, you can rename all the columns as:
# read excel
dfs = pd.read_excel('tmp.xlsx', sheetname=None, ignore_index=True)
# rename columns
column_names = ['col1', 'col2', ...]
for df in dfs.values(): df.columns = column_names
# concat
total_df = pd.concat(dfs.values())
Or, you can ignore the header in read_excel so that the columns are labeled as 0,1,2,...:
# read ignore header
dfs = pd.read_excel('tmp.xlsx', sheet_name=None,
header=None, skiprows=1)
total_df = pd.concat(dfs.values)
# rename
total_df.columns = column_names
I have these lines of code reading and writing an excel:
df = pd.read_excel(file_path, sheet_name, header=[0, 1])
df.to_excel(output_path, index=False)
When it tries to write the excel I get the following error:
NotImplementedError: Writing to Excel with MultiIndex columns and no index ('index'=False) is not yet implemented
I have no idea why this is happening, and I cannot find a concrete answer online.
Please help.
You can simply set index=True instead of False
That is because you are having multi index in your dataframe.
You can either reset_index() or drop your level=1 index.
If you don't want index then you can insert a column as index
df.insert(0,df.index,inplace=False) #something like this
Multi-index columns can actually be exported to Excel. You just have to set index=True.
So for the example the solution becomes...
df = pd.read_excel(file_path, sheet_name, header=[0, 1])
df.to_excel(output_path, index=True)
NB. This is true as of Pandas version 1.2.0
Multi-index columns cannot be exported to Excel. It is possible to transform the multi-index to single index, and then export it to excelc.
df = df.reset_index()
df.to_excel('file_name.xlsx', index=False)
One can write data to a specific cell, using:
xlsworksheet.write('B5', 'Hello')
But if you try to write a whole dataframe, df2, starting in cell 'B5':
xlsworksheet.write('B5', df2)
TypeError: Unsupported type <class 'pandas.core.frame.DataFrame'> in write()
What should be the way to write a whole dataframe starting in a specific cell?
The reason I ask this is because I need to paste 2 different pandas dataframes in the same sheet in excel.
XlsxWriter doesn't write Pandas dataframes directly. However, it is integrated with Pandas so you can do it the other way around.
Here is a small example of writing 2 dataframes to the same worksheet using the startrow parameter of Pandas to_excel:
import pandas as pd
df1 = pd.DataFrame({'Data': [10, 20, 30, 40]})
df2 = pd.DataFrame({'Data': [13, 24, 35, 46]})
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet1', startrow=6)
Output:
You can turn off the column and row indexes using other to_excel options.
You write strings to Excel. To convert DataFrames to strings, there you have several options of which it looks like to_csv is your best bet:
>>> string1 = df1.to_csv(writer)
>>> xlsworksheet.write('B5',string1)
>>> string2 = df2.to_csv(writer,'Sheet2')
>>> xlsworksheet.write('C5', string2)
Note this will write your entire dataframe to one cell. The only way I know of to write a frame to individual cells in an Excel sheet is to combine them and then use to_excel:
>>> writer = ExcelWriter('output.xlsx')
>>> frame = pd.concat(df1, df2)
>>> frame.to_excel(writer,'Sheet1')
Hope this helps...