How to read single column of xlsx file into a dataframe?

How to read single column of xlsx file into a dataframe? - python

I have an .xlsx file with 5 sheets, each sheet has 4 columns and I need to read the first column of the 5th sheet into a column of a dataframe.
I've tried this:
df = read_excel('file_path.xlsx', sheet_names='sheet_5', index_col='column_name'
However this seems to copy the whole sheet into the dataframe rather than just the first column.

Thanks to #Quang Hoang's comment, I found the solution.
df = pd.read_excel('file_path.xlsx', sheet_name, usecols=['column_name'])
The usecols option in read_excel only read in the column I wanted into the dataframe

Hey lets try it this way.
import pandas as pd
df = pd.read_csv ('path/to/file.csv', sheet_name = '5', index_col = 0)
print(df[['column_name']])
Tell me if it works, I recommend you reading documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

Related

Pandas to_excel() - can't figure out how to export multiple dataframes as different sheets in one excel using for loop

there are many similar questions to this. I have looked through them all and I can't figure out how to fix my issue.
I have 11 dataframes. I would like to export all of these dataframes to one excel file, with one sheet per data frame. I have 2 lists: One is a list of dataframe objects, and one is a list of the names I want for each df. the lists are each ordered so that if you iterated through each list at the same time it would be the df and the df name I want.
Here is my code:
for (df, df_name) in zip(df_list, df_name_list):
sheetname = "{}".format(df_name)
df.to_excel(r"myfolder\myfile.xlsx", index=False, sheet_name=sheetname)
It exports to excel, but it appears to overwrite the sheet each time. The final sheet has the same name as the final dataframe, so it looped through both lists but it won't save separate sheets. Any help would be much appreciated!
UPDATE - ISSUE FIXED - EDITING TO ADD THE CODE THAT WORKED
`with pd.ExcelWriter(r"myfolder\myfile.xlsx") as writer:
for (df, df_name) in zip(df_list, df_name_list):
sheetname = "{}".format(df_name)
df.to_excel(writer, sheet_name=sheetname)'

I just tried something based docs example as it seems to work ok see:
import pandas as pd
data = [(f'Sheet {j}', pd.DataFrame({'a': [i for i in range(20)], 'b': [j for i in range(20)] })) for j in range(10)]
with pd.ExcelWriter('output.xlsx') as writer:
for sheet, df in data:
df.to_excel(writer, sheet_name=sheet)

Deleting an unnamed column from a csv file Pandas Python

I ma trying to write a code that deletes the unnamed column , that comes right before Unix Timestamp. After deleting I will save the modified dataframe into data.csv. How would I be able to get the Expected Output below?
import pandas ads pd
data = pd.read_csv('data.csv')
data.drop('')
data.to_csv('data.csv')
data.csv file
,Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
0,1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1,1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
2,1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
Updated csv (Expected Output):
Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714

This is the index. Use index=False in to_csv.
data.to_csv('data.csv', index=False)

Set the first column as index df = pd.read_csv('data.csv', index_col=0) and set index=False when writing the results.

you can follow below code.it will take column from 1st position and then you can save that df to csv without index values.
df = df.iloc[:,1:]
df.to_csv("data.csv",index=False)

How to merge multiple sheets in a single workbook using Python when the first column is named differently across the workbook

I have been using the following code from another StackOverflow answer to concatenate data from multiple excel sheets in the same workbook into one sheet.
This works great when the column names is uniform across all sheets in a workbook. However, I'm running into an issue with one specific workbook where only the first column is named differently (or not named at all.. so is blank) but the rest of the columns are the same.
How do I merge such sheets? Is there a way to rename the first column of each sheet into one name so that I can then use the steps from the answer linked above?

Yes, you can rename all the columns as:
# read excel
dfs = pd.read_excel('tmp.xlsx', sheetname=None, ignore_index=True)
# rename columns
column_names = ['col1', 'col2', ...]
for df in dfs.values(): df.columns = column_names
# concat
total_df = pd.concat(dfs.values())
Or, you can ignore the header in read_excel so that the columns are labeled as 0,1,2,...:
# read ignore header
dfs = pd.read_excel('tmp.xlsx', sheet_name=None,
header=None, skiprows=1)
total_df = pd.concat(dfs.values)
# rename
total_df.columns = column_names

Pandas DF NotImplementedError: Writing to Excel with MultiIndex columns and no index ('index'=False) is not yet implemented

I have these lines of code reading and writing an excel:
df = pd.read_excel(file_path, sheet_name, header=[0, 1])
df.to_excel(output_path, index=False)
When it tries to write the excel I get the following error:
NotImplementedError: Writing to Excel with MultiIndex columns and no index ('index'=False) is not yet implemented
I have no idea why this is happening, and I cannot find a concrete answer online.
Please help.

You can simply set index=True instead of False

That is because you are having multi index in your dataframe.
You can either reset_index() or drop your level=1 index.
If you don't want index then you can insert a column as index
df.insert(0,df.index,inplace=False) #something like this

Multi-index columns can actually be exported to Excel. You just have to set index=True.
So for the example the solution becomes...
df = pd.read_excel(file_path, sheet_name, header=[0, 1])
df.to_excel(output_path, index=True)
NB. This is true as of Pandas version 1.2.0

Multi-index columns cannot be exported to Excel. It is possible to transform the multi-index to single index, and then export it to excelc.
df = df.reset_index()
df.to_excel('file_name.xlsx', index=False)

XlsxWriter python to write a dataframe in a specific cell

One can write data to a specific cell, using:
xlsworksheet.write('B5', 'Hello')
But if you try to write a whole dataframe, df2, starting in cell 'B5':
xlsworksheet.write('B5', df2)
TypeError: Unsupported type <class 'pandas.core.frame.DataFrame'> in write()
What should be the way to write a whole dataframe starting in a specific cell?
The reason I ask this is because I need to paste 2 different pandas dataframes in the same sheet in excel.

XlsxWriter doesn't write Pandas dataframes directly. However, it is integrated with Pandas so you can do it the other way around.
Here is a small example of writing 2 dataframes to the same worksheet using the startrow parameter of Pandas to_excel:
import pandas as pd
df1 = pd.DataFrame({'Data': [10, 20, 30, 40]})
df2 = pd.DataFrame({'Data': [13, 24, 35, 46]})
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet1', startrow=6)
Output:
You can turn off the column and row indexes using other to_excel options.

You write strings to Excel. To convert DataFrames to strings, there you have several options of which it looks like to_csv is your best bet:
>>> string1 = df1.to_csv(writer)
>>> xlsworksheet.write('B5',string1)
>>> string2 = df2.to_csv(writer,'Sheet2')
>>> xlsworksheet.write('C5', string2)
Note this will write your entire dataframe to one cell. The only way I know of to write a frame to individual cells in an Excel sheet is to combine them and then use to_excel:
>>> writer = ExcelWriter('output.xlsx')
>>> frame = pd.concat(df1, df2)
>>> frame.to_excel(writer,'Sheet1')
Hope this helps...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read single column of xlsx file into a dataframe? - python

Thanks to #Quang Hoang's comment, I found the solution. df = pd.read_excel('file_path.xlsx', sheet_name, usecols=['column_name']) The usecols option in read_excel only read in the column I wanted into the dataframe

Hey lets try it this way. import pandas as pd df = pd.read_csv ('path/to/file.csv', sheet_name = '5', index_col = 0) print(df[['column_name']]) Tell me if it works, I recommend you reading documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

Related

Pandas to_excel() - can't figure out how to export multiple dataframes as different sheets in one excel using for loop

Deleting an unnamed column from a csv file Pandas Python

How to merge multiple sheets in a single workbook using Python when the first column is named differently across the workbook

Pandas DF NotImplementedError: Writing to Excel with MultiIndex columns and no index ('index'=False) is not yet implemented

XlsxWriter python to write a dataframe in a specific cell

Categories

Resources