I just started to learn the pandas library for python and made an excel sheet that I saved as a .csv file.
The csv file reopened in excel
import pandas as pd
df = pd.read_csv('purchases.csv')
print(df)
Than I read the file with pandas and get the following output.
;apples;oranges
0 June;3;0
1 Robert;2;3
2 Lily;0;7
3 David;1;2
What should I do for the file showing the same way in an excel sheet and a dataframe?
You did not post your code.
Try this one:
df = pd.read_csv(<your file>, sep=';')
Related
So, I am actually handling text responses from surveys, and it is common to have responses that starts with -, an example is: -I am sad today.
Excel would interpret it as #NAMES?
So when I import the excel file into pandas using read_excel, it would show NAN.
Now is there any method to force excel to retain as raw strings instead interpret it at formula level?
I created a vba and assigning the entire column with text to click through all the cells in the column, which is slow if there is ten thousand++ data.
I was hoping it can do it at python level instead, any idea?
I hope, it works for your solution, use openpyxl to extract excel data and then convert it into a pandas dataframe
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = './formula_contains_raw.xlsx', ).active
print(wb.values)
# sheet_names = wb.get_sheet_names()[0]
# sheet_ranges = wb[name]
df = pd.DataFrame(list(wb.values)[1:], columns=list(wb.values)[0])
df.head()
It works for me using a CSV instead of excel file.
In the CSV file (opened in excel) I need to select the option Formulas/Show Formulas, then save the file.
pd.read_csv('draft.csv')
Output:
Col1
0 hello
1 =-hello
I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')
While making my bot to set permissions automatically while it came into a guild, Writing codes for this seemed getting too long. So, I just wanted to made my bot to get xlsx file as dataframe and set permissions from that data inside.
I wanted to make this xlsx file of mine as multiple-columned dataframe, but I don't think my program recognises it as one. Do I have my errors in my code below or I have to change my excel file for it to be rocognised as I wanted?
from pandas import read_excel
perm_data = read_excel('E:/Discord bot/Grail-Relique/data/xlsx/TextPermission.xlsx', header=[0,1], engine='openpyxl')
print(perm_data)
print(perm_data.loc[0,(0,0)])
result
This should do the work:
import pandas as pd
df = pd.read_excel('your/path/to/file.xlsx',
header=[0,1],
index_col=0)
print(df.head())
I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:
import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')
If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:
import pandas as pd
with open('test.xlsb', 'rb') as data:
data.seek(0,0)
df = pd.read_excel(data, engine='pyxlsb')
but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.
with open('test.xlsb','rb') as data:
data.seek(0,0)
with open('test_1.xlsb','wb') as outfile:
outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')
If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.
I would like to access the first sheet of the excel file. How can I do this? Below is my code:
import pandas as pd
df = pd.read_excel(r'S:\hotel pan Management\zero Material\Test run\Indepedent Run_2020\Return.Xlsx',sheet_names='FactorRtn')
print(df)
Replacing sheet_names with sheet_name should do it.