CSV file misread with pandas - python

I just started to learn the pandas library for python and made an excel sheet that I saved as a .csv file.
The csv file reopened in excel
import pandas as pd
df = pd.read_csv('purchases.csv')
print(df)
Than I read the file with pandas and get the following output.
;apples;oranges
0 June;3;0
1 Robert;2;3
2 Lily;0;7
3 David;1;2
What should I do for the file showing the same way in an excel sheet and a dataframe?

You did not post your code.
Try this one:
df = pd.read_csv(<your file>, sep=';')

Related

Treat everything as raw string (even formulas) when reading into pandas from excel

So, I am actually handling text responses from surveys, and it is common to have responses that starts with -, an example is: -I am sad today.
Excel would interpret it as #NAMES?
So when I import the excel file into pandas using read_excel, it would show NAN.
Now is there any method to force excel to retain as raw strings instead interpret it at formula level?
I created a vba and assigning the entire column with text to click through all the cells in the column, which is slow if there is ten thousand++ data.
I was hoping it can do it at python level instead, any idea?
I hope, it works for your solution, use openpyxl to extract excel data and then convert it into a pandas dataframe
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = './formula_contains_raw.xlsx', ).active
print(wb.values)
# sheet_names = wb.get_sheet_names()[0]
# sheet_ranges = wb[name]
df = pd.DataFrame(list(wb.values)[1:], columns=list(wb.values)[0])
df.head()
It works for me using a CSV instead of excel file.
In the CSV file (opened in excel) I need to select the option Formulas/Show Formulas, then save the file.
pd.read_csv('draft.csv')
Output:
Col1
0 hello
1 =-hello

Inserting Data into an Excel file using Pandas - Python

I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')

How to make pandas recognise my xlsx file as multiple-columned datafrane

While making my bot to set permissions automatically while it came into a guild, Writing codes for this seemed getting too long. So, I just wanted to made my bot to get xlsx file as dataframe and set permissions from that data inside.
I wanted to make this xlsx file of mine as multiple-columned dataframe, but I don't think my program recognises it as one. Do I have my errors in my code below or I have to change my excel file for it to be rocognised as I wanted?
from pandas import read_excel
perm_data = read_excel('E:/Discord bot/Grail-Relique/data/xlsx/TextPermission.xlsx', header=[0,1], engine='openpyxl')
print(perm_data)
print(perm_data.loc[0,(0,0)])
result
This should do the work:
import pandas as pd
df = pd.read_excel('your/path/to/file.xlsx',
header=[0,1],
index_col=0)
print(df.head())

Reading XLSB (binary) file with Pandas read_excel using pyxlsb reads empty rows for some xlsb file

I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:
import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')
If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:
import pandas as pd
with open('test.xlsb', 'rb') as data:
data.seek(0,0)
df = pd.read_excel(data, engine='pyxlsb')
but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.
with open('test.xlsb','rb') as data:
data.seek(0,0)
with open('test_1.xlsb','wb') as outfile:
outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')
If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.

Pandas not able to read specific Excel sheet instead it is reading first sheet of the workbbok

I would like to access the first sheet of the excel file. How can I do this? Below is my code:
import pandas as pd
df = pd.read_excel(r'S:\hotel pan Management\zero Material\Test run\Indepedent Run_2020\Return.Xlsx',sheet_names='FactorRtn')
print(df)
Replacing sheet_names with sheet_name should do it.

Categories