CSV to Xls conversion using Pandas script - python

Trying to convert csv to xls using Python script. I have a sample csv file - test_report.csv and it is having the below values
COL1 COL2 COL3
A 1
B 2
C 5
COL3 is having empty or NULL values.
When I am trying to convert csv to xls using python script, I'm not able to see COL3 in the converted file HEADER as it is not having any data. If data is there is COL3, I can see the HEADER part and the corresponding values without any issue.
I have tried the below script. Not sure what mistake I'm making.
import os
os.chdir("/dev/test/test01/sub/subdir01/")
# Reading the csv file
import pandas as pd
print(pd.__file__)
col_names=["COL1","COL2","COL3"]
df_new = pd.read_csv("test_report.csv", quotechar='"', names=col_names, sep="|",skiprows=1, low_memory=False,error_bad_lines=False,header=None).dropna(axis=1, how="all")
# Saving xlsx file
file = f"test_report{pd.Timestamp('now').strftime('%Y%m%d_%I%M')}.xlsx"
df_new.to_excel(file, index=False)
Need guidance on fixing this issue.

Related

Treat everything as raw string (even formulas) when reading into pandas from excel

So, I am actually handling text responses from surveys, and it is common to have responses that starts with -, an example is: -I am sad today.
Excel would interpret it as #NAMES?
So when I import the excel file into pandas using read_excel, it would show NAN.
Now is there any method to force excel to retain as raw strings instead interpret it at formula level?
I created a vba and assigning the entire column with text to click through all the cells in the column, which is slow if there is ten thousand++ data.
I was hoping it can do it at python level instead, any idea?
I hope, it works for your solution, use openpyxl to extract excel data and then convert it into a pandas dataframe
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = './formula_contains_raw.xlsx', ).active
print(wb.values)
# sheet_names = wb.get_sheet_names()[0]
# sheet_ranges = wb[name]
df = pd.DataFrame(list(wb.values)[1:], columns=list(wb.values)[0])
df.head()
It works for me using a CSV instead of excel file.
In the CSV file (opened in excel) I need to select the option Formulas/Show Formulas, then save the file.
pd.read_csv('draft.csv')
Output:
Col1
0 hello
1 =-hello

Read a csv file in python correctly using pandas

I am trying to read this file using read_csv in pandas(python).
But I am not able to capture all columns.
Can you help?
Here is the code:
file = r'path of file'
df = pd.read_csv(file, encoding='cp1252', on_bad_lines='skip')
Thank you
I tried to read your file, and I first noticed that the encoding you specified does not correspond to the one used in your file. I also noticed that the separator is not a comma (,) but a tab (\t).
First, to get the file encoding (in linux), you just need to run:
$ file -i kopie.csv
kopie.csv: text/plain; charset=utf-16le
In Python:
import pandas as pd
path_to_file = 'kopie.csv'
df = pd.read_csv(path_to_file, encoding='utf-16le', sep='\t')
And when I print the shape of the loaded dataframe:
>>> df.shape
(869, 161)

CSV file misread with pandas

I just started to learn the pandas library for python and made an excel sheet that I saved as a .csv file.
The csv file reopened in excel
import pandas as pd
df = pd.read_csv('purchases.csv')
print(df)
Than I read the file with pandas and get the following output.
;apples;oranges
0 June;3;0
1 Robert;2;3
2 Lily;0;7
3 David;1;2
What should I do for the file showing the same way in an excel sheet and a dataframe?
You did not post your code.
Try this one:
df = pd.read_csv(<your file>, sep=';')

How to convert data from txt files to Excel files using python

I have a text file that contains data like this. It is is just a small example, but the real one is pretty similar.
I am wondering how to display such data in an "Excel Table" like this using Python?
The pandas library is wonderful for reading csv files (which is the file content in the image you linked). You can read in a csv or a txt file using the pandas library and output this to excel in 3 simple lines.
import pandas as pd
df = pd.read_csv('input.csv') # if your file is comma separated
or if your file is tab delimited '\t':
df = pd.read_csv('input.csv', sep='\t')
To save to excel file add the following:
df.to_excel('output.xlsx', 'Sheet1')
complete code:
import pandas as pd
df = pd.read_csv('input.csv') # can replace with df = pd.read_table('input.txt') for '\t'
df.to_excel('output.xlsx', 'Sheet1')
This will explicitly keep the index, so if your input file was:
A,B,C
1,2,3
4,5,6
7,8,9
Your output excel would look like this:
You can see your data has been shifted one column and your index axis has been kept. If you do not want this index column (because you have not assigned your df an index so it has the arbitrary one provided by pandas):
df.to_excel('output.xlsx', 'Sheet1', index=False)
Your output will look like:
Here you can see the index has been dropped from the excel file.
You do not need python! Just rename your text file to CSV and voila, you get your desired output :)
If you want to rename using python then -
You can use os.rename function
os.rename(src, dst)
Where src is the source file and dst is the destination file
XLWT
I use the XLWT library. It produces native Excel files, which is much better than simply importing text files as CSV files. It is a bit of work, but provides most key Excel features, including setting column widths, cell colors, cell formatting, etc.
saving this is:
df.to_excel("testfile.xlsx")

Column size issue : read_csv

I have a dataframe that has 4 columns. I have to convert this dataframe to csv for working in my local computer. when I convert dataframe to csv I have only one column:
df = pd.read_csv("final.csv")
print df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20479 entries, 0 to 20478
Data columns (total 1 columns)
How can I convert this csv to dataframe with 4 columns?
Question is inconcise. Are you aiming to write a pandas dataframe object to a csv file, or create a dataframe object from an existing csv file?
Pandas Dataframe to CSV this link should be sufficient to write a df to a csv file, and vice versa listed here Dataframe from CSV.
A csv file (comma separated values) is separated by commas so make sure the separator is consistent.
When you read in your dataframe, you might have to explicitly state what type of separator is being used. I would open the csv in a text editor and see what the separator is. If, for example, the separator used was "|", I would use the following code:
df = pd.read_csv('final.csv', sep='|')
Then, to save to a .csv the code should be as simple as:
df.to_csv('path/to/file/csvFileName.csv', index=False)
I would recommend using index=False like I did, otherwise the pandas index will be included as a column in your csv file. Cheers.

Categories