I have a requirement where I have to split some columns as first row and the remaining as second row.
I have store them in one dataframe such as :
columnA columnB columnC columnD
A B C D
to a text file sample.txt:
A,B
C,D
This is the code :
cleaned_data.iloc[:, 0:1].to_csv("report_csv.txt", encoding='utf-8', index=False, header=False, line_terminator='')
cleaned_data.iloc[:,1:].to_csv("report_csv.txt", encoding='utf-8', index=False, header=False, mode='a', line_terminator='')
It should produce as expected in sample.txt. However, there is third line which is empty and I dont want it to exist. I tried lineterminator='', it does not work for '' but it works such as ' ' or 'abc' etc..
I'm sure there is better way of producing the sample text file than using what I've written. I'm up for other alternative.
Still, how can I remove the last empty line? I'm using python 3.8
I'm not able to reproduce your issue, but it might be the case that your strings in the dataframe contain trailing line breaks. I'm running Pandas 0.23.4 on linux
import pandas
print(pandas.__version__)
I created what I think your dataframe contains using the command
df = pandas.DataFrame({'colA':['A'], 'colB': ['B'], 'colC':['C'], 'colD':['D']})
To check the contents of a cell, you could use df['colA'][0].
The indexing I needed to grab the first and second columns was
df.iloc[:, 0:2]
and the way I got to a CSV did not rely on lineterminator
df.iloc[:, 0:2].to_csv("report_csv.txt", encoding='utf-8', index=False, header=False)
df.iloc[:,2:].to_csv("report_csv.txt", encoding='utf-8', index=False, header=False, mode='a')
When I run
with open('report_csv.txt','r') as file_handle:
dat = file_handle.read()
I get 'A,B\nC,D\n' from dat.
To get no trailing newline on the last line, use to_string()
with open('output.txt','w') as file_handle:
file_handle.write(df.iloc[:, 0:2].to_string(header=False,index=False)+"\n")
file_handle.write(df.iloc[:,2:].to_string(header=False,index=False))
Then we can verify the file is formatted as desired by running
with open('output.txt','r') as file_handle:
dat = file_handle.read()
The dat contains 'A B\nC D'. If spaces are not an acceptable delimiter, they could be replaced by a , prior to writing to file.
Related
I want to append this single rowed df
rndList = ["albert", "magnus", "calc", 2, 5, "drop"]
rndListDf = pd.DataFrame([rndList])
to a new row of this csv file ,
first,second,third,fourth,fifth,sixth
to place each value under the corespondent column header
using this aproach
rndListDf.to_csv('./rnd_data.csv', mode='a', header=False)
leaves a empty row between header and data in the csv file
how can I append the row without the empty row ?
first,second,third,fourth,fifth,sixth
0,albert,magnus,calc,2,5,drop
I think you have empty lines after your header rows but you can try:
data = pd.read_csv('./rnd_data.csv')
rndListDf.rename(columns=dict(zip(rndListDf.columns, data.columns))) \
.to_csv('./rnd_data.csv', index=False)
Content of your file after this operation:
first,second,third,fourth,fifth,sixth
albert,magnus,calc,2,5,drop
I tested. Code or pandas.to_csv doesn't append new line. It comes from your original csv file. If you are trying to figure out how to add heading to your dataframe:
rndList = ["albert", "magnus", "calc", 2, 5, "drop"]
rndListDf = pd.DataFrame([rndList])
rndListDf.columns = 'first,second,third,fourth,fifth,sixth'.split(',')
rndListDf.to_csv('./rnd_data.csv', index=False)
alternatively, you can first clean your csv as suggested by Corralien and continue doing what you are doing. However, I would suggest to go with Corralien's solution.
# Cleanup
pd.read_csv('./rnd_data.csv').to_csv('rnd_data.csv', index=False)
# Your Code
rndList = ["albert", "magnus", "calc", 2, 5, "drop"]
rndListDf = pd.DataFrame([rndList])
rndListDf.to_csv('./rnd_data.csv', mode='a', header=False)
# Result
first,second,third,fourth,fifth,sixth
albert,magnus,calc,2,5,drop
I'm trying to read a csv file with pandas.
This file actually has only one row but it causes an error whenever I try to read it.
Something wrong seems happening in line 8 but I could hardly find the 8th line since there's clearly only one row on it.
I do like:
with codecs.open("path_to_file", "rU", "Shift-JIS", "ignore") as file:
df = pd.read_csv(file, header=None, sep="\t")
df
Then I get:
ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3
I don't get what's really going on, so any of your advice will be appreciated.
I struggled with this almost a half day , I opened the csv with notepad and noticed that separate is TAB not comma and then tried belo combination.
df = pd.read_csv('C:\\myfile.csv',sep='\t', lineterminator='\r')
Try df = pd.read_csv(file, header=None, error_bad_lines=False)
The existing answer will not include these additional lines in your dataframe. If you'd like your dataframe to be as wide as its widest point, you can use the following:
delimiter = ','
max_columns = max(open(path_name, 'r'), key = lambda x: x.count(delimiter)).count(delimiter)
df = pd.read_csv(path_name, header = None, skiprows = 1, names = list(range(0,max_columns)))
Set skiprows = 1 if there's actually a header, you can always retrieve the header column names later.
You can also identify rows that have more columns populated than the number of column names in the original header.
I'm currently sorting a csv file. As far as my output, its correct but it isn't properly formatted. The following is the file I'm sorting
And here is the output after I sort (I'll include the code after the image)
Obviously I'm having a delimiter issue, but here is my code:
with open(out_file, 'r') as unsort:##Opens OMI Data
with open(Pandora_Sorted,'w') as sort:##Opens file to write to
for line in unsort:
if "Datetime" in line:##Searches lines
writer=csv.writer(sort, delimiter = ',')
writer.writerow(headers)##Writes header
elif "T13" in line:
writer=csv.writer(sort)
writer.writerow(line)##Writes to output file
I think it's easier to read the csv file into a pandas data frame and sort, please check below sample code.
import pandas as pd
df = pd.read_csv(input_file)
df.sort_values(by = ['Datetime'], inplace = True)
df.to_csv(output_file)
Do you need to be explicit about your separator for the writer?
Here in the second line of your elif:
elif "T13" in line:
writer=csv.writer(sort, delimiter = ',')
writer.writerow(line) # Writes to output file
For provided code, the header also would also have formatting similar to other rows due to following line:
writer=csv.writer(sort, delimiter = ',')
Using pandas following can be used for sorting in ascending order by list of columns, list_of_columns
import csv
import pandas as pd
input_csv = pd.read_csv(out_file, sep=',')
input_csv.sort_values(by=list_of_columns, ascending=True)
input_csv.to_csv(Pandora_Sorted, sep=',')
for e.g. list_of_columns could be
list_of_columns = ['Datetime', 'JulianDate', 'repetition']
I have a csv file with rows that looks like this:
87.89,"2,392.05",14.77,373.2 ( third row has coma thousand separator)
pandas keeps considering the comma in second column as a row separator and showing "Error tokenizing data" Error.
is there a away in pandas to ignore comas between double quotes?
thanks
sample rows :
9999992613813558569,87.89,"2,392.05",14.77,373.2
9999987064038821584,95.11,"3,397.04",42.15,"1,461.14"
9999956300203713283,6.67,194.02,41.23,"1,105.45"
9999946809576027532,15.08,353.84,29.43,591.9
Edit:
i already tried :
read_csv(file, quotechar='"', encoding='latin1', thousands=',')
read_csv(file, quotechar='"', encoding='latin1', escapechar ='"')
Try reading it with:
pd.read_csv(myfile, encoding='latin1', quotechar='"')
Each column that contains these will be treated as type object.
Once you get this, to get back to float use:
df = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce'))
Alternatively you can try:
pd.read_csv(myfile, encoding='latin1', quotechar='"', error_bad_lines=False)
Here you can see what was omitted from original csv - what caused the problem.
For each line that was omitted you'll receive a Warning instead of Error.
This worked for me:
pd.read_csv(myfile, encoding='latin1', quotechar='"', thousands=',')
I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data.
You can specify a python write mode in the pandas to_csv function. For append it is 'a'.
In your case:
df.to_csv('my_csv.csv', mode='a', header=False)
The default mode is 'w'.
If the file initially might be missing, you can make sure the header is printed at the first write using this variation:
output_path='my_csv.csv'
df.to_csv(output_path, mode='a', header=not os.path.exists(output_path))
You can append to a csv by opening the file in append mode:
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
If this was your csv, foo.csv:
,A,B,C
0,1,2,3
1,4,5,6
If you read that and then append, for example, df + 6:
In [1]: df = pd.read_csv('foo.csv', index_col=0)
In [2]: df
Out[2]:
A B C
0 1 2 3
1 4 5 6
In [3]: df + 6
Out[3]:
A B C
0 7 8 9
1 10 11 12
In [4]: with open('foo.csv', 'a') as f:
(df + 6).to_csv(f, header=False)
foo.csv becomes:
,A,B,C
0,1,2,3
1,4,5,6
0,7,8,9
1,10,11,12
with open(filename, 'a') as f:
df.to_csv(f, header=f.tell()==0)
Create file unless exists, otherwise append
Add header if file is being created, otherwise skip it
A little helper function I use with some header checking safeguards to handle it all:
def appendDFToCSV_void(df, csvFilePath, sep=","):
import os
if not os.path.isfile(csvFilePath):
df.to_csv(csvFilePath, mode='a', index=False, sep=sep)
elif len(df.columns) != len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns):
raise Exception("Columns do not match!! Dataframe has " + str(len(df.columns)) + " columns. CSV file has " + str(len(pd.read_csv(csvFilePath, nrows=1, sep=sep).columns)) + " columns.")
elif not (df.columns == pd.read_csv(csvFilePath, nrows=1, sep=sep).columns).all():
raise Exception("Columns and column order of dataframe and csv file do not match!!")
else:
df.to_csv(csvFilePath, mode='a', index=False, sep=sep, header=False)
Initially starting with a pyspark dataframes - I got type conversion errors (when converting to pandas df's and then appending to csv) given the schema/column types in my pyspark dataframes
Solved the problem by forcing all columns in each df to be of type string and then appending this to csv as follows:
with open('testAppend.csv', 'a') as f:
df2.toPandas().astype(str).to_csv(f, header=False)
This is how I did it in 2021
Let us say I have a csv sales.csv which has the following data in it:
sales.csv:
Order Name,Price,Qty
oil,200,2
butter,180,10
and to add more rows I can load them in a data frame and append it to the csv like this:
import pandas
data = [
['matchstick', '60', '11'],
['cookies', '10', '120']
]
dataframe = pandas.DataFrame(data)
dataframe.to_csv("sales.csv", index=False, mode='a', header=False)
and the output will be:
Order Name,Price,Qty
oil,200,2
butter,180,10
matchstick,60,11
cookies,10,120
A bit late to the party but you can also use a context manager, if you're opening and closing your file multiple times, or logging data, statistics, etc.
from contextlib import contextmanager
import pandas as pd
#contextmanager
def open_file(path, mode):
file_to=open(path,mode)
yield file_to
file_to.close()
##later
saved_df=pd.DataFrame(data)
with open_file('yourcsv.csv','r') as infile:
saved_df.to_csv('yourcsv.csv',mode='a',header=False)`