I'm currently sorting a csv file. As far as my output, its correct but it isn't properly formatted. The following is the file I'm sorting
And here is the output after I sort (I'll include the code after the image)
Obviously I'm having a delimiter issue, but here is my code:
with open(out_file, 'r') as unsort:##Opens OMI Data
with open(Pandora_Sorted,'w') as sort:##Opens file to write to
for line in unsort:
if "Datetime" in line:##Searches lines
writer=csv.writer(sort, delimiter = ',')
writer.writerow(headers)##Writes header
elif "T13" in line:
writer=csv.writer(sort)
writer.writerow(line)##Writes to output file
I think it's easier to read the csv file into a pandas data frame and sort, please check below sample code.
import pandas as pd
df = pd.read_csv(input_file)
df.sort_values(by = ['Datetime'], inplace = True)
df.to_csv(output_file)
Do you need to be explicit about your separator for the writer?
Here in the second line of your elif:
elif "T13" in line:
writer=csv.writer(sort, delimiter = ',')
writer.writerow(line) # Writes to output file
For provided code, the header also would also have formatting similar to other rows due to following line:
writer=csv.writer(sort, delimiter = ',')
Using pandas following can be used for sorting in ascending order by list of columns, list_of_columns
import csv
import pandas as pd
input_csv = pd.read_csv(out_file, sep=',')
input_csv.sort_values(by=list_of_columns, ascending=True)
input_csv.to_csv(Pandora_Sorted, sep=',')
for e.g. list_of_columns could be
list_of_columns = ['Datetime', 'JulianDate', 'repetition']
Related
I have a requirement where I have to split some columns as first row and the remaining as second row.
I have store them in one dataframe such as :
columnA columnB columnC columnD
A B C D
to a text file sample.txt:
A,B
C,D
This is the code :
cleaned_data.iloc[:, 0:1].to_csv("report_csv.txt", encoding='utf-8', index=False, header=False, line_terminator='')
cleaned_data.iloc[:,1:].to_csv("report_csv.txt", encoding='utf-8', index=False, header=False, mode='a', line_terminator='')
It should produce as expected in sample.txt. However, there is third line which is empty and I dont want it to exist. I tried lineterminator='', it does not work for '' but it works such as ' ' or 'abc' etc..
I'm sure there is better way of producing the sample text file than using what I've written. I'm up for other alternative.
Still, how can I remove the last empty line? I'm using python 3.8
I'm not able to reproduce your issue, but it might be the case that your strings in the dataframe contain trailing line breaks. I'm running Pandas 0.23.4 on linux
import pandas
print(pandas.__version__)
I created what I think your dataframe contains using the command
df = pandas.DataFrame({'colA':['A'], 'colB': ['B'], 'colC':['C'], 'colD':['D']})
To check the contents of a cell, you could use df['colA'][0].
The indexing I needed to grab the first and second columns was
df.iloc[:, 0:2]
and the way I got to a CSV did not rely on lineterminator
df.iloc[:, 0:2].to_csv("report_csv.txt", encoding='utf-8', index=False, header=False)
df.iloc[:,2:].to_csv("report_csv.txt", encoding='utf-8', index=False, header=False, mode='a')
When I run
with open('report_csv.txt','r') as file_handle:
dat = file_handle.read()
I get 'A,B\nC,D\n' from dat.
To get no trailing newline on the last line, use to_string()
with open('output.txt','w') as file_handle:
file_handle.write(df.iloc[:, 0:2].to_string(header=False,index=False)+"\n")
file_handle.write(df.iloc[:,2:].to_string(header=False,index=False))
Then we can verify the file is formatted as desired by running
with open('output.txt','r') as file_handle:
dat = file_handle.read()
The dat contains 'A B\nC D'. If spaces are not an acceptable delimiter, they could be replaced by a , prior to writing to file.
Is there a better way to import a txt file in a single pandas row than the solution below?
import pandas as pd
with open(path_txt_file) as f:
text = f.read().replace("\n", "")
df = pd.DataFrame([text], columns = ["text"])
Sample lines from the .txt file:
Today is a beautiful day.
I will go swimming.
I tried pd.read_csv but it is returning multiple rows due to new lines.
You can concatenate the lines with .str.cat() [pandas-doc]:
text = pd.read_csv(path_txt_file, sep='\n', header=None)[0].str.cat()
df = pd.DataFrame([text], columns=['text'])
I am a novice in Python, and after several searches about how to convert my list of lists into a CSV file, I didn't find how to correct my issue.
Here is my code :
#!C:\Python27\read_and_convert_txt.py
import csv
if __name__ == '__main__':
with open('c:/python27/mytxt.txt',"r") as t:
lines = t.readlines()
list = [ line.split() for line in lines ]
with open('c:/python27/myfile.csv','w') as f:
writer = csv.writer(f)
for sublist in list:
writer.writerow(sublist)
The first open() will create a list of lists from the txt file like
list = [["hello","world"], ["my","name","is","bob"], .... , ["good","morning"]]
then the second part will write the list of lists into a csv file but only in the first column.
What I need is from this list of lists to write it into a csv file like this :
Column 1, Column 2, Column 3, Column 4 ......
hello world
my name is bob
good morning
To resume when I open the csv file with the txtpad:
hello;world
my;name;is;bob
good;morning
Simply use pandas dataframe
import pandas as pd
df = pd.DataFrame(list)
df.to_csv('filename.csv')
By default missing values will be filled in with None to replace None use
df.fillna('', inplace=True)
So your final code should be like
import pandas as pd
df = pd.DataFrame(list)
df.fillna('', inplace=True)
df.to_csv('filename.csv')
Cheers!!!
Note: You should not use list as a variable name as it is a keyword in python.
I do not know if this is what you want:
list = [["hello","world"], ["my","name","is","bob"] , ["good","morning"]]
with open("d:/test.csv","w") as f:
writer = csv.writer(f, delimiter=";")
writer.writerows(list)
Gives as output file:
hello;world
my;name;is;bob
good;morning
I have a 1million line CSV file. I want to do call a lookup function on each row's 1'st column, and append its result as a new column in the same CSV (if possible).
What I want is this is something like this:
for each row in dataframe
string=row[1]
result=lookupFunction(string)
row.append[string]
I Know i could do it using python's CSV library by opening my CSV, read each row, do my operation, write results to a new CSV.
This is my code using Python's CSV library
with open(rawfile, 'r') as f:
with open(newFile, 'a') as csvfile:
csvwritter = csv.writer(csvfile, delimiter=' ')
for line in f:
#do operation
However I really want to do it with Pandas because it would be something new to me.
This is what my data looks like
77,#oshkosh # tannersville pa,,PA,US
82,#osithesakcom ca,,CA,US
88,#osp open records or,,OR,US
89,#ospbco tel ord in,,IN,US
98,#ospwmnwithn return in,,IN,US
99,#ospwmnwithn tel ord in,,IN,US
100,#osram sylvania inc ma,,MA,US
106,#osteria giotto montclair nj,,NJ,US
Any help and guidance will be appreciated it. THanks
here is a simple example of adding 2 columns to a new column from you csv file
import pandas as pd
df = pd.read_csv("yourpath/yourfile.csv")
df['newcol'] = df['col1'] + df['col2']
create df and csv
import pandas as pd
df = pd.DataFrame(dict(A=[1, 2], B=[3, 4]))
df.to_csv('test_add_column.csv')
read csv into dfromcsv
dfromcsv = pd.read_csv('test_add_column.csv', index_col=0)
create new column
dfromcsv['C'] = df['A'] * df['B']
dfromcsv
write csv
dfromcsv.to_csv('test_add_column.csv')
read it again
dfromcsv2 = pd.read_csv('test_add_column.csv', index_col=0)
dfromcsv2
How do I prevent Python from automatically writing objects into csv as a different format than originally? For example, I have list object such as the following:
row = ['APR16', '100.00000']
I want to write this row as is, however when I use writerow function of csv writer, it writes into the csv file as 16-Apr and just 10. I want to keep the original formatting.
EDIT:
Here is the code:
import pandas as pd
dates = ['APR16', 'MAY16', 'JUN16']
numbers = [100.00000, 200.00000, 300.00000]
for i in range(3):
row = []
row.append(dates[i])
row.append(numbers[i])
prow = pd.DataFrame(row)
prow.to_csv('test.csv', index=False, header=False)
And result:
Using pandas:
import pandas as pd
dates = ['APR16', 'MAY16', 'JUN16']
numbers = [100.00000, 200.00000, 300.00000]
data = zip(dates,numbers)
fd = pd.DataFrame(data)
fd.to_csv('test.csv', index=False, header=False) # csv-file
fd.to_excel("test.xls", header=False,index=False) # or xls-file
Result in my terminal:
➜ ~ cat test.csv
APR16
100.00000
Result in LibreOffice: