Writing columns from separate files into a single file - python

I am relatively new to working with csv files in python and would appreciate some guidiance. I have 6 separate csv files. I would like to copy data from column 1, column 2 and column 3 from each of the csv files into the corresponding first 3 columns in a new file.
How do I word that into my code?
Here is my incomplete code:
import csv
file1 = open ('fileA.csv', 'rb')
reader1 = csv.reader (file1)
file2 = open ('fileB.csv', 'rb')
reader2 = csv.reader (file2)
file3 = open ('fileC.csv', 'rb')
reader3 = csv.reader (file3)
file4 = open ('fileD.csv', 'rb')
reader4 = csv.reader (file4)
file5 = open ('fileE.csv', 'rb')
reader5 = csv.reader (file5)
file6 = open ('fileF.csv', 'rb')
reader6 = csv.reader (file6)
WriteFile = open ('NewFile.csv','wb')
writer = csv.writer(WriteFile)
next(reader1, None)
Data1 = (col[0:3] for col in reader1)
next(reader2, None)
Data2 = (col[0:3] for col in reader2)
next(reader3, None)
Data3 = (col[0:3] for col in reader3)
next(reader4, None)
Data4 = (col[0:3] for col in reader4)
next(reader5, None)
Data5 = (col[0:3] for col in reader5)
next(reader6, None)
Data6 = (col[0:3] for col in reader6)
.......????????
file1.close()
file2.close()
file3.close()
file4.close()
file5.close()
file6.close()
WriteFile.close()
Thanks!

If you just want these all concatenated, that's easy. You can either call writerows on each of your iterators, or chain them together:
writer.writerows(itertools.chain(Data1, Data2, Data3, Data4, Data5, Data6))
Or, if you want them interleaved, where you get row 1 from Data1, then row 1 from Data 2, and so on, and then row 2 from Data 1, etc., use zip to transpose the data, and then chain again to flatten it:
writer.writerows(itertools.chain.from_iterable(zip(Data1, Data2, Data3,
Data4, Data5, Data6)))
If the files are of different lengths, that zip will stop as soon as you reach the end of any of the files. Is that what you want? I have no idea. You might want that. You might want to fill in the gaps with blank rows (in which case look at zip_longest). You might want to skip over the gaps (which you can do with zip_longest plus filter). Or a million other possibilities.
As a side note, once you get to this many similar variables, it's usually a good sign that you really wanted a single iterable instead of separate variables. For example:
filenames = ('fileA.csv', 'fileB.csv', 'fileC.csv',
'fileD.csv', 'fileE.csv', 'fileF.csv')
files = [open(filename, 'rb') for filename in filenames]
readers = [csv.reader(file) for file in files]
WriteFile = open ('NewFile.csv','wb')
writer = csv.writer(WriteFile)
for reader in readers:
next(reader, None)
Data = [(col[0:3] for col in reader) for reader in readers]
writer.writerows(itertools.chain.from_iterable(Data))
for file in files:
file.close()
WriteFile.close()
(Notice that I used list comprehensions, not generator expressions, for the collections of files, readers, data, etc. That's because we need to iterate over them repeatedly—e.g., create a reader for every file, and later call close on every file. Also because there are a fixed, small number of elements—6—so "wasting" a whole list isn't really any issue.)

The way I understand your question is that you have six separate csv's that have 3 columns each and the data in each column is of the same type in all six files. If so you could use pandas. Say you had 3 files that looked like ...
file1:
col1 col2 col3
1 1 1
1 1 1
and then a second and third file with 2's in the second and 3's in the third you could write...
#!/usr/bin/env python
import pandas as pd
cols = ['col1', 'col2', 'col3']
files = ['~/one.txt', '~/two.txt', '~/three.txt']
data_1 = pd.read_csv(files[0], sep=',', header=False, names=cols)
data_2 = pd.read_csv(files[1], sep=',', header=False, names=cols)
data_3 = pd.read_csv(files[2], sep=',', header=False, names=cols)
data_final = data_1.append(data_2).append(data_3)
Then data_final should have the contents of all three data sets stacked on each other. You can modify for 6 (or n) datasets. Hope this is what you wanted.
Out[1]: col1 col2 col3
1 1 1
1 1 1
2 2 2
2 2 2
3 3 3
3 3 3

Related

Adding a Python List to a CSV File as a Column [duplicate]

I have several CSV files that look like this:
Input
Name Code
blackberry 1
wineberry 2
rasberry 1
blueberry 1
mulberry 2
I would like to add a new column to all CSV files so that it would look like this:
Output
Name Code Berry
blackberry 1 blackberry
wineberry 2 wineberry
rasberry 1 rasberry
blueberry 1 blueberry
mulberry 2 mulberry
The script I have so far is this:
import csv
with open(input.csv,'r') as csvinput:
with open(output.csv, 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
writer.writerow(row+['Berry'])
(Python 3.2)
But in the output, the script skips every line and the new column has only Berry in it:
Output
Name Code Berry
blackberry 1 Berry
wineberry 2 Berry
rasberry 1 Berry
blueberry 1 Berry
mulberry 2 Berry
This should give you an idea of what to do:
>>> v = open('C:/test/test.csv')
>>> r = csv.reader(v)
>>> row0 = r.next()
>>> row0.append('berry')
>>> print row0
['Name', 'Code', 'berry']
>>> for item in r:
... item.append(item[0])
... print item
...
['blackberry', '1', 'blackberry']
['wineberry', '2', 'wineberry']
['rasberry', '1', 'rasberry']
['blueberry', '1', 'blueberry']
['mulberry', '2', 'mulberry']
>>>
Edit, note in py3k you must use next(r)
Thanks for accepting the answer. Here you have a bonus (your working script):
import csv
with open('C:/test/test.csv','r') as csvinput:
with open('C:/test/output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('Berry')
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
Please note
the lineterminator parameter in csv.writer. By default it is
set to '\r\n' and this is why you have double spacing.
the use of a list to append all the lines and to write them in
one shot with writerows. If your file is very, very big this
probably is not a good idea (RAM) but for normal files I think it is
faster because there is less I/O.
As indicated in the comments to this post, note that instead of
nesting the two with statements, you can do it in the same line:
with open('C:/test/test.csv','r') as csvinput, open('C:/test/output.csv', 'w') as csvoutput:
I'm surprised no one suggested Pandas. Although using a set of dependencies like Pandas might seem more heavy-handed than is necessary for such an easy task, it produces a very short script and Pandas is a great library for doing all sorts of CSV (and really all data types) data manipulation. Can't argue with 4 lines of code:
import pandas as pd
csv_input = pd.read_csv('input.csv')
csv_input['Berries'] = csv_input['Name']
csv_input.to_csv('output.csv', index=False)
Check out Pandas Website for more information!
Contents of output.csv:
Name,Code,Berries
blackberry,1,blackberry
wineberry,2,wineberry
rasberry,1,rasberry
blueberry,1,blueberry
mulberry,2,mulberry
import csv
with open('input.csv','r') as csvinput:
with open('output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[0] == "Name":
writer.writerow(row+["Berry"])
else:
writer.writerow(row+[row[0]])
Maybe something like that is what you intended?
Also, csv stands for comma separated values. So, you kind of need commas to separate your values like this I think:
Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2
I used pandas and it worked well...
While I was using it, I had to open a file and add some random columns to it and then save back to same file only.
This code adds multiple column entries, you may edit as much you need.
import pandas as pd
csv_input = pd.read_csv('testcase.csv') #reading my csv file
csv_input['Phone1'] = csv_input['Name'] #this would also copy the cell value
csv_input['Phone2'] = csv_input['Name']
csv_input['Phone3'] = csv_input['Name']
csv_input['Phone4'] = csv_input['Name']
csv_input['Phone5'] = csv_input['Name']
csv_input['Country'] = csv_input['Name']
csv_input['Website'] = csv_input['Name']
csv_input.to_csv('testcase.csv', index=False) #this writes back to your file
If you want that cell value doesn't gets copy, so first of all create a empty Column in your csv file manually, like you named it as Hours
then, Now for this you can add this line in above code,
csv_input['New Value'] = csv_input['Hours']
or simply we can, without adding the manual column, we can
csv_input['New Value'] = '' #simple and easy
I Hope it helps.
Yes Its a old question but it might help some
import csv
import uuid
# read and write csv files
with open('in_file','r') as r_csvfile:
with open('out_file','w',newline='') as w_csvfile:
dict_reader = csv.DictReader(r_csvfile,delimiter='|')
#add new column with existing
fieldnames = dict_reader.fieldnames + ['ADDITIONAL_COLUMN']
writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter='|')
writer_csv.writeheader()
for row in dict_reader:
row['ADDITIONAL_COLUMN'] = str(uuid.uuid4().int >> 64) [0:6]
writer_csv.writerow(row)
I don't see where you're adding the new column, but try this:
import csv
i = 0
Berry = open("newcolumn.csv","r").readlines()
with open(input.csv,'r') as csvinput:
with open(output.csv, 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
writer.writerow(row+","+Berry[i])
i++
This code will suffice your request and I have tested on the sample code.
import csv
with open(in_path, 'r') as f_in, open(out_path, 'w') as f_out:
csv_reader = csv.reader(f_in, delimiter=';')
writer = csv.writer(f_out)
for row in csv_reader:
writer.writerow(row + [row[0]]
In case of a large file you can use pandas.read_csv with the chunksize argument which allows to read the dataset per chunk:
import pandas as pd
INPUT_CSV = "input.csv"
OUTPUT_CSV = "output.csv"
CHUNKSIZE = 1_000 # Maximum number of rows in memory
header = True
mode = "w"
for chunk_df in pd.read_csv(INPUT_CSV, chunksize=CHUNKSIZE):
chunk_df["Berry"] = chunk_df["Name"]
# You apply any other transformation to the chunk
# ...
chunk_df.to_csv(OUTPUT_CSV, header=header, mode=mode)
header = False # Do not save the header for the other chunks
mode = "a" # 'a' stands for append mode, all the other chunks will be appended
If you want to update the file inplace, you can use a temporary file and erase it at the end
import pandas as pd
INPUT_CSV = "input.csv"
TMP_CSV = "tmp.csv"
CHUNKSIZE = 1_000 # Maximum number of rows in memory
header = True
mode = "w"
for chunk_df in pd.read_csv(INPUT_CSV, chunksize=CHUNKSIZE):
chunk_df["Berry"] = chunk_df["Name"]
# You apply any other transformation to the chunk
# ...
chunk_df.to_csv(TMP_CSV, header=header, mode=mode)
header = False # Do not save the header for the other chunks
mode = "a" # 'a' stands for append mode, all the other chunks will be appended
os.replace(TMP_CSV, INPUT_CSV)
For adding a new column to an existing CSV file(with headers), if the column to be added has small enough number of values, here is a convenient function (somewhat similar to #joaquin's solution). The function takes the
Existing CSV filename
Output CSV filename (which will have the updated content) and
List with header name&column values
def add_col_to_csv(csvfile,fileout,new_list):
with open(csvfile, 'r') as read_f, \
open(fileout, 'w', newline='') as write_f:
csv_reader = csv.reader(read_f)
csv_writer = csv.writer(write_f)
i = 0
for row in csv_reader:
row.append(new_list[i])
csv_writer.writerow(row)
i += 1
Example:
new_list1 = ['test_hdr',4,4,5,5,9,9,9]
add_col_to_csv('exists.csv','new-output.csv',new_list1)
Existing CSV file:
Output(updated) CSV file:
Append new column in existing csv file using python without header name
default_text = 'Some Text'
# Open the input_file in read mode and output_file in write mode
with open('problem-one-answer.csv', 'r') as read_obj, \
open('output_1.csv', 'w', newline='') as write_obj:
# Create a csv.reader object from the input file object
csv_reader = reader(read_obj)
# Create a csv.writer object from the output file object
csv_writer = csv.writer(write_obj)
# Read each row of the input csv file as list
for row in csv_reader:
# Append the default text in the row / list
row.append(default_text)
# Add the updated row / list to the output file
csv_writer.writerow(row)
Thankyou
You may just write:
import pandas as pd
import csv
df = pd.read_csv('csv_name.csv')
df['Berry'] = df['Name']
df.to_csv("csv_name.csv",index=False)
Then you are done. To check it, you may run:
h = pd.read_csv('csv_name.csv')
print(h)
If you want to add a column with some arbitrary new elements(a,b,c), you may replace the 4th line of the code by:
df['Berry'] = ['a','b','c']

Python: merge csv data with differing headers

I have a bunch of software output files that I have manipulated into csv-like text files. I have probably done this the hard way, because I am not too familiar with python library
The next step is to gather all this data in one single csv file. The files have different headers, or are sorted differently.
Lets say this is file A:
A | B | C | D | id
0 2 3 2 "A"
...
and this is file B:
B | A | Z | D | id
4 6 1 0 "B"
...
I want the append.csv file to look like:
A | B | C | D | Z | id
0 2 3 2 "A"
6 4 0 1 "B"
...
How can I do this, elegantly? Thank you for all answers.
You can use pandas to read CSV files into DataFrames and use the concat method, then write the result to CSV:
import pandas as pd
df1 = pd.read_csv("file1.csv")
df2 = pd.read_csv("file2.csv")
df = pd.concat([df1, df2], axis=0, ignore_index=True)
df.to_csv("file.csv", index=False)
The csv module in the standard library provides tools you can use to do this. The DictReader class produces a mapping of column name to value for each row in a csv file; the DictWriter class will write such mappings to a csv file.
DictWriter must be provided with a list of column names, but does not require all column names to be present in each row mapping.
import csv
list_of_files = ['1.csv', '2.csv']
# Collect the column names.
all_headers = set()
for file_ in list_of_files:
with open(file_, newline='') as f:
reader = csv.reader(f)
headers = next(reader)
all_headers.update(headers)
all_headers = sorted(all_headers)
# Generate the output file.
with open('append.csv', 'w', newline='') as outfile:
writer = csv.DictWriter(outfile, fieldnames=all_headers)
writer.writeheader()
for file_ in list_of_files:
with open(file_, newline='') as f:
reader = csv.DictReader(f)
writer.writerows(reader)
$ cat append.csv
A,B,C,D,Z,id
0,2,3,2,,A
6,4,,0,1,B

Filter large csv files (10GB+) based on column value in Python

EDITED : Added Complexity
I have a large csv file, and I want to filter out rows based on the column values. For example consider the following CSV file format:
Col1,Col2,Nation,State,Col4...
a1,b1,Germany,state1,d1...
a2,b2,Germany,state2,d2...
a3,b3,USA,AL,d3...
a3,b3,USA,AL,d4...
a3,b3,USA,AK,d5...
a3,b3,USA,AK,d6...
I want to filter all rows with Nation == 'USA', and then based on each of the 50 state. What's the most efficient way of doing this? I'm using Python. Thanks
Also, is R better than Python for such tasks?
Use boolean indexing or DataFrame.query:
df1 = df[df['Nation'] == "Japan"]
Or:
df1 = df.query('Nation == "Japan"')
Second should be faster, see performance of query.
If still not possible (not a lot of RAM) try use dask as commented Jon Clements (thank you).
One way would be to filter the csv first and then load, given the size of the data
import csv
with open('yourfile.csv', 'r') as f_in:
with open('yourfile_edit.csv', 'w') as f_outfile:
f_out = csv.writer(f_outfile, escapechar=' ',quoting=csv.QUOTE_NONE)
for line in f_in:
line = line.strip()
row = []
if 'Japan' in line:
row.append(line)
f_out.writerow(row)
Now load the csv
df = pd.read_csv('yourfile_edit.csv', sep = ',',header = None)
You get
0 1 2 3 4
0 2 a3 b3 Japan d3
You could open the file, index the position of the Nation header, then iterate over a reader().
import csv
temp = r'C:\path\to\file'
with open(temp, 'r', newline='') as f:
cr = csv.reader(f, delimiter=',')
# next(cr) gets the header row (row[0])
i = next(cr).index('Nation')
# list comprehension through remaining cr iterables
filtered = [row for row in cr if row[i] == 'Japan']

Extract and merge columns in a CSV

Here's my situation....I have two CSV files (file 1 and file2). File1 has about 15 columns and file2 has about 10 columns. I need to grab all 15 columns from file1 and extract just column 13 from file2 and merge all 16 columns in a new csv file called "final.csv" Please suggest me some ideas as to how I can make this code work. Here is what I have so far...
import csv
File1 = 'F:\somedata\somefolder\file1.csv'
File2 = 'F:\somedata\somefolder\file2.csv'
File3 = 'F:\\somedata\somefolder\final.csv'
with open('r', 'File1' and 'File2', 'rt') as f, open('r', 'File3', 'wt', newline='') as f_out:
headings = next(iter(csv.reader(f)))
csv.writer(f_out).writerow(headings)
csvout = csv.DictWriter(f_out, fieldnames=headings)
for d in csv.DictReader(f, fieldnames=headings):
csvout.writerow(d)
I would start by using pandas load your files as tables. Then use indexing to select the columns you want, merge the files then create a new file. Obviously you cant select the thirteenth column from file2 if it only has 10 columns, so here I am assuming you DO have 13 columns in that file.
import pandas as pd
file1 = pd.read_table('F:\somedata\somefolder\file1.csv', delimiter=',', header=None)
file2 = pd.read_table('F:\somedata\somefolder\file2.csv', delimiter=',', header=None)
file2_short = file2.ix[:,12:13]
new = pd.concat(file1, file2_short, axis=1)
new.to_csv('F:\somedata\somefolder\newfile.csv')
This assumes that you want column 13 from file 2. If that column has a header (of course you would remove the 'header = None' part) you can select by that instead...
file2_short = file2['col_13']
Hope this helps

How to add a new column to a CSV file?

I have several CSV files that look like this:
Input
Name Code
blackberry 1
wineberry 2
rasberry 1
blueberry 1
mulberry 2
I would like to add a new column to all CSV files so that it would look like this:
Output
Name Code Berry
blackberry 1 blackberry
wineberry 2 wineberry
rasberry 1 rasberry
blueberry 1 blueberry
mulberry 2 mulberry
The script I have so far is this:
import csv
with open(input.csv,'r') as csvinput:
with open(output.csv, 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
writer.writerow(row+['Berry'])
(Python 3.2)
But in the output, the script skips every line and the new column has only Berry in it:
Output
Name Code Berry
blackberry 1 Berry
wineberry 2 Berry
rasberry 1 Berry
blueberry 1 Berry
mulberry 2 Berry
This should give you an idea of what to do:
>>> v = open('C:/test/test.csv')
>>> r = csv.reader(v)
>>> row0 = r.next()
>>> row0.append('berry')
>>> print row0
['Name', 'Code', 'berry']
>>> for item in r:
... item.append(item[0])
... print item
...
['blackberry', '1', 'blackberry']
['wineberry', '2', 'wineberry']
['rasberry', '1', 'rasberry']
['blueberry', '1', 'blueberry']
['mulberry', '2', 'mulberry']
>>>
Edit, note in py3k you must use next(r)
Thanks for accepting the answer. Here you have a bonus (your working script):
import csv
with open('C:/test/test.csv','r') as csvinput:
with open('C:/test/output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('Berry')
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
Please note
the lineterminator parameter in csv.writer. By default it is
set to '\r\n' and this is why you have double spacing.
the use of a list to append all the lines and to write them in
one shot with writerows. If your file is very, very big this
probably is not a good idea (RAM) but for normal files I think it is
faster because there is less I/O.
As indicated in the comments to this post, note that instead of
nesting the two with statements, you can do it in the same line:
with open('C:/test/test.csv','r') as csvinput, open('C:/test/output.csv', 'w') as csvoutput:
I'm surprised no one suggested Pandas. Although using a set of dependencies like Pandas might seem more heavy-handed than is necessary for such an easy task, it produces a very short script and Pandas is a great library for doing all sorts of CSV (and really all data types) data manipulation. Can't argue with 4 lines of code:
import pandas as pd
csv_input = pd.read_csv('input.csv')
csv_input['Berries'] = csv_input['Name']
csv_input.to_csv('output.csv', index=False)
Check out Pandas Website for more information!
Contents of output.csv:
Name,Code,Berries
blackberry,1,blackberry
wineberry,2,wineberry
rasberry,1,rasberry
blueberry,1,blueberry
mulberry,2,mulberry
import csv
with open('input.csv','r') as csvinput:
with open('output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[0] == "Name":
writer.writerow(row+["Berry"])
else:
writer.writerow(row+[row[0]])
Maybe something like that is what you intended?
Also, csv stands for comma separated values. So, you kind of need commas to separate your values like this I think:
Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2
I used pandas and it worked well...
While I was using it, I had to open a file and add some random columns to it and then save back to same file only.
This code adds multiple column entries, you may edit as much you need.
import pandas as pd
csv_input = pd.read_csv('testcase.csv') #reading my csv file
csv_input['Phone1'] = csv_input['Name'] #this would also copy the cell value
csv_input['Phone2'] = csv_input['Name']
csv_input['Phone3'] = csv_input['Name']
csv_input['Phone4'] = csv_input['Name']
csv_input['Phone5'] = csv_input['Name']
csv_input['Country'] = csv_input['Name']
csv_input['Website'] = csv_input['Name']
csv_input.to_csv('testcase.csv', index=False) #this writes back to your file
If you want that cell value doesn't gets copy, so first of all create a empty Column in your csv file manually, like you named it as Hours
then, Now for this you can add this line in above code,
csv_input['New Value'] = csv_input['Hours']
or simply we can, without adding the manual column, we can
csv_input['New Value'] = '' #simple and easy
I Hope it helps.
Yes Its a old question but it might help some
import csv
import uuid
# read and write csv files
with open('in_file','r') as r_csvfile:
with open('out_file','w',newline='') as w_csvfile:
dict_reader = csv.DictReader(r_csvfile,delimiter='|')
#add new column with existing
fieldnames = dict_reader.fieldnames + ['ADDITIONAL_COLUMN']
writer_csv = csv.DictWriter(w_csvfile,fieldnames,delimiter='|')
writer_csv.writeheader()
for row in dict_reader:
row['ADDITIONAL_COLUMN'] = str(uuid.uuid4().int >> 64) [0:6]
writer_csv.writerow(row)
I don't see where you're adding the new column, but try this:
import csv
i = 0
Berry = open("newcolumn.csv","r").readlines()
with open(input.csv,'r') as csvinput:
with open(output.csv, 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
writer.writerow(row+","+Berry[i])
i++
This code will suffice your request and I have tested on the sample code.
import csv
with open(in_path, 'r') as f_in, open(out_path, 'w') as f_out:
csv_reader = csv.reader(f_in, delimiter=';')
writer = csv.writer(f_out)
for row in csv_reader:
writer.writerow(row + [row[0]]
In case of a large file you can use pandas.read_csv with the chunksize argument which allows to read the dataset per chunk:
import pandas as pd
INPUT_CSV = "input.csv"
OUTPUT_CSV = "output.csv"
CHUNKSIZE = 1_000 # Maximum number of rows in memory
header = True
mode = "w"
for chunk_df in pd.read_csv(INPUT_CSV, chunksize=CHUNKSIZE):
chunk_df["Berry"] = chunk_df["Name"]
# You apply any other transformation to the chunk
# ...
chunk_df.to_csv(OUTPUT_CSV, header=header, mode=mode)
header = False # Do not save the header for the other chunks
mode = "a" # 'a' stands for append mode, all the other chunks will be appended
If you want to update the file inplace, you can use a temporary file and erase it at the end
import pandas as pd
INPUT_CSV = "input.csv"
TMP_CSV = "tmp.csv"
CHUNKSIZE = 1_000 # Maximum number of rows in memory
header = True
mode = "w"
for chunk_df in pd.read_csv(INPUT_CSV, chunksize=CHUNKSIZE):
chunk_df["Berry"] = chunk_df["Name"]
# You apply any other transformation to the chunk
# ...
chunk_df.to_csv(TMP_CSV, header=header, mode=mode)
header = False # Do not save the header for the other chunks
mode = "a" # 'a' stands for append mode, all the other chunks will be appended
os.replace(TMP_CSV, INPUT_CSV)
For adding a new column to an existing CSV file(with headers), if the column to be added has small enough number of values, here is a convenient function (somewhat similar to #joaquin's solution). The function takes the
Existing CSV filename
Output CSV filename (which will have the updated content) and
List with header name&column values
def add_col_to_csv(csvfile,fileout,new_list):
with open(csvfile, 'r') as read_f, \
open(fileout, 'w', newline='') as write_f:
csv_reader = csv.reader(read_f)
csv_writer = csv.writer(write_f)
i = 0
for row in csv_reader:
row.append(new_list[i])
csv_writer.writerow(row)
i += 1
Example:
new_list1 = ['test_hdr',4,4,5,5,9,9,9]
add_col_to_csv('exists.csv','new-output.csv',new_list1)
Existing CSV file:
Output(updated) CSV file:
You may just write:
import pandas as pd
import csv
df = pd.read_csv('csv_name.csv')
df['Berry'] = df['Name']
df.to_csv("csv_name.csv",index=False)
Then you are done. To check it, you may run:
h = pd.read_csv('csv_name.csv')
print(h)
If you want to add a column with some arbitrary new elements(a,b,c), you may replace the 4th line of the code by:
df['Berry'] = ['a','b','c']
Append new column in existing csv file using python without header name
default_text = 'Some Text'
# Open the input_file in read mode and output_file in write mode
with open('problem-one-answer.csv', 'r') as read_obj, \
open('output_1.csv', 'w', newline='') as write_obj:
# Create a csv.reader object from the input file object
csv_reader = reader(read_obj)
# Create a csv.writer object from the output file object
csv_writer = csv.writer(write_obj)
# Read each row of the input csv file as list
for row in csv_reader:
# Append the default text in the row / list
row.append(default_text)
# Add the updated row / list to the output file
csv_writer.writerow(row)
Thankyou

Categories