How do I add a new column at the very beginning of csv file? I know we can do it using pandas, but I am having issue with pandas so is there another way to do it? I have something that looks like this:
a
b
c
d
0
1
2
3
I want to do this instead:
letters
a
b
c
d
numbers
0
1
2
3
if the tables are not formatted properly here is a picture:
Tables
What do you mean by "I a am having issues" with pandas ?
Have you tried using/running df.insert(0, "letters", "numbers") ?
Anyways, alternatively, you can use csv.reader function from csv module with a listcomp to insert the new column :
import csv
with open("input.csv", "r") as file:
rows = [["letters" if idx == 0 else "numbers"] + row
for idx, row in enumerate(csv.reader(file, delimiter=","))]
with open("output.csv", "w", newline="") as file:
csv.writer(file, delimiter=",").writerows(rows)
Output :
from tabulate import tabulate #pip install tabulate
with open("output.csv", "r") as file:
reader = csv.reader(file, delimiter=",")
print(tabulate([row for row in reader]))
------- - - - -
letters a b c d
numbers 0 1 2 3
------- - - - -
Related
I am parsing hex data being obtained from a pipeline. The data is being parsed line-by-line and written to a csv file. I need to add the header.
So data obtained:
a b c d e....iy
f g h i j....iy
Required format:
1 2 3 4 5....259
a b c d e....iy
f g h i j....iy
I had tried writerow function. As it is line-by-line parsing, data obtained is as follows:
1 2 3 4 5....259
a b c d e....iy
1 2 3 4 5....259
e f g h i....iy
It prints the header name after every line.
The code I am currently using to print data to file is as below:
if '[' in line:
#processdata functions(converting from hex)
line = processdata
f = open("output.csv", "a+")
f.write(line)
f.close()
I'd appreciate it if there are any suggestions regarding this for line-to-line parsing of the file.
I am looking for something like open("file.csv", "a+", header = ['1', '2','3','n']. Thank you.
Using pandas
file.to_csv("gfg2.csv", header=headerList, index=False)
# importing python package
import pandas as pd
# read contents of csv file
file = pd.read_csv("gfg.csv")
print("\nOriginal file:")
print(file)
# adding header
headerList = ['id', 'name', 'profession']
# converting data frame to csv
file.to_csv("gfg2.csv", header=headerList, index=False)
# display modified csv file
file2 = pd.read_csv("gfg2.csv")
print('\nModified file:')
print(file2)
see https://www.geeksforgeeks.org/how-to-add-a-header-to-a-csv-file-in-python/
and https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
I have a bunch of software output files that I have manipulated into csv-like text files. I have probably done this the hard way, because I am not too familiar with python library
The next step is to gather all this data in one single csv file. The files have different headers, or are sorted differently.
Lets say this is file A:
A | B | C | D | id
0 2 3 2 "A"
...
and this is file B:
B | A | Z | D | id
4 6 1 0 "B"
...
I want the append.csv file to look like:
A | B | C | D | Z | id
0 2 3 2 "A"
6 4 0 1 "B"
...
How can I do this, elegantly? Thank you for all answers.
You can use pandas to read CSV files into DataFrames and use the concat method, then write the result to CSV:
import pandas as pd
df1 = pd.read_csv("file1.csv")
df2 = pd.read_csv("file2.csv")
df = pd.concat([df1, df2], axis=0, ignore_index=True)
df.to_csv("file.csv", index=False)
The csv module in the standard library provides tools you can use to do this. The DictReader class produces a mapping of column name to value for each row in a csv file; the DictWriter class will write such mappings to a csv file.
DictWriter must be provided with a list of column names, but does not require all column names to be present in each row mapping.
import csv
list_of_files = ['1.csv', '2.csv']
# Collect the column names.
all_headers = set()
for file_ in list_of_files:
with open(file_, newline='') as f:
reader = csv.reader(f)
headers = next(reader)
all_headers.update(headers)
all_headers = sorted(all_headers)
# Generate the output file.
with open('append.csv', 'w', newline='') as outfile:
writer = csv.DictWriter(outfile, fieldnames=all_headers)
writer.writeheader()
for file_ in list_of_files:
with open(file_, newline='') as f:
reader = csv.DictReader(f)
writer.writerows(reader)
$ cat append.csv
A,B,C,D,Z,id
0,2,3,2,,A
6,4,,0,1,B
Got a CSV which I am selecting a random sample of 500 rows using the following code:
import csv
import random
with open('Original.csv' , "rb") as source:
lines = [line for line in source]
random_choice = random.sample(lines, 500);
what I'd like to do is update a column called [winner] if they exist within the sample and then save it back to a csv file but I have no idea how to achieve this...
There is a unique identifier in a column called [ID].
how would I go about doing this?
Starting with a CSV that looks like this:
ID something winner
1 a
2 b
3 c
4 a
5 d
6 a
7 b
8 e
9 f
10 g
You could use the following approach. The whole file is read in, rows are chosen by a randomly selected index, and written back out to the file.
import csv
import random
# Read in the data
with open('example.csv', 'r') as infile:
reader = csv.reader(infile)
header = next(reader) # We want the headers, but not as part of the sample
data = []
for row in reader:
data.append(row)
# Find the column called winner
winner_column_index = header.index('winner')
# Pick some random indices which will be used to generate the sample
all_indices = list(range(len(data)))
sampled_indices = random.sample(all_indices, 5)
# Add the winner column to those rows selected
for index in sampled_indices:
data[index][winner_column_index] = 'Winner'
# Write the data back
with open('example_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(header) # Make sure we get the headers back in
writer.writerows(data) # Write the rest of the data
This will give the following output:
ID something winner
1 a
2 b Winner
3 c
4 a Winner
5 d
6 a Winner
7 b
8 e
9 f Winner
10 g Winner
EDIT: It turns out that having the first column of the CSV being called ID is not a good idea if you want to open with Excel. It then incorrectly thinks the file is in SYLK format.
First, why are you using csv and not a db? even an sqlite would be much easier (builtin - import sqlite3)
Second, you'll need to write the whole file again. I suggest you use your lines as lists and just update them (lists are like pointers so you can change the inner values and it will update)
lines=[list(line) for line in source]
and then
for choice in random_choice:
choice[WINNER_INDEX]+=1
and write the file
How could I sort a column in a csv file the way that excel will sort. below is my csv file and snippet code that I have so far. I want to sort ArrivalTime, so the particular Process and ServiceTime move along. Thank for any help or advice.
csv:
Process,ArrivalTime,ServiceTime
A,0,3
B,2,6
C,4,4
D,6,5
E,8,2
and my code:
import csv
from collections import defaultdict
columns = defaultdict(list)
with open('file.csv') as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
st = columns['ServiceTime']
at = columns['ArrivalTime']
pr = columns['Process']
Have you considered using pandas? It has built-in methods for handling exactly this type of situation.
import pandas as pd
# create a dataframe from the file, like an Excel spreadsheet
df = pd.read_csv('file.csv')
df.sort_values('ArrivalTime')
# returns:
Process ArrivalTime ServiceTime
0 A 0 3
1 B 2 6
2 C 4 4
3 D 6 5
4 E 8 2
I agree that you should use pandas...
Apart from that you don't need a defaultdict here.
Read the file and sort:
import csv
import operator as op
list_of_dicts = []
with open('in.csv','r') as f:
reader = csv.DictReader(f)
for line in reader:
list_of_dicts.append(line)
list_of_dicts.sort(key=op.itemgetter('ArrivalTime'))
Write it back out:
with open('out.csv','w') as f:
writer = csv.DictWriter(f,fieldnames=list_of_dicts[0].keys())
for i in list_of_dicts:
writer.writerow(i)
I am currently trying to count repeated values in a column of a CSV file and return the value to another CSV column in a python.
For example, my CSV file :
KeyID GeneralID
145258 KL456
145259 BG486
145260 HJ789
145261 KL456
What I want to achieve is to count how many data have the same GeneralID and insert it into a new CSV column. For example,
KeyID Total_GeneralID
145258 2
145259 1
145260 1
145261 2
I have tried to split each column using the split method but it didn't work so well.
My code :
case_id_list_data = []
with open(file_path_1, "rU") as g:
for line in g:
case_id_list_data.append(line.split('\t'))
#print case_id_list_data[0][0] #the result is dissatisfying
#I'm stuck here..
And if you are adverse to pandas and want to stay with the standard library:
Code:
import csv
from collections import Counter
with open('file1', 'rU') as f:
reader = csv.reader(f, delimiter='\t')
header = next(reader)
lines = [line for line in reader]
counts = Counter([l[1] for l in lines])
new_lines = [l + [str(counts[l[1]])] for l in lines]
with open('file2', 'wb') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerow(header + ['Total_GeneralID'])
writer.writerows(new_lines)
Results:
KeyID GeneralID Total_GeneralID
145258 KL456 2
145259 BG486 1
145260 HJ789 1
145261 KL456 2
You have to divide the task in three steps:
1. Read CSV file
2. Generate new column's value
3. Add value to the file back
import csv
import fileinput
import sys
# 1. Read CSV file
# This is opening CSV and reading value from it.
with open("dev.csv") as filein:
reader = csv.reader(filein, skipinitialspace = True)
xs, ys = zip(*reader)
result=["Total_GeneralID"]
# 2. Generate new column's value
# This loop is for counting the "GeneralID" element.
for i in range(1,len(ys),1):
result.append(ys.count(ys[i]))
# 3. Add value to the file back
# This loop is for writing new column
for ind,line in enumerate(fileinput.input("dev.csv",inplace=True)):
sys.stdout.write("{} {}, {}\n".format("",line.rstrip(),result[ind]))
I haven't use temp file or any high level module like panda or anything.
import pandas as pd
#read your csv to a dataframe
df = pd.read_csv('file_path_1')
#generate the Total_GeneralID by counting the values in the GeneralID column and extract the occurrance for the current row.
df['Total_GeneralID'] = df.GeneralID.apply(lambda x: df.GeneralID.value_counts()[x])
df = df[['KeyID','Total_GeneralID']]
Out[442]:
KeyID Total_GeneralID
0 145258 2
1 145259 1
2 145260 1
3 145261 2
You can use pandas library:
first read_csv
get counts of values in column GeneralID by value_counts, rename by output column
join to original DataFrame
import pandas as pd
df = pd.read_csv('file')
s = df['GeneralID'].value_counts().rename('Total_GeneralID')
df = df.join(s, on='GeneralID')
print (df)
KeyID GeneralID Total_GeneralID
0 145258 KL456 2
1 145259 BG486 1
2 145260 HJ789 1
3 145261 KL456 2
Use csv.reader instead of split() method.
Its easier.
Thanks