Read all columns from CSV file? - python

I am trying to read in a CSV file and then take all values from each column and put into a separate list. I do not want the values by row. Since the CSV reader only allows to loop through the file once, I am using the seek() method to go back to the beginning and read the next column. Besides using a Dict mapping, is there a better way to do this?
infile = open(fpath, "r")
reader = csv.reader(infile)
NOUNS = [col[0] for col in reader]
infile.seek(0) # <-- set the iterator to beginning of the input file
VERBS = [col[1] for col in reader]
infile.seek(0)
ADJECTIVES = [col[2] for col in reader]
infile.seek(0)
SENTENCES = [col[3] for col in reader]

Something like this would do it in one pass:
kinds = NOUNS, VERBS, ADJECTIVES, SENTENCES = [], [], [], []
with open(fpath, "r") as infile:
for cols in csv.reader(infile):
for i, kind in enumerate(kinds):
kind.append(cols[i])

You could feed the reader to zip and unpack it to variables as you wish.
import csv
with open('input.csv') as f:
first, second, third, fourth = zip(*csv.reader(f))
print('first: {}, second: {}, third: {}, fourth: {}'.format(
first, second, third, fourth
))
With following input:
1,2,3,4
A,B,C,D
It will produce output:
first: ('1', 'A'), second: ('2', 'B'), third: ('3', 'C'), fourth: ('4', 'D')

This works assuming you know exactly how many columns are in the csv (and there isn't a header row).
NOUNS = []
VERBS = []
ADJECTIVES = []
SENTENCES = []
with open(fpath, "r") as infile:
reader = csv.reader(infile)
for row in reader:
NOUNS.append(row[0])
VERBS.append(row[1])
ADJECTIVES.append(row[2])
SENTENCES.append(row[3])
If you don't know the column headers, you're going to have to be clever and read off the first row, make lists for every column you encounter, and loop through every new row and insert in the appropriate list. You'll probably need to do a list of lists.
If you don't mind adding a dependency, use Pandas. Use a DataFrame and the method read_csv(). Access each column using the column name i.e.
df = pandas.DataFrame.read_csv(fpath)
print df['NOUN']
print df['VERBS']

I am not sure why you dont want to use dict mapping. This is what I end up doing
Data
col1,col2,col3
val1,val2,val3
val4,val5,val6
Code
import csv
d = dict()
with open("abc.text") as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
for key, value in row.items():
if d.get(key) is None:
d[key] = [value]
else:
d[key].append(value)
print d
{'col2': ['val2', 'val5'], 'col3': ['val3', 'val6'], 'col1': ['val1', 'val4']}

Related

Get rows from CSV by matching header to multiple dictionary key-values

I have a CSV file with header and I want to retrieve all the rows from CSV that matches a dictionary key-value. Note that dictionary can contain any number of orbitary key and value to match with.
Here is the code I have written to solve this, is there any other better way to approach this (other than pandas dataframe)?
Better way mean - removal of unnecessary variable if any? better data structure, better library, reducing space/time complexity than below solution
options = {'h1': 'v1', 'h2': 'v2'}
output = []
with open("data.csv", "rt") as csvfile:
data = csv.reader(csvfile, delimiter=',', quotechar='"')
header = next(data)
for row in data:
match = 0
for k, v in options.items():
match += 1 if row[header.index(k)] == v else 0
if len(options.keys()) == match:
output.append(dict(zip(header, row)))
return output
You don't say what you would consider a "better" approach to be. That said, it would take fewer lines of code if you used a csv.DictReader to process the input file as illustrated.
import csv
def find_matching_rows(filename, criteria, delimiter=',', quotechar='"'):
criteria_values = tuple(criteria.values())
matches = []
with open(filename, 'r', newline='') as csvfile:
for row in csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar):
if tuple(row[key] for key in criteria) == criteria_values:
matches.append(row)
return matches
results = find_matching_rows('matchtest.csv', {'h1': 'v1', 'h2': 'v2'})
for row in results:
print(row)
You can use a list comprehension to read and filter the rows of a DictReader. Make the wanted options a set and then its an easy test for intersection.
import csv
def test():
options = {'h1': 'v1', 'h2': 'v2'}
wanted = set(options.items())
with open("data.csv", "rt", newline="") as csvfile:
return [row for row in csv.DictReader(csvfile) if set(row.items()) & wanted]
print(test())
print(len(test()))

Create dictionary from CSV where column names are keys

I'm trying to read a csv file (actually a tsv, but nvm) and set it as a dictionary where its key are the column names of said csv and the rest of the rows are values for those keys.
I also have some comments marked by the '#' character, which I intend to ignore:
csv_in.csv
##Some comments
##Can ignore these lines
Location Form Range <-- This would be the header
North Dodecahedron Limited <---|
East Toroidal polyhedron Flexible <------ These lines would be lists
South Icosidodecahedron Limited <---|
The main idea is to store them like this:
final_dict = {'Location': ['North','East','South'],
'Form': ['Dodecahedron','Toroidal polyhedron','Icosidodecahedron'],
'Range': ['Limited','Flexible','Limited']}
So far I could come close like so:
tryercode.py
import csv
dct = {}
# Open csv file
with open(tsvfile) as file_in:
# Open reader instance with tab delimeter
reader = csv.reader(file_in, delimiter='\t')
# Iterate through rows
for row in reader:
# First I skip those rows that start with '#'
if row[0].startswith('#'):
pass
elif row[0].startswith('L'):
# Here I try to keep the first row that starts with the letter 'L' in a separate list
# and insert this first row values as keys with empty lists inside
dictkeys_list = []
for i in range(len(row)):
dictkeys_list.append(row[i])
dct[row[i]] = []
else:
# Insert each row indexes as values by the quantity of rows
print('¿?')
So far, the dictionary's skeleton looks fine:
print(dct)
{'Location': [], 'Form': [], 'Range': []}
But everything I tried so far failed to append the values to the keys' empty lists the way it is intended. Only could do so for the first row.
(...)
else:
# Insert each row indexes as values by the quantity of rows
print('¿?')
for j in range(len(row)):
dct[dictkeys_list[j]] = row[j] # Here I indicate the intented key of the dict through the preoviously list of key names
I searched far and wide stackoverflow but couldn't find it for this way (the code template is inspired by an answer at this post, but the dictionary is of a different structure.
Using collections.defaultdict, we can create a dictionary that automatically initialises its values as lists. Then we can iterate over a csv.DictReader to populate the defaultdict.
Given this data:
A,B,C
a,b,c
aa,bb,cc
aaa,bbb,ccc
This code
import collections
import csv
d = collections.defaultdict(list)
with open('myfile.csv', 'r', newline='') as f:
reader = csv.DictReader(f)
for row in reader:
for k, v in row.items():
d[k].append(v)
print(d)
Produces this result:
defaultdict(<class 'list'>, {'A': ['a', 'aa', 'aaa'],
'B': ['b', 'bb', 'bbb'],
'C': ['c', 'cc', 'ccc']})
I amend something in your code and run it. Your code can work with the right result.
The code is below
import csv
dct = {}
# Open csv file
tsvfile="./tsv.csv" # This is the tsv file path
with open(tsvfile) as file_in:
# Open reader instance with tab delimeter
reader = csv.reader(file_in, delimiter='\t')
for row in reader:
# First I skip those rows that start with '#'
if row[0].startswith('#'):
pass
elif row[0].startswith('L'):
# Here I try to keep the first row that starts with the letter 'L' in a separate list
# and insert this first row values as keys with empty lists inside
dictkeys_list = []
for i in range(len(row)):
dictkeys_list.append(row[i])
dct[row[i]] = []
else:
# Insert each row indexes as values by the quantity of rows
for i in range(len(row)):
dct[dictkeys_list[i]].append(row[i])
print(dct)
# Iterate through rows
Running result like this
Besides, I amend your further like below, I think the code can deal with more complicated situation
import csv
dct = {}
# Open csv file
tsvfile="./tsv.csv" # This is the tsv file path
is_head=True # judge if the first line
with open(tsvfile) as file_in:
# Open reader instance with tab delimeter
reader = csv.reader(file_in, delimiter='\t')
for row in reader:
# First I skip those rows that start with '#'
# Use strip() to remove the space char of each item
if row.__len__()==0 or row[0].strip().startswith('#'):
pass
elif is_head:
# Here I try to keep the first row that starts with the letter 'L' in a separate list
# and insert this first row values as keys with empty lists inside
is_head=False
dictkeys_list = []
for i in range(len(row)):
item=row[i].strip()
dictkeys_list.append(item)
dct[item] = []
else:
# Insert each row indexes as values by the quantity of rows
for i in range(len(row)):
dct[dictkeys_list[i]].append(row[i].strip())
print(dct)
# Iterate through rows
Hi you can try the pandas library.
import pandas as pd
df = pd.read_csv("csv_in.csv")
df.to_dict(orient="list")
To reproduce this, I have created a csv file with below content and saved as 'csvfile.csv'.
Location,Form,Range
North,Dodecahedron,Limited
East,Toroidal polyhedron,Flexible
South,Icosidodecahedron,Limited
Now to achieve your goal, I have used pandas library as below:
import pandas as pd
df_csv = pd.read_csv('csvfile.csv')
dict_csv = df_csv.to_dict(orient='list')
print(dict_csv)
and here's the output as you needed:
{'Location': ['North', 'East', 'South'],
'Form': ['Dodecahedron', 'Toroidal polyhedron', 'Icosidodecahedron'],
'Range': ['Limited', 'Flexible', 'Limited']}
Hope, this helps.

Writing Python's List or Dictionary into CSV file (Row containing More than One Column)

I am just starting up Python!!
i want to make a CSV file where i have a dictionary and i want to print each member of it in its own column in the same row.
like i have an array of dictionaries and i want each row to represent one of them and each column of each row to represent an item inside.
import csv
"... we are going to create an array of dictionaries and print them all..."
st_dic = []
true = 1
while true:
dummy = input("Please Enter name, email, mobile, university, major")
x = dummy.split(",")
if "Stop" in x:
break
dict ={"Name":x[0],"Email":x[1],"Mobile":x[2],"University":x[3],"Major":x[4]}
st_dic.append(dict)
f2 = open("data.csv" , "w")
with open("data.csv", "r+") as f:
writer = csv.writer(f)
for item in st_dic:
writer.writerow([item["Name"], item["Email"], item["Mobile"] , item["University"] , item["Major"]])
f.close()
the thing i output now is a row which contains the data in the first dictionary, i just want them seperated, each in its own column within its row.
It is surprising there are so many questions here that try to fill in some data in while loop and input() command. In all the fairness, this is not python best use case.
Imagine you had the dictionary just filled in your code:
dict1 = {'name': "Kain", 'email': 'make_it_up#something.com'}
dict2 = {'name': "Abel", 'email': 'make_it_up2#otherthing.com'}
dict_list = [dict1, dict2]
After that you can export to csv easily:
import csv
with open('data.csv', 'w') as f:
w = csv.DictWriter(f, ['name', 'email'], lineterminator='\n')
w.writeheader()
for row in dict_list:
w.writerow(row)
Note there are many questiona about csv module on SO
as well as there are examples in documentation.

Python read CSV file, and write to another skipping columns

I have CSV input file with 18 columns
I need to create new CSV file with all columns from input except column 4 and 5
My function now looks like
def modify_csv_report(input_csv, output_csv):
begin = 0
end = 3
with open(input_csv, "r") as file_in:
with open(output_csv, "w") as file_out:
writer = csv.writer(file_out)
for row in csv.reader(file_in):
writer.writerow(row[begin:end])
return output_csv
So it reads and writes columns number 0 - 3, but i don't know how skip column 4,5 and continue from there
You can add the other part of the row using slicing, like you did with the first part:
writer.writerow(row[:4] + row[6:])
Note that to include column 3, the stop index of the first slice should be 4. Specifying start index 0 is also usually not necessary.
A more general approach would employ a list comprehension and enumerate:
exclude = (4, 5)
writer.writerow([r for i, r in enumerate(row) if i not in exclude])
If your CSV has meaningful headers an alternative solution to slicing your rows by indices, is to use the DictReader and DictWriter classes.
#!/usr/bin/env python
from csv import DictReader, DictWriter
data = '''A,B,C
1,2,3
4,5,6
6,7,8'''
reader = DictReader(data.split('\n'))
# You'll need your fieldnames first in a list to ensure order
fieldnames = ['A', 'C']
# We'll also use a set for efficient lookup
fieldnames_set = set(fieldnames)
with open('outfile.csv', 'w') as outfile:
writer = DictWriter(outfile, fieldnames)
writer.writeheader()
for row in reader:
# Use a dictionary comprehension to iterate over the key, value pairs
# discarding those pairs whose key is not in the set
filtered_row = dict(
(k, v) for k, v in row.iteritems() if k in fieldnames_set
)
writer.writerow(filtered_row)
This is what you want:
import csv
def remove_csv_columns(input_csv, output_csv, exclude_column_indices):
with open(input_csv) as file_in, open(output_csv, 'w') as file_out:
reader = csv.reader(file_in)
writer = csv.writer(file_out)
writer.writerows(
[col for idx, col in enumerate(row)
if idx not in exclude_column_indices]
for row in reader)
remove_csv_columns('in.csv', 'out.csv', (3, 4))

Writing intersection data to new CSV

I have 2 CSV files which have a list of unique words. After I complete my intersection on them I get the results, but when I try to write it to a new file it creates a very large sized file of almost 155MB, when it should be well below 2MB.
Code:
alist, blist = [], []
with open("SetA-unique.csv", "r") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
alist += row
with open("SetB-unique.csv", "r") as fileB:
reader = csv.reader(fileB, delimiter=',')
for row in reader:
blist += row
first_set = set(alist)
second_set = set(blist)
res = (first_set.intersection(second_set))
writer = csv.writer(open("SetA-SetB.csv", 'w'))
for row in res:
writer.writerow(res)
You're writing the entire set res to the file on each iteration. You probably want to write the rows instead:
for row in res:
writer.writerow([row])
Apart from writing the whole set each iteration you also don't need to create multiple sets and lists, you can use itertools.chain:
from itertools import chain
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv") as file_b,open("SetA-SetB.csv", 'w') as inter :
r1 = csv.reader(file_a)
r2 = csv.reader(file_b)
for word in set(chain.from_iterable(r1)).intersection(chain.from_iterable(r2)):
inter.write(word)+"\n"
If you are just writing words there is also no need to use csv.writer just use file.write as above.
If you are actually trying do the comparison row wise, you should not be creating a flat iterable of words, you can imap to tuples:
from itertools import imap
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv") as file_b,open("SetA-SetB.csv", 'w') as inter :
r1 = csv.reader(file_a)
r2 = csv.reader(file_b)
writer = csv.writer(inter)
for row in set(imap(tuple, r1).intersection(imap(tuple, r2)):
writer.writerow(row)
And if you only have one word per line you don't need the csv lib at all.
from itertools import imap
with open("SetA-unique.csv") as file_a, open("SetB-unique.csv") as file_b,open("SetA-SetB.csv", 'w') as inter :
for word in set(imap(str.strip, file_a)).intersection(imap(str.strip, file_b)):
inter.write(word) + "\n"

Categories