I'm trying to read a csv file (actually a tsv, but nvm) and set it as a dictionary where its key are the column names of said csv and the rest of the rows are values for those keys.
I also have some comments marked by the '#' character, which I intend to ignore:
csv_in.csv
##Some comments
##Can ignore these lines
Location Form Range <-- This would be the header
North Dodecahedron Limited <---|
East Toroidal polyhedron Flexible <------ These lines would be lists
South Icosidodecahedron Limited <---|
The main idea is to store them like this:
final_dict = {'Location': ['North','East','South'],
'Form': ['Dodecahedron','Toroidal polyhedron','Icosidodecahedron'],
'Range': ['Limited','Flexible','Limited']}
So far I could come close like so:
tryercode.py
import csv
dct = {}
# Open csv file
with open(tsvfile) as file_in:
# Open reader instance with tab delimeter
reader = csv.reader(file_in, delimiter='\t')
# Iterate through rows
for row in reader:
# First I skip those rows that start with '#'
if row[0].startswith('#'):
pass
elif row[0].startswith('L'):
# Here I try to keep the first row that starts with the letter 'L' in a separate list
# and insert this first row values as keys with empty lists inside
dictkeys_list = []
for i in range(len(row)):
dictkeys_list.append(row[i])
dct[row[i]] = []
else:
# Insert each row indexes as values by the quantity of rows
print('¿?')
So far, the dictionary's skeleton looks fine:
print(dct)
{'Location': [], 'Form': [], 'Range': []}
But everything I tried so far failed to append the values to the keys' empty lists the way it is intended. Only could do so for the first row.
(...)
else:
# Insert each row indexes as values by the quantity of rows
print('¿?')
for j in range(len(row)):
dct[dictkeys_list[j]] = row[j] # Here I indicate the intented key of the dict through the preoviously list of key names
I searched far and wide stackoverflow but couldn't find it for this way (the code template is inspired by an answer at this post, but the dictionary is of a different structure.
Using collections.defaultdict, we can create a dictionary that automatically initialises its values as lists. Then we can iterate over a csv.DictReader to populate the defaultdict.
Given this data:
A,B,C
a,b,c
aa,bb,cc
aaa,bbb,ccc
This code
import collections
import csv
d = collections.defaultdict(list)
with open('myfile.csv', 'r', newline='') as f:
reader = csv.DictReader(f)
for row in reader:
for k, v in row.items():
d[k].append(v)
print(d)
Produces this result:
defaultdict(<class 'list'>, {'A': ['a', 'aa', 'aaa'],
'B': ['b', 'bb', 'bbb'],
'C': ['c', 'cc', 'ccc']})
I amend something in your code and run it. Your code can work with the right result.
The code is below
import csv
dct = {}
# Open csv file
tsvfile="./tsv.csv" # This is the tsv file path
with open(tsvfile) as file_in:
# Open reader instance with tab delimeter
reader = csv.reader(file_in, delimiter='\t')
for row in reader:
# First I skip those rows that start with '#'
if row[0].startswith('#'):
pass
elif row[0].startswith('L'):
# Here I try to keep the first row that starts with the letter 'L' in a separate list
# and insert this first row values as keys with empty lists inside
dictkeys_list = []
for i in range(len(row)):
dictkeys_list.append(row[i])
dct[row[i]] = []
else:
# Insert each row indexes as values by the quantity of rows
for i in range(len(row)):
dct[dictkeys_list[i]].append(row[i])
print(dct)
# Iterate through rows
Running result like this
Besides, I amend your further like below, I think the code can deal with more complicated situation
import csv
dct = {}
# Open csv file
tsvfile="./tsv.csv" # This is the tsv file path
is_head=True # judge if the first line
with open(tsvfile) as file_in:
# Open reader instance with tab delimeter
reader = csv.reader(file_in, delimiter='\t')
for row in reader:
# First I skip those rows that start with '#'
# Use strip() to remove the space char of each item
if row.__len__()==0 or row[0].strip().startswith('#'):
pass
elif is_head:
# Here I try to keep the first row that starts with the letter 'L' in a separate list
# and insert this first row values as keys with empty lists inside
is_head=False
dictkeys_list = []
for i in range(len(row)):
item=row[i].strip()
dictkeys_list.append(item)
dct[item] = []
else:
# Insert each row indexes as values by the quantity of rows
for i in range(len(row)):
dct[dictkeys_list[i]].append(row[i].strip())
print(dct)
# Iterate through rows
Hi you can try the pandas library.
import pandas as pd
df = pd.read_csv("csv_in.csv")
df.to_dict(orient="list")
To reproduce this, I have created a csv file with below content and saved as 'csvfile.csv'.
Location,Form,Range
North,Dodecahedron,Limited
East,Toroidal polyhedron,Flexible
South,Icosidodecahedron,Limited
Now to achieve your goal, I have used pandas library as below:
import pandas as pd
df_csv = pd.read_csv('csvfile.csv')
dict_csv = df_csv.to_dict(orient='list')
print(dict_csv)
and here's the output as you needed:
{'Location': ['North', 'East', 'South'],
'Form': ['Dodecahedron', 'Toroidal polyhedron', 'Icosidodecahedron'],
'Range': ['Limited', 'Flexible', 'Limited']}
Hope, this helps.
Related
I am just starting up Python!!
i want to make a CSV file where i have a dictionary and i want to print each member of it in its own column in the same row.
like i have an array of dictionaries and i want each row to represent one of them and each column of each row to represent an item inside.
import csv
"... we are going to create an array of dictionaries and print them all..."
st_dic = []
true = 1
while true:
dummy = input("Please Enter name, email, mobile, university, major")
x = dummy.split(",")
if "Stop" in x:
break
dict ={"Name":x[0],"Email":x[1],"Mobile":x[2],"University":x[3],"Major":x[4]}
st_dic.append(dict)
f2 = open("data.csv" , "w")
with open("data.csv", "r+") as f:
writer = csv.writer(f)
for item in st_dic:
writer.writerow([item["Name"], item["Email"], item["Mobile"] , item["University"] , item["Major"]])
f.close()
the thing i output now is a row which contains the data in the first dictionary, i just want them seperated, each in its own column within its row.
It is surprising there are so many questions here that try to fill in some data in while loop and input() command. In all the fairness, this is not python best use case.
Imagine you had the dictionary just filled in your code:
dict1 = {'name': "Kain", 'email': 'make_it_up#something.com'}
dict2 = {'name': "Abel", 'email': 'make_it_up2#otherthing.com'}
dict_list = [dict1, dict2]
After that you can export to csv easily:
import csv
with open('data.csv', 'w') as f:
w = csv.DictWriter(f, ['name', 'email'], lineterminator='\n')
w.writeheader()
for row in dict_list:
w.writerow(row)
Note there are many questiona about csv module on SO
as well as there are examples in documentation.
So I have a csv file formatted like this
data_a,dataA,data1,data11
data_b,dataB,data1,data12
data_c,dataC,data1,data13
, , ,
data_d,dataD,data2,data21
data_e,dataE,data2,data22
data_f,dataF,data2,data23
HEADER1,HEADER2,HEADER3,HEADER4
The column headers are at the bottom, and I want the third column to be the keys. You can see that the third column is the same value for each of the two blocks of data and these blocks of data are separated by empty values, so I want to store the 3 rows of values to this 1 key and also disregard some columns such as column 4. This is my code right now
#!usr/bin/env python
import csv
with open("example.csv") as f:
readCSV = csv.reader(f)
for row in readCSV:
# disregard separating rows
if row[2] != '':
myDict = {row[2]:[row[0],row[1]]}
print(myDict)
What I basically want is that when I call
print(myDict['data2'])
I get
{[data_d,dataD][data_e,dataE][data_f,dataF]}
I tried editing my if loop to
if row[2] == 'data2':
myDict = {'data2':[row[0],row[1]]}
and just make an if for every individual key, but I don't think this will work either way.
With your current method, you probably want a defaultdict. This is a dictionary-like object that provides a default value if the key doesn't already exist. So in your case, we set this up to be a list, and then for each row we loop through, we append the values in columns 0 and 1 to this list as a tuple, like so:
import csv
from collections import defaultdict
data = defaultdict(list)
with open("example.csv") as f:
readCSV = csv.reader(f)
for row in readCSV:
# disregard separating rows
if row[2] != '':
data[row[2]].append((row[0], row[1]))
print(data)
With the example provided, this prints a defaultdict with the following entries:
{'data1': [('data_a', 'dataA'), ('data_b', 'dataB'), ('data_c', 'dataC')], 'data2': [('data_d', 'dataD'), ('data_e', 'dataE'), ('data_f', 'dataF')]}
I'm not a super Python geek, but I would suggest to use pandas (import pandas as pd). So you load data with pd.read_csv(file, header). With header you can specify the row you want to be a header and then it's much much easier to manipulate with the dataset (e.g. dropping the vars (del df['column_name']), creating dictionaries, etc).
Here is documentation to pd.read_csv: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
I am trying to read in a CSV file and then take all values from each column and put into a separate list. I do not want the values by row. Since the CSV reader only allows to loop through the file once, I am using the seek() method to go back to the beginning and read the next column. Besides using a Dict mapping, is there a better way to do this?
infile = open(fpath, "r")
reader = csv.reader(infile)
NOUNS = [col[0] for col in reader]
infile.seek(0) # <-- set the iterator to beginning of the input file
VERBS = [col[1] for col in reader]
infile.seek(0)
ADJECTIVES = [col[2] for col in reader]
infile.seek(0)
SENTENCES = [col[3] for col in reader]
Something like this would do it in one pass:
kinds = NOUNS, VERBS, ADJECTIVES, SENTENCES = [], [], [], []
with open(fpath, "r") as infile:
for cols in csv.reader(infile):
for i, kind in enumerate(kinds):
kind.append(cols[i])
You could feed the reader to zip and unpack it to variables as you wish.
import csv
with open('input.csv') as f:
first, second, third, fourth = zip(*csv.reader(f))
print('first: {}, second: {}, third: {}, fourth: {}'.format(
first, second, third, fourth
))
With following input:
1,2,3,4
A,B,C,D
It will produce output:
first: ('1', 'A'), second: ('2', 'B'), third: ('3', 'C'), fourth: ('4', 'D')
This works assuming you know exactly how many columns are in the csv (and there isn't a header row).
NOUNS = []
VERBS = []
ADJECTIVES = []
SENTENCES = []
with open(fpath, "r") as infile:
reader = csv.reader(infile)
for row in reader:
NOUNS.append(row[0])
VERBS.append(row[1])
ADJECTIVES.append(row[2])
SENTENCES.append(row[3])
If you don't know the column headers, you're going to have to be clever and read off the first row, make lists for every column you encounter, and loop through every new row and insert in the appropriate list. You'll probably need to do a list of lists.
If you don't mind adding a dependency, use Pandas. Use a DataFrame and the method read_csv(). Access each column using the column name i.e.
df = pandas.DataFrame.read_csv(fpath)
print df['NOUN']
print df['VERBS']
I am not sure why you dont want to use dict mapping. This is what I end up doing
Data
col1,col2,col3
val1,val2,val3
val4,val5,val6
Code
import csv
d = dict()
with open("abc.text") as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
for key, value in row.items():
if d.get(key) is None:
d[key] = [value]
else:
d[key].append(value)
print d
{'col2': ['val2', 'val5'], 'col3': ['val3', 'val6'], 'col1': ['val1', 'val4']}
I have the csv file as follows:
product_name, product_id, category_id
book, , 3
shoe, 3, 1
lemon, 2, 4
I would like to update product_id of each row by providing the column name using python's csv library.
So for an example if I pass:
update_data = {"product_id": [1,2,3]}
then the csv file should be:
product_name, product_id, category_id
book, 1, 3
shoe, 2, 1
lemon, 3, 4
You can use your existing dict and iter to take items in order, eg:
import csv
update_data = {"product_id": [1,2,3]}
# Convert the values of your dict to be directly iterable so we can `next` them
to_update = {k: iter(v) for k, v in update_data.items()}
with open('input.csv', 'rb') as fin, open('output.csv', 'wb') as fout:
# create in/out csv readers, skip intial space so it matches the update dict
# and write the header out
csvin = csv.DictReader(fin, skipinitialspace=True)
csvout = csv.DictWriter(fout, csvin.fieldnames)
csvout.writeheader()
for row in csvin:
# Update rows - if we have something left and it's in the update dictionary,
# use that value, otherwise we use the value that's already in the column.
row.update({k: next(to_update[k], row[k]) for k in row if k in to_update})
csvout.writerow(row)
Now - this assumes that each new column value goes to the row number and that the existing values should be used after that. You could change that logic to only use new values when the existing value is blank for instance (or whatever other criteria you wish).
(assuming you're using 3.x)
Python has a CSV module in the standard library which helps read and amend CSV files.
Using that I'd find the index for the column you are after and store it in the dictionary you've made. Once that has been found it's simply a matter of popping the list item into each row.
import csv
update_data = {"product_id": [None, [1,2,3]]}
#I've nested the original list inside another so that we can hold the column index in the first position.
line_no = 0
#simple counter for the first step.
new_csv = []
#Holds the new rows for when we rewrite the file.
with open('test.csv', 'r') as csvfile:
filereader = csv.reader(csvfile)
for line in filereader:
if line_no == 0:
for key in update_data:
update_data[key][0] = line.index(key)
#This finds us the columns index and stores it for us.
else:
for key in update_data:
line[update_data[key][0]] = update_data[key][1].pop(0)
#using the column index we enter the new data into the correct place whilst removing it from the input list.
new_csv.append(line)
line_no +=1
with open('test.csv', 'w') as csvfile:
filewriter = csv.writer(csvfile)
for line in new_csv:
filewriter.writerow(line)
I hope someone point me in the right direction. From what I've read, I believe using a dictionary would best suit this need but I am by no means a master programmer and I hope someone can shed some light and give me a hand. This is the CSV file I have:
11362672,091914,100914,100.00,ITEM,11,N,U08
12093169,092214,101514,25.00,ITEM,11,N,U10
12162432,091214,101214,175.00,ITEM,11,N,U07
11362672,091914,100914,65.00,ITEM,11,N,U08
11362672,091914,100914,230.00,ITEM,11,N,U08
I would like to treat the first column a key, and the following columns as the values for that key in order to:
sort the data by the key
counter the occurrences
append the counter
This is the output I would like to attain:
1,11362672,091914,100914,100.00,ITEM,11,N,U08 # occurrence 1 for key: 11362672
2,11362672,091914,100914,65.00,ITEM,11,N,U08 # occurrence 2 for key: 11362672
3,11362672,091914,100914,230.00,ITEM,11,N,U08 # occurrence 3 for key: 11362672
1,12093169,092214,101514,25.00,ITEM,11,N,U10 # occurrence 1 for key: 12093169
1,12162432,091214,101214,175.00,ITEM,11,N,U07 # occurrence 1 for key: 12162432
I need to keep the integrity of each line which is why I think a dictionary will work best. I don't have much, but this is what I started with. This is where I need help to sort, counter and append the counter.
import csv
with open('C:/Download/item_report1.csv', 'rb') as infile:
reader = csv.reader(infile)
dict1 = {row[0]:row[1:7] for row in reader}
print dict1
gives me:
{
'11362672': ['091914', '100914', '230.00', 'ITEM', '11', 'N'],
'12093169': ['092214', '101514', '25.00', 'ITEM', '11', 'N'],
'12162432': ['091214', '101214', '175.00', 'ITEM', '11', 'N']
}
Briefly, you should use a counter to tally the keys and a list to store the rows.
As you read in the csv, keep track of how many times you've seen the key value, inserting it into the start of each row as you read them.
Once you've read the file in, you can sort it by the key value first and the occurrence counter second.
import csv
counter = {}
data = []
with open('report.csv','rb') as infile:
for row in csv.reader(infile):
key = row[0]
if key not in counter:
counter[key] = 1
else:
counter[key] += 1
row.insert(0,counter[key])
data.append(row)
for row in sorted(data,key=lambda x: (x[1],x[0])):
print row
Here's the same thing again, written slightly differently and 4 spaces in accordance with official style guides rather than my personal preference of two.
import csv
# key function for sorting later
def second_and_first(x):
return (x[1],x[0])
# dictionary to store key_fields and their counts
counter = {}
# list to store rows from the csv file
data = []
with open('report.csv','rb') as infile:
for row in csv.reader(infile):
# For convenience, assign the value of row[0] to key_field
key_field = row[0]
# if key_field is not in the dictionary counter. Add it with a value of 1
if key_field not in counter:
counter[key_field] = 1
# otherwise, it is there, increment the value by one.
else:
counter[key_field] += 1
# insert the value associated with key_field in the counter into the start of
# the row
row.insert(0,counter[key_field])
# Append the row to
data.append(row)
for row in sorted(data,key=second_and_first):
print row