I have a csv file like below. A small csv file and I have uploaded it here
I am trying to convert csv values into array.
My expectation output like
My solution
results = []
with open("Solutions10.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
but I am getting a
ValueError: could not convert string to float: ' [1'
There is a problem with your CSV. It's just not csv (coma separated values). To do this you need some cleaning:
import re
# if you expect only integers
pattern = re.compile(r'\d+')
# if you expect floats (uncomment below)
# pattern = re.compile(r'\d+\.*\d*')
result = []
with open(filepath) as csvfile:
for row in csvfile:
result.append([
int(val.group(0))
# float(val.group(0))
for val in re.finditer(pattern, row)
])
print(result)
You can also solve this with substrings if it's easier for you and you know the format exactly.
Note: Also I see there is "eval" suggestion. Please, be careful with it as you can get into a lot of trouble if you scan unknown/not trusted files...
You can do this:
with open("Solutions10.csv") as csvfile:
result = [eval(k) for k in csvfile.readlines()]
Edit: Karl is cranky and wants you todo this:
with open("Solutions10.csv") as csvfile:
result = []
for line in csvfile.readlines():
line = line.replace("[","").replace("]","")
result.append([int(k) for k in line.split(",")]
But you're the programmer so you can do what you want. If you trust your input file eval is fine.
I'm hoping you good folks can help with a project I'm working on. Essentially, I am trying to create a class that will take as an input a CSV file, examine the file for the number of columns of data, and store that data in key, value pairs in a dictionary. The code I have up to this point is below:
import csv
class DataStandard():
'''class to store and examine columnar data saved as a csv file'''
def __init__(self, file_name):
self.file_name = file_name
self.full_data_set = {}
with open(self.file_name) as f:
reader = csv.reader(f)
# get labels of each column in list format
self.col_labels = next(reader)
# find the number of columns of data in the file
self.number_of_cols = len(self.col_labels)
# initialize lists to store data using column label as key
for label in self.col_labels:
self.full_data_set[label] = []
The piece I am having a hard time with is once the dictionary (full_data_set) is created I'm not sure how to loop through the remainder of the CSV file and store the data in the respective values for each key (column). Everything I have tried until now hasn't worked because of how I have to loop through the csv.reader object.
I hope this question makes sense, but please feel free to ask any clarifying questions. Also, if you think of an approach that may work in a better more pythonic way I would appreciate the input. This is one of my first self-guided projects on class, so the subject is fairly new to me. Thanks in advance!
To read rows you can use for row in reader
data = []
with open('test.csv') as f:
reader = csv.reader(f)
headers = next(reader)
for row in reader:
d = dict(zip(headers, row))
#print(d)
data.append(d)
print('data:', data)
As said #PM2Ring csv has DictReader
with open('test.csv') as f:
reader = csv.DictReader(f)
data = list(reader)
print('data:', data)
This might give you ideas towards a solution. It is assumed that the labels are only on row 1, and the rest is data, and then the row length becomes 0 when there is no data:
import csv
class DataStandard():
'''class to store and examine columnar data saved as a csv file'''
def __init__(self, file_name):
self.file_name = file_name
self.full_data_set = {}
#modify method to the following:
with open(self.file_name) as f:
reader = csv.reader(f)
for row in reader:
if row = 0:
# get labels of each column in list format
self.col_labels = next(reader)
# find the number of columns of data in the file
self.number_of_cols = len(self.col_labels)
# initialize lists to store data using column label as key
for label in self.col_labels:
self.full_data_set[label] = []
else:
if len(row) != 0:
for i in range(self.number_of_cols):
label = self.col_labels[i]
self.full_data_set[label] = next(reader)
...My one concern is that while the 'with open(...)' is valid, some levels of indentation can be ignored, from my experience. In that case, to reduce the number of indentations, I would just separate 'row=0' and 'row!=0' operations into different instances of 'with open(...)' i.e. do row 1, close, open again, do row 2.
I am new to python. I have a .csv file which has 13 columns. I want to round off the floating values of the 2nd column which I was able to achieve successfully. I did this and stored it in a list. Now I am unable to figure out how to overwrite the rounded off values into the same csv file and into the same column i.e. column 2? I am using python3. Any help will be much appreciated.
My code is as follows:
Import statements for module import:
import csv
Creating an empty list:
list_string = []
Reading a csv file
with open('/home/user/Desktop/wine.csv', 'r') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter = ',')
next(csvReader, None)
for row in csvReader:
floatParse = float(row[1])
closestInteger = int(round(floatParse))
stringConvert = str(closestInteger)
list_string.append(stringConvert)
print(list_string)
Writing into the same csv file for the second column (Overwrites the entire Excel file)
with open('/home/user/Desktop/wine.csv', 'w') as csvDataFile:
writer = csv.writer(csvDataFile)
next(csvDataFile)
row[1] = list_string
writer.writerows(row[1])
PS: The writing into the csv overwrites the entire csv and removes all the other columns which I don't want. I just want to overwrite the 2nd column with rounded off values and keep the rest of the data same.
this might be what you're looking for.
import pandas as pd
import numpy as np
#Some sample data
data = {"Document_ID": [102994,51861,51879,38242,60880,76139,76139],
"SecondColumnName": [7.256,1.222,3.16547,4.145658,4.154656,6.12,17.1568],
}
wine = pd.DataFrame(data)
#This is how you'd read in your data
#wine = pd.read_csv('/home/user/Desktop/wine.csv')
#Replace the SecondColumnName with the real name
wine["SecondColumnName"] = wine["SecondColumnName"].map('{:,.2f}'.format)
#This will overwrite the sheet, but it will have all the data as before
wine.to_csv(/home/user/Desktop/wine.csv')
Pandas is way easier than read csv...I'd recommended checking it out.
I think this better answers the specific question. The key to this is to define an input_file and an output_file during the with part.
The StringIO part is just there for sample data in this example. newline='' is for Python 3. Without it, blank lines between each row appears in the output. More info.
import csv
from io import StringIO
s = '''A,B,C,D,E,F,G,H,I,J,K,L
1,4.4343,3,4,5,6,7,8,9,10,11
1,8.6775433,3,4,5,6,7,8,9,10,11
1,16.83389832,3,4,5,6,7,8,9,10,11
1,32.2711122,3,4,5,6,7,8,9,10,11
1,128.949483,3,4,5,6,7,8,9,10,11'''
list_string = []
with StringIO(s) as input_file, open('output_file.csv', 'w', newline='') as output_file:
reader = csv.reader(input_file)
next(reader, None)
writer = csv.writer(output_file)
for row in reader:
floatParse = float(row[1]) + 1
closestInteger = int(round(floatParse))
stringConvert = str(closestInteger)
row[1] = stringConvert
writer.writerow(row)
i have a large csv file and can not load in memory at a time,i also want to add some columns at the side of csv,so i want to add one column once a time because that does not cost many memory,i use python and pandas,so what can i do for that.
here's my code.
def toCsv(filepath,lists):
i = 0
with open(filepath,'r+') as f:
reader = csv.reader(f)
writer = csv.writer(f)
for row in reader:
print lists
row.append(lists[i])
writer.writerows(row)
i = i+1
I have no knowledge of python.
What i want to be able to do is create a script that will edit a CSV file so that it will wrap every field in column 3 around quotes. I haven't been able to find much help, is this quick and easy to do? Thanks.
column1,column2,column3
1111111,2222222,333333
This is a fairly crude solution, very specific to your request (assuming your source file is called "csvfile.csv" and is in C:\Temp).
import csv
newrow = []
csvFileRead = open('c:/temp/csvfile.csv', 'rb')
csvFileNew = open('c:/temp/csvfilenew.csv', 'wb')
# Open the CSV
csvReader = csv.reader(csvFileRead, delimiter = ',')
# Append the rows to variable newrow
for row in csvReader:
newrow.append(row)
# Add quotes around the third list item
for row in newrow:
row[2] = "'"+str(row[2])+"'"
csvFileRead.close()
# Create a new CSV file
csvWriter = csv.writer(csvFileNew, delimiter = ',')
# Append the csv with rows from newrow variable
for row in newrow:
csvWriter.writerow(row)
csvFileNew.close()
There are MUCH more elegant ways of doing what you want, but I've tried to break it down into basic chunks to show how each bit works.
I would start by looking at the csv module.
import csv
filename = 'file.csv'
with open(filename, 'wb') as f:
reader = csv.reader(f)
for row in reader:
row[2] = "'%s'" % row[2]
And then write it back in the csv file.