Trying to convert a CSV file to int in Python [duplicate]

Trying to convert a CSV file to int in Python [duplicate] - python

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.

You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:

To skip the first line just call:
next(inf)
Files in Python are iterators over lines.

Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...

In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process

use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc

Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row

The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row

this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()

The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()

Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)

Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])

For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])

I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)

I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py

just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython

Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]

Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.

Related

Reading column names alone in a csv file

I have a csv file with the following columns:
id,name,age,sex
Followed by a lot of values for the above columns.
I am trying to read the column names alone and put them inside a list.
I am using Dictreader and this gives out the correct details:
with open('details.csv') as csvfile:
i=["name","age","sex"]
re=csv.DictReader(csvfile)
for row in re:
for x in i:
print row[x]
But what I want to do is, I need the list of columns, ("i" in the above case)to be automatically parsed with the input csv than hardcoding them inside a list.
with open('details.csv') as csvfile:
rows=iter(csv.reader(csvfile)).next()
header=rows[1:]
re=csv.DictReader(csvfile)
for row in re:
print row
for x in header:
print row[x]
This gives out an error
Keyerrror:'name'
in the line print row[x]. Where am I going wrong? Is it possible to fetch the column names using Dictreader?

Though you already have an accepted answer, I figured I'd add this for anyone else interested in a different solution-
Python's DictReader object in the CSV module (as of Python 2.6 and above) has a public attribute called fieldnames.
https://docs.python.org/3.4/library/csv.html#csv.csvreader.fieldnames
An implementation could be as follows:
import csv
with open('C:/mypath/to/csvfile.csv', 'r') as f:
d_reader = csv.DictReader(f)
#get fieldnames from DictReader object and store in list
headers = d_reader.fieldnames
for line in d_reader:
#print value in MyCol1 for each row
print(line['MyCol1'])
In the above, d_reader.fieldnames returns a list of your headers (assuming the headers are in the top row).
Which allows...
>>> print(headers)
['MyCol1', 'MyCol2', 'MyCol3']
If your headers are in, say the 2nd row (with the very top row being row 1), you could do as follows:
import csv
with open('C:/mypath/to/csvfile.csv', 'r') as f:
#you can eat the first line before creating DictReader.
#if no "fieldnames" param is passed into
#DictReader object upon creation, DictReader
#will read the upper-most line as the headers
f.readline()
d_reader = csv.DictReader(f)
headers = d_reader.fieldnames
for line in d_reader:
#print value in MyCol1 for each row
print(line['MyCol1'])

You can read the header by using the next() function which return the next row of the reader’s iterable object as a list. then you can add the content of the file to a list.
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
rest = list(reader)
Now i has the column's names as a list.
print i
>>>['id', 'name', 'age', 'sex']
Also note that reader.next() does not work in python 3. Instead use the the inbuilt next() to get the first line of the csv immediately after reading like so:
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = next(reader)
print(i)
>>>['id', 'name', 'age', 'sex']

The csv.DictReader object exposes an attribute called fieldnames, and that is what you'd use. Here's example code, followed by input and corresponding output:
import csv
file = "/path/to/file.csv"
with open(file, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
print([col + '=' + row[col] for col in reader.fieldnames])
Input file contents:
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
00,01,02,03,04,05,06,07,08,09
10,11,12,13,14,15,16,17,18,19
20,21,22,23,24,25,26,27,28,29
30,31,32,33,34,35,36,37,38,39
40,41,42,43,44,45,46,47,48,49
50,51,52,53,54,55,56,57,58,59
60,61,62,63,64,65,66,67,68,69
70,71,72,73,74,75,76,77,78,79
80,81,82,83,84,85,86,87,88,89
90,91,92,93,94,95,96,97,98,99
Output of print statements:
['col0=00', 'col1=01', 'col2=02', 'col3=03', 'col4=04', 'col5=05', 'col6=06', 'col7=07', 'col8=08', 'col9=09']
['col0=10', 'col1=11', 'col2=12', 'col3=13', 'col4=14', 'col5=15', 'col6=16', 'col7=17', 'col8=18', 'col9=19']
['col0=20', 'col1=21', 'col2=22', 'col3=23', 'col4=24', 'col5=25', 'col6=26', 'col7=27', 'col8=28', 'col9=29']
['col0=30', 'col1=31', 'col2=32', 'col3=33', 'col4=34', 'col5=35', 'col6=36', 'col7=37', 'col8=38', 'col9=39']
['col0=40', 'col1=41', 'col2=42', 'col3=43', 'col4=44', 'col5=45', 'col6=46', 'col7=47', 'col8=48', 'col9=49']
['col0=50', 'col1=51', 'col2=52', 'col3=53', 'col4=54', 'col5=55', 'col6=56', 'col7=57', 'col8=58', 'col9=59']
['col0=60', 'col1=61', 'col2=62', 'col3=63', 'col4=64', 'col5=65', 'col6=66', 'col7=67', 'col8=68', 'col9=69']
['col0=70', 'col1=71', 'col2=72', 'col3=73', 'col4=74', 'col5=75', 'col6=76', 'col7=77', 'col8=78', 'col9=79']
['col0=80', 'col1=81', 'col2=82', 'col3=83', 'col4=84', 'col5=85', 'col6=86', 'col7=87', 'col8=88', 'col9=89']
['col0=90', 'col1=91', 'col2=92', 'col3=93', 'col4=94', 'col5=95', 'col6=96', 'col7=97', 'col8=98', 'col9=99']

How about
with open(csv_input_path + file, 'r') as ft:
header = ft.readline() # read only first line; returns string
header_list = header.split(',') # returns list
I am assuming your input file is CSV format.
If using pandas, it takes more time if the file is big size because it loads the entire data as the dataset.

I am just mentioning how to get all the column names from a csv file.
I am using pandas library.
First we read the file.
import pandas as pd
file = pd.read_csv('details.csv')
Then, in order to just get all the column names as a list from input file use:-
columns = list(file.head(0))

Thanking Daniel Jimenez for his perfect solution to fetch column names alone from my csv, I extend his solution to use DictReader so we can iterate over the rows using column names as indexes. Thanks Jimenez.
with open('myfile.csv') as csvfile:
rest = []
with open("myfile.csv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
i=i[1:]
re=csv.DictReader(csvfile)
for row in re:
for x in i:
print row[x]

here is the code to print only the headers or columns of the csv file.
import csv
HEADERS = next(csv.reader(open('filepath.csv')))
print (HEADERS)
Another method with pandas
import pandas as pd
HEADERS = list(pd.read_csv('filepath.csv').head(0))
print (HEADERS)

import pandas as pd
data = pd.read_csv("data.csv")
cols = data.columns

I literally just wanted the first row of my data which are the headers I need and didn't want to iterate over all my data to get them, so I just did this:
with open(data, 'r', newline='') as csvfile:
t = 0
for i in csv.reader(csvfile, delimiter=',', quotechar='|'):
if t > 0:
break
else:
dbh = i
t += 1

Using pandas is also an option.
But instead of loading the full file in memory, you can retrieve only the first chunk of it to get the field names by using iterator.
import pandas as pd
file = pd.read_csv('details.csv'), iterator=True)
column_names_full=file.get_chunk(1)
column_names=[column for column in column_names_full]
print column_names

Creating a single dictionary from two tab delimited files

I'm somewhat new to Python and still trying to learn all its tricks and exploitations.
I'm looking to see if it's possible to collect column data from two separate files to create a single dictionary, rather than two distinct dictionaries. The code that I've used to import files before looks like this:
import csv
from collections import defaultdict
columns = defaultdict(list)
with open("myfile.txt") as f:
reader = csv.DictReader(f,delimiter='\t')
for row in reader:
for (header,variable) in row.items():
columns[header].append(variable)
f.close()
This code makes each element of the first line of the file into a header for the columns of data below it. What I'd like to do now is to import a file that only contains one line which I'll use as my header, and import another file that only contains data that I'll match the headers up to. What I've tried so far resembles this:
columns = defaultdict(list)
with open("headerData.txt") as g:
reader1 = csv.DictReader(g,delimiter='\t')
for row in reader1:
for (h,v) in row.items():
columns[h].append(v)
with open("variableData.txt") as f:
reader = csv.DictReader(f,delimiter='\t')
for row in reader:
for (h,v) in row.items():
columns[h].append(v)
Is nesting the open statements the right way to attempt this? Honestly I am totally lost on what to do. Any help is greatly appreciated.

You can't use DictReader like that if the headers are not in the file. But you can create a fake file object that would yield the headers and then the data, using itertools.chain:
from itertools import chain
with open('headerData.txt') as h, open('variableData.txt') as data:
f = chain(h, data)
reader = csv.DictReader(f,delimiter='\t')
# proceed with you code from the first snippet
# no close() calls needed when using open() with "with" statements
Another way of course would be to just read the headers into a list and use regular csv.reader on variableData.txt:
with open('headerData') as h:
names = next(h).split('\t')
with open('variableData.txt') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
for name, value in zip(names, row):
columns[name].append(value)

By default, DictReader will take the first line in your csv file and use that as the keys for the dict. However, according to the docs, you can also pass it a fieldnames parameter, which is a sequence containing the names of the keys to use for the dict. So you could do this:
columns = defaultdict(list)
with open("headerData.txt") as f, open("variableData.txt") as data:
reader = csv.DictReader(data,
fieldnames=f.read().rstrip().split('\t'),
delimiter='\t')
for row in reader:
for (h,v) in row.items():
columns[h].append(v)

How to skip the headers when processing a csv file using Python?

I am using below referred code to edit a csv using Python. Functions called in the code form upper part of the code.
Problem: I want the below referred code to start editing the csv from 2nd row, I want it to exclude 1st row which contains headers. Right now it is applying the functions on 1st row only and my header row is getting changed.
in_file = open("tmob_notcleaned.csv", "rb")
reader = csv.reader(in_file)
out_file = open("tmob_cleaned.csv", "wb")
writer = csv.writer(out_file)
row = 1
for row in reader:
row[13] = handle_color(row[10])[1].replace(" - ","").strip()
row[10] = handle_color(row[10])[0].replace("-","").replace("(","").replace(")","").strip()
row[14] = handle_gb(row[10])[1].replace("-","").replace(" ","").replace("GB","").strip()
row[10] = handle_gb(row[10])[0].strip()
row[9] = handle_oem(row[10])[1].replace("Blackberry","RIM").replace("TMobile","T-Mobile").strip()
row[15] = handle_addon(row[10])[1].strip()
row[10] = handle_addon(row[10])[0].replace(" by","").replace("FREE","").strip()
writer.writerow(row)
in_file.close()
out_file.close()
I tried to solve this problem by initializing row variable to 1 but it didn't work.
Please help me in solving this issue.

Your reader variable is an iterable, by looping over it you retrieve the rows.
To make it skip one item before your loop, simply call next(reader, None) and ignore the return value.
You can also simplify your code a little; use the opened files as context managers to have them closed automatically:
with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
reader = csv.reader(infile)
next(reader, None) # skip the headers
writer = csv.writer(outfile)
for row in reader:
# process each row
writer.writerow(row)
# no need to close, the files are closed automatically when you get to this point.
If you wanted to write the header to the output file unprocessed, that's easy too, pass the output of next() to writer.writerow():
headers = next(reader, None) # returns the headers or `None` if the input is empty
if headers:
writer.writerow(headers)

Another way of solving this is to use the DictReader class, which "skips" the header row and uses it to allowed named indexing.
Given "foo.csv" as follows:
FirstColumn,SecondColumn
asdf,1234
qwer,5678
Use DictReader like this:
import csv
with open('foo.csv') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
print(row['FirstColumn']) # Access by column header instead of column number
print(row['SecondColumn'])

Doing row=1 won't change anything, because you'll just overwrite that with the results of the loop.
You want to do next(reader) to skip one row.

Simply iterate one time with next()
with open(filename) as file:
csvreaded = csv.reader(file)
header = next(csvreaded)
for row in csvreaded:
empty_list.append(row) #your csv list without header
or use [1:] at the end of reader object
with open(filename) as file:
csvreaded = csv.reader(file)
header = next(csvreaded)
for row in csvreaded[1:]:
empty_list.append(row) #your csv list without header

Inspired by Martijn Pieters' response.
In case you only need to delete the header from the csv file, you can work more efficiently if you write using the standard Python file I/O library, avoiding writing with the CSV Python library:
with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
next(infile) # skip the headers
outfile.write(infile.read())

How to ignore the first line of data when processing CSV data?

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.

You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:

To skip the first line just call:
next(inf)
Files in Python are iterators over lines.

Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...

In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process

use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc

Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row

The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row

this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()

The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()

Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)

Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])

For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])

I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)

I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py

just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython

Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]

Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.

Python to insert quotes to column in CSV

I have no knowledge of python.
What i want to be able to do is create a script that will edit a CSV file so that it will wrap every field in column 3 around quotes. I haven't been able to find much help, is this quick and easy to do? Thanks.
column1,column2,column3
1111111,2222222,333333

This is a fairly crude solution, very specific to your request (assuming your source file is called "csvfile.csv" and is in C:\Temp).
import csv
newrow = []
csvFileRead = open('c:/temp/csvfile.csv', 'rb')
csvFileNew = open('c:/temp/csvfilenew.csv', 'wb')
# Open the CSV
csvReader = csv.reader(csvFileRead, delimiter = ',')
# Append the rows to variable newrow
for row in csvReader:
newrow.append(row)
# Add quotes around the third list item
for row in newrow:
row[2] = "'"+str(row[2])+"'"
csvFileRead.close()
# Create a new CSV file
csvWriter = csv.writer(csvFileNew, delimiter = ',')
# Append the csv with rows from newrow variable
for row in newrow:
csvWriter.writerow(row)
csvFileNew.close()
There are MUCH more elegant ways of doing what you want, but I've tried to break it down into basic chunks to show how each bit works.

I would start by looking at the csv module.

import csv
filename = 'file.csv'
with open(filename, 'wb') as f:
reader = csv.reader(f)
for row in reader:
row[2] = "'%s'" % row[2]
And then write it back in the csv file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to convert a CSV file to int in Python [duplicate] - python

To skip the first line just call: next(inf) Files in Python are iterators over lines.

Borrowed from python cookbook, A more concise template code might look like this: import csv with open('stocks.csv') as f: f_csv = csv.reader(f) headers = next(f_csv) for row in f_csv: # Process row ...

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be: from itertools import islice for row in islice(incsv, 30, None): # process

use csv.DictReader instead of csv.Reader. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc

this might be a very old question but with pandas we have a very easy solution import pandas as pd data=pd.read_csv('all16.csv',skiprows=1) data['column'].min() with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()

The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns. import pandas as pd data = pd.read_csv('all16.csv') data.min()

Well, my mini wrapper library would do the job as well. >>> import pyexcel as pe >>> data = pe.load('all16.csv', name_columns_by_row=0) >>> min(data.column[1]) Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead: >>> min(data.column["Column 1"])

For me the easiest way to go is to use range. import csv with open('files/filename.csv') as I: reader = csv.reader(I) fulllist = list(reader) # Starting with data skipping header for item in range(1, len(fulllist)): # Print each row using "item" as the index value print (fulllist[item])

I would convert csvreader to list, then pop the first element import csv with open(fileName, 'r') as csvfile: csvreader = csv.reader(csvfile) data = list(csvreader) # Convert to list data.pop(0) # Removes the first row for row in data: print(row)

I would use tail to get rid of the unwanted first line: tail -n +2 $INFIL | whatever_script.py

just add [1:] example below: data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)[1:] that works for me in iPython

Simple Solution is to use csv.DictReader() import csv def read_csv(file): with open(file, 'r') as file: reader = csv.DictReader(file) for row in reader: print(row["column_name"]) # Replace the name of column header.

Related

Reading column names alone in a csv file

Creating a single dictionary from two tab delimited files

How to skip the headers when processing a csv file using Python?

How to ignore the first line of data when processing CSV data?

Python to insert quotes to column in CSV

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to convert a CSV file to int in Python [duplicate] - python

To skip the first line just call: next(inf) Files in Python are iterators over lines.

Borrowed from python cookbook, A more concise template code might look like this: import csv with open('stocks.csv') as f: f_csv = csv.reader(f) headers = next(f_csv) for row in f_csv: # Process row ...

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be: from itertools import islice for row in islice(incsv, 30, None): # process

use csv.DictReader instead of csv.Reader. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc

this might be a very old question but with pandas we have a very easy solution import pandas as pd data=pd.read_csv('all16.csv',skiprows=1) data['column'].min() with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()

The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns. import pandas as pd data = pd.read_csv('all16.csv') data.min()

Well, my mini wrapper library would do the job as well. >>> import pyexcel as pe >>> data = pe.load('all16.csv', name_columns_by_row=0) >>> min(data.column[1]) Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead: >>> min(data.column["Column 1"])

For me the easiest way to go is to use range. import csv with open('files/filename.csv') as I: reader = csv.reader(I) fulllist = list(reader) # Starting with data skipping header for item in range(1, len(fulllist)): # Print each row using "item" as the index value print (fulllist[item])

I would convert csvreader to list, then pop the first element import csv with open(fileName, 'r') as csvfile: csvreader = csv.reader(csvfile) data = list(csvreader) # Convert to list data.pop(0) # Removes the first row for row in data: print(row)

I would use tail to get rid of the unwanted first line: tail -n +2 $INFIL | whatever_script.py

just add [1:] example below: data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]** that works for me in iPython

Simple Solution is to use csv.DictReader() import csv def read_csv(file): with open(file, 'r') as file: reader = csv.DictReader(file) for row in reader: print(row["column_name"]) # Replace the name of column header.

Related

Reading column names alone in a csv file

Creating a single dictionary from two tab delimited files

How to skip the headers when processing a csv file using Python?

How to ignore the first line of data when processing CSV data?

Python to insert quotes to column in CSV

Categories

Resources

just add [1:] example below: data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)[1:] that works for me in iPython