Converting CSV into Array in Python - python

I have a csv file like below. A small csv file and I have uploaded it here
I am trying to convert csv values into array.
My expectation output like
My solution
results = []
with open("Solutions10.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
but I am getting a
ValueError: could not convert string to float: ' [1'

There is a problem with your CSV. It's just not csv (coma separated values). To do this you need some cleaning:
import re
# if you expect only integers
pattern = re.compile(r'\d+')
# if you expect floats (uncomment below)
# pattern = re.compile(r'\d+\.*\d*')
result = []
with open(filepath) as csvfile:
for row in csvfile:
result.append([
int(val.group(0))
# float(val.group(0))
for val in re.finditer(pattern, row)
])
print(result)
You can also solve this with substrings if it's easier for you and you know the format exactly.
Note: Also I see there is "eval" suggestion. Please, be careful with it as you can get into a lot of trouble if you scan unknown/not trusted files...

You can do this:
with open("Solutions10.csv") as csvfile:
result = [eval(k) for k in csvfile.readlines()]
Edit: Karl is cranky and wants you todo this:
with open("Solutions10.csv") as csvfile:
result = []
for line in csvfile.readlines():
line = line.replace("[","").replace("]","")
result.append([int(k) for k in line.split(",")]
But you're the programmer so you can do what you want. If you trust your input file eval is fine.

Related

Csv, Python, separating elements in one column to different columns

So I have a CSV file like this,
how can I separate them into different columns like this,
using python without using the pandas lib.
Implementation that should work in python 3.6+.
import csv
with open("input.csv", newline="") as inputfile:
with open("output.csv", "w", newline="") as outputfile:
reader = csv.DictReader(inputfile) # reader
fieldnames = reader.fieldnames
writer = csv.DictWriter(outputfile, fieldnames=fieldnames) # writer
# make header
writer.writeheader()
# loop over each row in input CSV
for row in reader:
# get first column
column: str = str(row[fieldnames[0]])
numbers: list = column.split(",")
if len(numbers) != len(fieldnames):
print("Error: Lengths not equal")
# write row in output CSV
writer.writerow({field: num for field, num in zip(fieldnames, numbers)})
Explanation of the code:
The above code takes two file names input.csv and output.csv. The names being verbose don't need any further explanation.
It reads each row from input.csv and writes corresponding row in output.csv.
The last line is a "dictionary comprehension" combined with zip (similar to "list comprehensions" for lists). It's a nice way to do a lot of stuff in a single line but same code in expanded form looks like:
row = {}
for field, num in zip(fieldnames, numbers):
row[field] = num
writer.writerow(row)
It is already separated into different columns by , as separator, but the european version of excel usually uses ; as separator. You can specify the separator, when you import the csv:
https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba
If you really want to change the file content with python use the replace function and replace , with ;: How to search and replace text in a file?

convert items in csv column to list using python

So i have been reading answers on StackOverflow and haven't been able to find this specific doubt that i have.
I have a csv with a single column with values as follows:
**Values**
abc
xyz
bcd,fgh
tew,skdh,fsh
As you can see above some cells have more than one value separated by commas,
i used the following code:
with open('dat.csv', 'rb') as inputfile:
reader = csv.reader(inputfile)
colnames=['Keywords']
data = pandas.read_csv('dat.csv', names=colnames)
lkn=data.values.tolist()
print lkn
The output i got was: [['abc'],['xyz'],['bcd,fgh'],['tew,skdh,fsh']]
i would like to have the output as:
[['abc'],['xyz'],['bcd','fgh'],['tew','skdh','fsh']]
which i believe is a proper list of list format(fairly new to list of lists). Please do provide guidance in the right direction.
Thanks!.
NB:csv file with how cells are arranged (image)
Looking at your attached image, I'd bet that the cells have been quoted (although, to be sure, open the CSV file in a text editor, not in Excel) so you have to do the manual splitting yourself:
import csv
with open("file.csv", "r") as f:
reader = csv.reader(f)
your_list = [e[0].strip().split(",") for e in reader if e]
Try something like this :
import csv
with open('file.csv', 'r') as f:
reader = csv.reader(f)
your_list = list(reader)
for item in your_list:
item = list(item)
print(your_list)
Credit : Python import csv to list

Python read a file replace a string in a word

I am trying to read a file with below data
Et1, Arista2, Ethernet1
Et2, Arista2, Ethernet2
Ma1, Arista2, Management1
I need to read the file replace Et with Ethernet and Ma with Management. At the end of them the digit should be the same. The actual output should be as follows
Ethernet1, Arista2, Ethernet1
Ethernet2, Arista2, Ethernet2
Management1, Arista2, Management1
I tried a code with Regular expressions, I am able to get to the point I can parse all Et1, Et2 and Ma1. But unable to replace them.
import re
with open('test.txt','r') as fin:
for line in fin:
data = re.findall(r'\A[A-Z][a-z]\Z\d[0-9]*', line)
print(data)
The output looks like this..
['Et1']
['Et2']
['Ma1']
import re
#to avoid compile in each iteration
re_et = re.compile(r'^Et(\d+),')
re_ma = re.compile(r'^Ma(\d+),')
with open('test.txt') as fin:
for line in fin:
data = re_et.sub('Ethernet\g<1>,', line.strip())
data = re_ma.sub('Management\g<1>,', data)
print(data)
This example follows Joseph Farah's suggestion
import csv
file_name = 'data.csv'
output_file_name = "corrected_data.csv"
data = []
with open(file_name, "rb") as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
data.append(row)
corrected_data = []
for row in data:
tmp_row = []
for col in row:
if 'Et' in col and not "Ethernet" in col:
col = col.replace("Et", "Ethernet")
elif 'Ma' in col and not "Management" in col:
col = col.replace("Ma", "Management")
tmp_row.append(col)
corrected_data.append(tmp_row)
with open(output_file_name, "wb") as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for row in corrected_data:
writer.writerow(row)
print data
Here are the steps you should take:
Read each line in the file
Separate each line into smaller list items using the comments as delimiters
Use str.replace() to replace the characters with the words you want; keep in mind that anything that says "Et" (including the beginning of the word "ethernet") will be replaced, so remember to account for that. Same goes for Ma and Management.
Roll it back into one big list and put it back in the file with file.write(). You may have to overwrite the original file.

Trying to convert a CSV file to int in Python [duplicate]

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.
You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:
To skip the first line just call:
next(inf)
Files in Python are iterators over lines.
Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...
In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))
You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process
use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc
Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row
The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row
this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()
The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()
Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)
Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])
For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])
I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)
I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py
just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython
Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]
Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.

Appending data to csv file

I am trying to append 2 data sets to my csv file. Below is my code. The code runs but my data gets appended below a set of data in the first column (i.e. col[0]). I would however like to append my data sets in separate columns at the end of file. Could I please get advice on how I might be able to do this? Thanks.
import csv
Trial = open ('Trial_test.csv', 'rt', newline = '')
reader = csv.reader(Trial)
Trial_New = open ('Trial_test.csv', 'a', newline = '')
writer = csv.writer(Trial_New, delimiter = ',')
Cortex = []
Liver = []
for col in reader:
Cortex_Diff = float(col[14])
Liver_Diff = float(col[17])
Cortex.append(Cortex_Diff)
Liver.append(Liver_Diff)
Avg_diff_Cortex = sum(Cortex)/len(Cortex)
Data1 = str(Avg_diff_Cortex)
Avg_diff_Liver = sum(Liver)/len(Liver)
Data2 = str(Avg_diff_Liver)
writer.writerows(Data1 + Data2)
Trial.close()
Trial_New.close()
I think I see what you are trying to do. I won't try to rewrite your function entirely for you, but here's a tip: assuming you are dealing with a manageable size of dataset, try reading your entire CSV into memory as a list of lists (or list of tuples), then perform your calculations on the values on this object, then write the python object back out to the new CSV in a separate block of code. You may find this article or this one of use. Naturally the official documentation should be helpful too.
Also, I would suggest using different files for input and output to make your life easier.
For example:
import csv
data = []
with open('Trial_test.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in reader:
data.append(row)
# now do your calculations on the 'data' object.
with open('Trial_test_new.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ', quotechar='|')
for row in data:
writer.writerow(row)
Something like that, anyway!

Categories