convert items in csv column to list using python - python

So i have been reading answers on StackOverflow and haven't been able to find this specific doubt that i have.
I have a csv with a single column with values as follows:
**Values**
abc
xyz
bcd,fgh
tew,skdh,fsh
As you can see above some cells have more than one value separated by commas,
i used the following code:
with open('dat.csv', 'rb') as inputfile:
reader = csv.reader(inputfile)
colnames=['Keywords']
data = pandas.read_csv('dat.csv', names=colnames)
lkn=data.values.tolist()
print lkn
The output i got was: [['abc'],['xyz'],['bcd,fgh'],['tew,skdh,fsh']]
i would like to have the output as:
[['abc'],['xyz'],['bcd','fgh'],['tew','skdh','fsh']]
which i believe is a proper list of list format(fairly new to list of lists). Please do provide guidance in the right direction.
Thanks!.
NB:csv file with how cells are arranged (image)

Looking at your attached image, I'd bet that the cells have been quoted (although, to be sure, open the CSV file in a text editor, not in Excel) so you have to do the manual splitting yourself:
import csv
with open("file.csv", "r") as f:
reader = csv.reader(f)
your_list = [e[0].strip().split(",") for e in reader if e]

Try something like this :
import csv
with open('file.csv', 'r') as f:
reader = csv.reader(f)
your_list = list(reader)
for item in your_list:
item = list(item)
print(your_list)
Credit : Python import csv to list

Related

Converting CSV into Array in Python

I have a csv file like below. A small csv file and I have uploaded it here
I am trying to convert csv values into array.
My expectation output like
My solution
results = []
with open("Solutions10.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
but I am getting a
ValueError: could not convert string to float: ' [1'
There is a problem with your CSV. It's just not csv (coma separated values). To do this you need some cleaning:
import re
# if you expect only integers
pattern = re.compile(r'\d+')
# if you expect floats (uncomment below)
# pattern = re.compile(r'\d+\.*\d*')
result = []
with open(filepath) as csvfile:
for row in csvfile:
result.append([
int(val.group(0))
# float(val.group(0))
for val in re.finditer(pattern, row)
])
print(result)
You can also solve this with substrings if it's easier for you and you know the format exactly.
Note: Also I see there is "eval" suggestion. Please, be careful with it as you can get into a lot of trouble if you scan unknown/not trusted files...
You can do this:
with open("Solutions10.csv") as csvfile:
result = [eval(k) for k in csvfile.readlines()]
Edit: Karl is cranky and wants you todo this:
with open("Solutions10.csv") as csvfile:
result = []
for line in csvfile.readlines():
line = line.replace("[","").replace("]","")
result.append([int(k) for k in line.split(",")]
But you're the programmer so you can do what you want. If you trust your input file eval is fine.

Read CSV file in custom string format

I have a .csv file which looks something like this:
-73.933087,40.6960679
-84.39591587,39.34949003
-111.2325173,47.49438049
How can I read that .csv file in python to get format like this(2 numbers between quotes seperated by comma):
numbers = ["-73.933087,40.6960679",
"-84.39591587,39.34949003",
"-111.2325173,47.49438049"]
I managed to load .csv in list, but I formatting is the problem.
import csv
with open('coordinates.csv', newline='') as f:
reader = csv.reader(f)
my_list = list(reader)
print(my_list)
input("Press enter to exit.")
Where I get output like this:
[['-73.933087', '40.6960679'],
['-84.39591587', '39.34949003'],
['-111.2325173', '47.49438049']]
So I need to remove single quotes here, and to change square brackets for double quotes.
Just use join to combine each line. You were 95% there with your code already.
import csv
numbers = []
with open('coordinates.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
nums = ",".join(row)
numbers.append(nums)
I think you should simply be able to store it in a pandas dataframe like this:
import pandas as pd
numbers = pd.read_csv (r'Path where the CSV file is stored\File name.csv')
print (numbers)
Then you can convert it to a numpy array or whatever you like.

Prints to console. Now I want to print to CSV file

I can read a text file with names and print in ascending order to console. I simply want to write the sorted names to a column in a CSV file. Can't I take the printed(file) and send to CSV?
Thanks!
import csv
with open('/users/h/documents/pyprojects/boy-names.txt','r') as file:
for file in sorted(file):
print(file, end='')
#the following isn't working.
with open('/users/h/documents/pyprojects/boy-names.csv', 'w', newline='') as csvFile:
names = ['Column1']
writer = csv.writer(names)
print(file)
You can do something like this:
import csv
with open('boy-names.txt', 'rt') as file, open('boy-names.csv', 'w', newline='') as csv_file:
csv_writer = csv.writer(csv_file, quoting=csv.QUOTE_MINIMAL)
csv_writer.writerow(['Column1'])
for boy_name in sorted(file.readlines()):
boy_name = boy_name.rstrip('\n')
print(boy_name)
csv_writer.writerow([boy_name])
This is covered in the documentation.
The only tricky part is converting the lines from the file to a list of 1-element lists.
import csv
with open('/users/h/documents/pyprojects/boy-names.txt','r') as file:
names = [[k.strip()] for k in sorted(file.readlines())]
with open('/users/h/documents/pyprojects/boy-names.csv', 'w', newline='') as csvFile:
writer = csv.writer(csvFile)
writer.writerow(['Column1'])
writer.writerows(names)
So, names will contain (for example):
[['Able'],['Baker'],['Charlie'],['Delta']]
The CSV recorder expects to write a row or a set of rows. EACH ROW has to be a list (or tuple). That's why I created it like I did. By calling writerows, the outer list contains the set of rows to be written. Each element of the outer list is a row. I want each row to contain one item, so each is a one element list.
If I had created this:
['Able','Baker','Charlie','Delta']
then writerows would have treated each string as a sequence, resulting in a CSV file like this:
A,b,l,e
B,a,k,e,r
C,h,a,r,l,i,e
D,e,l,t,a
which is amusing but not very useful. And I know that because I did it while I was creating your answer.

How to read just the first column of each row of a CSV file [duplicate]

This question already has answers here:
Read in the first column of a CSV in Python
(5 answers)
Closed 3 years ago.
How to read just the first column of each row of a CSV file in Python?
My data is something like this:
1 abc
2 bcd
3 cde
and I only need to loop trough the values of the first column.
Also, when I open the csv File in calc the data in each row is all in the same cell, is that normal?
import csv
with open(file) as f:
reader = csv.reader(f, delimiter="\t")
for i in reader:
print i[0]
OR
change the delimter to space if necessary.
reader = csv.reader(f, delimiter=" ")
without csv module,
import csv
with open(file) as f:
for line in f:
print line.split()[0]
You can use itertools.izip to crate a generator contains the columns and use next to get the first column.Its more efficient if you have a large data and you want to refuse of multi-time indexing!
import csv
from itertools import izip
with open('ex.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ')
print next(izip(*spamreader))
To get just the first column as a list:
with open('myFile.csv') as f:
firstColumn = [line.split(',')[0] for line in f]
for the second part of your question:
when opening csv-documents in LibreOffice Calc (openoffice should work the same way) I get a Dialog where I am asked a few things about that document, like charakter encoding and as well the type of separator. If you select "space", it should work. You have a preview at the bottom of this dialog.

Reading column names alone in a csv file

I have a csv file with the following columns:
id,name,age,sex
Followed by a lot of values for the above columns.
I am trying to read the column names alone and put them inside a list.
I am using Dictreader and this gives out the correct details:
with open('details.csv') as csvfile:
i=["name","age","sex"]
re=csv.DictReader(csvfile)
for row in re:
for x in i:
print row[x]
But what I want to do is, I need the list of columns, ("i" in the above case)to be automatically parsed with the input csv than hardcoding them inside a list.
with open('details.csv') as csvfile:
rows=iter(csv.reader(csvfile)).next()
header=rows[1:]
re=csv.DictReader(csvfile)
for row in re:
print row
for x in header:
print row[x]
This gives out an error
Keyerrror:'name'
in the line print row[x]. Where am I going wrong? Is it possible to fetch the column names using Dictreader?
Though you already have an accepted answer, I figured I'd add this for anyone else interested in a different solution-
Python's DictReader object in the CSV module (as of Python 2.6 and above) has a public attribute called fieldnames.
https://docs.python.org/3.4/library/csv.html#csv.csvreader.fieldnames
An implementation could be as follows:
import csv
with open('C:/mypath/to/csvfile.csv', 'r') as f:
d_reader = csv.DictReader(f)
#get fieldnames from DictReader object and store in list
headers = d_reader.fieldnames
for line in d_reader:
#print value in MyCol1 for each row
print(line['MyCol1'])
In the above, d_reader.fieldnames returns a list of your headers (assuming the headers are in the top row).
Which allows...
>>> print(headers)
['MyCol1', 'MyCol2', 'MyCol3']
If your headers are in, say the 2nd row (with the very top row being row 1), you could do as follows:
import csv
with open('C:/mypath/to/csvfile.csv', 'r') as f:
#you can eat the first line before creating DictReader.
#if no "fieldnames" param is passed into
#DictReader object upon creation, DictReader
#will read the upper-most line as the headers
f.readline()
d_reader = csv.DictReader(f)
headers = d_reader.fieldnames
for line in d_reader:
#print value in MyCol1 for each row
print(line['MyCol1'])
You can read the header by using the next() function which return the next row of the reader’s iterable object as a list. then you can add the content of the file to a list.
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
rest = list(reader)
Now i has the column's names as a list.
print i
>>>['id', 'name', 'age', 'sex']
Also note that reader.next() does not work in python 3. Instead use the the inbuilt next() to get the first line of the csv immediately after reading like so:
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = next(reader)
print(i)
>>>['id', 'name', 'age', 'sex']
The csv.DictReader object exposes an attribute called fieldnames, and that is what you'd use. Here's example code, followed by input and corresponding output:
import csv
file = "/path/to/file.csv"
with open(file, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
print([col + '=' + row[col] for col in reader.fieldnames])
Input file contents:
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
00,01,02,03,04,05,06,07,08,09
10,11,12,13,14,15,16,17,18,19
20,21,22,23,24,25,26,27,28,29
30,31,32,33,34,35,36,37,38,39
40,41,42,43,44,45,46,47,48,49
50,51,52,53,54,55,56,57,58,59
60,61,62,63,64,65,66,67,68,69
70,71,72,73,74,75,76,77,78,79
80,81,82,83,84,85,86,87,88,89
90,91,92,93,94,95,96,97,98,99
Output of print statements:
['col0=00', 'col1=01', 'col2=02', 'col3=03', 'col4=04', 'col5=05', 'col6=06', 'col7=07', 'col8=08', 'col9=09']
['col0=10', 'col1=11', 'col2=12', 'col3=13', 'col4=14', 'col5=15', 'col6=16', 'col7=17', 'col8=18', 'col9=19']
['col0=20', 'col1=21', 'col2=22', 'col3=23', 'col4=24', 'col5=25', 'col6=26', 'col7=27', 'col8=28', 'col9=29']
['col0=30', 'col1=31', 'col2=32', 'col3=33', 'col4=34', 'col5=35', 'col6=36', 'col7=37', 'col8=38', 'col9=39']
['col0=40', 'col1=41', 'col2=42', 'col3=43', 'col4=44', 'col5=45', 'col6=46', 'col7=47', 'col8=48', 'col9=49']
['col0=50', 'col1=51', 'col2=52', 'col3=53', 'col4=54', 'col5=55', 'col6=56', 'col7=57', 'col8=58', 'col9=59']
['col0=60', 'col1=61', 'col2=62', 'col3=63', 'col4=64', 'col5=65', 'col6=66', 'col7=67', 'col8=68', 'col9=69']
['col0=70', 'col1=71', 'col2=72', 'col3=73', 'col4=74', 'col5=75', 'col6=76', 'col7=77', 'col8=78', 'col9=79']
['col0=80', 'col1=81', 'col2=82', 'col3=83', 'col4=84', 'col5=85', 'col6=86', 'col7=87', 'col8=88', 'col9=89']
['col0=90', 'col1=91', 'col2=92', 'col3=93', 'col4=94', 'col5=95', 'col6=96', 'col7=97', 'col8=98', 'col9=99']
How about
with open(csv_input_path + file, 'r') as ft:
header = ft.readline() # read only first line; returns string
header_list = header.split(',') # returns list
I am assuming your input file is CSV format.
If using pandas, it takes more time if the file is big size because it loads the entire data as the dataset.
I am just mentioning how to get all the column names from a csv file.
I am using pandas library.
First we read the file.
import pandas as pd
file = pd.read_csv('details.csv')
Then, in order to just get all the column names as a list from input file use:-
columns = list(file.head(0))
Thanking Daniel Jimenez for his perfect solution to fetch column names alone from my csv, I extend his solution to use DictReader so we can iterate over the rows using column names as indexes. Thanks Jimenez.
with open('myfile.csv') as csvfile:
rest = []
with open("myfile.csv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
i=i[1:]
re=csv.DictReader(csvfile)
for row in re:
for x in i:
print row[x]
here is the code to print only the headers or columns of the csv file.
import csv
HEADERS = next(csv.reader(open('filepath.csv')))
print (HEADERS)
Another method with pandas
import pandas as pd
HEADERS = list(pd.read_csv('filepath.csv').head(0))
print (HEADERS)
import pandas as pd
data = pd.read_csv("data.csv")
cols = data.columns
I literally just wanted the first row of my data which are the headers I need and didn't want to iterate over all my data to get them, so I just did this:
with open(data, 'r', newline='') as csvfile:
t = 0
for i in csv.reader(csvfile, delimiter=',', quotechar='|'):
if t > 0:
break
else:
dbh = i
t += 1
Using pandas is also an option.
But instead of loading the full file in memory, you can retrieve only the first chunk of it to get the field names by using iterator.
import pandas as pd
file = pd.read_csv('details.csv'), iterator=True)
column_names_full=file.get_chunk(1)
column_names=[column for column in column_names_full]
print column_names

Categories