How can I get a specific field of a csv file?

How can I get a specific field of a csv file? - python

I need a way to get a specific item(field) of a CSV. Say I have a CSV with 100 rows and 2 columns (comma seperated). First column emails, second column passwords. For example I want to get the password of the email in row 38. So I need only the item from 2nd column row 38...
Say I have a csv file:
aaaaa#aaa.com,bbbbb
ccccc#ccc.com,ddddd
How can I get only 'ddddd' for example?
I'm new to the language and tried some stuff with the csv module, but I don't get it...

import csv
mycsv = csv.reader(open(myfilepath))
for row in mycsv:
text = row[1]
Following the comments to the SO question here, a best, more robust code would be:
import csv
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[1]
............
Update: If what the OP actually wants is the last string in the last row of the csv file, there are several aproaches that not necesarily needs csv. For example,
fulltxt = open(mifilepath, 'rb').read()
laststring = fulltxt.split(',')[-1]
This is not good for very big files because you load the complete text in memory but could be ok for small files. Note that laststring could include a newline character so strip it before use.
And finally if what the OP wants is the second string in line n (for n=2):
Update 2: This is now the same code than the one in the answer from J.F.Sebastian. (The credit is for him):
import csv
line_number = 2
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
text = mycsv[line_number][1]
............

#!/usr/bin/env python
"""Print a field specified by row, column numbers from given csv file.
USAGE:
%prog csv_filename row_number column_number
"""
import csv
import sys
filename = sys.argv[1]
row_number, column_number = [int(arg, 10)-1 for arg in sys.argv[2:])]
with open(filename, 'rb') as f:
rows = list(csv.reader(f))
print rows[row_number][column_number]
Example
$ python print-csv-field.py input.csv 2 2
ddddd
Note: list(csv.reader(f)) loads the whole file in memory. To avoid that you could use itertools:
import itertools
# ...
with open(filename, 'rb') as f:
row = next(itertools.islice(csv.reader(f), row_number, row_number+1))
print row[column_number]

import csv
def read_cell(x, y):
with open('file.csv', 'r') as f:
reader = csv.reader(f)
y_count = 0
for n in reader:
if y_count == y:
cell = n[x]
return cell
y_count += 1
print (read_cell(4, 8))
This example prints cell 4, 8 in Python 3.

There is an interesting point you need to catch about csv.reader() object. The csv.reader object is not list type, and not subscriptable.
This works:
for r in csv.reader(file_obj): # file not closed
print r
This does not:
r = csv.reader(file_obj)
print r[0]
So, you first have to convert to list type in order to make the above code work.
r = list( csv.reader(file_obj) )
print r[0]

Finaly I got it!!!
import csv
def select_index(index):
csv_file = open('oscar_age_female.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
l = line['Index']
if l == index:
print(line[' "Name"'])
select_index('11')
"Bette Davis"

Following may be be what you are looking for:
import pandas as pd
df = pd.read_csv("table.csv")
print(df["Password"][row_number])
#where row_number is 38 maybe

import csv
inf = csv.reader(open('yourfile.csv','r'))
for row in inf:
print row[1]

Related

Extract two columns sorted from CSV

I have a large csv file, containing multiple values, in the form
Date,Dslam_Name,Card,Port,Ani,DownStream,UpStream,Status
2020-01-03 07:10:01,aart-m1-m1,204,57,302xxxxxxxxx,0,0,down
I want to extract the Dslam_Name and Ani values, sort them by Dslam_name and write them to a new csv in two different columns.
So far my code is as follows:
import csv
import operator
with open('bad_voice_ports.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
sortedlist = sorted(readCSV, key=operator.itemgetter(1))
for row in sortedlist:
bad_port = row[1][:4],row[4][2::]
print(bad_port)
f = open("bad_voice_portsnew20200103SORTED.csv","a+")
f.write(row[1][:4] + " " + row[4][2::] + '\n')
f.close()
But my Dslam_Name and Ani values are kept in the same column.
As a next step I would like to count how many times the same value appears in the 1st column.

You are forcing them to be a single column. Joining the two into a single string means Python no longer regards them as separate.
But try this instead:
import csv
import operator
with open('bad_voice_ports.csv') as readfile, open('bad_voice_portsnew20200103SORTED.csv', 'w') as writefile:
readCSV = csv.reader(readfile)
writeCSV = csv.writer(writefile)
for row in sorted(readCSV, key=operator.itemgetter(1)):
bad_port = row[1][:4],row[4][2::]
print(bad_port)
writeCSV.writerow(bad_port)
If you want to include the number of times each key occurred, you can easily include that in the program, too. I would refactor slightly to separate the reading and the writing.
import csv
import operator
from collections import Counter
with open('bad_voice_ports.csv') as readfile:
readCSV = csv.reader(readfile)
rows = []
counts = Counter()
for row in readCSV:
rows.append([row[1][:4], row[4][2::]])
counts[row[1][:4]] += 1
with open('bad_voice_portsnew20200103SORTED.csv', 'w') as writefile:
writeCSV = csv.writer(writefile)
for row in sorted(rows):
print(row)
writeCSV.writerow([counts[row[0]]] + row)
I would recommend to remove the header line from the CSV file entirely; throwing away (or separating out and prepending back) the first line should be an easy change if you want to keep it.
(Also, hard-coding input and output file names is problematic; maybe have the program read them from sys.argv[1:] instead.)

So my suggestion is failry simple. As i stated in a previous comment there is good documentation on CSV read and write in python here: https://realpython.com/python-csv/
As per an example, to read from a csv the columns you need you can simply do this:
>>> file = open('some.csv', mode='r')
>>> csv_reader = csv.DictReader(file)
>>> for line in csv_reader:
... print(line["Dslam_Name"] + " " + line["Ani"])
...
This would return:
aart-m1-m1 302xxxxxxxxx
Now you can just as easilly create a variable and store the column values there and later write them to a file or just open up a new file wile reading lines and writing the column values in there. I hope this helps you.

After the help from #tripleee and #marxmacher my final code is
import csv
import operator
from collections import Counter
with open('bad_voice_ports.csv') as csv_file:
readCSV = csv.reader(csv_file, delimiter=',')
sortedlist = sorted(readCSV, key=operator.itemgetter(1))
line_count = 0
rows = []
counts = Counter()
for row in sortedlist:
Dslam = row[1][:4]
Ani = row[4][2:]
if line_count == 0:
print(row[1], row[4])
line_count += 1
else:
rows.append([row[1][:4], row[4][2::]])
counts[row[1][:4]] += 1
print(Dslam, Ani)
line_count += 1
for row in sorted(rows):
f = open("bad_voice_portsnew202001061917.xls","a+")
f.write(row[0] + '\t' + row[1] + '\t' + str(counts[row[0]]) + '\n')
f.close()
print('Total of Bad ports =', str(line_count-1))
As with this way the desired values/columns are extracted from the initial csv file and a new xls file is generated with the desired values stored in different columns and the total values per key are counted, along with the total of entries.
Thanks for all the help, please feel free for any improvement suggestions!

You can use sorted:
import csv
_h, *data = csv.reader(open('filename.csv'))
with open('new_csv.csv', 'w') as f:
write = csv.writer(f)
csv.writerows([_h, *sorted([(i[1], i[4]) for i in data], key=lambda x:x[0])])

Edit a piece of data inside a csv

I have a csv file looking like this
34512340,1
12395675,30
56756777,30
90673412,45
12568673,25
22593672,25
I want to be able to edit the data after the comma from python and then save the csv.
Does anybody know how I would be able to do this?
This bit of code below will write a new line, but not edit:
f = open("stockcontrol","a")
f.write(code)

Here is a sample, which adds 1 to the second column:
import csv
with open('data.csv') as infile, open('output.csv', 'wb') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
# Transform the second column, which is row[1]
row[1] = int(row[1]) + 1
writer.writerow(row)
Notes
The csv module correctly parses the CSV file--highly recommended
By default, each row will be parsed as text, what is why I converted into integer: int(row[1])
Update
If you really want to edit the file "in place", then use the fileinput module:
import fileinput
for line in fileinput.input('data.csv', inplace=True):
fields = line.strip().split(',')
fields[1] = str(int(fields[1]) + 1) # "Update" second column
line = ','.join(fields)
print line # Write the line back to the file, in place

You can use python pandas to edit the column you want for e.g increase the column number by n:
import pandas
data_df = pandas.read_csv('input.csv')
data_df = data_df['column2'].apply(lambda x: x+n)
print data_df
for adding 1 replace n by 1.

Row count in a csv file

I am probably making a stupid mistake, but I can't find where it is. I want to count the number of lines in my csv file. I wrote this, and obviously isn't working: I have row_count = 0 while it should be 400. Cheers.
f = open(adresse,"r")
reader = csv.reader(f,delimiter = ",")
data = [l for l in reader]
row_count = sum(1 for row in reader)
print row_count

with open(adresse,"r") as f:
reader = csv.reader(f,delimiter = ",")
data = list(reader)
row_count = len(data)
You are trying to read the file twice, when the file pointer has already reached the end of file after saving the data list.

First you have to open the file with open
input_file = open("nameOfFile.csv","r+")
Then use the csv.reader for open the csv
reader_file = csv.reader(input_file)
At the last, you can take the number of row with the instruction 'len'
value = len(list(reader_file))
The total code is this:
input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))
Remember that if you want to reuse the csv file, you have to make a input_file.fseek(0), because when you use a list for the reader_file, it reads all file, and the pointer in the file change its position

If you are working with python3 and have pandas library installed you can go with
import pandas as pd
results = pd.read_csv('f.csv')
print(len(results))

I would consider using a generator. It would do the job and keeps you safe from MemoryError of any kind
def generator_count_file_rows(input_file):
for row in open(input_file,'r'):
yield row
And then
for row in generator_count_file_rows('very_large_set.csv'):
count+=1

The important stuff is hidden in comments section of solution which is marked correct.
Re-sharing Erdős-Bacon's solution here for better visibility.
Why ?
Because: It saves lot of memory without having to create list.
So I think it is better do this way
def read_raw_csv(file_name):
with open(file_name, 'r') as file:
csvreader = csv.reader(file)
# count number of rows
entry_count = sum(1 for row in csvreader)
print(entry_count-1) # -1 is for discarding header row.
Checkout this link for more info

# with built in libraries
opened_file = open('f.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)
rowcount = len(apps_data) #which incudes header row
print("Total rows incuding header: " + str(rowcount))

Simply Open the csv file in Notepad++. It shows the total row count in a jiffy. :)
Or
in cmd prompt , Provide file path and key in the command
find \c \v "some meaningless string" Filename.csv

append content of one csv file to another using python

I have 2 csv files:
output.csv
output1.csv
output.csv has a 5 columns of titles.
output1.csv has about 40 columns of different types of data.
I need to append all the content of output1.csv to output.csv. How can I do this?
could somebody please give me a hint on how to go about it ???
i have the following code :
reader=csv.DictReader(open("test.csv","r"))
allrows = list(reader)
keepcols = [c for c in allrows[0] if all(r[c] != '0' for r in allrows)]
print keepcols
writer=csv.DictWriter(open("output.csv","w"),fieldnames='keepcols',extrasaction='ignore')
writer.writerows(allrows)
with open("test1.csv","r") as f:
fields=next(f).split()
# print(fields)
allrows=[]
for line in f:
line=line.split()
row=dict(zip(fields,line))
allrows.append(row)
# print(row)
keepcols = [c for c in fields if any(row[c] != '0' for row in allrows)]
print keepcols
writer=csv.DictWriter(open("output1.csv","w"),fieldnames=keepcols,extrasaction='ignore')
writer.writerows(allrows)
test.csv generates output.csv
test1.csv generates output1.csv
i m trying to see if i can make both files generate my output in the same file..

If I understand your question correctly, you want to create a csv with 41 columns - the 1 from output.csv followed by the 40 from output1.csv.
I assume they have the same number of rows (if not - what is the necessary behavior?)
Try using the csv module:
import csv
reader = csv.reader(open('output.csv', 'rb'))
reader1 = csv.reader(open('output1.csv', 'rb'))
writer = csv.writer(open('appended_output.csv', 'wb'))
for row in reader:
row1 = reader1.next()
writer.writerow(row + row1)
If your csv files are formatted with special delimiters or quoting characters, you can use the optional keyword arguments for the csv.reader and csv.writer objects.
See Python's csv module documentation for details...
EDIT: Added 'b' flag, as suggested.

This recent discussion looks very similar to what you are looking for except that the OP there wanted to concatenate mp3 files.
EDIT:
import os, sys
target = '/path/to/target'
src1 = '/path/to/source1.csv'
src2 = '/path/to/source2.csv'
tf = open(target, 'a')
tf.write(open(src1).read())
tf.write(open(src2).read())
tf.close()
try this, this should work since you simply want to do the equivalent of cat src1 src2 > target of shell command

"I need to append all the content of output1.csv to output.csv." ... taken literally that would mean write each row in the first file followed by each row in the second file. Is that what you want??
titles of what? the 40 columns in the other file?? If this is so, then assuming that you want the titles written as a row of column headings:
import csv
titles = [x[0] for x in csv.reader(open('titles.csv', 'rb'))]
writer = csv.writer(open('merged.csv', 'wb'))
writer.writerow(titles)
for row in csv.reader(open('data.csv', 'rb')):
writer.writerow(row)

You could also use a generator from the reader if you want to pass a condition:
import csv
def read_generator(filepath:str):
with open(filepath, 'rb'):
reader = csv.reader(f)
for row in reader:
if row[0] == condition:
yield row
and then write from that with:
writer = csv.writer(open("process.csv", "rb"))
write.writerow(read_generator(file_to_read.csv))

csv python questions

i am opening a csv file like this:
import csv
reader = csv.reader(open("book1.csv", "rb"))
for row in reader:
print row
how can i replace the value in column 3 with its log and then save the result into a new csv?

Like this?
>>> input = "1,2,3\n4,5,6\n7,8,9".splitlines()
>>> reader=csv.reader(input)
>>> for row in reader:
... row[2] = log(float(row[2]))
... print ','.join(map(str,row))
...
1,2,1.09861228867
4,5,1.79175946923
7,8,2.19722457734

These links might help:
http://docs.python.org/library/csv.html#csv.writer
http://docs.python.org/tutorial/datastructures.html?highlight=array
Each row being returned by reader is an array. Arrays in Python are 0 based (So to access the third entry in a row, you would use my_array[2])
That should help you on your way.

You should use the context manager WITH statement for files - cleaner, less code, obviates file.close() statements.
e.g.
import csv
import math
with open('book1.csv', 'rb') as f1,open('book2.csv', 'wb') as f2:
reader = csv.reader(f1)
writer = csv.writer(f2)
for row in reader:
row[2] = str(math.log(float(row[2])))
writer.writerow(row)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I get a specific field of a csv file? - python

import csv def read_cell(x, y): with open('file.csv', 'r') as f: reader = csv.reader(f) y_count = 0 for n in reader: if y_count == y: cell = n[x] return cell y_count += 1 print (read_cell(4, 8)) This example prints cell 4, 8 in Python 3.

Finaly I got it!!! import csv def select_index(index): csv_file = open('oscar_age_female.csv', 'r') csv_reader = csv.DictReader(csv_file) for line in csv_reader: l = line['Index'] if l == index: print(line[' "Name"']) select_index('11') "Bette Davis"

Following may be be what you are looking for: import pandas as pd df = pd.read_csv("table.csv") print(df["Password"][row_number]) #where row_number is 38 maybe

import csv inf = csv.reader(open('yourfile.csv','r')) for row in inf: print row[1]

Related

Extract two columns sorted from CSV

Edit a piece of data inside a csv

Row count in a csv file

append content of one csv file to another using python

csv python questions

Categories

Resources