reading directory inside a csv files using pandas - python

i have a csv file which look like the following
/user/desktop/1.jpg, 0
/user/desktop/2.jpg, 0
the first column is a file directory and second column is class label
what I want is read the directory one by one, how should i do that?
Now what I try is read the csv file and assign column name and read the column but it seems not working
df = pandas.read_csv('csv_file', names=['name','class'])
x = matplotlib.image.imread(df['name'])

I think you shouldn't use pandas like this.
Better is just to open the file and read it row by row, like:
import csv
with open('file.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
img_path = row[0]
img_class = row[1]
matplotlib.image.imread(img_path)

Related

Create and append CSV row by row

I would just like to create a csv file and at the same time add my data row by row with a for loop.
for x in y:
newRow = "\n%s,%s\n" % (sentence1, sentence2)
with open('Mydata.csv', "a") as f:
f.write(newRow)
After the above process, I tried to read the csv file but I can't separate the columns. It seems that there is only one column, maybe I did something wrong in the csv creation process?
colnames = ['A_sentence', 'B_sentence']
Mydata = pd.read_csv(Mydata, names=colnames, delimiter=";")
print(Mydata['A_sntence']) #output Nan
When you are writing the file, it looks like you are using commas as separators, but when reading the file you are using semicolons (probably just a typo). Change delimiter=";" to delimiter="," and it should work.

Missing out rows with blank spaces when writing to a new CSV file

I'm attempting to write a program that enters a directory full of CSV files (all with the same layout but different data), reads the files, and writes all the data in the specific columns to a new CSV file. I would also like it to miss out the entire row of data is there is a blank space in one of the columns (in this case, if there is a gap in the Name column).
The program works fine in writing in specific columns (in this case Name and Location) from the old CSV files to the new one, however, I am unsure as to how I would miss out a line if there was a blank space.
import nltk
import csv
from nltk.corpus import PlaintextCorpusReader
root = '/Users/bennaylor/local documents/humanotics'
incorpus = root + '/chats/input/'
outcorpus =root + '/chats/output.csv'
doccorpus = PlaintextCorpusReader(incorpus, '.*\.csv')
filelist = doccorpus.fileids()
with open(outcorpus, 'w', newline='') as fw:
fieldnames = (['Name','Location'])
writer = csv.DictWriter(fw, fieldnames=fieldnames)
writer.writeheader()
print('Writing Headers!!')
for rawdoc in filelist:
infile = incorpus + rawdoc
with open(infile, encoding='utf-8') as fr:
reader = csv.DictReader(fr)
for row in reader:
rowName = row['Name']
rowLocation = row['Location']
writer.writerow({'Name': rowName, 'Location': rowLocation})
An example CSV input file would look like this:
Name,Age,Location,Birth Month
Steve,14,London,November
,18,Sydney,April
Matt,12,New York,June
Jeff,20,,December
Jonty,19,Greenland,July
With gaps in the Name column on the third row, and Location column on the fifth. In this case, I would like the program to miss out the third row when writing the data to a new CSV as there is a gap in the Name column
Thanks in advance for any help.
This is easy to do using pandas:
import pandas as pd
import os
# Create an empty data frame
df = pd.DataFrame()
# Add the data from all the files into the data frame
for filename in filelist:
data = pd.read_csv(os.path.join(incorpus, filename))
df = df.append(data)
# Drop rows with any empty values
df = df.dropna()
# Keep only the needed columns
df = df.reindex(columns=['Name', 'Location'])
# Write the dataframe to a .csv file
df.to_csv(outcorpus)

How to replace one value with another in a csv file?

I have a CSV file with information and want to replace the information in a specific location with a new value.
For example if my CSV file looks like this:
example1,example2,0
example3,example4,0
exampple5,example6,0
Note that each row is labelled for example:
test = row[0]
test1 = row[1]
test2 = row[2]
If I want to replace
test[0]
with a new value how would I go about doing it?
Simplest way without installing any additional package would be to use built-in csv to read the whole file in a matrix and replace the desired element.
Here is code that would do just that:
import csv
with open('test.csv', 'r') as in_file, open('test_out.csv', 'wb') as out_file:
data = [row for row in csv.reader(in_file)]
data[0][0] = 'new value'
writer = csv.writer(out_file)
writer.writerows(data)
There are a handful of ways to do this, but personally I'm a big fan of pandas. With pandas, you can read a csv file with df = pd.read_csv('path_to_file.csv'). Make changes however you want, if you wanted row 1 column 1, you'd use df.loc[0,0] = new_val. Then when you are done save to the same file df.to_csv('path_to_file.csv').

How to overwrite a particular column of a csv file using pandas or normal python?

I am new to python. I have a .csv file which has 13 columns. I want to round off the floating values of the 2nd column which I was able to achieve successfully. I did this and stored it in a list. Now I am unable to figure out how to overwrite the rounded off values into the same csv file and into the same column i.e. column 2? I am using python3. Any help will be much appreciated.
My code is as follows:
Import statements for module import:
import csv
Creating an empty list:
list_string = []
Reading a csv file
with open('/home/user/Desktop/wine.csv', 'r') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter = ',')
next(csvReader, None)
for row in csvReader:
floatParse = float(row[1])
closestInteger = int(round(floatParse))
stringConvert = str(closestInteger)
list_string.append(stringConvert)
print(list_string)
Writing into the same csv file for the second column (Overwrites the entire Excel file)
with open('/home/user/Desktop/wine.csv', 'w') as csvDataFile:
writer = csv.writer(csvDataFile)
next(csvDataFile)
row[1] = list_string
writer.writerows(row[1])
PS: The writing into the csv overwrites the entire csv and removes all the other columns which I don't want. I just want to overwrite the 2nd column with rounded off values and keep the rest of the data same.
this might be what you're looking for.
import pandas as pd
import numpy as np
#Some sample data
data = {"Document_ID": [102994,51861,51879,38242,60880,76139,76139],
"SecondColumnName": [7.256,1.222,3.16547,4.145658,4.154656,6.12,17.1568],
}
wine = pd.DataFrame(data)
#This is how you'd read in your data
#wine = pd.read_csv('/home/user/Desktop/wine.csv')
#Replace the SecondColumnName with the real name
wine["SecondColumnName"] = wine["SecondColumnName"].map('{:,.2f}'.format)
#This will overwrite the sheet, but it will have all the data as before
wine.to_csv(/home/user/Desktop/wine.csv')
Pandas is way easier than read csv...I'd recommended checking it out.
I think this better answers the specific question. The key to this is to define an input_file and an output_file during the with part.
The StringIO part is just there for sample data in this example. newline='' is for Python 3. Without it, blank lines between each row appears in the output. More info.
import csv
from io import StringIO
s = '''A,B,C,D,E,F,G,H,I,J,K,L
1,4.4343,3,4,5,6,7,8,9,10,11
1,8.6775433,3,4,5,6,7,8,9,10,11
1,16.83389832,3,4,5,6,7,8,9,10,11
1,32.2711122,3,4,5,6,7,8,9,10,11
1,128.949483,3,4,5,6,7,8,9,10,11'''
list_string = []
with StringIO(s) as input_file, open('output_file.csv', 'w', newline='') as output_file:
reader = csv.reader(input_file)
next(reader, None)
writer = csv.writer(output_file)
for row in reader:
floatParse = float(row[1]) + 1
closestInteger = int(round(floatParse))
stringConvert = str(closestInteger)
row[1] = stringConvert
writer.writerow(row)

Copying one column of a CSV file and adding it to another file using python

I have two files, the first one is called book1.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
The second file is called book2.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,4
My goal is to copy the column that contains the 5's in book1.csv to the corresponding column in book2.csv.
The problem with my code seems to be that it is not appending right nor is it selecting just the index that I want to copy.It also gives an error that I have selected an incorrect index position. The output is as follows:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,41,2,3,4,5
Here is my code:
import csv
with open('C:/Users/SAM/Desktop/book2.csv','a') as csvout:
write=csv.writer(csvout, delimiter=',')
with open('C:/Users/SAM/Desktop/book1.csv','rb') as csvfile1:
read=csv.reader(csvfile1, delimiter=',')
header=next(read)
for row in read:
row[5]=write.writerow(row)
What should I do to get this to append properly?
Thanks for any help!
What about something like this. I read in both books, append the last element of book1 to the book2 row for every row in book2, which I store in a list. Then I write the contents of that list to a new .csv file.
with open('book1.csv', 'r') as book1:
with open('book2.csv', 'r') as book2:
reader1 = csv.reader(book1, delimiter=',')
reader2 = csv.reader(book2, delimiter=',')
both = []
fields = reader1.next() # read header row
reader2.next() # read and ignore header row
for row1, row2 in zip(reader1, reader2):
row2.append(row1[-1])
both.append(row2)
with open('output.csv', 'w') as output:
writer = csv.writer(output, delimiter=',')
writer.writerow(fields) # write a header row
writer.writerows(both)
Although some of the code above will work it is not really scalable and a vectorised approach is needed. Getting to work with numpy or pandas will make some of these tasks easier so it is great to learn a bit of it.
You can download pandas from the Pandas Website
# Load Pandas
from pandas import DataFrame
# Load each file into a pandas dataframe, this is based on a numpy array
data1 = DataFrame.from_csv('csv1.csv',sep=',',parse_dates=False)
data2 = DataFrame.from_csv('csv2.csv',sep=',',parse_dates=False)
#Now add 'header5' from data1 to data2
data2['header5'] = data1['header5']
#Save it back to csv
data2.to_csv('output.csv')
Regarding the "error that I have selected an incorrect index position," I suspect this is because you're using row[5] in your code. Indexing in Python starts from 0, so if you have A = [1, 2, 3, 4, 5] then to get the 5 you would do print(A[4]).
Assuming the two files have the same number of rows and the rows are in the same order, I think you want to do something like this:
import csv
# Open the two input files, which I've renamed to be more descriptive,
# and also an output file that we'll be creating
with open("four_col.csv", mode='r') as four_col, \
open("five_col.csv", mode='r') as five_col, \
open("five_output.csv", mode='w', newline='') as outfile:
four_reader = csv.reader(four_col)
five_reader = csv.reader(five_col)
five_writer = csv.writer(outfile)
_ = next(four_reader) # Ignore headers for the 4-column file
headers = next(five_reader)
five_writer.writerow(headers)
for four_row, five_row in zip(four_reader, five_reader):
last_col = five_row[-1] # # Or use five_row[4]
four_row.append(last_col)
five_writer.writerow(four_row)
Why not reading the files line by line and use the -1 index to find the last item?
endings=[]
with open('book1.csv') as book1:
for line in book1:
# if not header line:
endings.append(line.split(',')[-1])
linecounter=0
with open('book2.csv') as book2:
for line in book2:
# if not header line:
print line+','+str(endings[linecounter]) # or write to file
linecounter+=1
You should also catch errors if row numbers don't match.

Categories