How to convert contents of csv to lowercase in Python?

How to convert contents of csv to lowercase in Python? - python

I've looked at all the threads I can find but I still can't figure it out, I'm having all sorts of issues.
First issue is that I can't change the items in a list to lower case, so I have to convert it to a string first. Once I do that I can't append strings back into the list without it creating a double list. Why can't I simply change a list to lowercase, delete contents in csv, then paste the lowercase list back in?
My latest attempt, but I've tried many things.
with open(teacherDD, 'r+') as f:
read = csv.reader(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
for row in read:
copyRow = row.copy()
# print(copyRow)
del row[:]
# print(row)
getLowerStr = str(copyRow).lower()
# appendLower = row.append(getLowerStr)
# print(getLowerStr)
print(row)
f.write(getLowerStr)
f.close()

If you just want to convert to lowercase, why use the csv reader?
You can use the fileinput module to edit lines in place.
Python3.x
import fileinput
for line in fileinput.input("test.txt", inplace=1):
print(line.lower(), end='')
Python2.x
import fileinput
import sys
for line in fileinput.input("test.txt", inplace=1):
sys.stdout.write(line.lower())
One cool feature of this module is, when you open a file with it, anything that is printed with print or sys.stdout.write is redirected to the file.
Sample input:
UPPER,CASE,ROW
Output:
upper,case,row

Related

read the header and replace a column value with another one in Python

I am a new bie to Python and I am trying to read in a file with the below format
ORDER_NUMBER!Speed_Status!Days!
10!YES!100!
10!NO!100!
10!TRUE!100!
And the output to be written to the same file is
ORDER_NUMBER!STATUS!Days!
10!YES!100!
10!NO!100!
10!TRUE!100!
so far I tried
# a file named "repo", will be opened with the reading mode.
file = open('repo.dat', 'r+')
# This will print every line one by one in the file
for line in file:
if line.startswith('ORDER_NUMBER'):
words = [w.replace('Speed_Status', 'STATUS') for w in line.partition('!')]
file.write(words)
input()
But somehow its not working. what am I missing.

Read file ⇒ replace content ⇒ write to file:
with open('repo.dat', 'r') as f:
data = f.read()
data = data.replace('Speed_Status', 'STATUS')
with open('repo.dat', 'w') as f:
f.write(data)

The ideal way would be to use the fileinput module to replace the file contents in-place instead of opening the file in update mode r+
from __future__ import print_function
import fileinput
for line in fileinput.input("repo.dat", inplace=True):
if line.startswith('ORDER_NUMBER'):
print (line.replace("Speed_Status", "STATUS"), end="")
else:
print (line, end="")
As for why your attempt didn't work, the logic to form the words is quite incorrect, when you partition the line based on !, the list you formed back is in out of order as ['ORDER_NUMBER', '!', 'STATUS!Days!\n'] with the embedded new-line. Also your write() call would never take a non-character buffer object. You need to have cast it into a string format to print it.

CSV Writer truncates characters in sequence in Excel 2013

I have an interesting situation with Python's csv module. I have a function that takes specific lines from a text file and writes them to csv file:
import os
import csv
def csv_save_use(textfile, csvfile):
with open(textfile, "rb") as text:
for line in text:
line=line.strip()
with open(csvfile, "ab") as f:
if line.startswith("# Online_Resource"):
write = csv.writer(f, dialect='excel',
delimiter='\t',
lineterminator="\t",
)
write.writerow([line.lstrip("# ")])
if line.startswith("##"):
write = csv.writer(f, dialect='excel',
delimiter='\t',
lineterminator="\t",
)
write.writerow([line.lstrip("# ")])
Here is a sample of some strings from the original text file:
# Online_Resource: https://www.ncdc.noaa.gov/
## Corg% percent organic carbon,,,%,,paleoceanography,,,N
What is really bizarre is the final csv file looks good, except the characters in the first column only (those with the # originally) partially "overwrite" each other when I try to manually delete some characters from the cell:
Oddly enough, too, there seems to be no formula to how the characters get jumbled each time I try to delete some after running the script. I tried encoding the csv file as unicode to no avail.
Thanks.

You've selected excel dialect but you overrode it with weird parameters:
You're using TAB as separator and line terminator, which creates a 1-line CSV file. Close enough to "truncated" to me
Also quotechar shouldn't be a space.
This conveyed a nice side-effect as you noted: the csv module actually splits the lines according to commas!
The code is inefficient and error-prone: you're opening the file in append mode in the loop and create a new csv writer each time. Better done outside the loop.
Also, comma split must be done by hand now. So even better: use csv module to read the file as well. My fix proposal for your routine:
import os
import csv
def csv_save_use(textfile, csvfile):
with open(textfile, "rU") as text, open(csvfile, "wb") as f:
write = csv.writer(f, dialect='excel',
delimiter='\t')
reader = csv.reader(text, delimiter=",")
for row in reader:
if not row:
continue # skip possible empty rows
if row[0].startswith("# Online_Resource"):
write.writerow([row[0].lstrip("# ")])
elif row[0].startswith("##"):
write.writerow([row[0].lstrip("# ")]+row[1:]) # write row, stripping the first item from hashes
Note that the file isn't properly displayed in excel unless to remove delimiter='\t (reverts back to default comma)
Also note that you need to replace open(csvfile, "wb") as f by open(csvfile, "w",newline='') as f for Python 3.
here's how the output looks now (note that the empty cells are because there are several commas in a row)

more problems:
line=line.strip(" ") removes leading and trailing spaces. It doesn't remove \r or \n ... try line=line.strip() which removes leading and trailing whitespace
you get all your line including commas in one cell because you haven't split it up somehow ... like using a csv.reader instance. See here:
https://docs.python.org/2/library/csv.html#csv.reader
str.lstrip non-default arg is treated as a set of characters to be removed, so '## ' has the same effect as '# '. if guff.startswith('## ') then do guff = guff[3:] to get rid of the unwanted text
It is not very clear at all what the sentence containing "bizarre" means. We need to see exactly what is in the output csv file. Create a small test file with 3 records (1) with '# Online_Resource' (2) with "## " (3) none of the above, run your code, and show the output, like this:
print repr(open('testout.csv', 'rb').read())

What does this line of code do?

I was just wondering what this line of code does:
writerow([recordlist[i][0], recordlist[i][1], recordlist[i][2]])
I know its a parameter of some sort, but what does it actually do in all of this code:
recordlist=[["1",chinese, "male"],["2",indian, "female"]]
import math
import csv
file_name = 'info.txt'
ofile = open(file_name, 'a')
writer = csv.writer(ofile, delimiter=',', lineterminator='\n')
for i in range(0,len(recordlist)):
writer.writerow([recordlist[i][0], recordlist[i][1], recordlist[i][2]])
ofile.close()
Thank you!

You've created a csvwriter. It has a method writerow that takes a sequence (list, tuple, etc.) of values to write the underlying file in delimited format, which in this case uses a comma as the delimiter. So it will create a row in the csv file for each row in the recordlist variable, as it iterates over it in the for loop. Each row will consist of the values defined on the first line of your code, separated by commas.
The real answer should be "run it and try it" to see what it does.
Then read the documentation of the csv module in Python here

Trying to import a list of words using csv (Python 2.7)

import csv, Tkinter
with open('most_common_words.csv') as csv_file: # Opens the file in a 'closure' so that when it's finished it's automatically closed"
csv_reader = csv.reader(csv_file) # Create a csv reader instance
for row in csv_reader: # Read each line in the csv file into 'row' as a list
print row[0] # Print the first item in the list
I'm trying to import this list of most common words using csv. It continues to give me the same error
for row in csv_reader: # Read each line in the csv file into 'row' as a list
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
I've tried a couple different ways to do it as well, but they didn't work either. Any suggestions?
Also, where does this file need to be saved? Is it okay just being in the same folder as the program?

You should always open a CSV file in binary mode (Python 2) or universal newline mode (Python 3). Also, make sure that the delimiters and quote characters are , and ", or you'll need to specify otherwise:
with open('most_common_words.csv', 'rb') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';', quotechar='"') # for EU CSV
You can save the file in the same folder as your program. If you don't, you can provide the correct path to open() as well. Be sure to use raw strings if you're on Windows, otherwise the backslashes may trick you: open(r"C:\Python27\data\table.csv")

It seems you have a file with one column as you say here:
It is a simple list of words. When I open it up, it opens into Excel
with one column and 500 rows of 500 different words.
If so, you don't need the csv module at all:
with open('most_common_words.csv') as f:
rows = list(f)
Note in this case, each item of the list will have the newline appended to it, so if your file is:
apple
dog
cat
rows will be ['apple\n', 'dog\n', 'cat\n']
If you want to strip the end of line, then you can do this:
with open('most_common_words.csv') as f:
rows = list(i.rstrip() for i in f)

CVS writerow() after expression

I haven't been able to re.sub a csv file.
My expression is doing it's job but the writerow is where I'm stuck.
re.sub out
"A1","Address2" "A1","Address2"
0138,"DEERFIELD AVE" 0138,"DEERFIELD"
0490,"REMMINGTON COURT" 0490,"REMMINGTON"
2039,"SANDHILL DR" 2039,"SANDHILL"
import csv
import re
with open('aa_street.txt', 'rb') as f:
reader = csv.reader(f)
read=csv.reader(f)
for row in read:
row_one = re.sub('\s+(DR|COURT|AVE|)\s*$', ' ', row[1])
row_zero = row[0]
print row_one
for row in reader:
print writerow([row[0],row[1]])

Perhaps something like this is what you need?
#!/usr/local/cpython-3.3/bin/python
# "A1","Address2" "A1","Address2"
# 0138,"DEERFIELD AVE" 0138,"DEERFIELD"
# 0490,"REMMINGTON COURT" 0490,"REMMINGTON"
# 2039,"SANDHILL DR" 2039,"SANDHILL"
import re
import csv
with open('aa_street.txt', 'r') as infile, open('actual-output', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
row_zero = row[0]
row_one = re.sub('\s+(DR|COURT|AVE|)\s*$', '', row[1])
writer.writerow([row_zero, row_one])

A file is an iterator—you iterate over it once, and then it's empty.
A csv.reader is also an iterator.
In general, if you want to reuse an iterator, there are three ways to do it:
Re-generate the iterator (and, if its source was an iterator, re-generate that as well, as so on up the chain)—in this case, that means open the file again.
Use itertools.tee.
Copy the iterator into a sequence and reuse that instead.
In the special case of files, you can fake #1 by using f.seek(0). Some other iterators have similar behavior. But in general, you shouldn't rely on this.
Anyway, the last one is the easiest, so let's just see how that works:
reader = list(csv.reader(f))
read = reader
Now you've got a list of all of the rows in the file. You can copy it, loop over it, loop over the copy, close the file, loop over the copy again, it's still there.
Of course the down side it that you need enough memory to put the whole thing in memory (plus, you can't start processing the first line until you've finished reading the last one). If that's a problem, you need to either reorganize your code so it only needs one pass, or re-open (or seek) the file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert contents of csv to lowercase in Python? - python

Related

read the header and replace a column value with another one in Python

CSV Writer truncates characters in sequence in Excel 2013

What does this line of code do?

Trying to import a list of words using csv (Python 2.7)

CVS writerow() after expression

Categories

Resources