I have a csv file that is something like BM13302, EM13203,etc
I have to read this from a file then reformat it to something like 'BM13302', 'EM13203',etc
What I'm having problems with is how do I export (write it either the clipboard or a file, I can cut and paste from. This is a tiny little project for reformatting some for part of some SQL code that's given to me in a unclean format and i have to spend a little while formatting it out. I would like to just point python to a directory and past the list in the file and have it export everything that way I need it.
I have the following code working
import os
f = open(r"/User/person/Desktop/folder/file.csv")
csv_f = csv.reader(f)
for row in csv_f:
print(row)
I get the expected results
I would like find out how to take the list(?) and format it like this
'BM1234', 'BM2351', '20394',....etc
and copy that to the clipboard
I thought something doing something like
with open('/Users/person/Desktop/csv/export.txt') as f:
f.write("open=", + "', '")
f.close()
nothing is printed. Can't find an example of what I'm needing. Anyone able to help me out??
Much Appreciate!
You can have the csv module quote things for you. As far as I know there is no clipboard in the python standard libs but there are various mechanisms out there. Here I'm using pyperclip which is reasonable for text-only copies.
import pyperclip
import csv
import io
def clip_csv(filename):
outbuf = io.StringIO()
with open('file.csv', newline='') as infile:
incsv = csv.reader(infile, skipinitialspace=True)
outcsv = csv.writer(outbuf, quotechar="'", quoting=csv.QUOTE_ALL)
outcsv.writerows(incsv)
pyperclip.copy(outbuf.getvalue())
clip_csv('file.csv')
# DEBUG: Verify by printing clipboard
print(pyperclip.paste())
I'm not sure but I think you try to add quote char ' to all data in csv
import csv
with open('export.csv', 'w') as f:
# use quote char `'` for all data
writer = csv.writer(f, quotechar="'", quoting=csv.QUOTE_ALL)
writer.writerow(["BM1234", "BM2351", "20394"])
Related
Hi I am trying to iterate through a csv file but I cannot get it to work somehow. I followed the python docs but I am still not able to iterate through it. I have a gzipped csv file that I work with with this format:
2015-01-10 00:00:05;32
As you can see it's delimited with a ';'.
Here is my code to run though it (simplified)
gzip_fd = gzip.decompress(gzip_file).decode(encoding='utf8')
csv_data = csv.reader(gzip_fd, delimiter=';', lineterminator='\n')
for data in csv_data:
print(data)
But when I want to work with data it only contains the first character (like: 2) and not the first part of the csv data that I need. Anyone here that had the same issues? I also tried csv.DictReader but with no success.
Even if your snippet was fixed to work, it would buffer all data in the memory, which might not scale well for very large files.
Gzipped data can also be iterated on-the-fly -- the following works for me on CPython 3.8:
import csv
import gzip
with gzip.open('test.csv.gz', 'r') as gzipped:
reader = csv.reader(gzipped, delimiter=';', lineterminator='\n')
for line in reader:
print(line)
['2015-01-10 00:00:05', '32']
<...>
Update: As per comments below, my snippet does not work on older Python versions (reproduced on CPython 3.5).
You can use io.TextIOWrapper to achieve the same effect:
import csv
import io
import gzip
with gzip.open('test.csv.gz', 'rb') as gzipped:
reader = csv.reader(io.TextIOWrapper(gzipped), delimiter=';',
lineterminator='\n')
for line in reader:
print(line)
So I fixed my issue, the issue was that I didn't split the string that I get (can't do gzip.open because it isn't a file but rather a bytes string of the gzipped file
Here is the fix to my problem:
gzip_fd = gzip.decompress(compressed_data).decode(encoding='utf-8').split('\n')
self.data = csv.reader(gzip_fd, delimiter=';', lineterminator='\n')
I have looked at previous answers to this question, but in each of those scenarios the questioners were asking about something specific they were doing with the file, but the problem occurs for me even when I am not.
I have a .csv file of 27,204 rows. When I open the python interpreter:
python
import csv
o = open('btc_usd1hour.csv','r')
p = csv.reader(o)
for row in p:
print(row)
I then only see roughly the last third of the document displayed to me.
Try so, at me works:
with open(name) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
reference:
https://docs.python.org/3.6/library/csv.html#csv.DictReader
Try the following code
import csv
fname = 'btc_usd1hour.csv'
with open(fname, newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
It is difficult to tell what is the problem without having the sample. I guess the problem would be removed if you add that newline='' for opening the file.
Use the with construct to close the file automatically. Use the f name for a file object when no further explanation is needed. Store the file name to fname to make future modifications easier (and also for easy copying the code fragment for your later programs).
olisch may be right that the console just scrolled so fast you could not see the result. You can write the result to another text file like this:
with open(fname, newline='') as fin,\
open('output.txt', 'w') as fout:
reader = csv.reader(fin)
for row in reader:
fout.write(repr(row) + '\n')
The repr function converts the row list into its string representation. The print calls that function internally, so you will have the same result that you otherwise observe on screen.
maybe your scrollback buffer is just to short to see the whole list?
In general your csv.reader call should be working fine, except your 27k rows aren't extremly long so that you might be able to hit any 64bit boundaries, which would be quite uncommon.
len(o) might be interesting to see.
I have a large file from a proprietary archive format. Unzipping this archive gives a file that has no extension, but the data inside is comma-delimited. Adding a .csv extension or simply opening the file with Excel will work.
I have about 375-400 of these files, and I'm trying to extract a chunk of rows (about 13,500 out of 1.2M+ rows) between a keyword "Point A" and another keyword "Point B".
I found some code on this site that I think is extracting the data correctly, but I'm getting an error:
AttributeError: 'list' object has no attribute 'rows'
when trying to save out the file. Can somebody help me get this data to save into a csv?
import re
import csv
import time
print(time.ctime())
file = open('C:/Users/User/Desktop/File with No Extension That\'s Very Similar to CSV', 'r')
data = file.read()
x = re.findall(r'Point A(.*?)Point B', data,re.DOTALL)
name = "C:/Users/User/Desktop/testoutput.csv"
with open(name, 'w', newline='') as file2:
savefile = csv.writer(file2)
for i in x.rows:
savefile.writerow([cell.value for cell in i])
print(time.ctime())
Thanks in advance, any help would be much appreciated.
The following should work nicely. As mentioned, your regex usage was almost correct. It is possible to still use the Python CSV library to do the CSV processing by converting the found text into a StringIO object and passing that to the CSV reader:
import re
import csv
import time
import StringIO
print(time.ctime())
input_name = "C:/Users/User/Desktop/File with No Extension That's Very Similar to CSV"
output_name = "C:/Users/User/Desktop/testoutput.csv"
with open(input_name, 'r') as f_input, open(output_name, 'wb') as f_output:
# Read whole file in
all_input = f_input.read()
# Extract interesting lines
ab_input = re.findall(r'Point A(.*?)Point B', all_input, re.DOTALL)[0]
# Convert into a file object and parse using the CSV reader
fab_input = StringIO.StringIO(ab_input)
csv_input = csv.reader(fab_input)
csv_output = csv.writer(f_output)
# Iterate a row at a time from the input
for input_row in csv_input:
# Skip any empty rows
if input_row:
# Write row at a time to the output
csv_output.writerow(input_row)
print(time.ctime())
You have not given us an example from your CSV file, so if there are problems, you might need to configure the CSV 'dialect' to process it better.
Tested using Python 2.7
You have 2 problems here: the first is related to the regular expression and the other is about the list syntax.
Getting what you want
The way you are using the regular expression will return to you a list with a single value (all lines into an unique string).
Probably there is a better way of doing this but I would go now with something like this:
with open('bla', 'r') as input:
data = input.read()
x = re.findall(r'Point A(.*?)Point B', data, re.DOTALL)[0]
x = x.splitlines(False)[1:]
That's not pretty but will return a list with all values between those two points.
Working with lists
There is no rows attribute inside lists. You just have to iterate over it:
for i in x:
do what you have to do
See, I'm not familiar to the csv library but it looks that you will have to perform some manipulations to the i value before adding it to the library.
IMHO, I would avoid using CSV format since it is kind of "locale dependent" so it may not work as expected depending the settings your end-users may have on OS.
Updating the code so that #Martin Evans answer works on the latest Python version.
import re
import csv
import time
import io
print(time.ctime())
input_name = "C:/Users/User/Desktop/File with No Extension That's Very Similar to CSV"
output_name = "C:/Users/User/Desktop/testoutput.csv"
with open(input_name, 'r') as f_input, open(output_name, 'wt') as f_output:
# Read whole file in
all_input = f_input.read()
# Extract interesting lines
ab_input = re.findall(r'Point A(.*?)Point B', all_input, re.DOTALL)[0]
# Convert into a file object and parse using the CSV reader
fab_input = io.StringIO(ab_input)
csv_input = csv.reader(fab_input)
csv_output = csv.writer(f_output)
# Iterate a row at a time from the input
for input_row in csv_input:
# Skip any empty rows
if input_row:
# Write row at a time to the output
csv_output.writerow(input_row)
print(time.ctime())
Also, by using 'wt' instead of 'wb' one can avoid
"TypeError: a bytes-like object is required, not 'str'"
This question piggybacks a question I had posted yesterday. I actually got my code to work fine. I was starting small. I switched out the JSON in the Python code for multiple JSON files outside of the Python code. I actually got that to work beautifully. And then there was some sort of catastrophe, and my code was lost.
I have spent several hours trying to recreate it to no avail. I am actually using arcpy (ArcGIS's Python module) since I will later on be using it to perform some spatial analysis, but I don't think you need to know much about arcpy to help me out with this part (I don't think, but it may help).
Here is one version of my latest attempts, but it is not working. I switched out my actual path to just "Pathname." I actually have everything working up until the point when I try to populate the rows in the CSV (which are of latitude and longitude values. It is successfully writing the latitude/longitude headers in the CSV files). So apparently whatever is below dict_writer.writerows(openJSONfile) is not working:
import json, csv, arcpy
from arcpy import env
arcpy.env.workspace = r"C:\GIS\1GIS_DATA\Pathname"
workspaces = arcpy.ListWorkspaces("*", "Folder")
for workspace in workspaces:
arcpy.env.workspace = workspace
JSONfiles = arcpy.ListFiles("*.json")
for JSONfile in JSONfiles:
descJSONfile = arcpy.Describe(JSONfile)
JSONfileName = descJSONfile.baseName
openJSONfile = open(JSONfile, "wb+")
print "JSON file is open"
fieldnames = ['longitude', 'latitude']
with open(JSONfileName+"test.csv", "wb+") as f:
dict_writer = csv.DictWriter(f, fieldnames=fieldnames)
dict_writer.writerow(dict(zip(fieldnames, fieldnames)))
dict_writer.writerows(openJSONfile)
#Do I have to open the CSV files? Aren't they already open?
#openCSVfile = open(CSVfile, "r+")
for row in openJSONfile:
f.writerow( [row['longitude'], row['latitude']] )
Any help is greatly appreciated!!
You're not actually loading the JSON file.
You're trying to write rows from an open file instead of writing rows from json.
You will need to add something like this:
rows = json.load(openJSONfile)
and later:
dict_writer.writerows(rows)
The last two lines you have should be removed, since all the csv writing is done before you reach them, and they are outside of the loop, so they would only work for the last file anyway(they don't write anything, since there are no lines left in the file at that point).
Also, I see you're using with open... to open the csv file, but not the json file.
You should always use it rather than using open() without the with statement.
You should use a csv.DictWriter object to do everything. Here's something similar to your code with all the Arc stuff removed because I don't have it, that worked when I tested it:
import json, csv
JSONfiles = ['sample.json']
for JSONfile in JSONfiles:
with open(JSONfile, "rb") as openJSONfile:
rows = json.load(openJSONfile)
fieldnames = ['longitude', 'latitude']
with open(JSONfile+"test.csv", "wb") as f:
dict_writer = csv.DictWriter(f, fieldnames=fieldnames)
dict_writer.writeheader()
dict_writer.writerows(rows)
It was unnecessary to write out each row because your json file was a list of row dictionaries (assuming it was what you had embedded in your linked question).
I can't say I know for sure what was wrong, but putting all of the .JSON files in the same folder as my code (and changing my code appropriately) works. I will have to keep investigating why, when trying to read into other folders, it gives me the error:
IOError: [Errno 2] No such file or directory:
For now, the following code DOES work :)
import json, csv, arcpy, os
from arcpy import env
arcpy.env.workspace = r"C:\GIS\1GIS_DATA\MyFolder"
JSONfiles = arcpy.ListFiles("*.json")
print JSONfiles
for JSONfile in JSONfiles:
print "Current JSON file is: " + JSONfile
descJSONfile = arcpy.Describe(JSONfile)
JSONfileName = descJSONfile.baseName
with open(JSONfile, "rb") as openJSONfile:
rows = json.load(openJSONfile)
print "JSON file is loaded"
fieldnames = ['longitude', 'latitude']
with open(JSONfileName+"test.csv", "wb") as f:
dict_writer = csv.DictWriter(f, fieldnames = fieldnames)
dict_writer.writerow(dict(zip(fieldnames, fieldnames)))
dict_writer.writerows(rows)
print "CSVs are Populated with headers and rows from JSON file.", '\n'
Thanks everyone for your help.
I'm trying to write a program that looks at a .CSV file (input.csv) and rewrites only the rows that begin with a certain element (corrected.csv), as listed in a text file (output.txt).
This is what my program looks like right now:
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'r') as mycsv:
reader = csv.reader(mycsv)
for row in reader:
if row[0] not in lines:
writer.writerow(row)
Unfortunately, I keep getting this error, and I have no clue what it's about.
Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 12, in <module>
for row in reader:
_csv.Error: line contains NULL byte
Credit to all the people here to even to get me to this point.
I'm guessing you have a NUL byte in input.csv. You can test that with
if '\0' in open('input.csv').read():
print "you have null bytes in your input file"
else:
print "you don't"
if you do,
reader = csv.reader(x.replace('\0', '') for x in mycsv)
may get you around that. Or it may indicate you have utf16 or something 'interesting' in the .csv file.
I've solved a similar problem with an easier solution:
import codecs
csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))
The key was using the codecs module to open the file with the UTF-16 encoding, there are a lot more of encodings, check the documentation.
If you want to replace the nulls with something you can do this:
def fix_nulls(s):
for line in s:
yield line.replace('\0', ' ')
r = csv.reader(fix_nulls(open(...)))
You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.
See the (line.replace('\0','') for line in f) below, also you'll want to probably open that file up using mode rb.
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'rb') as mycsv:
reader = csv.reader( (line.replace('\0','') for line in mycsv) )
for row in reader:
if row[0] not in lines:
writer.writerow(row)
This will tell you what line is the problem.
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'r') as mycsv:
reader = csv.reader(mycsv)
try:
for i, row in enumerate(reader):
if row[0] not in lines:
writer.writerow(row)
except csv.Error:
print('csv choked on line %s' % (i+1))
raise
Perhaps this from daniweb would be helpful:
I'm getting this error when reading from a csv file: "Runtime Error!
line contains NULL byte". Any idea about the root cause of this error?
...
Ok, I got it and thought I'd post the solution. Simply yet caused me
grief... Used file was saved in a .xls format instead of a .csv Didn't
catch this because the file name itself had the .csv extension while
the type was still .xls
A tricky way:
If you develop under Lunux, you can use all the power of sed:
from subprocess import check_call, CalledProcessError
PATH_TO_FILE = '/home/user/some/path/to/file.csv'
try:
check_call("sed -i -e 's|\\x0||g' {}".format(PATH_TO_FILE), shell=True)
except CalledProcessError as err:
print(err)
The most efficient solution for huge files.
Checked for Python3, Kubuntu
def fix_nulls(s):
for line in s:
yield line.replace('\0', '')
with open(csv_file, 'r', encoding = "utf-8") as f:
reader = csv.reader(fix_nulls(f))
for line in reader:
#do something
this way works for me
I've recently fixed this issue and in my instance it was a file that was compressed that I was trying to read. Check the file format first. Then check that the contents are what the extension refers to.
Turning my linux environment into a clean complete UTF-8 environment made the trick for me.
Try the following in your command line:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
This is long settled, but I ran across this answer because I was experiencing an unexpected error while reading a CSV to process as training data in Keras and TensorFlow.
In my case, the issue was much simpler, and is worth being conscious of. The data being produced into the CSV wasn't consistent, resulting in some columns being completely missing, which seems to end up throwing this error as well.
The lesson: If you're seeing this error, verify that your data looks the way that you think it does!
pandas.read_csv now handles the different UTF encoding when reading/writing and therefore can deal directly with null bytes
data = pd.read_csv(file, encoding='utf-16')
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
for skipping the NULL byte rows
import csv
with open('sample.csv', newline='') as csv_file:
reader = csv.reader(csv_file)
while True:
try:
row = next(reader)
print(row)
except csv.Error:
continue
except StopIteration:
break
The above information is great. For me I had this same error. My fix was easy and just user error aka myself. Simply save the file as a csv and not an excel file.
It is very simple.
don't make a csv file by "create new excel" or save as ".csv" from window.
simply import csv module, write a dummy csv file, and then paste your data in that.
csv made by python csv module itself will no longer show you encoding or blank line error.