Map over csv in python - python

I'm trying to use "map" on a csv file in python.
However, the line map(lambda x: x, reseller_csv) gives nothing.
I've tried iterating over the csv object, and it works fine and can print the rows.
Here's the code.
# imports
import csv
# Opens files
ifile = open('C:\Users\josh.SCL\Desktop\Records.csv', 'r')
ofile = open('C:\Users\josh.SCL\Desktop\RecordsNew.csv', 'w')
resellers_file = open('C:\Users\josh.SCL\Desktop\Reseller.csv', 'r')
# Setup CSV objects
csvfile = csv.DictReader(ifile, delimiter=',')
reseller_csv = csv.DictReader(resellers_file, delimiter=',')
# Get names only in resellers
resellers = map(lambda x: x.get('Reseller'), reseller_csv)

A csv.DictReader is a use-once gadget. You probably ran it a second time.
>>> import csv
>>> iterable = ['Reseller,cost', 'fred,100', 'joe,99']
>>> reseller_csv = csv.DictReader(iterable)
>>> map(lambda x: x.get('Reseller'), reseller_csv)
['fred', 'joe']
>>> map(lambda x: x.get('Reseller'), reseller_csv)
[]
>>>
While we're here:
(1) [Python 2.x] Always open csv files in BINARY mode.
[Python 3.x] Always open csv files in text mode (the default), and use newline=''
(2) If you insist on hardcoding file paths in Windows, use r"...." instead of "...", or use forward slashes -- otherwise \n and \t will be interpreted as control characters.

The following works for me:
>>> data = ["name,age", "john,32", "bob,45"]
>>> list(map(lambda x: x.get("name"), csv.DictReader(data))) # Python 3 so using list to see values.
['john', 'bob']
Are you sure you get any data at all from your DictReader? Do you read any data from it prior to that, exhausting the reader perhaps?

First on your specific problem: try checking if there is actually a key named 'Reseller', chances are its there with different capitalization or extra space. See list of all the keys (assuming non-exhausted DictReader):
>>> csvfile.next().keys()
Otherwise the map() should work fine. But i'd argue it's more readable (and faster!) done like this:
resellers = [x['Reseller'] for x in reseller_csv]

Related

Read content of txt files into lists to find duplicates

I'm new to Python.
My code should read 2 different .txt files into lists and compare them to find and delete duplicates.
Code
import os
dir = os.listdir
T = "Albums"
if T not in dir():
os.mkdir("Albums")
with open('list.txt','w+') as f:
linesA = f.readlines()
print(linesA) # output empty
with open('completed.txt','w+') as t:
linesB = t.readlines()
print(linesB) # output empty
for i in linesA[:]:
if i in linesB:
linesA.remove(i)
print(linesA)
print(linesB)
I tried the code above with following inputs:
in list.txt I wrote (on separate lines) A, B and C.
in completed.txt I wrote (also on separate lines) A and B.
It should have first output the content of the lists, which were empty for some reasons.
Why are the read lists empty?
Does this help:
I suggest using not os.path.exists(entry) instead of not entry in os.listdir(), it's not relevant for the problem, but I point it out anyway. (Also, you overwrote the built-in dir function)
I've split up the file using split("\n")
I've changed the way the files are opened to r+, this doesn't clear the file unlike w+.
Please note that if you want to use readlines you have to remove the new line for each entry.
import os
with open('list.txt','w+') as file:
file.write("Foo\n")
file.write("Bar")
with open('completed.txt','w+') as file:
file.write("Bar\n")
file.write("Python")
T = "Albums"
if not os.path.exists(T):
os.mkdir("Albums")
with open('list.txt','r+') as f:
linesA = f.read().split("\n")
print(linesA)
with open('completed.txt','r+') as t:
linesB = t.read().split("\n")
print(linesB)
for entry in list(linesA):
if entry in linesB:
linesA.remove(entry)
print(linesA)
print(linesB)
Output:
['Foo', 'Bar']
['Bar', 'Python']
['Foo']
['Bar', 'Python']
This makes little sense.
dir = os.listdir
You wanted to call os.listdir().
What you did was assign a reference to that function,
without actually calling the function.
Better to dispense with dir and just phrase it this way:
if T not in os.listdir():
with open('list.txt','w+') as f:
linesA = f.readlines()
...
with open('completed.txt','w+') as t:
linesB = t.readlines()
You wanted to open those with 'r' read mode,
rather than write.

How to write a list to a file in Python? [duplicate]

How do I write a list to a file? writelines() doesn't insert newline characters, so I need to do:
f.writelines([f"{line}\n" for line in lines])
Use a loop:
with open('your_file.txt', 'w') as f:
for line in lines:
f.write(f"{line}\n")
For Python <3.6:
with open('your_file.txt', 'w') as f:
for line in lines:
f.write("%s\n" % line)
For Python 2, one may also use:
with open('your_file.txt', 'w') as f:
for line in lines:
print >> f, line
If you're keen on a single function call, at least remove the square brackets [], so that the strings to be printed get made one at a time (a genexp rather than a listcomp) -- no reason to take up all the memory required to materialize the whole list of strings.
What are you going to do with the file? Does this file exist for humans, or other programs with clear interoperability requirements?
If you are just trying to serialize a list to disk for later use by the same python app, you should be pickleing the list.
import pickle
with open('outfile', 'wb') as fp:
pickle.dump(itemlist, fp)
To read it back:
with open ('outfile', 'rb') as fp:
itemlist = pickle.load(fp)
Simpler is:
with open("outfile", "w") as outfile:
outfile.write("\n".join(itemlist))
To ensure that all items in the item list are strings, use a generator expression:
with open("outfile", "w") as outfile:
outfile.write("\n".join(str(item) for item in itemlist))
Remember that itemlist takes up memory, so take care about the memory consumption.
Using Python 3 and Python 2.6+ syntax:
with open(filepath, 'w') as file_handler:
for item in the_list:
file_handler.write("{}\n".format(item))
This is platform-independent. It also terminates the final line with a newline character, which is a UNIX best practice.
Starting with Python 3.6, "{}\n".format(item) can be replaced with an f-string: f"{item}\n".
Yet another way. Serialize to json using simplejson (included as json in python 2.6):
>>> import simplejson
>>> f = open('output.txt', 'w')
>>> simplejson.dump([1,2,3,4], f)
>>> f.close()
If you examine output.txt:
[1, 2, 3, 4]
This is useful because the syntax is pythonic, it's human readable, and it can be read by other programs in other languages.
I thought it would be interesting to explore the benefits of using a genexp, so here's my take.
The example in the question uses square brackets to create a temporary list, and so is equivalent to:
file.writelines( list( "%s\n" % item for item in list ) )
Which needlessly constructs a temporary list of all the lines that will be written out, this may consume significant amounts of memory depending on the size of your list and how verbose the output of str(item) is.
Drop the square brackets (equivalent to removing the wrapping list() call above) will instead pass a temporary generator to file.writelines():
file.writelines( "%s\n" % item for item in list )
This generator will create newline-terminated representation of your item objects on-demand (i.e. as they are written out). This is nice for a couple of reasons:
Memory overheads are small, even for very large lists
If str(item) is slow there's visible progress in the file as each item is processed
This avoids memory issues, such as:
In [1]: import os
In [2]: f = file(os.devnull, "w")
In [3]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 385 ms per loop
In [4]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
ERROR: Internal Python error in the inspect module.
Below is the traceback from this internal error.
Traceback (most recent call last):
...
MemoryError
(I triggered this error by limiting Python's max. virtual memory to ~100MB with ulimit -v 102400).
Putting memory usage to one side, this method isn't actually any faster than the original:
In [4]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 370 ms per loop
In [5]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
1 loops, best of 3: 360 ms per loop
(Python 2.6.2 on Linux)
Because i'm lazy....
import json
a = [1,2,3]
with open('test.txt', 'w') as f:
f.write(json.dumps(a))
#Now read the file back into a Python list object
with open('test.txt', 'r') as f:
a = json.loads(f.read())
Serialize list into text file with comma sepparated value
mylist = dir()
with open('filename.txt','w') as f:
f.write( ','.join( mylist ) )
In Python 3 you can use print and * for argument unpacking:
with open("fout.txt", "w") as fout:
print(*my_list, sep="\n", file=fout)
Simply:
with open("text.txt", 'w') as file:
file.write('\n'.join(yourList))
In General
Following is the syntax for writelines() method
fileObject.writelines( sequence )
Example
#!/usr/bin/python
# Open a file
fo = open("foo.txt", "rw+")
seq = ["This is 6th line\n", "This is 7th line"]
# Write sequence of lines at the end of the file.
line = fo.writelines( seq )
# Close opend file
fo.close()
Reference
http://www.tutorialspoint.com/python/file_writelines.htm
file.write('\n'.join(list))
Using numpy.savetxt is also an option:
import numpy as np
np.savetxt('list.txt', list, delimiter="\n", fmt="%s")
You can also use the print function if you're on python3 as follows.
f = open("myfile.txt","wb")
print(mylist, file=f)
with open ("test.txt","w")as fp:
for line in list12:
fp.write(line+"\n")
Why don't you try
file.write(str(list))
I recently found Path to be useful. Helps me get around having to with open('file') as f and then writing to the file. Hope this becomes useful to someone :).
from pathlib import Path
import json
a = [[1,2,3],[4,5,6]]
# write
Path("file.json").write_text(json.dumps(a))
# read
json.loads(Path("file.json").read_text())
You can also go through following:
Example:
my_list=[1,2,3,4,5,"abc","def"]
with open('your_file.txt', 'w') as file:
for item in my_list:
file.write("%s\n" % item)
Output:
In your_file.txt items are saved like:
1
2
3
4
5
abc
def
Your script also saves as above.
Otherwise, you can use pickle
import pickle
my_list=[1,2,3,4,5,"abc","def"]
#to write
with open('your_file.txt', 'wb') as file:
pickle.dump(my_list, file)
#to read
with open ('your_file.txt', 'rb') as file:
Outlist = pickle.load(file)
print(Outlist)
Output:
[1, 2, 3, 4, 5, 'abc', 'def']
It save dump the list same as a list when we load it we able to read.
Also by simplejson possible same as above output
import simplejson as sj
my_list=[1,2,3,4,5,"abc","def"]
#To write
with open('your_file.txt', 'w') as file:
sj.dump(my_list, file)
#To save
with open('your_file.txt', 'r') as file:
mlist=sj.load(file)
print(mlist)
This logic will first convert the items in list to string(str). Sometimes the list contains a tuple like
alist = [(i12,tiger),
(113,lion)]
This logic will write to file each tuple in a new line. We can later use eval while loading each tuple when reading the file:
outfile = open('outfile.txt', 'w') # open a file in write mode
for item in list_to_persistence: # iterate over the list items
outfile.write(str(item) + '\n') # write to the file
outfile.close() # close the file
Another way of iterating and adding newline:
for item in items:
filewriter.write(f"{item}" + "\n")
In Python3 You Can use this loop
with open('your_file.txt', 'w') as f:
for item in list:
f.print("", item)
Redirecting stdout to a file might also be useful for this purpose:
from contextlib import redirect_stdout
with open('test.txt', 'w') as f:
with redirect_stdout(f):
for i in range(mylst.size):
print(mylst[i])
i suggest this solution .
with open('your_file.txt', 'w') as f:
list(map(lambda item : f.write("%s\n" % item),my_list))
Let avg be the list, then:
In [29]: a = n.array((avg))
In [31]: a.tofile('avgpoints.dat',sep='\n',dtype = '%f')
You can use %e or %s depending on your requirement.
i think you are looking for an answer like this.
f = open('output.txt','w')
list = [3, 15.2123, 118.3432, 98.2276, 118.0043]
f.write('a= {:>3d}, b= {:>8.4f}, c= {:>8.4f}, d= {:>8.4f}, e=
{:>8.4f}\n'.format(*list))
f.close()
poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
use Python!
'''
f = open('poem.txt', 'w') # open for 'w'riting
f.write(poem) # write text to file
f.close() # close the file
How It Works:
First, open a file by using the built-in open function and specifying the name of
the file and the mode in which we want to open the file. The mode can be a
read mode (’r’), write mode (’w’) or append mode (’a’). We can also specify
whether we are reading, writing, or appending in text mode (’t’) or binary
mode (’b’). There are actually many more modes available and help(open)
will give you more details about them. By default, open() considers the file to
be a ’t’ext file and opens it in ’r’ead mode.
In our example, we first open the file in write text mode and use the write
method of the file object to write to the file and then we finally close the file.
The above example is from the book "A Byte of Python" by Swaroop C H.
swaroopch.com

Memory efficient way to add columns to .csv files

Ok, I couldn't really find an answer to this anywhere else, so I figured I'd ask.
I'm working with some .csv files that have about 74 million lines right now and I'm trying to add columns into one file from another file.
ex.
Week,Sales Depot,Sales Channel,Route,Client,Product,Units Sold,Sales,Units Returned,Returns,Adjusted Demand
3,1110,7,3301,15766,1212,3,25.14,0,0,3
3,1110,7,3301,15766,1216,4,33.52,0,0,4
combined with
Units_cat
0
1
so that
Week,Sales Depot,Sales Channel,Route,Client,Product,Units Sold,Units_cat,Sales,Units Returned,Returns,Adjusted Demand
3,1110,7,3301,15766,1212,3,0,25.14,0,0,3
3,1110,7,3301,15766,1216,4,1,33.52,0,0,4
I've been using pandas to read in and output the .csv files, but the issue I'm coming to is the program keeps crashing because creating the DataFrame overloads my memory. I've tried applying the csv library from Python but I'm not sure how merge the files the way I want (not just append).
Anyone know a more memory efficient method of combining these files?
Something like this might work for you:
Using csv.DictReader()
import csv
from itertools import izip
with open('file1.csv') as file1:
with open('file2.csv') as file2:
with open('result.csv', 'w') as result:
file1 = csv.DictReader(file1)
file2 = csv.DictReader(file2)
# Get the field order correct here:
fieldnames = file1.fieldnames
index = fieldnames.index('Units Sold')+1
fieldnames = fieldnames[:index] + file2.fieldnames + fieldnames[index:]
result = csv.DictWriter(result, fieldnames)
def dict_merge(a,b):
a.update(b)
return a
result.writeheader()
result.writerows(dict_merge(a,b) for a,b in izip(file1, file2))
Using csv.reader()
import csv
from itertools import izip
with open('file1.csv') as file1:
with open('file2.csv') as file2:
with open('result.csv', 'w') as result:
file1 = csv.reader(file1)
file2 = csv.reader(file2)
result = csv.writer(result)
result.writerows(a[:7] + b + a[7:] for a,b in izip(file1, file2))
Notes:
This is for Python2. You can use the normal zip() function in Python3. If the two files are not of equivalent lengths, consider itertools.izip_longest().
The memory efficiency comes from passing a generator expression to .writerows() instead of a list. This way, only the current line is under consideration at any moment in time, not the entire file. If a generator expression isn't appropriate, you'll get the same benefit from a for loop: for a,b in izip(...): result.writerow(...)
The dict_merge function is not required starting from Python3.5. In sufficiently new Pythons, try result.writerows({**a,**b} for a,b in zip(file1, file2)) (See this explanation).

Trying to copy column1 from a csv file to another empty file using python

I'm looking for a way using python to copy the first column from a csv into an empty file. I'm trying to learn python so any help would be great!
So if this is test.csv
A 32
D 21
C 2
B 20
I want this output
A
D
C
B
I've tried the following commands in python but the output file is empty
f= open("test.csv",'r')
import csv
reader = csv.reader(f,delimiter="\t")
names=""
for each_line in reader:
names=each_line[0]
First, you want to open your files. A good practice is to use the with statement (that, technically speaking, introduces a context manager) so that when your code exits from the with block all the files are automatically closed
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
next you want a loop on the lines of the input file (note the indentation, we are inside the with block), line splitting is automatic when you read a text file with lines separated by newlines…
for line in inpfile:
each line is a string, but you think of it as two fields separated by white space — this situation is so common that strings have a method to deal with this situation (note again the increasing indent, we are in the for loop block)
fields = line.split()
by default .split() splits on white space, but you can use, e.g., split(',') to split on commas, etc — that said, fields is a list of strings, for your first record it is equal to ['A', '32'] and you want to output just the first field in this list… for this purpose a file object has the .write() method, that writes a string, just a string, to the file, and fields[0] IS a string, but we have to add a newline character to it because, in this respect, .write() is different from print().
outfile.write(fields[0]+'\n')
That's all, but if you omit my comments it's 4 lines of code
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
for line in inpfile:
fields = line.split()
outfile.write(fields[0]+'\n')
When you are done with learning (some) Python, ask for an explanation of this...
with open('test.csv') as ifl, open('out.csv', 'w') as ofl:
ofl.write('\n'.join(line.split()[0] for line in ifl))
Addendum
The csv module in such a simple case adds the additional conveniences of
auto-splitting each line into a list of strings
taking care of the details of output (newlines, etc)
and when learning Python it's more fruitful to see how these steps can be done using the bare language, or at least that it is my opinion…
The situation is different when your data file is complex, has headers, has quoted strings possibly containing quoted delimiters etc etc, in those cases the use of csv is recommended, as it takes into account all the gory details. For complex data analisys requirements you will need other packages, not included in the standard library, e.g., numpy and pandas, but that is another story.
This answer reads the CSV file, understanding a column to be demarked by a space character. You have to add the header=None otherwise the first row will be taken to be the header / names of columns.
ss is a slice - the 0th column, taking all rows as denoted by :
The last line writes the slice to a new filename.
import pandas as pd
df = pd.read_csv('test.csv', sep=' ', header=None)
ss = df.ix[:, 0]
ss.to_csv('new_path.csv', sep=' ', index=False)
import csv
reader = csv.reader(open("test.csv","rb"), delimiter='\t')
writer = csv.writer(open("output.csv","wb"))
for e in reader:
writer.writerow(e[0])
The best you can do is create a empty list and append the column and then write that new list into another csv for example:
import csv
def writetocsv(l):
#convert the set to the list
b = list(l)
print (b)
with open("newfile.csv",'w',newline='',) as f:
w = csv.writer(f, delimiter=',')
for value in b:
w.writerow([value])
adcb_list = []
f= open("test.csv",'r')
reader = csv.reader(f,delimiter="\t")
for each_line in reader:
adcb_list.append(each_line)
writetocsv(adcb_list)
hope this works for you :-)

How to write a number as text while writing in csv file in python

import csv
a = ['679L', 'Z60', '033U', '0003']
z = csv.writer(open("test1.csv", "wb"))
z.writerow(a)
Consider the code above
Output:
676L Z60 33U 3
I need to get it in the text format itself as
676L Z60 033U 0003
How to do that.
The Python csv module does not treat strings as numbers when writing the file:
>>> import csv
>>> from StringIO import StringIO
>>> a = ['679L', 'Z60', '033U', '0003']
>>> out = StringIO()
>>> z = csv.writer(out)
>>> z.writerow(a)
>>> out.getvalue()
'679L,Z60,033U,0003\r\n'
If you are seeing 3 in some other tool when reading you need to fix that tool; Python is not at fault here.
You can instruct the csv.writer() to put quotes around anything that is not a number; this could make it clearer to whatever reads your CSV that the column is not numeric. Set quoting to csv.QUOTE_NONNUMERIC:
>>> out = StringIO()
>>> z = csv.writer(out, quoting=csv.QUOTE_NONNUMERIC)
>>> z.writerow(a)
>>> out.getvalue()
'"679L","Z60","033U","0003"\r\n'
but this won't prevent Excel from treating the column as numeric anyway.
If you are loading this into Excel then don't use the Open feature. Instead create a new empty worksheet and use the Import feature instead. This will let you designate a column as Text rather than General.

Categories