I am trying to import a CSV file while removing the '$' signs from the first column.
Is there any way I can omit the '$' sign with csv.reader?
If not, how can I modify aList to remove the $ signs?
>>> import csv
>>> with open('test.csv', 'rb') as csvfile:
... reader = csv.reader(csvfile, delimiter=',')
... for a in reader:
... print a
...
['$135.20 ', '2']
['$137.20 ', '3']
['$139.20 ', '4']
['$141.20 ', '5']
['$143.20 ', '8']
>>> print(aList)
[['$135.20 ', '2'], ['$137.20 ', '3'], ['$139.20 ', '4'], ['$141.20 ', '5'], ['$143.20 ', '8']]
Ultimately, I would like to prep aList for Numpy functions.
You can modify the first column and then accumulate the results somewhere else:
for col_a, col_b in reader:
results.append([col_a[1:], col_b])
That will remove the first character from the first column and append both columns to another list results
You can do it like this:
for a in reader:
print a[0][1:], a[1]
a[0] is the first entry in your array, a[0][1:] is the first entry starting with the second character.
For example:
a="$123"
print a[1:]
# prints 123
If you want to modify the list itself, try the following:
for x in xrange(len(reader)):
reader[x]=[reader[x][0][1:], reader[x][1]]
Related
When I'm moving through a file with a csv.reader, how do I return to the top of the file. If I were doing it with a normal file I could just do something like "file.seek(0)". Is there anything like that for the csv module?
Thanks ahead of time ;)
You can seek the file directly. For example:
>>> f = open("csv.txt")
>>> c = csv.reader(f)
>>> for row in c: print row
['1', '2', '3']
['4', '5', '6']
>>> f.seek(0)
>>> for row in c: print row # again
['1', '2', '3']
['4', '5', '6']
You can still use file.seek(0). For instance, look at the following:
import csv
file_handle = open("somefile.csv", "r")
reader = csv.reader(file_handle)
# Do stuff with reader
file_handle.seek(0)
# Do more stuff with reader as it is back at the beginning now
This should work since csv.reader is working with the same.
I've found the csv.reader and csv.DictReader a little difficult to work with because of the current line_num. making a list from the first read works well:
>>> import csv
>>> f = open('csv.txt')
>>> lines = list( csv.reader(f) ) # <-- list from csvReader
>>>
>>> for line in lines:
... print(line)
['1', '2', '3']
['4', '5', '6']
>>>
>>> for line in lines:
... print(line)
['1', '2', '3']
['4', '5', '6']
>>>
>>>lines[1]
['4', '5', '6']
this captures the optional first row used by the dictReader but lets you work with the list again and again and even inspect individual rows.
import csv
import re
column_3 =[]
f = open('E:\pythontest\ip_data.csv')
csv_f = csv.reader(f)
for row in csv_f:
column_3.append(row[2])
f.close
print column_3
for row in column_3:
if re.search(r'\d.*' , 'column_3'):
print("its numeric value")
else:
print ("its not numeric")
I am doing this but it prints its numeric for all data ,while there is a row that contains string not integer.
I would assume you have already read the data from csv into column_3. To check if all the values in this column have integer data, you can use all with regex match:
>>> import re
>>> column_3 = ['1', '2', '3', '4', 'a']
>>> all(re.match(r'^\d+$', c) for c in column_3)
False
>>> column_3 = ['1', '2', '3', '4', '56']
>>> all(re.match(r'^\d+$', c) for c in column_3)
True
>>> column_3 = ['1', '2', '3', '4', '56', 'a43']
>>> all(re.match(r'^\d+$', c) for c in column_3)
False
^\d+$ will make sure to match a string which have only digits. I also added start and end assertions so it doesn't capture anything in between.
While reading the file Try appending it in a string format.
for row in csv_f:
column_3.append(str(row[2]))
Hope it helps.Happy Coding :)
Your regex for matching only numeric digits isn't right. \d.* tries to match a digit followed by 0 or more chars(any).
Also there is no need to store the column values into a separate list and iterate it again. Instead you can check for it inside the reader loop itself like this:
import csv, re
flag=True
with open('data', 'r') as f:
reader=csv.reader(f)
for row in reader:
if not re.match(r'^\d+$', row[1]) is not None:
print('All not Numeric')
break
else:
print('All Numeric')
regex ^\d+$ or ^[0-9]+$ will match 1 or more digits.
Here are 2 sample test cases:
$ cat data
a,1
b,123
c,456
d,56
$ python3 a.py
All Numeric
$ vim data
$ cat data
a,1
b,123
c,456
d,56s
$ python a.py
All not Numeric
I have a CSV file with names and scores in it. I've made each line a separate list but when appending a variable to this list it doesn't actually do it.
My code is:
import csv
f = open('1scores.csv')
csv_f = csv.reader(f)
newlist = []
for row in csv_f:
newlist.append(row[0:4])
minimum = min(row[1:4])
newlist.append(minimum)
print(newlist)
With the data in the file being
Person One,4,7,4
Person Two,1,4,2
Person Three,3,4,1
Person Four,2
Surely the output would be ['Person One', '4', '7', '4', '4'] as the minimum is 4, which I'm appending to the list. But I get this: ['Person One', '4', '7', '4'], '4',
What am I doing wrong? I want the minimum to be inside the list, instead of outside but don't understand.
Append the min to each row and then append the row itself, you are appending the list you slice first then adding the min value to newlist not to the sliced list:
for row in csv_f:
row.append(min(row[1:],key=int)
newlist.append(row)
You could also use a list comp:
new_list = [row + [min(row[1:], key=int)] for row in csv_f]
You also need the, key=int or you might find you get strange results as your scores/strings will be compared lexicographically:
In [1]: l = ["100" , "2"]
In [2]: min(l)
Out[2]: '100'
In [3]: min(l,key=int)
Out[3]: '2'
I have a text file in the following format:
1,"20130219111529","90UP:34","0000","9999","356708","2"
"-2","20130219105824","0001:11","0000","","162_005",""
I want to compare row 1 and row 2 (In this case 1 and -2) for some purpose. To strip out all the quotes and parse this file I have the following code:
if os.path.exists(FileName):
with open(FileName) as File:
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
print(row)
The following is the output:
['1', '20130219111529', '90UP:34', '0000', '9999', '356708', '2']
['-2', '20130219105824', '0001:11', '0000', '', '162_005', '']
I want to iterate through the columns. For example, iterate through '1' then '-2' and so on.
How do I go about doing this?
Use zip(). It turns two iterables into one iterable of tuples, with elements coming from both lists.
l1 = ['1', '20130219111529', '90UP:34', '0000', '9999', '356708', '2']
l2 = ['-2', '20130219105824', '0001:11', '0000', '', '162_005', '']
for elem1, elem2 in zip(l1, l2):
print("elem1 is {0} and elem2 is {1}.".format(elem1, elem2)
Perhaps the following.
if os.path.exists(FileName):
with open(FileName) as File:
lastRow = []
# loop over the lines in the file
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
# saves the first row, for comparison below
if lastRow == []:
lastRow = row
continue
# loop over the columns, if all rows have the same number
for colNum in range(len(row)):
# compare row[colNum] and lastRow[colNum] as you wish
# save this row, to compare with the next row in the loop
lastRow = row
just print the first element in the row:
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
print(row[0])
EDIT
rows = csv.reader(File, delimiter= ',', skipinitialspace= True)
print len(rows) # how many rows were read from the file
for row in rows:
print(row[0])
If (as you said in the question, though I'm not sure if you wanted this) you want to iterate through the columns, you can do the following:
if os.path.exists(file_name):
with open(file_name) as csv_file:
for columns in zip(*csv.reader(csv_file, delimiter=',', skipinitialspace=True)):
print columns
This will output the following:
('1', '-2')
('20130219111529', '20130219105824')
('90UP:34', '0001:11')
('0000', '0000')
('9999', '')
('356708', '162_005')
('2', '')
Really quick question here, some other people helped me on another problem but I can't get any of their code to work because I don't understand something very fundamental here.
8000.5 16745 0.1257
8001.0 16745 0.1242
8001.5 16745 0.1565
8002.0 16745 0.1595
8002.5 16745 0.1093
8003.0 16745 0.1644
I have a data file as such, and when I type
f1 = open(sys.argv[1], 'rt')
for line in f1:
fields = line.split()
print list(fields [0])
I get the output
['1', '6', '8', '2', '5', '.', '5']
['1', '6', '8', '2', '6', '.', '0']
['1', '6', '8', '2', '6', '.', '5']
['1', '6', '8', '2', '7', '.', '0']
['1', '6', '8', '2', '7', '.', '5']
['1', '6', '8', '2', '8', '.', '0']
['1', '6', '8', '2', '8', '.', '5']
['1', '6', '8', '2', '9', '.', '0']
Whereas I would have expected from trialling stuff like print list(fields) to get something like
[16825.5, 162826.0 ....]
What obvious thing am I missing here?
thanks!
Remove the list; .split() already returns a list.
You are turning the first element of the fields into a list:
>>> fields = ['8000.5', '16745', '0.1257']
>>> fields[0]
'8000.5'
>>> list(fields[0])
['8', '0', '0', '0', '.', '5']
If you want to have the first column as a list, you can build a list as you go:
myfirstcolumn = []
for line in f1:
fields = line.split()
myfirstcolumn.append(fields[0])
This can be simplified into a list comprehension:
myfirstcolumn = [line.split()[0] for line in f1]
The last command is the problem.
print list(fields[0]) takes the zero'th item from your split list, then takes it and converts it into a list.
Since you have a list of strings already ['8000.5','16745','0.1257'], the zero'th item is a string, which converts into a list of individual elements when list() is applied to it.
Your first problem is that you apply list to a string:
list("123") == ["1", "2", "3"]
Secondly, you print once per line in the file, but it seems you want to collect the first item of each line and print them all at once.
Third, in Python 2, there's no 't' mode in the call to open (text mode is the default).
I think what you want is:
with open(sys.argv[1], 'r') as f:
print [ line.split()[0] for line in f ]
The problem was you were converting the first field which you correctly extracted into a list.
Here's a solution to print the first column:
with open(sys.argv[1]) as f1:
first_col = []
for line in f1:
fields = line.split()
first_col.append(fields[0])
print first_col
gives:
['8000.5', '8001.0', '8001.5', '8002.0', '8002.5', '8003.0']
Rather than doing f1 = open(sys.argv[1], 'rt') consider using with which will close the file when you are done or in case of an exception. Also, I left off rt since open() defaults to read and text mode.
Finally, this could also be written using list comprehension:
with open(sys.argv[1]) as f1:
first_col = [line.split()[0] for line in f1]
Others have already done a great job answering this question, the behavior that your seeing is because you're using list on a string. list will take any object that you can iterate over and turn it into a list -- one element at a time. This isn't really surprising except that the object doesn't even have to have an __iter__ method (which is the case with strings) -- There are a number of posts on SO about __iter__ so I won't focus on that part.
In any event, try the following code and see what it prints out:
>>> def enlighten_me(obj):
... print (list(obj))
... print (hasattr(obj))
...
>>> enlighten_me("Hello World")
>>> enlighten_me( (1,2,3,4) )
>>> enlighten_me( {'red':'wagon',1:5} )
Of course, you can try the example with sets, lists, generators ... Anything you can iterate over.
Levon posted a nice answer about how to create a column while reading your file. I will demonstrate the same thing using the built-in zip function.
rows=[]
for row in myfile:
rows.append(row.split())
#now rows is stored as [ [col1,col2,...] , [col1,col2,...], ... ]
At this point we could get the first column by (Levon's answer):
column1=[]
for row in rows:
column1.append(row[0])
or more succinctly:
column1=[row[0] for row in rows] #<-- This is called a list comprehension
But what if you want all the columns? (and what if you don't know how many columns there are?). This is a job for zip.
zip takes iterables as input and matches them up. In other words:
zip(iter1,iter2)
will take iter1[0] and match it with iter2[0], and match iter1[1] with iter2[1] and so on -- kind of like a zipper if you think about it. But, zip can take more than just 2 arguments ...
zip(iter1,iter2,iter3) #results in [ [iter1[0],iter2[0],iter3[0]] , [iter1[1],iter2[1],iter3[1]], ... ]
Now, the last piece of the puzzle that we need is argument unpacking with the star operator.
If I have a function:
def foo(a,b,c):
print a
print b
print c
I can call that function like this:
A=[1,2,3]
foo(A[0],A[1],A[2])
Or, I can call it like this:
foo(*A)
Hopefully this makes sense -- the star takes each element in the list and "unpacks" it before passing it to foo.
So, putting the pieces together (remember back to the list of rows), we can unpack the list of rows and pass it to zip which will match corresponding indices in each row (i.e. columns).
columns=zip(*rows)
Now to get the first column, we just do:
columns[0] #first column
for lists of lists, I like to think of zip(*list_of_lists) as a sort of poor-man's transpose.
Hopefully this has been helpful.