i have a .csv file trying to make it in a dict. I tried pandas and csv.DictReader mostly but until now i can print the data (not in the way i want) with the DictReader.
So the main problem is that the file is like
header;data (1 column)
for about 50 rows and after that it changes the schema like
header1;header2;header3;header4
in row 50 and row 50+
data1;data2;data3;data4 etc..
with open(filename, 'r', encoding='utf-16') as f:
for line in csv.DictReader(f):
print(line)
thats the code i have for now.
Thanks for your help.
You can't use DictReader for this, because it requires all the rows to have the same fields.
Use csv.reader and check the length of the row that it returns. When the length changes, treat that as a new header.
Hopefully you don't have adjacent sections of the file that have the same number of fields but different headers. It will be difficult for the script to detect when the section changes.
data = []
with open(filename, 'r', encoding='utf-16') as f:
r = csv.reader(f, delimiter=';')
# process first 52 rows in format header;data
for _ in range(52):
row = next(r)
data.append({row[0]: row[1]})
# rest of file is a header row followed by variable number of data rows
header = next(r)
for row in r:
if len(row) != len(header): # new header
header = row
continue
d = dict(zip(header, row))
data.append(d)
import csv
with open('example.csv', 'r') as f:
csvfile = csv.reader(f, delimiter = ',')
client_email = ['#example.co.uk', '#moreexamples.com', 'lastexample.com']
for row in csvfile:
if row not in client_email:
print row
Assume code is formatted in blocks properly, it's not translating properly when I copy paste. I've created a list of company email domain names (as seen in the example), and I've created a loop to print out every row in my CSV that is not present in the list. Other columns in the CSV file include first name, second name, company name etc. so it is not limited to only emails.
Problem is when Im testing, it is printing off rows with the emails in the list i.e jackson#example.co.uk.
Any ideas?
In your example, row refers to a list of strings. So each row is ['First name', 'Second name', 'Company Name'] etc.
You're currently checking whether any column is exactly one of the elements in your client_email.
I suspect you want to check whether the text of any column contains one of the elements in client_email.
You could use another loop:
for row in csvfile:
for column in row:
# check if the column contains any of the email domains here
# if it does:
print row
continue
To check if a string contains any strings in another list, I often find this approach useful:
s = "xxabcxx"
stop_list = ["abc", "def", "ghi"]
if any(elem in s for elem in stop_list):
pass
One way to check may be to see if set of client_email and set in row has common elements (by changing if condition in loop):
import csv
with open('example.csv', 'r') as f:
csvfile = csv.reader(f, delimiter = ',')
client_email = ['#example.co.uk', '#moreexamples.com', 'lastexample.com']
for row in csvfile:
if (set(row) & set(client_email)):
print (row)
You can also use any as following:
import csv
with open('untitled.csv', 'r') as f:
csvfile = csv.reader(f, delimiter = ',')
client_email = ['#example.co.uk', '#moreexamples.com', 'lastexample.com']
for row in csvfile:
if any(item in row for item in client_email):
print (row)
Another possible way,
import csv
data = csv.reader(open('example.csv', 'r'))
emails = {'#example.co.uk', '#moreexamples.com', 'lastexample.com'}
for row in data:
if any(email in cell for cell in row for email in emails):
print(row)
i have a large csv file and can not load in memory at a time,i also want to add some columns at the side of csv,so i want to add one column once a time because that does not cost many memory,i use python and pandas,so what can i do for that.
here's my code.
def toCsv(filepath,lists):
i = 0
with open(filepath,'r+') as f:
reader = csv.reader(f)
writer = csv.writer(f)
for row in reader:
print lists
row.append(lists[i])
writer.writerows(row)
i = i+1
file1.csv:
Country,Location,number,letter,name,pup-name,null
a,ab,1,qw,abcd,test1,3
b,cd,1,df,efgh,test2,4
c,ef,2,er,fgh,test3,5
d,gh,3,sd,sds,test4,
e,ij,DDDD,we,sdrt,test5,
f,kl,6,sc,asdf,test6,
g,mn,7,df,xcxc,test7,
h,op,8,gb,eretet,test8,
i,qr,8,df,hjjh,test9,
I want to search for string/number in 3rd column of above csv file. And if present, write the 'first two column values' to another file.
For example:
In 3rd column, number 6 is present --- > Then I want write 'f','kl' into a new csv file (with headers)
In 3rd column, string DDDD is present ---> Then I want to write 'e','ij' into a new csv file.
Please guide me how we can do this with Python?
I am trying with below code:
import csv
import time
search_string = "1"
with open('file1.csv') as f, open('file3.csv', 'w') as g:
reader = csv.reader(f)
next(reader, None) # discard the header
writer = csv.writer(g)
for row in reader:
if row[2] == search_string:
writer.writerow(row[:2])
But its printing only last two row values.
I don't see any problem in your code:
The third column of the row in row[2], you are right.
The first two columns are row[0:2] or row[:2], you are right.
If I simulate the reading, like this:
import io
import csv
data = """Country,Location,number,letter,name,pup-name,null
a,ab,1,qw,abcd,test1,3
b,cd,1,df,efgh,test2,4
c,ef,2,er,fgh,test3,5
d,gh,3,sd,sds,test4,
e,ij,DDDD,we,sdrt,test5,
f,kl,6,sc,asdf,test6,
g,mn,7,df,xcxc,test7,
h,op,8,gb,eretet,test8,
i,qr,8,df,hjjh,test9,
"""
with io.StringIO(data) as f:
reader = csv.reader(f)
next(reader, None) # discard the header
for row in reader:
if row[2] == "1":
print(row[:2])
It prints:
['a', 'ab']
['b', 'cd']
Change the value of search_string…
I have a very large .csv file (10GB) and want to extract rows based on different criteria in a tuple.
The fourth column of each row contains IPAdd
I need to extract only the rows with specific IP.
I'm new to python and would like to know how can I iterate over each tuple IP and write them into the WYE_Data.csv file.
The content sample of the CSV file is;
xxx,1234,abc,199.199.1.1,1,fghy,xxx
xxx,1234,abc,10.10.1.1,1,fghy,xxx
xxx,1234,abc,144.122.1.1,1,fghy,xxx
xxx,1234,abc,50.200.50.32,1,fghy,xxx
import csv
customers = csv.reader(open('data.csv', 'rb'), delimiter=',')
## This is the line I'm having issues with
IPAdd = ('199.199.1.1' or '144.122.1.1' or '22.22.36.22')
csvout = csv.writer(open('WYE_Data.csv', 'ab'))
for customer in customers:
if customer[3] == IPAdd:
csvout.writerow(customer)
I recommend that you use a list of the values you want to match for IP.
ips = ['199.199.1.1', '144.122.1.1', '22.22.36.22']
Then you can say:
if customer[3] in ips:
import csv
look_for = set(['199.199.1.1', '144.122.1.1', '22.22.36.22'])
with open('data.csv','rb') as inf, open('wye_data.csv','wb') as outf:
incsv = csv.reader(inf, delimiter=',')
outcsv = csv.writer(outf, delimiter=',')
outcsv.writerows(row for row in incsv if row[3] in look_for)