Handling new line characters in CSV using python - python

I am trying to upload a CSV file into a an SQL Server database using python.I am not able to handle new line characters.The file behaves differently in MS Excel and Noptepad++
The following is an example of CSV file that contains new line characters.The file looks like this in notepad++ but it looks like this in excel.
The text breaks into two parts in column C.
I tried to handle newline characters like this
wfile = open(UploadFile, "rU")
reader = csv.reader(wfile,delimiter = ",",dialect='excel')
with open(UploadFile, "r") as uploadData:
formatter_string = "%d/%m/%y %H:%M"
for row in reader:
datetime_object = datetime.strptime(row[9], formatter_string)
row[9] = str(datetime_object.date())
cursor.execute("insert into "+UploadTable+" values ("+(row[9])+","+(row[0])+","+(row[1])+","+(row[2])+","+(row[8])+","+(row[3])+","+(row[4])+","+(row[5])+","+(row[6])+","+(row[7])+")")
I read this here
When I tried to upload this file I got this error
Failed to upload Facebook data.
list index out of range
I am not sure what is going wrong.
stack trace:
Traceback (most recent call last):
File "G:/P/14. Digital metrics - Phase 2/3. Execution/4. Code/Code Final/Unmetric Post level/test.py", line 66, in <module>
FBPostupload(os.getcwd()+'\FB_Unm_postlevel_camp_mapping.csv','Unm_Fb_Posts_Stage1_test')
File "G:/P/14. Digital metrics - Phase 2/3. Execution/4. Code/Code Final/Unmetric Post level/test.py", line 53, in FBPostupload
datetime_object = datetime.strptime(row[9], formatter_string)
IndexError: list index out of range

Related

Python OSError: [Errno 9] Bad file descriptor after opening big json file

I just tried to read in a big json file (the Wikipedia json dump) in Python line by line and got the Error:
Traceback (most recent call last):
File "C:/.../test_json_wiki_file.py", line 19, in <module>
test_fct()
File "C:/.../test_json_wiki_file.py", line 12, in test_fct
for line in f:
OSError: [Errno 9] Bad file descriptor
Here is my code:
import json
def test_fct():
data = []
i = 0
with open('E:/.../20200713.json/20200713.json') as f:
for line in f:
data.append(json.loads(line))
i = i + 1
if i > 1:
input_file.close()
return data
test_data = test_fct()
The file size is around 700GB and the description (https://www.wikidata.org/wiki/Wikidata:Database_download) of the file states that it can be read line by line. I don't know if this is important but the E:/ hard drive is an external one.
Thank you for your help in advance :)
I don't have any firsthand knowledge on opening large files in python, but did you mean to have the path as 20200713.json/20200713.json. Is the first one actually a directory that has a .json extension? I'd also suggest trying to first load a smaller sample of the file (opening might be hard, so maybe just use the more command in terminal?).

How to append keywords to IPTC data in a JPG image?

I'm trying to add keywords to the IPTC data in a JPG file and failing miserably. I'm able to read in the keywords using the iptcinfo3 library and, seemingly, append the keyword to the list of current keywords but I'm failing when trying to write those keywords back to the JPG file, if not sooner. The error message is a bit unclear to me and may actually reference the appending of the new keyword (although a print statement seems to indicate it took).
I've tried three different metadata libraries (there doesn't seem to be one standard) and this is the furthest I've gotten with any of them (failing to even install one and not being able to get a second one to run). This seems so basic but I can't figure it out and haven't been able to adapt the few other code examples I've seen online to work, including iptcinfo3's example code fragment.
The current Error message is:
| => pipenv run python editMetadata.py
WARNING: problems with charset recognition (b'\x1b')
[b'Gus']
[b'Gus', b'frog']
Traceback (most recent call last):
File "editMetadata.py", line 22, in <module>
info.save_as('Gus2.jpg')
File "/Users/Scott/.local/share/virtualenvs/editPhotoMetadata-tx0JAOmI/lib/python3.7/site-packages/iptcinfo3.py", line 635, in save_as
jpeg_parts = jpeg_collect_file_parts(fh)
File "/Users/Scott/.local/share/virtualenvs/editPhotoMetadata-tx0JAOmI/lib/python3.7/site-packages/iptcinfo3.py", line 324, in jpeg_collect_file_parts
adobeParts = collect_adobe_parts(partdata)
File "/Users/Scott/.local/share/virtualenvs/editPhotoMetadata-tx0JAOmI/lib/python3.7/site-packages/iptcinfo3.py", line 433, in collect_adobe_parts
out = [''.join(out)]
TypeError: sequence item 0: expected str instance, bytes found
Code:
from iptcinfo3 import IPTCInfo
import os
# Create new info object
info = IPTCInfo('Gus.jpg')
# Print list of keywords
print(info['keywords'])
# Append the keyword I want to add
info['keywords'].append(b'frog')
# Print to test keyword has been added
print(info['keywords'])
# Save new info to file
info.save()
info.save_as('Gus2.jpg')
Instead of appending use equal "="
from iptcinfo3 import IPTCInfo
info = IPTCInfo('Gus.jpg')
print(info['keywords'])
# add keyword
info['keywords'] = ['new keyword']
info.save()
info.save_as('Gus_2.jpg')
I have the same error. It seems to be an issue with the save depending on the file.
from iptcinfo3 import IPTCInfo
info = IPTCInfo('image.jpg', force=True)
info.save()
Which gives me the same error.
WARNING: problems with charset recognition (b'\x1b')
WARNING: problems with charset recognition (b'\x1b')
Traceback (most recent call last):
File "./searchimages.py", line 123, in <module>
main(sys.argv[1:])
File "./searchimages.py", line 119, in main
find_photos(str(sys.argv[1]))
File "./searchimages.py", line 46, in find_photos
write_keywords(image, current_keywords, new_keywords)
File "./searchimages.py", line 109, in write_keywords
info.save_as('out.jpg')
File "/usr/local/lib/python3.7/site-packages/iptcinfo3.py", line 635, in save_as
jpeg_parts = jpeg_collect_file_parts(fh)
File "/usr/local/lib/python3.7/site-packages/iptcinfo3.py", line 324, in jpeg_collect_file_parts
adobeParts = collect_adobe_parts(partdata)
File "/usr/local/lib/python3.7/site-packages/iptcinfo3.py", line 433, in collect_adobe_parts
out = [''.join(out)]
TypeError: sequence item 0: expected str instance, bytes found

CSV file gives error on reading with open() in python 2.7

import unicodecsv
engagement_file=r'G:\college\udacity\intro to data analitics\datasets\daily_engagement.csv'
enrollment_file=r'G:\college\udacity\intro to data analitics\datasets\enrollments.csv'
project_submissions_file=r'G:\college\udacity\intro to data analitics\datasets\project_submissions.csv'
def csv_to_list(csv_file):
with open(csv_file,'rb') as f:
reader=unicodecsv.DictReader(f)
return list(reader)
daily_engagement=csv_to_list(engagement_file)
enrollment=csv_to_list(enrollment_file)
project_submissions=csv_to_list(project_submissions_file)
on executing this piece of code I get following errors
Traceback (most recent call last):
File "G:\college\udacity\intro to data analitics\data_analytis_csv_to_list.py", line 10, in <module>
daily_engagement=csv_to_list(engagement_file)
File "G:\college\udacity\intro to data analitics\data_analytis_csv_to_list.py", line 8, in csv_to_list
return list(reader)
File "C:\ProgramData\Anaconda2\lib\site-packages\unicodecsv\py2.py", line 217, in next
row = csv.DictReader.next(self)
File "C:\ProgramData\Anaconda2\lib\csv.py", line 108, in next
row = self.reader.next()
File "C:\ProgramData\Anaconda2\lib\site-packages\unicodecsv\py2.py", line 117, in next
row = self.reader.next()
ValueError: I/O operation on closed file
I dont know how to solve it ,I m new to python
thanks in advance
When using with open() as f: in python the file f is only open inside the with clause. That is the point of using it; it provides automatic file closing and cleaning in a easy and readable way.
If you want to work on the file either open it without the with clause (that is plain opening a file) or do the operations on that file inside the clause, calling it directly as f.
You need to move your return under your with statement. Once control flow has gone out of the with statement, Python automatically closes the file for you. That means any file I/O you have to do needs to be done under the context manager:
def csv_to_list(csv_file):
with open(csv_file,'rb') as f:
reader = unicodecsv.DictReader(f)
return list(reader) # return the file under the context manager

Python reading from CSV file in python 2.7 but not python 3.6. What do i have to do to make it work in 3.6

I have some code which i coded in python 2.7, however I need it to work for 3.6 and when i run it i get this error and i am not sure why.
import csv
def ReadFromFile():
with open('File.csv', 'r') as File:
cr = csv.reader(File)
for row in cr:
Name = row[0]
Gender = row[1]
print(Name + Gender)
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
ReadFromFile()
File "F:/Test.py", line 6, in ReadFromFile
Name = row[0]
IndexError: list index out of range
I am using the same code saved on a memory stick with the file in 2.7 i get my desired out come of it being read but in 3.6 i am stuck with the error. Thanks for any help
Edit: Added Print
After adding print i got
ELIZABETHFemale
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
ReadFromFile()
File "F:/Test.py", line 6, in ReadFromFile
Name = row[0]
IndexError: list index out of range
So it gave me the first line but nothing more
Python's CSV module has changed how it wants the files you pass to it to be opened. You want to avoid the file object doing any newline transformation because some CSV formats allow embedded newlines within quoted fields. The csv module will do its own newline normalization, so the usual universal newline handling the file object does is redundant.
This is mentioned in the csv.reader documentation, where it is talking about the file argument:
If csvfile is a file object, it should be opened with newline=''.
So for your code, try changing open('File.csv', 'r') to open('File.csv', 'r', newline='').
Have you tried pandas?
I think you may want to use something like
import pandas as pd
def ReadFromFile():
df = pd.read_csv('File.csv')
for row in df:
Name = row[0]
Gender = row[1]
print(Name + Gender)

Issue with csv module in Python

I'm having an issue using Python's csv module. I created a simple file in Excel containing two columns (names and ages of people) and saved it as a csv file. I then ran the following lines in Python:
import csv
csv_rader = csv.DictReader(open('people.csv'))
people = list(csv_reader)
and I got the following error:
Traceback (most recent call last):
File "", line 1, in
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 107, in next
self.fieldnames
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 90, in fieldnames
self._fieldnames = self.reader.next()
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
Does anyone know what could be causing this, and how to go about fixing it?

Categories