Getting exception "openpyxl.utils.exceptions.IllegalCharacterError"

Getting exception "openpyxl.utils.exceptions.IllegalCharacterError" - python

I am extracting data from an Oracle 11g Database using python and writing it to an Excel file. During extraction, I'm using a python list of tuples (each tuple indicates each row in dataset) and the openpyxl module to write the data into Excel. It's working fine for some datasets but for some, it's throwing the exception:
openpyxl.utils.exceptions.IllegalCharacterError
This is the solution I've already tried:
Openpyxl.utils.exceptions.IllegalcharacterError
Here is my Code:
for i in range(0,len(list)):
for j in range(0,len(header)):
worksheet_ntn.cell(row = i+2, column = j+1).value = list[i][j]
Here is the error message:
raise IllegalCharacterError
openpyxl.utils.exceptions.IllegalCharacterError

I did get this error because of some hex charactres in some of my strings.
'Suport\x1f_01'
The encode\decode solutions mess with the accente words too
So...
i resolve this with repr()
value = repr(value)
That give a safe representation, with quotation marks
And then i remove the first and last charactres
value = repr(value)[1:-1]
Now you can safe insert value on your cell

The exception tells you everything you need to know: you must replace the characters that cause the exception. This can be done using re.sub() but, seeing as only you can decide what you want to replace them with — spaces, empty strings, etc. — only you can do this.

Related

Python dpath.util.get Empty string keys Error

I read in a simple JSON text file and it parses to a dict just fine.
>>> data.keys()
dict_keys(['metadata', 'value'])
I want to get specific elements and I typically use the dpath package. However, in this case i get an error which seems to imply that I
dpath.util.get(data, 'metadata', separator='..')
InvalidKeyName: Empty string keys not allowed without dpath.options.ALLOW_EMPTY_STRING_KEYS=True
I don't see any empty string keys, only the two above. I can reproduce with some other seemingly random JSON text files but for others it works just fine. Any idea what is going on here?

Searching this error message in the library's codebase finds dpath/path.py:88:
for (k, v) in iteritems:
if issubclass(k.__class__, (string_class)):
if (not k) and (not dpath.options.ALLOW_EMPTY_STRING_KEYS):
raise dpath.exceptions.InvalidKeyName("Empty string keys not allowed without "
"dpath.options.ALLOW_EMPTY_STRING_KEYS=True")
So, this error is raised when your data structure has empty keys.

Overflow error when reading json file

I am trying to read a json which includes a number of tweets, but I get the following error.
OverflowError: int too large to convert
The script filters multiple json files to get specific tweets, and it crashes when reaching to a specific json.
The line that creates the error is this one :
df_temp = pd.read_json(path_or_buf=json_path, lines=True)
Here is the error in the cmd

Just store the user id as a String, and treat it like it is one (this is actually what you should do when dealing with this kind of ids). If you can't change the json input format, you can always parse it like a string before parsing it like a json object, and add the quotes to the id code, using for instance regexes: Regex in python.
I don't know with which library you are parsing the json, but maybe also implicit casting will work: either try the "getString" method on the number instead of the "getInt" method, or force python to treat the object like a string, with something like x = "" + json.getId()
Python is pretty loose on typing and may let you do it.

Python: Using str.split and getting list index out of range

I just started using python and am trying to convert some of my R code into python. The task is relatively simple; I have many csv file with a variable name (in this case cell lines) and values ( IC50's). I need to pull out all variables and their values shared in common among all files. Some of these files share the save variables but are formatted differently. For example in some files a variable is just "Cell_line" and in others it is MEL:Cell_line. So first things first to make a direct string comparison I need to format them the same and hence am trying ti use str.split() to do so. There is probably a much better way to do this but for now I am using the following code:
import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv"
with open(file_name) as csvfile:
NCI_data=csv.reader(csvfile, delimiter=',')
alldata={}
for row in NCI_data:
name_str=row[0]
splt=name_str.split(':')
n_name=splt[1]
alldata[n_name]=row
[1]
name_str.split return a list of length 2. Since the portion I want is after the ":" I want the second element which should be indexed as splt[1] as splt[0] is the first in python. However when I run the code I get this error message "IndexError: list index out of range"
I'm trying the second element out of a list of length 2 thus I have no idea why it is out of range. Any help or suggestions would be appreciated.

I am pretty sure that there are some rows where name_str does not have a : in them. From your own example if the name_str is Cell_line it would fail.
If you are sure that there would only be 1 : in name_str (at max) , or if there are multiple : you want to select the last one, instead of splt[1] , you should use - splt[-1] . -1 index would take the last element in the list (unless its empty) .

The simple answer is that sometimes the data isn't following the specification being assumed when you write this code (i.e. that there is a colon and two fields).
The easiest way to deal with this is to add an if block if len(splot)==2: and do the subsequent lines within that block.
Optionally, add an else: and print the lines that are not so spec or save them somewhere so you can diagnose.
Like this:
import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv"
with open(file_name) as csvfile:
NCI_data=csv.reader(csvfile, delimiter=',')
alldata={}
for row in NCI_data:
name_str=row[0]
splt=name_str.split(':')
if len(splt)==2:
n_name=splt[1]
alldata[n_name]=row
else:
print "invalid name: "+name_str
Alternatively, you can use try/except, which in this case is a bit more robust because we can handle IndexError anywhere, in either row[0] or in split[1], with the one exception handler, and we don't have to specify that the length of the : split field should be 2.
In addition we could explicitly check that there actually is a : before splitting, and assign the name appropriately.
import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv"
with open(file_name) as csvfile:
NCI_data=csv.reader(csvfile, delimiter=',')
alldata={}
for row in NCI_data:
try:
name_str=row[0]
if ':' in name_str:
splt=name_str.split(':')
n_name=splt[1]
else:
n_name = name_str
alldata[n_name]=row
except IndexError:
print "bad row:"+str(row)

Python: Append a parsed string but throw out non-compliant values?

Warning: I'm a total newbie; apologies if I didn't search for the right thing before submitting this question. I found lots on how to ignore errors, but nothing quite like what I'm trying to do here.
I have a simple script that I'm using to grab data off a database, parse some fields apart, and re-write the parsed values back to the database. Multiple users are submitting to the database according to a delimited template, but there is some degree of non-compliance, meaning sometimes the string won't contain all/any delimiters. My script needs to be able to handle those instances by throwing them out entirely.
I'm having trouble throwing out non-compliant strings, rather than just ignoring the errors they raise. When I've tried try-except-pass, I've ended up getting errors when my script attempts to append parsed values into the array I'm ultimately writing back to the db.
Originally, my script said:
def parse_comments(comments):
parts = comments.split("||")
if len(parts) < 20:
raise ValueError("Comment didn't have enough || delimiters")
return Result._make([parts[i].strip() for i in xrange(2, 21, 3)])
Fully compliant uploads would append Result to an array and write back to db.
I've tried try/except:
def parse_comments(comments):
parts = comments.split("||")
try:
Thing._make([parts[i].strip() for i in xrange(2, 21, 3)])
except:
pass
return Thing
But I end up getting an error when I try and append the parsed values to an array -- specifically TypeError: 'type' object has no attribute 'getitem'
I've also tried:
def parse_comments(comments):
parts = comments.split("||")
if len(parts) >= 20:
Thing._make([parts[i].strip() for i in xrange(2, 21, 3)])
else:
pass
return Thing
but to no avail.
tl;dr: I need to parse stuff and append parsed items. If a string can't be parsed how I want it, I want my code to ignore that string entirely and move on.

But I end up getting an error when I try and append the parsed values to an array -- specifically TypeError: 'type' object has no attribute 'getitem'
Because Thing means the Thing class itself, not an instance of that class.
You need to think more clearly about what you want to return when the data is invalid. It may be the case that you can't return anything directly usable here, so that the calling code has to explicitly check.

I am not sure I understand everything you want to do. But I think you are not catching the error at the right place. You said yourself that it arose when you wanted to append the value to an array. So maybe you should do:
try:
# append the parsed values to an array
except TypeError:
pass
You should give the exception type to catch after except, otherwise it will catch any exception, even a user's CTRL+C which raise a KeyboardInterrupt.

Search a single column for a particular value in a CSV file and return an entire row

Issue
The code does not correctly identify the input (item). It simply dumps to my failure message even if such a value exists in the CSV file. Can anyone help me determine what I am doing wrong?
Background
I am working on a small program that asks for user input (function not given here), searches a specific column in a CSV file (Item) and returns the entire row. The CSV data format is shown below. I have shortened the data from the actual amount (49 field names, 18000+ rows).
Code
import csv
from collections import namedtuple
from contextlib import closing
def search():
item = 1000001
raw_data = 'active_sanitized.csv'
failure = 'No matching item could be found with that item code. Please try again.'
check = False
with closing(open(raw_data, newline='')) as open_data:
read_data = csv.DictReader(open_data, delimiter=';')
item_data = namedtuple('item_data', read_data.fieldnames)
while check == False:
for row in map(item_data._make, read_data):
if row.Item == item:
return row
else:
return failure
CSV structure
active_sanitized.csv
Item;Name;Cost;Qty;Price;Description
1000001;Name here:1;1001;1;11;Item description here:1
1000002;Name here:2;1002;2;22;Item description here:2
1000003;Name here:3;1003;3;33;Item description here:3
1000004;Name here:4;1004;4;44;Item description here:4
1000005;Name here:5;1005;5;55;Item description here:5
1000006;Name here:6;1006;6;66;Item description here:6
1000007;Name here:7;1007;7;77;Item description here:7
1000008;Name here:8;1008;8;88;Item description here:8
1000009;Name here:9;1009;9;99;Item description here:9
Notes
My experience with Python is relatively little, but I thought this would be a good problem to start with in order to learn more.
I determined the methods to open (and wrap in a close function) the CSV file, read the data via DictReader (to get the field names), and then create a named tuple to be able to quickly select the desired columns for the output (Item, Cost, Price, Name). Column order is important, hence the use of DictReader and namedtuple.
While there is the possibility of hard-coding each of the field names, I felt that if the program can read them on file open, it would be much more helpful when working on similar files that have the same column names but different column organization.
Research
CSV Header and named tuple:
What is the pythonic way to read CSV file data as rows of namedtuples?
Converting CSV data to tuple: How to split a CSV row so row[0] is the name and any remaining items are a tuple?
There were additional links of research, but I cannot post more than two.

You have three problems with this:
You return on the first failure, so it will never get past the first line.
You are reading strings from the file, and comparing to an int.
_make iterates over the dictionary keys, not the values, producing the wrong result (item_data(Item='Name', Name='Price', Cost='Qty', Qty='Item', Price='Cost', Description='Description')).
for row in (item_data(**data) for data in read_data):
if row.Item == str(item):
return row
return failure
This fixes the issues at hand - we check against a string, and we only return if none of the items matched (although you might want to begin converting the strings to ints in the data rather than this hackish fix for the string/int issue).
I have also changed the way you are looping - using a generator expression makes for a more natural syntax, using the normal construction syntax for named attributes from a dict. This is cleaner and more readable than using _make and map(). It also fixes problem 3.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting exception "openpyxl.utils.exceptions.IllegalCharacterError" - python

The exception tells you everything you need to know: you must replace the characters that cause the exception. This can be done using re.sub() but, seeing as only you can decide what you want to replace them with — spaces, empty strings, etc. — only you can do this.

Related

Python dpath.util.get Empty string keys Error

Overflow error when reading json file

Python: Using str.split and getting list index out of range

Python: Append a parsed string but throw out non-compliant values?

Search a single column for a particular value in a CSV file and return an entire row

Categories

Resources