Dataset loading error in Python in Jupyter

Dataset loading error in Python in Jupyter - python

import pandas as pd
import numpy as ny
studentPerfomance = 'C:\Users\Vignesh\Desktop\project\students-performance-in-exams\StudentsPerformance.csv'
error
File "<ipython-input-10-056bf84aaa71>", line 1
studentPerfomance = 'C:\Users\Vignesh\Desktop\project\students-performance-in-exams\StudentsPerformance.csv'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Use the standard slash / and not the backslash. It is not good practice to use the backslash to separate folders. I do not know why Windows is still using this as the standard way to display paths.
The problem with the backslash is related to escape sequences like \n (new line) or \t (tab).
So the solution is to replace als backslashes with a standard slash /.
import pandas as pd
import numpy as ny
studentPerfomance = 'C:/Users/Vignesh/Desktop/project/students-performance-in-exams/StudentsPerformance.csv'

The problem is that you are using a string as a path.
Just put r before your normal string it converts normal string to raw string:
studentPerfomance = r'C:\Users\Vignesh\Desktop\project\students-performance-in-exams\StudentsPerformance.csv'
or
studentPerfomance = 'C:\\Users\\Vignesh\\Desktop\\project\\students-performance-in-exams\\StudentsPerformance.csv'

In general, there is nothing wrong with what you did. I'm also proud of you for not having any spaces in your path!(very unprofessional). The issue is that the backslashes(\) in your studentPerformance string are escape characters in Python. So Python escapes from the string every time it sees a \.
That said, Windows uses backslashes in system paths instead of forward slashes like Linux based operating systems, causing the users extra pain.
The best way to fix this issue is to prefix your string with an r, like so:
studentPerfomance = r'C:\Users\Vignesh\Desktop\project\students-performance-in-exams\StudentsPerformance.csv'
This tells Python to ignore the backslashes so that it does not escape the string.

Related

Reading .dat file through pandas.read_csv( ) gives unicode error

I've a set of .dat files and I'm not sure what type pf data they carry (mostly non video, audio content - should be a mix of integer, text and special characters). I came to learn that we read .dat files using pandas read_csv or read_table into Python and I tried the below
DATA = pd.read_csv(r'file_path\Data.dat', header=None)
below is the error I get
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 2-3: truncated \UXXXXXXXX escape
I've tried the solutions listed from around including Stack overflow and blogs, and tried the below options too, but none of them worked
Used Double quotes for filepath instead of single quote (pd.read_csv(r"filepath"))
Use double backslash instead of single backslash
Use forward slash
Use double backslash only at the beginning of the filepath, something like C:\User\folder....
Tried a few encodings like utf-8, ascii, latin-1 etc., and the error for all the above is "EmptyDataError: No columns to parse from file"
Tried without r in the read_csv argument. Didn't work
Tried 'sep='\s+', also set skiprows too. No use
One thing to mention is that one of my folder name has numbers apart from text. Does that create this issue by any chance?
Can someone highlight what I need to do...thanks in advance

Changing file path and need for raw? [duplicate]

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 1 year ago.
import os
cwd = os.getcwd()
print("Current working directory: {0}".format(cwd))
# Print the type of the returned object
print("os.getcwd() returns an object of type: {0}".format(type(cwd)))
os.chdir(r"C:\Users\ghph0\AppData\Local\Programs\Python\Python39\Bootcamp\PDFs")
# Print the current working directory
print("Current working directory: {0}".format(os.getcwd()))
Hi all, I was changing my file directory so I could access specific files and was then greeted with this error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
From there I did some research and was told that converting the string to raw would fix the problem. My question is why do I convert it to raw and what does it do and why does it turn the file path into a red colour(not really important but never seen this before). Picture below:
https://i.stack.imgur.com/4oHlC.png
Many thanks to anyone that can help.

Backslashes in strings have a specific meaning in Python and are translated by the interpreter. You have surely already encountered "\n". Despite taking two letters to type, that is actually a one-character string meaning "newline". ANY backslashes in a string are interpreted that way. In your particular case, you used "\U", which is the way Python allows typing long Unicode values. "\U1F600", for example, is the grinning face emoji.
Because regular expressions often need to use backslashes for other uses, Python introduced the "raw" string. In a raw string, backslashes are not interpreted. So, r"\n" is a two-character string containing a backslash and an "n". This is NOT a newline.
Windows paths often use backslashes, so raw strings are convenient there. As it turns out, every Windows API will also accept forward slashes, so you can use those as well.
As for the colors, that probably means your editor doesn't know how to interpret raw strings.

Syntax error RunPython using xlwings, sys.path

OS Windows 10 Pro
Versions of xlwings, Excel, and Python (0.9.0, Office 365, Python 3.8.2)
I am new on using xlwings through VBA. I run the exact syntax from a tutorial webpage on both VBA and Python, but it gives error like this:
File "<string>", line 1
import sys, os; sys.path[0:0]=os.path.normcase(os.path.expandvars(r'C:\Users\User\Trial2;C:\Users\User\Trial2\Trial2.zip;C:\Users\User\Anaconda3\')).split(';'); import Trial2;Trial2.main()
SyntaxError: invalid syntax
I used original syntax for VBA, and the syntax I used for python is like this:
import xlwings as xw
##xw.sub # only required if you want to import it or run it via UDF Server
def main():
wb = xw.Book.caller()
wb.sheets[0].range("A1").value = "Hello xlwings!"
##xw.func
def hello(name):
return "hello {0}".format(name)
if __name__ == "__main__":
xw.Book("Trial2.xlsm").set_mock_caller()
main()
I barely find any clue for this problem, so I'm hoping that someone can give me a solution

I realize this is a long time after the initial question but I had the same issue and couldn't find an answer anywhere. After playing around (for much longer than I care to admit) I found the problem for me was that my .xlsm/.py file names contained a space. With no other changes, everything worked when I replaced the space with an underscore.

This is a quirk in python's string literals. Even with raw strings the backslash escapes the quote character so r"ends in quote\"" is valid. It also means that raw strings can't end in a single backslash. r"ends in slash\" is a syntax error. If you need to end a string with a backslash, you can't use raw. "ends in slash\\" is okay.
I'm not sure where the failing string comes from, but you need to change it to
import sys, os; sys.path[0:0]=os.path.normcase(os.path.expandvars('C:\\Users\\User\\Trial2;C:\\Users\\User\\Trial2\\Trial2.zip;C:\\Users\\User\\Anaconda3\\')).split(';'); import Trial2;Trial2.main()
See Python Lexical Analysis
Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw literal cannot end in a single backslash (since the backslash would escape the following quote character).

Error "(unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape" [duplicate]

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 3 years ago.
I'm trying to read a CSV file into Python (Spyder), but I keep getting an error. My code:
import csv
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)
I get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 2-3: truncated \UXXXXXXXX escape
I have tried to replace the \ with \\ or with / and I've tried to put an r before "C.., but all these things didn't work.

This error occurs, because you are using a normal string as a path. You can use one of the three following solutions to fix your problem:
1: Just put r before your normal string. It converts a normal string to a raw string:
pandas.read_csv(r"C:\Users\DeePak\Desktop\myac.csv")
2:
pandas.read_csv("C:/Users/DeePak/Desktop/myac.csv")
3:
pandas.read_csv("C:\\Users\\DeePak\\Desktop\\myac.csv")

The first backslash in your string is being interpreted as a special character. In fact, because it's followed by a "U", it's being interpreted as the start of a Unicode code point.
To fix this, you need to escape the backslashes in the string. The direct way to do this is by doubling the backslashes:
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
If you don't want to escape backslashes in a string, and you don't have any need for escape codes or quotation marks in the string, you can instead use a "raw" string, using "r" just before it, like so:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

You can just put r in front of the string with your actual path, which denotes a raw string. For example:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

Consider it as a raw string. Just as a simple answer, add r before your Windows path.
import csv
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)

Try writing the file path as "C:\\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener" i.e with double backslash after the drive as opposed to "C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener"

Add r before your string. It converts a normal string to a raw string.

As per String literals:
String literals can be enclosed within single quotes (i.e. '...') or double quotes (i.e. "..."). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings).
The backslash character (i.e. \) is used to escape characters which otherwise will have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter r or R. Such strings are called raw strings and use different rules for backslash escape sequences.
In triple-quoted strings, unescaped newlines and quotes are allowed, except that the three unescaped quotes in a row terminate the string.
Unless an r or R prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
So ideally you need to replace the line:
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
To any one of the following characters:
Using raw prefix and single quotes (i.e. '...'):
data = open(r'C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener')
Using double quotes (i.e. "...") and escaping backslash character (i.e. \):
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
Using double quotes (i.e. "...") and forwardslash character (i.e. /):
data = open("C:/Users/miche/Documents/school/jaar2/MIK/2.6/vektis_agb_zorgverlener")

Just putting an r in front works well.
eg:
white = pd.read_csv(r"C:\Users\hydro\a.csv")

It worked for me by neutralizing the '' by f = open('F:\\file.csv')

The double \ should work for Windows, but you still need to take care of the folders you mention in your path. All of them (except the filename) must exist. Otherwise you will get an error.

Interpreting Unicode from Terminal

I'm having issues reading Unicode text from the shell into Python. I have a test document with the following metadata atrribute:
kMDItemAuthors = (
"To\U0304ny\U0308 Sta\U030ark"
)
I see this when I run mdls -name kMDItemAuthors path/to/the/file
I am attempting to get this data into usable form within a Python script. However, I cannot get the Unicode represented text into actual Unicode in Python.
Here's what I am currently doing:
import unicodedata
import subprocess
import os
os.environ['LANG'] = 'en_US.UTF-8'
cmd = 'mdls -name kMDItemAuthors path/to/the/file'
proc = subprocess.Popen(cmd,
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
(stdout, stderr) = proc.communicate()
u = unicode(stdout, 'utf8')
a = unicodedata.normalize('NFC', u)
Now, when I print(a), I get the exact same string representation is above. I have tried normalizing with all of the options (NFC, NFD, NFKC, NFKD), all with the same result.
The weirder thing is, when I try this code:
print('To\U0304ny\U0308 Sta\U030ark')
I get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-7: truncated \UXXXXXXXX escape
So, when that sub-string is within the variable, there's no problem, but as a raw string, it creates an issue.
I had felt pretty strong in my understanding of Python and Unicode, but now the shell has broken me. Any help would be greatly appreciated.
PS. I am running all this in Python 2.7.X

You have multiple problems here.
Like all escape sequences, Python only interprets the \U sequence in string literals in your source code. If a file actually has a \ followed by a U in it, Python isn't going to treat that as anything other than a \ and a U, any more than it'll treat a \ followed by an n as a newline. If you want to unescape them manually, you can, by using the unicodeescape codec. (But note that this will treat your file as ASCII, not UTF-8. If you actually have both UTF-8 and \U sequences, you will have to decode it as UTF8, then encode it with unicodeescape, then decode it back with unicodeescape.)
A Python \U sequence requires 8 digits, not 4. If you only have 4, you have to use \u. So, whatever program generated this string, it can't be parsed with unicodeescape. You might be able to hack it into shape by some quick&dirty workaround like s.replace(r'\U', r'\U0000') or s.replace('r\U', r'\u'), or you may have to write a simple parser for it.
In your test, you're trying to use \U escapes in a string literal. You can only do that in Unicode string literals, like print(u'To\U0304ny\U0308 Sta\U030ark'). (If you do that, of course, you'll get the previous error again.)
Also, since this appears to be a Mac, you probably shouldn't be doing os.environ['LANG'] = 'en_US.UTF-8'. If Python sees that it's on OS X, it assumes everything is UTF-8. Anything you do to try to force UTF-8 will probably do nothing, and could in theory confuse it so it doesn't notice it's on OS X. Unless you're trying to work around a driver program that intentionally sets the locale to "C" before calling your script, you're usually better off not doing this.

as mentioned in the other answers just slightly more direct code example
>>> s="To\U0304ny\U0308 Sta\U030ark"
>>> s
'To\\U0304ny\\U0308 Sta\\U030ark'
>>> s.replace("\\U","\\u").decode("unicode-escape")
u'To\u0304ny\u0308 Sta\u030ark'
>>> print s.replace("\\U","\\u").decode("unicode-escape")
Tōnÿ Stårk
>>>

\U is for characters outside the BMP, i.e. it takes 8 hex digits. For characters within the BMP use \u.
>>> print u'To\u0304ny\u0308 Sta\u030ark'
Tōnÿ Stårk
3>> print('To\u0304ny\u0308 Sta\u030ark')
Tōnÿ Stårk

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.