Pandas Read CSV for file address with \t in it - python

This may be a redundant question because I know that I can rename the file and solve the issue, but I'm still pretty new at this and it would be really useful information for the future. Thanks in advance to respondents!
So, I have a CSV file which is a table exported from SQL with the filename "t_SQLtable" located in a sub-folder of my working directory.
In order to open the file in Pandas I use the following command:
SQLfile= pd.read_csv('SUBFOLDER\t_SQLtable.csv', sep=',')
This is the error I receive:
FileNotFoundError: [Errno 2] File SUBFOLDER _SQLtable.csv does not exist: 'SUBFOLDER\t_SQLtable.csv'
My understanding is that Pandas is reading the <\t> as a tab and thus is not able to find the file, because that's not the file name it is looking for. But I don't know how to format the text in order to tell Pandas how to recognize the <t> as part of the filename. Would anyone know how to resolve this?
Thank you!

Folders are navigated using / which won't escape any character
SQLfile= pd.read_csv('SUBFOLDER/t_SQLtable.csv', sep=',')
in future if you want to keep \t without it being considered as tab
use raw string
print('SUBFOLDER\t_SQLtable.csv')
print(r'SUBFOLDER\t_SQLtable.csv')
SUBFOLDER _SQLtable.csv
SUBFOLDER\t_SQLtable.csv

Try with this.
SQLfile= pd.read_csv('SUBFOLDER\\t_SQLtable.csv', sep=',')
SQLfile= pd.read_csv('SUBFOLDER/t_SQLtable.csv', sep=',')
If doesn't work , then try this:
import os
file_path = os.path.join(os.getcwd(), "SUBFOLDER", "t_SQLtable.csv")
SQLfile= pd.read_csv(file_path, sep=',')

Simply do what you did before, except add an r right before the string:
SQLfile = pd.read_csv(r'SUBFOLDER\t_SQLtable.csv', sep=',')
Adding r to the start of a string will make python treat it as a raw string, as in, all escape codes won't be evaluated.

Related

Python pandas csv file unicode error and stuffs

I'm trying to read a csv file on python. The code goes like this -
import pandas as pd
df = pd.read_csv("C:\Users\User\Desktop\Inan")
print(df.head())
However it keeps showing the unicode error. Tried putting r,changing the slashes in multiple ways,but it didnt't work,just showed different errors like "file not found". What can I do?
Try this method, It may work
df = pd.read_csv("C:/Users/User/Desktop/Inan.csv", encoding="utf-8")
include your file extension also(.csv .xlxs)

Python - open a file [duplicate]

def choose_option(self):
if self.option_picker.currentRow() == 0:
description = open(":/description_files/program_description.txt","r")
self.information_shower.setText(description.read())
elif self.option_picker.currentRow() == 1:
requirements = open(":/description_files/requirements_for_client_data.txt", "r")
self.information_shower.setText(requirements.read())
elif self.option_picker.currentRow() == 2:
menus = open(":/description_files/menus.txt", "r")
self.information_shower.setText(menus.read())
I am using resource files and something is going wrong when i am using it as argument in open function, but when i am using it for loading of pictures and icons everything is fine.
That is not a valid file path. You must either use a full path
open(r"C:\description_files\program_description.txt","r")
Or a relative path
open("program_description.txt","r")
Add 'r' in starting of path:
path = r"D:\Folder\file.txt"
That works for me.
I also ran into this fault when I used open(file_path). My reason for this fault was that my file_path had a special character like "?" or "<".
I received the same error when trying to print an absolutely enormous dictionary. When I attempted to print just the keys of the dictionary, all was well!
In my case, I was using an invalid string prefix.
Wrong:
path = f"D:\Folder\file.txt"
Right:
path = r"D:\Folder\file.txt"
In my case the error was due to lack of permissions to the folder path. I entered and saved the credentials and the issue was solved.
I had the same problem
It happens because files can't contain special characters like ":", "?", ">" and etc.
You should replace these files by using replace() function:
filename = filename.replace("special character to replace", "-")
you should add one more "/" in the last "/" of path, that is:
open('C:\Python34\book.csv') to open('C:\Python34\\book.csv'). For example:
import csv
with open('C:\Python34\\book.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter='', quotechar='|')
for row in spamreader:
print(row)
Just replace with "/" for file path :
open("description_files/program_description.txt","r")
In Windows-Pycharm: If File Location|Path contains any string like \t then need to escape that with additional \ like \\t
just use single quotation marks only and use 'r' raw string upfront and a single '/'
for eg
f = open(r'C:/Desktop/file.txt','r')
print(f.read())
I had special characters like '' in my strings, for example for one location I had a file Varzea*, then when I tried to save ('Varzea.csv') with f-string Windows complained. I just "sanitized" the string and all got back to normal.
The best way in my case was to let the strings with just letters, without special characters!
For me this issue was caused by trying to write a datetime to file.
Note: this doesn't work:
myFile = open(str(datetime.now()),"a")
The datetime.now() object contains the colon ''':''' character
To fix this, use a filename which avoid restricted special characters. Note this resource on detecting and replacing invalid characters:
https://stackoverflow.com/a/13593932/9053474
For completeness, replace unwanted characters with the following:
import re
re.sub(r'[^\w_. -]', '_', filename)
Note these are Windows restricted characters and invalid characters differ by platform.
for folder, subs, files in os.walk(unicode(docs_dir, 'utf-8')):
for filename in files:
if not filename.startswith('.'):
file_path = os.path.join(folder, filename)
In my case,the problem exists beacause I have not set permission for drive "C:\" and when I change my path to other drive like "F:\" my problem resolved.
import pandas as pd
df = pd.read_excel ('C:/Users/yourlogin/new folder/file.xlsx')
print (df)
I got this error because old server instance was running and using log file, hence new instance was not able to write to log file. Post deleting log file this issue got resolved.
When I copy the path by right clicking the file---> properties-->security, it shows the error. The working method for this is to copy path and filename separately.
I had faced same issue while working with pandas and trying to open a big csv file:
wrong_df = pd.read_csv("D:\Python Projects\ML\titanic.csv")
right_df = pd.read_csv("D:\Python Projects\ML\\titanic.csv")

How to open a .data file extension

I am working on side stuff where the data provided is in a .data file. How do I open a .data file to see what the data looks like and also how do I read from a .data file programmatically through python? I have Mac OSX
NOTE: The Data I am working with is for one of the KDD cup challenges
Kindly try using Notepad or Gedit to check delimiters in the file (.data files are text files too). After you have confirmed this, then you can use the read_csv method in the Pandas library in python.
import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")
It vastly depends on what is in it. It could be a binary file or it could be a text file.
If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))
If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:
Reading binary file in Python and looping over each byte
Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)
After further into from above and looking at the page the format is:
Data Format
The datasets use a format similar as that of the text export format from relational databases:
One header lines with the variables names
One line per instance
Separator tabulation between the values
There are missing values (consecutive tabulations)
Therefore see this answer:
parsing a tab-separated file in Python
I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...
I suspect it doesnt open in sublime because the file is huge, but that is just a guess.
To get a quick overview of what the file may content you could do this within a terminal, using strings or cat, for example:
$ strings file.data
or
$ cat -v file.data
In case you forget to pass the -v option to cat and if is a binary file you could mess your terminal and therefore need to reset it:
$ reset
I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.
Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.
I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:
inf = open("processed-1.cleveland.data", "r")
lines = inf.readlines()
for line in lines:
print(line, end="")
It works for me.
import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()
I mean that just take it as a csv file if it is seprated with ','.
solution from #mustious.

read_csv() & EOF character in string cause parsing issue

I am trying to read in 50 csv files from a zip file but keep getting
CParserError: Error tokenizing data. C error: EOF inside string starting at line 166
I know there is an error with reading a particular string within the data and can fix in manually but dont want to have to extract all csv files manually to fix each one.
with zipfile.ZipFile('C:\Users\Austen\Anaconda\cs109_final\CA34.zip') as zf:
for name in zf.namelist():
container[name] = pd.read_csv(zf.open(name))
The problem I found is that there is a single ; in each csv file towards the end of the file. How would I ignore that?
With reference from:
https://github.com/pydata/pandas/issues/5500
Tried to add
container[name] = pd.read_csv(zf.open(name),skipfooter=4)
But I get 'unexpected end of data'
Would adding an option to read_csv fix the problem? I had a similar problem and it was fixed by adding the option quoting=csv.QUOTE_NONE
For example:
df = pd.read_csv(csvfile, header = None, delimiter="\t", quoting=csv.QUOTE_NONE, encoding='utf-8')
The second comment in this discussion talks about why:
https://github.com/pydata/pandas/issues/5500
Passing engine="python" solves the issue.
Reference:Most frequent errors

Pandas read excel with Chinese filename

I am trying to load as a pandas dataframe a file that has Chinese characters in its name.
I've tried:
df=pd.read_excel("url/某物2008.xls")
and
import sys
df=pd.read_excel("url/某物2008.xls", encoding=sys.getfilesystemencoding())
But the response is something like: "no such file or directory "url/\xa1\xa92008.xls"
I've also tried changing the names of the files using os.rename, but the filenames aren't even read properly (asking python to just print the filenames yields only question marks or squares).
df=pd.read_excel(u"url/某物2008.xls", encoding=sys.getfilesystemencoding())
may work... but you may have to declare an encoding type at the top of the file
try this for unicode conversion:
df=pd.read_excel(u"url/某物2008.xls", encoding='utf-8')

Categories