I downloaded this dataset and stored it in a folder called AutomobileDataset.
I cross checked the working directory using:
import pandas as pd
import numpy as np
import os
os.chdir("/Users/madan/Desktop/ML/Datasets/AutomobileDataset")
os.getcwd()
Output:
'/Users/madan/Desktop/ML/Datasets/AutomobileDataset'
Then I tried reading the file using pandas:
import pandas as pd
import numpy as np
import os
os.chdir("/Users/madan/Desktop/ML/Datasets/AutomobileDataset")
os.getcwd()
automobile_data = pd.read_csv("AutomobileDataset.txt", sep = ',',
header = None, na_values = '?')
automobile_data.head()
Output:
---------------------------------------------------------------------------
ParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 26
Someone please help me with this, I don't know where I am making a mistake.
Can try this!
import os
# Read in a plain text file
with open(os.path.join("c:user/xxx/xx", "xxx.txt"), "r") as f:
text = f.read()
print(text)
Related
I'm trying to read the two latest sheets in my folder READ1 and READ2 with pandas. Usually when I read files the file name has to be formatted at 'File.xlsx' but the method I'm using is printing in the terminal as File.xlsx. I tried changing the format with:
one = [str("'")+str(READ1)+ str("'")]
print(one)
Which outputs as ["'None'"]
My Code:
import glob
import os
import os.path
import pandas as pd
import xlsxwriter as xl
from pandas_datareader import data as pdr
import numpy as np
latest_file = sorted(glob.iglob('C:\My Folder\*'), key=os.path.getmtime)
READ1 = print(latest_file[0])
READ2 = print(latest_file[1])
File1 = pd.read_excel(READ1,sheet_name='Sheet1', header=None)
File2 = pd.read_excel(READ2,sheet_name='Sheet1', header=None)
print(File1)
If I run my code as is I get
inspect_excel_format assert content_or_path is not None
AssertionError
I have tried changing them to csv files too but that doesn't change anything. I think Python is reading it as an undefined variable. Such as:
READ1 = [File.xlsx]
has the error:
NameError: name 'File' is not defined
I have been referenceing:
How to get the latest file in a folder?
https://datatofish.com/latest-file-python/
The print method just prints its arguments to the terminal and returns None, so READ1 and READ2 are None.
Replace:
READ1 = print(latest_file[0])
READ2 = print(latest_file[1])
with:
READ1 = latest_file[0]
READ2 = latest_file[1]
I'm new to python and I got this error I couldn't solve
import pandas as pd
import numpy as np
url = 'http://localhost:8888/edit/Downloads/untitled.cvs'
food2014_recalls = pd.read_csv(url)
This is my csv file:
animal,uniq_id,water_need
elephant,1001,500
elephant,1002,600
elephant,1003,550
I got this error:
import pandas as pd
import numpy as np
import io
import requests
url ='http://localhost:8888/edit/Downloads/untitled.cvs'
res =requests.get(url).content
food2014_recalls =pd.read_csv(io.StringIO(res.decode('utf-8')), error_bad_lines=False, comment='#', sep=',')
Hi I am unable to read CSV file from the URL by using
import pandas as pd
import numpy as np
data_url = 'https://data.baltimorecity.gov/Financial/Real-Property-Taxes/27w9-urtv.csv'
df = pd.read_csv(data_url)
df.head()
I got an error: "not acceptable"
I also tried different codes importing "requests" but none of them worked. How do I fix this?
Your URL wasnt correct. This should work:
import pandas as pd
data_url = 'https://data.baltimorecity.gov/resource/27w9-urtv.csv'
df = pd.read_csv(data_url)
df.head()
I am trying a project as a beginner. It is driving me nuts because I keep getting minor errors that paralyze the whole execution. Here's an error that has been plaguing me.
### SOLUTION
## 1. Introduction of dataset
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
# This lets us see many columns in the output
pd.set_option('display.expand_frame_repr', False)
df = pd.read_csv('data.csv', index_col=0)
Error:
File "C:\ProgramData\Anaconda\Lib\site-packages\pandas\_libs\parsers.cp36-win_amd64.pyd", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
V\000~Ã\000\000ëtA¸P\000\000\000H»Ì\000H
builtins.FileNotFoundError: File b'data.csv' does not exist.
Why do I get this error even though the csv file exists?
You need to provide absolute path for the file.
For example: pd.read_csv("/path/to/the/file/data.csv", ...)
Or if you want to read the file from current directory:
import os
import sys
csv_path = os.path.dirname(os.path.abspath(sys.executable)) + '/data.csv'
df = pd.read_csv(csv_path, index_col=0)
I'm trying to load a dataset with breaks in it. I am trying to find an intelligent way to make this work. I got started on it with the code i included.
As you can see, the data within the file posted on the public FTP site starts at line 11, ends at line 23818, then starts at again at 23823, and ends at 45,630.
import pandas as pd
import numpy as np
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
url = urlopen("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/10_Portfolios_Prior_12_2_Daily_CSV.zip")
#Download Zipfile and create pandas DataFrame
zipfile = ZipFile(BytesIO(url.read()))
df = pd.read_csv(zipfile.open('10_Portfolios_Prior_12_2_Daily.CSV'), header = 0,
names = ['asof_dt','1','2','3','4','5','6','7','8','9','10'], skiprows=10).dropna()
df['asof_dt'] = pd.to_datetime(df['asof_dt'], format = "%Y%m%d")
I would ideally like the first set to have a version number "1", the second to have "2", etc.
Any help would be greatly appreciated. Thank you.