I am trying a project as a beginner. It is driving me nuts because I keep getting minor errors that paralyze the whole execution. Here's an error that has been plaguing me.
### SOLUTION
## 1. Introduction of dataset
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
# This lets us see many columns in the output
pd.set_option('display.expand_frame_repr', False)
df = pd.read_csv('data.csv', index_col=0)
Error:
File "C:\ProgramData\Anaconda\Lib\site-packages\pandas\_libs\parsers.cp36-win_amd64.pyd", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
V\000~Ã\000\000ëtA¸P\000\000\000H»Ì\000H
builtins.FileNotFoundError: File b'data.csv' does not exist.
Why do I get this error even though the csv file exists?
You need to provide absolute path for the file.
For example: pd.read_csv("/path/to/the/file/data.csv", ...)
Or if you want to read the file from current directory:
import os
import sys
csv_path = os.path.dirname(os.path.abspath(sys.executable)) + '/data.csv'
df = pd.read_csv(csv_path, index_col=0)
Related
I have the following python code and it runs just fine if I run it manually. When I go to schedule it in Window's scheduler it doesn't work. Any idea why?
#Data
from datetime import date, datetime, timedelta
import os
import sys
import traceback
#Pandas and Numpy
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import numpy as np
#Set Directory
#print('\nRead in data file')
indirname = r'C:/Users/user/Desktop'
outdirname = r'C:/Users/user/Desktop'
#Read in file
data_file = os.path.join(indirname, r'File Name.xlsx')
df = pd.read_excel(data_file, sheet_name='Sheet2', skiprows=range(1))
df.to_excel('C:/Users/user/Desktop/test5.xlsx')
If your script is fine than you can check your python.exe location using where python command in your command prompt then paste the path in Program/script dilogue box then provide full path of your python file in add arguments with double quotes. e.g. "C:\user\name\folder\file.py" and empty your start in (optional)
I tried the following:
import pandas as pd
import numpy as np
x=pd.ExcelFile('Energy Indicator.xls')
energy= x.parse(skiprows =17, skip_footer=38))
...
I got the following error message:
FileNotFoundError, no such file or directory.
I think it's best to declare your filepath specification with something like os.path or filedialog from tkinter.
Hi I am unable to read CSV file from the URL by using
import pandas as pd
import numpy as np
data_url = 'https://data.baltimorecity.gov/Financial/Real-Property-Taxes/27w9-urtv.csv'
df = pd.read_csv(data_url)
df.head()
I got an error: "not acceptable"
I also tried different codes importing "requests" but none of them worked. How do I fix this?
Your URL wasnt correct. This should work:
import pandas as pd
data_url = 'https://data.baltimorecity.gov/resource/27w9-urtv.csv'
df = pd.read_csv(data_url)
df.head()
I downloaded this dataset and stored it in a folder called AutomobileDataset.
I cross checked the working directory using:
import pandas as pd
import numpy as np
import os
os.chdir("/Users/madan/Desktop/ML/Datasets/AutomobileDataset")
os.getcwd()
Output:
'/Users/madan/Desktop/ML/Datasets/AutomobileDataset'
Then I tried reading the file using pandas:
import pandas as pd
import numpy as np
import os
os.chdir("/Users/madan/Desktop/ML/Datasets/AutomobileDataset")
os.getcwd()
automobile_data = pd.read_csv("AutomobileDataset.txt", sep = ',',
header = None, na_values = '?')
automobile_data.head()
Output:
---------------------------------------------------------------------------
ParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 26
Someone please help me with this, I don't know where I am making a mistake.
Can try this!
import os
# Read in a plain text file
with open(os.path.join("c:user/xxx/xx", "xxx.txt"), "r") as f:
text = f.read()
print(text)
I'm trying to load a dataset with breaks in it. I am trying to find an intelligent way to make this work. I got started on it with the code i included.
As you can see, the data within the file posted on the public FTP site starts at line 11, ends at line 23818, then starts at again at 23823, and ends at 45,630.
import pandas as pd
import numpy as np
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
url = urlopen("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/10_Portfolios_Prior_12_2_Daily_CSV.zip")
#Download Zipfile and create pandas DataFrame
zipfile = ZipFile(BytesIO(url.read()))
df = pd.read_csv(zipfile.open('10_Portfolios_Prior_12_2_Daily.CSV'), header = 0,
names = ['asof_dt','1','2','3','4','5','6','7','8','9','10'], skiprows=10).dropna()
df['asof_dt'] = pd.to_datetime(df['asof_dt'], format = "%Y%m%d")
I would ideally like the first set to have a version number "1", the second to have "2", etc.
Any help would be greatly appreciated. Thank you.