Can't read my csv file in jupyter - kubernetes - python

I recenlty deployed jupyter into kubernetes, and now i want to read my data and make the data cleaning, while i'm runing :
data = pd.read_csv("home/ghofrane21/data/Les indices des prix.csv", header=None)
I get this error :
File Not Found Error [Errno2] File home/ghofrane21/data/Les indices des prix.csv does not exist
The file already exist :

Related

why "bson" librarye dosen't work in jupyter notebook

I'm trying to read a bson file this is my code:
import bson
with open("D:/rl env/chat_log.bson",'rb') as f:
datas = bson.decode_all(f.read())
note that "D:/rl env/chat_log.bson" is my file path.
i got below error:
AttributeError: module 'bson' has no attribute 'decode_all'
I must mention that I didn't get any error when I ran this code in google colab.
Have you tried to use the loads method
with open("D:/rl env/chat_log.bson",'r') as f:
datas = bson.loads(f.read())

253006: File doesn't exist: ['/Users/oscar/Desktop/data.txt']

Receiving intermittent snowflake python connector error while trying to load file onto table.
Gives error in following code:
exe.execute("""PUT 'file:///Users/oscar/Desktop/data.txt'
'#"db"."schema".%"table"/ui4654116544'""")

Loading own data for remote training in microsoft azure machine learning

So I have been trying to use azure machine learning for faster model training.
I am submitting a training .py file, and within that training file I access my training data, however I am getting error messages regarding that.
I have tried the following code
subscription_id = 'my_id'
resource_group = 'my_resource_group'
workspace_name = 'my_workspace'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='my-dataset')
with dataset.mount() as mount_context:
print(os.listdir(mount_context.mount_point))
data = np.load('my-data.npy')
But I am getting the error and training failure with the following output logs.
File "train.py", line 29, in <module>
data = np.load('my-data.npy')
File "/azureml-envs/azureml_167f4dd4c85f61389bb53e00383dafbe/lib/python3.6/site-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'my-data.npy'
I assume I am incorrectly mounting my dataset on the remote machine, however I am unsure what the correct way to mount it, or submit a training job is?
Did the print statement returns the list of directory correctly?
Here is a sample notebook that shows how to load data in training: https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/work-with-data/datasets-tutorial/scriptrun-with-data-input-output

Parsing Google Fonts METADATA.pb - google.protobuf.message.DecodeError: Error parsing message

I'm trying to parse a METADATA.pb file from the official Google Fonts repo which can be found here: https://github.com/google/fonts (Example METADATA.pb file for the Roboto font: https://github.com/google/fonts/blob/master/apache/roboto/METADATA.pb)
To parse proto-buf files, the right format is required. It can be downloaded as "public_fonts.proto" here: https://github.com/googlefonts/gftools/blob/master/Lib/gftools/fonts_public.proto
I used it to generate a Python code file called "fonts_public_pb2.py" with this command:
protoc -I=. --python_out=. fonts_public.proto
And here is my code which imports this generated file, reads the content of a METADATA.pb file (shouldn't matter which one, they all follow the same structure) and then tries to parse the proto-buf string.
#! /usr/bin/env python
import fonts_public_pb2
protobuf_file_path = 'METADATA.pb'
protobuf_file = open(protobuf_file_path, 'rb')
protobuf = protobuf_file.read()
font_family = fonts_public_pb2.FamilyProto()
font_family.ParseFromString(protobuf)
Just a few lines, nothing too complicated, but the output is always the same:
Traceback (most recent call last):
File "parse.py", line 22, in <module>
font_family.ParseFromString(protobuf)
google.protobuf.message.DecodeError: Error parsing message
I don't usually code in Python, so the issue here might very well be me, but after trying a few different things I don't know what to do anymore:
Used the already generated "fonts_public_pb2.py" file from the gftools repo: https://github.com/googlefonts/gftools/blob/master/Lib/gftools/fonts_public_pb2.py - My generated output from the "public_fonts.proto" file and this file are nearly identical, I check with Meld. The error was still the same
Set all "required" fields in the .proto file to "optional", Generated the "fonts_public_pb2.py" file again - Same error
Tried Python 2 and 3 - Same error
Those METADATA.pb files are not binary protobuf files, they use the text format.
import fonts_public_pb2
from google.protobuf import text_format
protobuf_file_path = 'METADATA.pb'
protobuf_file = open(protobuf_file_path, 'r')
protobuf = protobuf_file.read()
font_family = fonts_public_pb2.FamilyProto()
text_format.Merge(protobuf, font_family)
print(font_family)

Scikit-learn joblib/zlib error

I ran into the following error when trying to load a .pkl file using joblib.
self.name = joblib.load(
os.path.join(BASE_DIR, '../relative/directory/%s.pkl' % name)
)
The above code yields the following error.
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/numpy_pickle.py", line 68, in read_zfile
data = zlib.decompress(file_handle.read(), 15, length)
zlib.error: Error -3 while decompressing data: invalid distance too far back
Has anyone run into this issue before?
It means that the compressed data was corrupted somewhere along the way.

Categories