tokenizer.push_to_hub(repo_name) is not working

tokenizer.push_to_hub(repo_name) is not working - python

I'm trying to puch my tokonizer to my huggingface repo...
it consist of the model vocab.Json (I'm making a speech recognition model)
My code:
vocab_dict["|"] = vocab_dict[" "]
del vocab_dict[" "]
vocab_dict["[UNK]"] = len(vocab_dict)
vocab_dict["[PAD]"] = len(vocab_dict)
len(vocab_dict)
import json
with open('vocab.json', 'w') as vocab_file:
json.dump(vocab_dict, vocab_file)
from transformers import Wav2Vec2CTCTokenizer
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained("./", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
from huggingface_hub import login
login('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')
repo_name = "Foxasdf/ArabicTextToSpeech"
add_to_git_credential=True
tokenizer.push_to_hub(repo_name)
the tokenizer.push_to_hub(repo_name) is giving me this error:
TypeError: create_repo() got an unexpected keyword argument 'organization'
I have logged in my huggingface account using
from huggingface_hub import notebook_login
notebook_login()
but the error is still the same..
here's a link of the my collab notebook you can see the full code there and the error: https://colab.research.google.com/drive/11tkQ85SfaT6U_1PXDNwk0Q6qogw2r2sw?hl=ar&hl=en&authuser=0#scrollTo=WkbZ_Wcidq8Z

I have the same problem. It is somehow associated with version of transformers - I have 4.6. When I change environment to the one with 4.11.3 transformers version the problem is that code tries to clone repository which I am going to yet create and there is an error " Remote repository not found ... "
Checked more and it looks like issue with version with huggingface_hub library - when it is downgraded to 0.10.1 it should work.

Related

Python Clarifai program stopped working inspite of no changes made

I have the following Python code to label images using Clarifai. It was a working code and had been usable for the past 6-8 months. However, for the last few days, I have been getting the error mentioned below. Note that I have not made any changes to the working version of the code for the error to creep in.
#python program to analyse an image and label it
'''
Dependencies:
pip install flask
pip install clarifai-grpc
pip install logging
'''
import json
from flask import Flask, render_template, request
import logging
import os
from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_pb2, status_code_pb2
channel = ClarifaiChannel.get_json_channel()
stub = service_pb2_grpc.V2Stub(channel)
# This will be used by every Clarifai endpoint call.
# The word 'Key' is required to precede the authentication key.
metadata = (('authorization', 'Key API_KEY_HERE'),)
webapp = Flask(__name__) #creating a web application for the current python file
#decorators
#webapp.route('/')
def index():
return render_template('index.html', len=0)
#webapp.route('/' , methods = ['POST'])
def search():
if request.form['searchByURL']:
url = request.form['searchByURL']
my_request = service_pb2.PostModelOutputsRequest(
# This is the model ID of a publicly available General model.
#You may use any other public or custom model ID.
model_id='aaa03c23b3724a16a56b629203edc62c',
inputs=[
resources_pb2.Input(data=resources_pb2.Data(image=resources_pb2.Image(url=url)))
])
response = stub.PostModelOutputs(my_request, metadata=metadata)
if response.status.code != status_code_pb2.SUCCESS:
message = ["You have reached the limit for today!"]
return render_template('/index.html' , len = 1, searchResults = message)
concepts = []
for concept in response.outputs[0].data.concepts:
concepts.append(concept.name)
concepts = concepts[0:10]
return render_template('/index.html' , len = len(concepts), searchResults = concepts )
elif request.files['searchByImage']:
file = request.files['searchByImage']
file.save(file.filename)
#IMAGE INPUT:
with open(file.filename, "rb") as f:
file_bytes = f.read()
post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
model_id="aaa03c23b3724a16a56b629203edc62c",
version_id="aa7f35c01e0642fda5cf400f543e7c40", # This is optional. Defaults to the latest model version.
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(
image=resources_pb2.Image(
base64=file_bytes
)
)
)
]
),
metadata=metadata
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
message = ["You have reached the limit for today!"]
return render_template('/index.html' , len = 1, searchResults = message)
# Since we have one input, one output will exist here.
output = post_model_outputs_response.outputs[0]
os.remove(file.filename)
concepts = []
#Predicted concepts:
for concept in output.data.concepts:
concepts.append(concept.name)
concepts = concepts[0:10]
return render_template('/index.html' , len = len(concepts), searchResults = concepts )
else:
return render_template('/index.html' , len = 1, searchResults = ["No Image entered!"] )
#run the server
if __name__ == "__main__":
logging.basicConfig(filename = 'error.log' , level = logging.DEBUG, )
webapp.run(debug=True)
Error:
Exception has occurred: ImportError
dlopen(/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-darwin.so, 0x0002): tried: '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))
File "/Users/eshaangupta/Desktop/Python-Level-4/Image Analyser.py", line 15, in <module>
from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel

It looks like you are trying to run this on a different architecture than you've been in the past. You've been running on x86 (looks like likely MacOS) and now have moved to an arm architecture. I'm guessing you've upgraded to an M1 chip macbook, although maybe you've moved over to a different ARM based chip.
(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))
This is a problem with a file in grpc - specifically the library file cygrpc.cpython-310-darwin.so. I'd recommend removing gRPC and re-installing and see if that helps to resolve it.
Something like this might work:
python -m pip install --force-reinstall grpcio
(This is assuming python points to python3.10 in your system).
although I'm not sure how you've installed it so that is just a guess.

I can't create the body of ResNet18 with fastai

I'm trying to build the body of the ResNet18 in this code:
from fastai.vision.data import create_body
from fastai.vision import models
from torchvision.models.resnet import resnet18
from fastai.vision.models.unet import DynamicUnet
import torch
def build_res_unet(n_input=1, n_output=2, size=256):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
body = create_body(resnet18, n_in = n_input, pretrained=True, cut=-2)
net_G = DynamicUnet(body, n_output, (size, size)).to(device)
return net_G
net_G = build_res_unet(n_input=1, n_output=2, size=256)
but I keep getting an error:
TypeError: create_body() got an unexpected keyword argument 'n_in'
but in the fastai docs the n_in parameter is present.
How can I create the body, am I missing something?

I tested the code on my local machine and it runs perfectly, maybe there are some problems on Google Colab! I will update this answer if I found a way to make it run on colab
EDIT: I solved the problem by adding !pip install fastai==2.4 on google colab, the version used by colab was very old

Error accessing Youtube Data API using Python

I am following the tutorial found here https://www.geeksforgeeks.org/youtube-data-api-set-1/. After I run the below code, I am getting a "No module named 'apiclient'" error. I also tried using "from googleapiclient import discovery" but that gave an error as well. Does anyone have alternatives I can try out?
I have already imported pip install --upgrade google-api-python-client
Would appreciate any help/suggestions!
Here is the code:
from apiclient.discovery import build
# Arguments that need to passed to the build function
DEVELOPER_KEY = "your_API_Key"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# creating Youtube Resource Object
youtube_object = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey = DEVELOPER_KEY)
def youtube_search_keyword(query, max_results):
# calling the search.list method to
# retrieve youtube search results
search_keyword = youtube_object.search().list(q = query, part = "id, snippet",
maxResults = max_results).execute()
# extracting the results from search response
results = search_keyword.get("items", [])
# empty list to store video,
# channel, playlist metadata
videos = []
playlists = []
channels = []
# extracting required info from each result object
for result in results:
# video result object
if result['id']['kind'] == "youtube# video":
videos.append("% s (% s) (% s) (% s)" % (result["snippet"]["title"],
result["id"]["videoId"], result['snippet']['description'],
result['snippet']['thumbnails']['default']['url']))
# playlist result object
elif result['id']['kind'] == "youtube# playlist":
playlists.append("% s (% s) (% s) (% s)" % (result["snippet"]["title"],
result["id"]["playlistId"],
result['snippet']['description'],
result['snippet']['thumbnails']['default']['url']))
# channel result object
elif result['id']['kind'] == "youtube# channel":
channels.append("% s (% s) (% s) (% s)" % (result["snippet"]["title"],
result["id"]["channelId"],
result['snippet']['description'],
result['snippet']['thumbnails']['default']['url']))
print("Videos:\n", "\n".join(videos), "\n")
print("Channels:\n", "\n".join(channels), "\n")
print("Playlists:\n", "\n".join(playlists), "\n")
if __name__ == "__main__":
youtube_search_keyword('Geeksforgeeks', max_results = 10)

With this information it's hard to say what is the problem. But sometimes I've been banging my head to wall when installing something with pip (Python2) and then trying to import module in Python3 or vice versa.
So if you are running your script with Python3, try install package by using pip3 install --upgrade google-api-python-client

Try the YouTube docs here:
https://developers.google.com/youtube/v3/code_samples
They worked for me on a recently updated Slackware_64 14.2
I use them with Python 3.8. Since there may also be a version 2 of Python installed, I make sure to use this in the Interpreter line:
!/usr/bin/python3.8
Likewise with pip, I use pip3.8 to install dependencies
I installed Python from source. python3.8 --version Python 3.8.2
You can also look at this video here:
https://www.youtube.com/watch?v=qDWtB2q_09g
It sort of explains how to use YouTube's API explorer. You can copy code samples directly from there. The video above covers Android, but the same concept applies to Python regarding using YouTube's API Explorer.
I concur with the previous answer regarding version control.

Fastai - failed initiation of language model in Sentence Piece Processor, cache_dir parameter

I've been already browsing web for hours to find a solution for my, which i believe so might be a pretty petty issue.
I'm using fastai's Sentence Piece Processor (SPProcesor) at the very first steps of initiation of a language model.
My code for these steps looks like this:
bs = 48
processor = SPProcessor(lang='pl')
data_lm = (TextList.from_csv('', target_corpus, processor=processor)
.split_by_rand_pct(0.1)
.label_for_lm()
.databunch(bs=bs)
)
data_lm.save(data_lm_file)
After execution i get an error which is as follows:
~/x/miniconda3/envs/fastai/lib/python3.6/site-packages/fastai/text/data.py in process(self, ds)
466 self.sp_model,self.sp_vocab = cache_dir/'spm.model',cache_dir/'spm.vocab'
467 if not getattr(self, 'vocab', False):
--> 468 with open(self.sp_vocab, 'r', encoding=self.enc) as f: self.vocab = Vocab([line.split('\t')[0] for line in f.readlines()])
469 if self.n_cpus <= 1: ds.items = self._encode_batch(ds.items)
470 else:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/spm/spm.vocab'
The proper outcome of the code executed above should be as following:
created folder named 'tmp', containing folder 'spm', within which should be placed 2 files named respectively: spm.vocab and spm.model.
What happens instead is that 'tmp' folder is created along with files named "cache_dir".vocab and "cache_dir".model inside my current directory.
Folder 'spm' is nowhere to be found.
I've found a sort of workaround solution.
It consists of manually creating a 'spm' folder inside 'tmp' and moving those 2 other mentioned above files into it, and changing their names to spm.vocab and spm.model.
That allows me to carry on with my processing yet I'd like to find a way to skip that neccessity of manually moving created files and else.
Maybe I need to pass some paramateres (probably cache_dir) with specific values before processing?
If you'd have any idea on how to solve that issue, please point me those.
I'd be grateful.

I can see similar error if I switch the code in fastai/text/data.py to an earlier version of this commit. Then, if I apply changes from the same commit it all works nicely. Now, the most recent version of the same file (the one which supposed to help with paths with spaces) seems to have yet another bug introduced there.
So pretty much it seems that the problem is that fastai is trying to give argument --model_prefix with quotes to the sentencepiece .SentencePieceTrainer.Train which makes it "misbehave".
One possibility for you would be to either (1) update to the later version of fastai (which might not help due to another bug in a newer version), or (2) manually apply changes from here to your installation's fastai/text/data.py. It's a very small change - just delete the line:
cache_dir = cache_dir/'spm'
and replace
f'--model_prefix="cache_dir" --vocab_size={vocab_sz} --model_type={model_type}']))
with the
f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
In case you are not comfortable with updating the code of the installation you can monkey-patch the module by substituting existing train_sentencepiece function by writing fixed version in your code and then doing something like fastai.text.data.train_sentencepiece = my_fixed_train_sentencepiece before other calls.
So if you are using newer version of the library the code might look like this:
import fastai
from fastai.core import PathOrStr
from fastai.text.data import ListRules, get_default_size, quotemark, full_char_coverage_langs
from typing import Collection
def train_sentencepiece(texts:Collection[str], path:PathOrStr, pre_rules: ListRules=None, post_rules:ListRules=None,
vocab_sz:int=None, max_vocab_sz:int=30000, model_type:str='unigram', max_sentence_len:int=20480, lang='en',
char_coverage=None, tmp_dir='tmp', enc='utf8'):
"Train a sentencepiece tokenizer on `texts` and save it in `path/tmp_dir`"
from sentencepiece import SentencePieceTrainer
cache_dir = Path(path)/tmp_dir
os.makedirs(cache_dir, exist_ok=True)
if vocab_sz is None: vocab_sz=get_default_size(texts, max_vocab_sz)
raw_text_path = cache_dir / 'all_text.out'
with open(raw_text_path, 'w', encoding=enc) as f: f.write("\n".join(texts))
spec_tokens = ['\u2581'+s for s in defaults.text_spec_tok]
SentencePieceTrainer.Train(" ".join([
f"--input={quotemark}{raw_text_path}{quotemark} --max_sentence_length={max_sentence_len}",
f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
f"--user_defined_symbols={','.join(spec_tokens)}",
f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
raw_text_path.unlink()
return cache_dir
fastai.text.data.train_sentencepiece = train_sentencepiece
And if you are using older version, then like the following:
import fastai
from fastai.core import PathOrStr
from fastai.text.data import ListRules, get_default_size, full_char_coverage_langs
from typing import Collection
def train_sentencepiece(texts:Collection[str], path:PathOrStr, pre_rules: ListRules=None, post_rules:ListRules=None,
vocab_sz:int=None, max_vocab_sz:int=30000, model_type:str='unigram', max_sentence_len:int=20480, lang='en',
char_coverage=None, tmp_dir='tmp', enc='utf8'):
"Train a sentencepiece tokenizer on `texts` and save it in `path/tmp_dir`"
from sentencepiece import SentencePieceTrainer
cache_dir = Path(path)/tmp_dir
os.makedirs(cache_dir, exist_ok=True)
if vocab_sz is None: vocab_sz=get_default_size(texts, max_vocab_sz)
raw_text_path = cache_dir / 'all_text.out'
with open(raw_text_path, 'w', encoding=enc) as f: f.write("\n".join(texts))
spec_tokens = ['\u2581'+s for s in defaults.text_spec_tok]
SentencePieceTrainer.Train(" ".join([
f"--input={raw_text_path} --max_sentence_length={max_sentence_len}",
f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
f"--user_defined_symbols={','.join(spec_tokens)}",
f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
raw_text_path.unlink()
return cache_dir
fastai.text.data.train_sentencepiece = train_sentencepiece

ValueError while deploying tensorflow model to Amazon SageMaker

I want to deploy my trained tensorflow model to the amazon sagemaker, I am following the official guide here: https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ to deploy my model using jupyter notebook.
But when I try to use code:
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.t2.medium')
It gives me the following error message:
ValueError: Error hosting endpoint sagemaker-tensorflow-2019-08-07-22-57-59-547: Failed Reason: The image '520713654638.dkr.ecr.us-west-1.amazonaws.com/sagemaker-tensorflow:1.12-cpu-py3 ' does not exist.
I think the tutorial does not tell me to create an image, and I do not know what to do.
import boto3, re
from sagemaker import get_execution_role
role = get_execution_role()
# make a tar ball of the model data files
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
archive.add('export', recursive=True)
# create a new s3 bucket and upload the tarball to it
import sagemaker
sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
role = role,
framework_version = '1.12',
entry_point = 'train.py',
py_version='py3')
%%time
#here I fail to deploy the model and get the error message
predictor = sagemaker_model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge')

https://github.com/aws/sagemaker-python-sdk/issues/912#issuecomment-510226311
As mentioned in the issue
Python 3 isn't supported using the TensorFlowModel object, as the container uses the TensorFlow serving api library in conjunction with the GRPC client to handle making inferences, however the TensorFlow serving api isn't supported in Python 3 officially, so there are only Python 2 versions of the containers when using the TensorFlowModel object.
If you need Python 3 then you will need to use the Model object defined in #2 above. The inference script format will change if you need to handle pre and post processing. https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

tokenizer.push_to_hub(repo_name) is not working - python

Related

Python Clarifai program stopped working inspite of no changes made

I can't create the body of ResNet18 with fastai

Error accessing Youtube Data API using Python

Fastai - failed initiation of language model in Sentence Piece Processor, cache_dir parameter

ValueError while deploying tensorflow model to Amazon SageMaker

Categories

Resources