automodel.from

编程入门 行业动态 更新时间:2024-10-12 12:32:40

<a href=https://www.elefans.com/category/hyzx/1661024904e60145c13a5f67fea8ca98.html style=automodel.from"/>

automodel.from

huggingface.co 链接报错

Automodel.from_pretrained()和AAutoTokenizer.from_pretrained()同理

第一次缓存模型到指定文件夹

from transformers import AutoTokenizer, AutoModelsen_trans_pretrained_path = os.path.join(PWD, "pretrained_w", "sentence-transformers")
model_name = 'sentence-transformers/all-MiniLM-L6-v2'tokenizer = AutoTokenizer.from_pretrained(model_name,cache_dir=os.path.join(sen_trans_pretrained_path, "tokenizer"))model = AutoModel.from_pretrained(model_name,cache_dir=os.path.join(sen_trans_pretrained_path, "model"))

cache_dir 指定要保存的本地目录,方便后面使用

报错后使用本地缓存model和tokenizer

from transformers import AutoTokenizer, AutoModel
sen_trans_pretrained_path = os.path.join(PWD, "pretrained_w", "sentence-transformers")
tail = 'models--sentence-transformers--all-MiniLM-L6-v2/snapshots/7dbbc90392e2f80f3d3c277d6e90027e55de9125'
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = os.path.join(sen_trans_pretrained_path, "tokenizer",tail))model = AutoModel.from_pretrained(pretrained_model_name_or_path = os.path.join(sen_trans_pretrained_path, "model", tail))

pretrained_model_name_or_path 为之前缓存的模型和tokenizer的最里层目录
最里层目录指的是:

  • 对于tokenizer 要到词表和config这一层:
  • 对于model 要到bin这一层:

另一个例子

缓存
from sentence_transformers import SentenceTransformer
sen2vec = SentenceTransformer('paraphrase-MiniLM-L6-v2',cache_folder=os.getcwd()+"/flaskr/pretrained_ST")

本地调用
from sentence_transformers import SentenceTransformer
sen2vec = SentenceTransformer(os.getcwd()+"/flaskr/pretrained_ST/sentence-transformers_paraphrase-MiniLM-L6-v2")

参考

AutoTokenizer
AutoModel
SSLError: HTTPSConnectionPool(host=‘huggingface.co’, port=443)
class AutoTokenizer: 源码

 @classmethoddef from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs):r""" Instantiate one of the tokenizer classes of the libraryfrom a pre-trained model vocabulary.The tokenizer class to instantiate is selectedbased on the `model_type` property of the config object, or when it's missing,falling back to using pattern matching on the `pretrained_model_name_or_path` string:- `t5`: T5Tokenizer (T5 model)- `distilbert`: DistilBertTokenizer (DistilBert model)- `albert`: AlbertTokenizer (ALBERT model)- `camembert`: CamembertTokenizer (CamemBERT model)- `xlm-roberta`: XLMRobertaTokenizer (XLM-RoBERTa model)- `longformer`: LongformerTokenizer (AllenAI Longformer model)- `roberta`: RobertaTokenizer (RoBERTa model)- `bert-base-japanese`: BertJapaneseTokenizer (Bert model)- `bert`: BertTokenizer (Bert model)- `openai-gpt`: OpenAIGPTTokenizer (OpenAI GPT model)- `gpt2`: GPT2Tokenizer (OpenAI GPT-2 model)- `transfo-xl`: TransfoXLTokenizer (Transformer-XL model)- `xlnet`: XLNetTokenizer (XLNet model)- `xlm`: XLMTokenizer (XLM model)- `ctrl`: CTRLTokenizer (Salesforce CTRL model)- `electra`: ElectraTokenizer (Google ELECTRA model)Params:pretrained_model_name_or_path: either:- a string with the `shortcut name` of a predefined tokenizer to load from cache or download, e.g.: ``bert-base-uncased``.- a string with the `identifier name` of a predefined tokenizer that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.- a path to a `directory` containing vocabulary files required by the tokenizer, for instance saved using the :func:`~transformers.PreTrainedTokenizer.save_pretrained` method, e.g.: ``./my_model_directory/``.- (not applicable to all derived classes) a path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (e.g. Bert, XLNet), e.g.: ``./my_model_directory/vocab.txt``.cache_dir: (`optional`) string:Path to a directory in which a downloaded predefined tokenizer vocabulary files should be cached if the standard cache should not be used.force_download: (`optional`) boolean, default False:Force to (re-)download the vocabulary files and override the cached versions if they exists.resume_download: (`optional`) boolean, default False:Do not delete incompletely recieved file. Attempt to resume the download if such a file exists.proxies: (`optional`) dict, default None:A dictionary of proxy servers to use by protocol or endpoint, e.g.: {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.The proxies are used on each request.use_fast: (`optional`) boolean, default False:Indicate if transformers should try to load the fast version of the tokenizer (True) or use the Python one (False).inputs: (`optional`) positional arguments: will be passed to the Tokenizer ``__init__`` method.kwargs: (`optional`) keyword arguments: will be passed to the Tokenizer ``__init__`` method. Can be used to set special tokens like ``bos_token``, ``eos_token``, ``unk_token``, ``sep_token``, ``pad_token``, ``cls_token``, ``mask_token``, ``additional_special_tokens``. See parameters in the doc string of :class:`~transformers.PreTrainedTokenizer` for details.Examples::# Download vocabulary from S3 and cache.tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')# Download vocabulary from S3 (user-uploaded) and cache.tokenizer = AutoTokenizer.from_pretrained('dbmdz/bert-base-german-cased')# If vocabulary files are in a directory (e.g. tokenizer was saved using `save_pretrained('./test/saved_model/')`)tokenizer = AutoTokenizer.from_pretrained('./test/bert_saved_model/')

更多推荐

automodel.from

本文发布于:2024-03-14 02:11:57,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1735397.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:automodel

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!