LangChain: Run Language Models Locally – Hugging Face Models
AI技術の魅力を伝える記事を書くライターとして、Hugging Faceの大規模言語モデルをローカルで使用する方法や、Hugging Face Hub APIを使用した同じモデルの活用方法などについて紹介します。また、エンコーダーデコーダーやデコーダーモデル(テキスト生成モデル)についても探求します。
Hugging Faceの大規模言語モデルを使って素晴らしい世界を探索してみましょう。
この記事では、LangChainプラットフォームを使用してHugging Faceの大規模言語モデルをローカルで利用する方法や、Hugging Face Hub APIを使用した同じモデルの活用方法について詳しく紹介します。さらに、エンコーダーデコーダーやデコーダーモデル(テキスト2テキスト生成およびテキスト生成モデル)についても掘り下げます。
興味深いビデオや関連リンクもご紹介します。ぜひ一緒に大規模言語モデルの驚異的な世界を探検しましょう。
※上記は参考として挙げたポイントですが、実際の内容はより詳細かつ魅力的なものとして執筆されることが望まれます。
Want to connect?
💼Consulting: https://calendly.com/engineerprompt/consulting-call
🦾 Discord: https://discord.com/invite/t4eYQRUcXB
☕ Buy me a Coffee: https://ko-fi.com/promptengineering
|🔴 Join Patreon: Patreon.com/PromptEngineering
▶ Subscribe: https://www.youtube.com/@engineerprompt?sub_confirmation=1
Brother can we have a call ? i need your help to make a tool .
and how can I use big models from huggingface ? I can't load them into memory because many of them are bigger than 15gb, some of them are 130gb+ . Any thoughts?
Gave you the thousandth like
This is the exact info I was searching for
Thank you man
hi. please help me. how to create custom model from many pdfs in Persian language? tank you.
hi. please help me. how to create custom model from many pdfs in Persian language? tank you.
Thanks buddy ❤
why do all these tutorials use jupyter notebooks, i get so lost in that stuff… just show me the damn code
I want to run models locally…
On my PC, that is my definition of locally, no API, have the model binary on my PC
If one llm (e.g. THUDM/chatglm2-6b) does not have the tags for both 'text2text-generation' and 'text-generation', how can we use it in LangChain?
I've found it near impossible to find info on memory requirements for using any model. If I want to load in a model out of the box locally (for example the flan-t5 model in your video), how can you determine this given the parameter size of the model, assuming no quantization, full-fine-tuning, and inference? Also what is actually getting loaded into memory as soon as you load in the model?
Zoom in can't see clearly
how can I use a summarization model here ?
What's the difference between Text Generation vs Text-2-Text Generation?
Excellent video!
Can you create video for download LLM from huggingface and Run models without api key and offline
Thank you so much for the well-structured video and accompanying google collab! Other YouTubers often assume the viewer is experienced, but you are patient enough to explain the basic terms and ideas.
How to get answers from local pdf files using these huggingface models?
Well explained bro. Thanks
i have a question that can we use pipeline and tokenizer for the downloaded model instead of downloading from hugging face
Can we call model through API and fine tune for our purpose?
Pardon me for being ignorant – But this doesn't look very local to me, it looks like you're running it in some kind of google application, which is the opposite of local
ERROR: Could not find a version that satisfies the requirement InstrcutorEmbedding (from versions: none)
this code gives me unfinished answer everytime as if there is token limit for output:
Area 51 is a United States military base in Nevada. Area 51 is known for its secretive
Should I change something in the code?
Running the model using Google colab GPU is taking too much time which leads to connection timeout. Is it because of the free APIs?
I have spent the last 3 days trying to learn all this through the langchain documentation. You made everything so much simple and clearer to understand. Thank you so much for your work! I unfortunately have failed multiple times to run StableLM 3b locally in google colab due to it crashing the session (RAM shortage). I've watched your other video about 8 bit quantization and have tried it, yet it still crashes the session. I've found useful articles about instantiating large models in huggingface but I can't quite understand what I'm reading. Any ideas on what I should try?
I was looking for a video that shows "how to use hugging face models locally" for a long time and finally find it thanks so much, bro
This is great. I liked how you explained each terms and each line of code. However, it would be nice if you could just point me some details on how i would be able to run this on vs code?
Should I simply just copy paste each line in vs code? I think that it won't work. We will need to pass the path of the models and may be other stuff that i don't know.
Please reply. This is important for me
I've tried the first approach and over 4minutes of response, the api reported "out of time". I tried through virtual environments, docker python image installing the proper ROC for the AMD card, but no results 🙁 I suppose it is the use of the AMD card and their incompatibilities with Pytorch
Has anyone try to run this as chatbot because it doesn’t work and only work for hub
I got the following error while giving it a try on kaggle :
—————————————————————————
TypeError Traceback (most recent call last)
Cell In[2], line 8
6 model_id = 'google/flan-t5-small'
7 tokenizer = AutoTokenizer.from_pretrained(model_id)
—-> 8 model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto')
10 pipeline = pipeline(
11 "text2text-generation",
12 model=model,
13 tokenizer=tokenizer,
14 max_length=128
15 )
17 local_llm = HuggingFacePipeline(pipeline=pipeline)
File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:471, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
469 elif type(config) in cls._model_mapping.keys():
470 model_class = _get_model_class(config, cls._model_mapping)
–> 471 return model_class.from_pretrained(
472 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
473 )
474 raise ValueError(
475 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.n"
476 f"Model type should be one of {', '.join(c._name_ for c in cls._model_mapping.keys())}."
477 )
File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:2846, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2844 # Dispatch model with hooks on all devices if necessary
2845 if device_map is not None:
-> 2846 dispatch_model(model, device_map=device_map, offload_dir=offload_folder, offload_index=offload_index)
2848 if output_loading_info:
2849 if loading_info is None:
TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'
Great video. Sometimes it takes forever to get a response.
If you liked this video, we will love to use LangChain for Talking to your own data: Watch this: https://youtu.be/TLf90ipMzfE
With which open source structures is an artificial intelligence created to run text to speech and speech to text in a call center style in an institution? 52 x 8gb rx570 graphics cards, which are currently idle as Ethereum rig, are considered to be used in this business? Which open source builds do you think would be appropriate? especially inbound calls for support are aimed. Or survey calls.
when i run this step : print(llm_chain.run(question))
google collab can't execute this line, how long is it taking for you?
I want to query my own library of PDFs, without sending anything to OpenAI et al. Will you have a video for that soon? (please!)
There are lots of examples of loading own content which focus on 'prompt stuffing' which presumably does not scale well, whereas I have thousands of PDFs to 'load', so I really need a different solution. Your insights would be greatly appreciated, thank you!
hey thanks for detailed infomation, Can you also make a video on how to use this approach for the custom data?
This isn't local but thanks for the info
What would be the best "base" LLM model in portuguese language for using LangChain and creating a question answering bot plugged to local documents? Thanks!
FYI your video thumbnail has a typo, spelling locally as locallay.
What is your Linkedin?
My OpenAI API key has expired. Does that mean I can't use Lang Chain to build apps?
what tokenizer and automodel to give for vicuna? also, how to give it a model already downloaded in a directory?
Great and informative video again! one bit to add is if you develop your chatbot, and doing vector search, decoder-encoder LLM models performs better, and when you generate human-like responses decode-only llm models are suitable more for that
Thank you for providing and sharing a simple workflow, using self and cloud-hosted options. This is pure gold.
this isn't working for me. I even created the API key. I just runs into the first question. Thereafter it stops working
I found the issue: just google = Cannot run large models using API token
The problem here is you use the word locally which can be connected to the word off-line. If I can run something locally I would want to be able to run it off-line as well. Your solution here requires an online connection to that other service. Effectively you’ve moved one online moment to a different one. I’m only looking for off-line local chat like Oogabooga.
How to create and train your own model, based on the rules of a business, and use it as explained in the video? Excellent content! I thank you!
For the local version of this models it seems you're still using the hugging face Id. May you please explain how to download and what exactly do we need to download in order to run these locally without invoking external APIs?