vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.9k
Star 20.8k

Code
Issues 871
Pull requests 274
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 30

Virtual Office Hours: Jun 5 and Jun 20

#4919 opened May 20, 2024 by robertgshaw2-neuralmagic

Open 2

v0.5.0 Release Tracker

#5224 opened Jun 3, 2024 by simon-mo

Open 6

Labels 42 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

871 Open 2,109 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: vllm deployment of GLM-4V reports KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight' bug

Something isn't working

#5417 opened Jun 11, 2024 by zhaobu

[Usage]: How do you specify a specific branch on huggingface to use when downloading a model? good first issue

Good for newcomers

usage

How to use vllm

#5415 opened Jun 11, 2024 by fake-name

[Performance]: Qwen2-72B-Instruction-GPTQ-Int4 Openai Server Request Problem performance

Performance-related issues

#5407 opened Jun 11, 2024 by syngokhan

hidden-states from final (or middle layers) feature request

#5406 opened Jun 11, 2024 by janphilippfranken

[Bug]:The vllm service takes two hours to start Because of NCCL bug

Something isn't working

#5405 opened Jun 11, 2024 by zhaotyer

[Bug]: topk=1 and temperature=0 cause different output in vllm bug

Something isn't working

#5404 opened Jun 11, 2024 by rangehow

[Bug]: EngineArgs missing value type for lora_dtype bug

Something isn't working

#5397 opened Jun 10, 2024 by c3-ali

[RFC]: OpenVINO vLLM backend RFC

#5377 opened Jun 10, 2024 by ilya-lavrenov

0.4.3 error CUDA error: an illegal memory access was encountered bug

Something isn't working

#5376 opened Jun 10, 2024 by maxin9966

[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered bug

Something isn't working

#5371 opened Jun 10, 2024 by gaye746560359

[Bug]: 8 GPU setup - vLLM can only start with --tensor-parallel-size=2 but not 4 or 8 bug

Something isn't working

#5370 opened Jun 10, 2024 by elabz

[Bug]: load nvidia/Llama3-ChatQA-1.5-8B model 15 min bug

Something isn't working

#5365 opened Jun 9, 2024 by JJplane

[Bug]: Falcon fails if trust_remote_code=True bug

Something isn't working

#5363 opened Jun 9, 2024 by robertgshaw2-neuralmagic

[Bug]: Multi GPU setup for VLLM in Openshift still does not work bug

Something isn't working

#5360 opened Jun 9, 2024 by jayteaftw

[Bug]: TorchSDPAMetadata is out of date bug

Something isn't working

#5351 opened Jun 7, 2024 by Reichenbachian

[RFC]: Refactor MoE RFC

#5346 opened Jun 7, 2024 by robertgshaw2-neuralmagic

[Bug]: with --enable-prefix-caching , /completions crashes server with echo=True above certain prompt length bug

Something isn't working

#5344 opened Jun 7, 2024 by hibukipanim

[Speculative decoding]: The content generated by speculative decoding is inconsistent with the content generated by : When I use the speculative mode and prompt_length+output_length > 2048, the error occurs bug

Something isn't working

#5342 opened Jun 7, 2024 by zhangxy1234

[Performance]: [Automatic Prefix Caching] When hitting the KV cached blocks, the first execute is slow, and then is fast. performance

Performance-related issues

#5339 opened Jun 7, 2024 by soacker

[Usage]: Howto quiet the terminal 'Info' outputs in vllm usage

How to use vllm

#5338 opened Jun 7, 2024 by rohitnanda1443

[Bug]: Getting an empty string ('') for every call on fine-tuned Code-Llama-7b-hf model bug

Something isn't working

#5336 opened Jun 7, 2024 by arthbohra

[Bug]: Unexpected prompt token logprob behaviors of llama 2 when setting echo=True for openai-api server bug

Something isn't working

#5334 opened Jun 7, 2024 by fywalter

[Bug]: vLLM does not support virtual GPU bug

Something isn't working

#5328 opened Jun 7, 2024 by youkaichao

[Usage]: Function calling for mistral v0.3 usage

How to use vllm

#5325 opened Jun 6, 2024 by mansirthd

[Installation]: Compiling VLLM for cpu only. installation

Installation problems

#5317 opened Jun 6, 2024 by Zibri

Previous 1 2 3 4 5 … 34 35 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-06-08.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly