-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: vllm deployment of GLM-4V reports KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight'
bug
Something isn't working
#5417
opened Jun 11, 2024 by
zhaobu
[Usage]: How do you specify a specific branch on huggingface to use when downloading a model?
good first issue
Good for newcomers
usage
How to use vllm
#5415
opened Jun 11, 2024 by
fake-name
[Performance]: Qwen2-72B-Instruction-GPTQ-Int4 Openai Server Request Problem
performance
Performance-related issues
#5407
opened Jun 11, 2024 by
syngokhan
hidden-states from final (or middle layers)
feature request
#5406
opened Jun 11, 2024 by
janphilippfranken
[Bug]:The vllm service takes two hours to start Because of NCCL
bug
Something isn't working
#5405
opened Jun 11, 2024 by
zhaotyer
[Bug]: topk=1 and temperature=0 cause different output in vllm
bug
Something isn't working
#5404
opened Jun 11, 2024 by
rangehow
[Bug]: EngineArgs missing value type for Something isn't working
lora_dtype
bug
#5397
opened Jun 10, 2024 by
c3-ali
0.4.3 error CUDA error: an illegal memory access was encountered
bug
Something isn't working
#5376
opened Jun 10, 2024 by
maxin9966
[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered
bug
Something isn't working
#5371
opened Jun 10, 2024 by
gaye746560359
[Bug]: 8 GPU setup - vLLM can only start with --tensor-parallel-size=2 but not 4 or 8
bug
Something isn't working
#5370
opened Jun 10, 2024 by
elabz
[Bug]: load nvidia/Llama3-ChatQA-1.5-8B model 15 min
bug
Something isn't working
#5365
opened Jun 9, 2024 by
JJplane
[Bug]: Falcon fails if Something isn't working
trust_remote_code=True
bug
#5363
opened Jun 9, 2024 by
robertgshaw2-neuralmagic
[Bug]: Multi GPU setup for VLLM in Openshift still does not work
bug
Something isn't working
#5360
opened Jun 9, 2024 by
jayteaftw
[Bug]: TorchSDPAMetadata is out of date
bug
Something isn't working
#5351
opened Jun 7, 2024 by
Reichenbachian
[Bug]: with Something isn't working
--enable-prefix-caching
, /completions
crashes server with echo=True
above certain prompt length
bug
#5344
opened Jun 7, 2024 by
hibukipanim
[Performance]: [Automatic Prefix Caching] When hitting the KV cached blocks, the first execute is slow, and then is fast.
performance
Performance-related issues
#5339
opened Jun 7, 2024 by
soacker
[Usage]: Howto quiet the terminal 'Info' outputs in vllm
usage
How to use vllm
#5338
opened Jun 7, 2024 by
rohitnanda1443
[Bug]: Getting an empty string ('') for every call on fine-tuned Code-Llama-7b-hf model
bug
Something isn't working
#5336
opened Jun 7, 2024 by
arthbohra
[Bug]: Unexpected prompt token logprob behaviors of llama 2 when setting echo=True for openai-api server
bug
Something isn't working
#5334
opened Jun 7, 2024 by
fywalter
[Bug]: vLLM does not support virtual GPU
bug
Something isn't working
#5328
opened Jun 7, 2024 by
youkaichao
[Usage]: Function calling for mistral v0.3
usage
How to use vllm
#5325
opened Jun 6, 2024 by
mansirthd
[Installation]: Compiling VLLM for cpu only.
installation
Installation problems
#5317
opened Jun 6, 2024 by
Zibri
Previous Next
ProTip!
Updated in the last three days: updated:>2024-06-08.