loss value nan

HI when I was training with vicuna v1.3, the loss is always nan, my training script is this
torchrun --nproc_per_node=1 medusa/train/train_legacy.py --model_name_or_path lmsys/vicuna-7b-v1.3 \ --data_path mistral.json \ --bf16 True \ --output_dir test \ --num_train_epochs 1 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "no" \ --save_strategy "no" \ --learning_rate 1e-3 \ --weight_decay 0.0 \ --warmup_ratio 0.1 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --lazy_preprocess True \ --medusa_num_heads 3 \ --medusa_num_layers 1

the screenshot is like this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

loss value nan #139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

loss value nan #139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions