大部分 LLM 模型都是基于 Transformer 架构,该架构由 2017 年的论文 “Attention is all you need” 中首次提出。它的原理是针对句子中的各个单词给予不同的注意力权重,而不是一视同仁。因此每个单词对整个句子在语义维度的贡献程度是不一样的。通过使用注意力机制,能够让模型更准确的抓住句子的精髓部分。
preprocessed = [item.strip() for item in preprocessed if item.strip() != ''] # 将未知单词或符号替换为 '<UNK>' preprocessed = [item if item inself.str_to_int else'<UNK>'for item in preprocessed] ids = [self.str_to_int[s] for s in preprocessed] return ids
defdecode(self, ids): text = " ".join([self.int_to_str[i] for i in ids]) text = re.sub(r'\s+([,.:;?_!"()\'])', r'\1', text) return text
defprint_gradients(model, x): output = model(x) target = torch.tensor([[0.0]]) # 为简单起见,用 0 作为目标值 loss = nn.MSELoss() # 初始化损失函数 loss = loss(output, target) # 比较输出和目标值间的差距(即损失) loss.backward() # 反向传播,计算损失的梯度 for name, param in model.named_parameters(): if"weight"in name: print(f"{name} has gradient mean of {param.grad.abs().mean().item()}")
print_gradients(model_without_shortcut, sample_input) # 从下面打印结果可见,随着网络层数的增加,反向传播时,梯度越来越小,逐渐消失 # layers.0.0.weight has gradient mean of 0.0002017411752603948 # layers.1.0.weight has gradient mean of 0.00012011770741082728 # layers.2.0.weight has gradient mean of 0.0007152437465265393 # layers.3.0.weight has gradient mean of 0.0013988513965159655 # layers.4.0.weight has gradient mean of 0.005049604922533035
torch.manual_seed(123) model_with_shortcut = ExampleDeepNeuralNetwork(layer_sizes, use_shorcut=True) print_gradients(model_with_shortcut, sample_input) # 从下面打印结果可见,使用跳跃连接后,梯度不会消失,保持稳定 # layers.0.0.weight has gradient mean of 0.22186797857284546 # layers.1.0.weight has gradient mean of 0.207092747092247 # layers.2.0.weight has gradient mean of 0.32923877239227295 # layers.3.0.weight has gradient mean of 0.2667771875858307 # layers.4.0.weight has gradient mean of 1.3268063068389893
defforward(self, x): # forward 由两部分构成,先使用多头注意力计算 token 间的相互关系,然后用 FeedForward 进行数据转换(激活),为下一轮计算做准备 shortcut = x x = self.norm1(x) x = self.att(x) # att 即 MultiHeadAttention x = self.drop_shortcut(x) x = x + shortcut # 跳跃连接
shortcut = x x = self.norm2(x) x = self.ff(x) # ff 即 FeedForward x = self.drop_shortcut(x) x = x + shortcut return x
torch.manual_seed(123) model = GPTModel(GPT_CONFIG_124M) model.to(device) optimizer = torch.optim.AdamW(model.parameters(), lr=0.0004, weight_decay=0.1) num_epochs = 10 train_losses, val_lossed, tokens_seen = train_model_simple( model, train_loader, val_loader, optimizer, device, num_epochs=num_epochs, eval_freq=5, eval_iter=5, start_context="Every effort moves you", tokenizer=tokenizer, ) # Ep 1 (Step 000000): Train loss 9.781, Val loss 9.923, # Ep 1 (Step 000005): Train loss 8.057, Val loss 8.332, # Every effort moves you,. # Ep 2 (Step 000010): Train loss 6.763, Val loss 7.044, # Ep 2 (Step 000015): Train loss 6.146, Val loss 6.628, # Every effort moves you, and, and, and, and, and, and, and, and,, and,, and, and, and, and, and, and, and,, and, and, and, and, and, and,, and # Ep 3 (Step 000020): Train loss 13.849, Val loss 14.409, # Ep 3 (Step 000025): Train loss 5.536, Val loss 6.441, # Every effort moves you, and to to to the to to the to the to the to the to the to the to the to the # Ep 4 (Step 000030): Train loss 5.181, Val loss 6.360, # Ep 4 (Step 000035): Train loss 5.026, Val loss 6.373, # Every effort moves you of the picture to the picture to the picture to the picture to the picture to the picture to the picture to the picture to the picture to the the picture to the picture to the my to the picture to the picture to the of the picture to the # Ep 5 (Step 000040): Train loss 4.689, Val loss 6.335, # Every effort moves you know it was not to have to have to have to have to have to have to have--and, and I was, and I had been the picture--as Jack himself, and I had been to have to have to have to have to have # Ep 6 (Step 000045): Train loss 4.133, Val loss 6.177, # Ep 6 (Step 000050): Train loss 3.686, Val loss 6.150, # Every effort moves you know it was not to have to have to see the fact of the last word. # Ep 7 (Step 000055): Train loss 3.395, Val loss 6.097, # Ep 7 (Step 000060): Train loss 2.701, Val loss 6.093, # Every effort moves you know it was not that the picture--I had the fact the fact of the donkey, I had been--I # Ep 8 (Step 000065): Train loss 2.494, Val loss 6.123, # Ep 8 (Step 000070): Train loss 2.166, Val loss 6.153, # Every effort moves you know it was not that the picture for nothing--I told Mrs. # Ep 9 (Step 000075): Train loss 1.793, Val loss 6.197, # Ep 9 (Step 000080): Train loss 1.471, Val loss 6.180, # Every effort moves you know," was not that my hostess was "interesting": on the last word. # Ep 10 (Step 000085): Train loss 1.071, Val loss 6.233, # Every effort moves you know," was not that my hostess was "interesting": on that point I could have given Miss Croft the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
import matplotlib.pyplot as plt from matplotlib.ticker import MaxNLocator
torch.manual_seed(123) token_ids = generate( model=model, idx=text_to_token_ids("Every effort moves you", tokenizer), max_new_tokens=15, context_size=GPT_CONFIG_124M["context_length"], top_k=25, temperature=1.4, ) print("Output text:\n", token_ids_to_text(token_ids, tokenizer)) # Output text: # Every effort moves you stand to work on surprise, a one of us had gone # with randomness, and the other side of the world. I was
defdownload_file(url, destination, backup_url=None): def_attempt_download(download_url): with urllib.request.urlopen(download_url) as response: # Get the total file size from headers, defaulting to 0 if not present file_size = int(response.headers.get("Content-Length", 0))
# Check if file exists and has the same size if os.path.exists(destination): file_size_local = os.path.getsize(destination) if file_size == file_size_local: print(f"File already exists and is up-to-date: {destination}") returnTrue# Indicate success without re-downloading
block_size = 1024# 1 Kilobyte
# Initialize the progress bar with total file size progress_bar_description = os.path.basename(download_url) with tqdm(total=file_size, unit="iB", unit_scale=True, desc=progress_bar_description) as progress_bar: withopen(destination, "wb") as file: whileTrue: chunk = response.read(block_size) ifnot chunk: break file.write(chunk) progress_bar.update(len(chunk)) returnTrue
try: if _attempt_download(url): return except (urllib.error.HTTPError, urllib.error.URLError): if backup_url isnotNone: print(f"Primary URL ({url}) failed. Attempting backup URL: {backup_url}") try: if _attempt_download(backup_url): return except urllib.error.HTTPError: pass
# If we reach here, both attempts have failed error_message = ( f"Failed to download from both primary URL ({url})" f"{' and backup URL (' + backup_url + ')'if backup_url else''}." "\nCheck your internet connection or the file availability.\n" "For help, visit: https://github.com/rasbt/LLMs-from-scratch/discussions/273" ) print(error_message) except Exception as e: print(f"An unexpected error occurred: {e}")
# Alternative way using `requests` """ def download_file(url, destination): # Send a GET request to download the file in streaming mode response = requests.get(url, stream=True)
# Get the total file size from headers, defaulting to 0 if not present file_size = int(response.headers.get("content-length", 0))
# Check if file exists and has the same size if os.path.exists(destination): file_size_local = os.path.getsize(destination) if file_size == file_size_local: print(f"File already exists and is up-to-date: {destination}") return
# Define the block size for reading the file block_size = 1024 # 1 Kilobyte
# Initialize the progress bar with total file size progress_bar_description = url.split("/")[-1] # Extract filename from URL with tqdm(total=file_size, unit="iB", unit_scale=True, desc=progress_bar_description) as progress_bar: # Open the destination file in binary write mode with open(destination, "wb") as file: # Iterate over the file data in chunks for chunk in response.iter_content(block_size): progress_bar.update(len(chunk)) # Update progress bar file.write(chunk) # Write the chunk to the file """
defload_gpt2_params_from_tf_ckpt(ckpt_path, settings): # Initialize parameters dictionary with empty blocks for each layer params = {"blocks": [{} for _ inrange(settings["n_layer"])]}
# Iterate over each variable in the checkpoint for name, _ in tf.train.list_variables(ckpt_path): # Load the variable and remove singleton dimensions variable_array = np.squeeze(tf.train.load_variable(ckpt_path, name))
# Process the variable name to extract relevant parts variable_name_parts = name.split("/")[1:] # Skip the 'model/' prefix
# Identify the target dictionary for the variable target_dict = params if variable_name_parts[0].startswith("h"): layer_number = int(variable_name_parts[0][1:]) target_dict = params["blocks"][layer_number]
# Recursively access or create nested dictionaries for key in variable_name_parts[1:-1]: target_dict = target_dict.setdefault(key, {})
# Assign the variable array to the last key last_key = variable_name_parts[-1] target_dict[last_key] = variable_array
return params
1 2 3 4 5 6 7
# 下载 124M 个参数的 gpt2 模型 from gpt_download import download_and_load_gpt2
text_2 = ( "Is the following text 'spam'? Answer with 'yes' or 'no':" " 'You are a winner you have been specially" " selected to receive $1000 cash or a $2000 award.'" ) token_ids = generate_text_simple( model=model, idx=text_to_token_ids(text_2, tokenizer), max_new_tokens=23, context_size=BASE_CONFIG["context_length"], ) print(token_ids_to_text(token_ids, tokenizer)) # Is the following text 'spam'? Answer with 'yes' or 'no': 'You are a winner you have been specially selected to receive $1000 cash or a $2000 award.' Answer a cash award a cash award cash' cash cash cash a cash a cash' cash a cash' cash'
虽然输入显式的指示模型最后输出 Yes 或者 No 作为结果,但显然模型最后的输出跟预期不同。这是因为模型之前的训练并没有专门针对分类任务进行设计。
train_losses, val_losses, train_accs, val_accs, examples_seen = train_classifier_simple( model, train_loader, val_loader, optimizer, device, num_epochs=num_epochs, eval_freq=50, eval_iter=5, ) end_time = time.time() execution_time_minutes = (end_time - start_time) / 60 print(f"Training completed in {execution_time_minutes:.2f} minutes") # Last output token: tensor([[-3.3210, 4.7156]]) # Ep 1 (Step 00000):Train loss 1.369, Val loss 1.781 # Ep 1 (Step 00050):Train loss 0.488, Val loss 0.349 # Ep 1 (Step 00100):Train loss 0.216, Val loss 0.335 # Training accuracy: 95.00% | Validation accuracy: 95.00% # Ep 2 (Step 00150):Train loss 0.257, Val loss 0.209 # Ep 2 (Step 00200):Train loss 0.086, Val loss 0.191 # Ep 2 (Step 00250):Train loss 0.138, Val loss 0.128 # Training accuracy: 100.00% | Validation accuracy: 97.50% # Ep 3 (Step 00300):Train loss 0.127, Val loss 0.211 # Ep 3 (Step 00350):Train loss 0.187, Val loss 0.105 # Training accuracy: 92.50% | Validation accuracy: 97.50% # Ep 4 (Step 00400):Train loss 0.108, Val loss 0.089 # Ep 4 (Step 00450):Train loss 0.027, Val loss 0.094 # Ep 4 (Step 00500):Train loss 0.203, Val loss 0.048 # Training accuracy: 100.00% | Validation accuracy: 97.50% # Ep 5 (Step 00550):Train loss 0.073, Val loss 0.044 # Ep 5 (Step 00600):Train loss 0.047, Val loss 0.074 # Training accuracy: 100.00% | Validation accuracy: 97.50% # Training completed in 14.17 minutes
text_2 = "Hey, just wanted to check if we're still on for dinner tonight? Let me know!" print( classify_review( text_2, model, tokenizer, device, max_length=train_dataset.max_length ) ) # 保存模型参数 torch.save(model.state_dict(), "data/review_classifier.pth")
text_2 = "Hey, just wanted to check if we're still on for dinner tonight? Let me know!" print(classify_review(text_2, model, tokenizer, device, max_length=120))
defdownload_and_load_file(file_path, url): ifnot os.path.exists(file_path): with urllib.request.urlopen(url) as response: text_data = response.read().decode("utf-8") withopen(file_path, "w", encoding="utf-8") as f: f.write(text_data) else: withopen(file_path, "r", encoding="utf-8") as f: text_data = f.read() withopen(file_path, "r") as f: data = json.load(f) return data
data = download_and_load_file(file_path, url) print("Number of entries:", len(data)) # Number of entries: 1100
print("Example entry:\n", data[50]) # Example entry: # {'instruction': 'Identify the correct spelling of the following word.', 'input': 'Ocassion', 'output': "The correct spelling is 'Occasion.'"} print("Another example entry:\n", data[999]) # {'instruction': "What is an antonym of 'complicated'?", 'input': '', 'output': "An antonym of 'complicated' is 'simple'."}
# 格式化输入的数据,以便用于训练 defformat_input(entry): instruction_text = ( f"Below is an instruction that describes a task." f"Write a response that appropriately completes the request." f"\n\n### Instruction:\n{entry['instruction']}" )
model_input = format_input(data[50]) desired_response = f"\n\n### Response:\n{data[50]['output']}" print(model_input + desired_response) # Below is an instruction that describes a task.Write a response that appropriately completes the request.
# ### Instruction: # Identify the correct spelling of the following word.
# ### Input: # Ocassion
# ### Response: # The correct spelling is 'Occasion.'
model_input = format_input(data[999]) desired_response = f"\n\n### Response:\n{data[999]['output']}" print(model_input + desired_response) # Below is an instruction that describes a task.Write a response that appropriately completes the request.
# ### Instruction: # What is an antonym of 'complicated'?
# ### Response: # An antonym of 'complicated' is 'simple'.
response_text = generated_text[len(input_text) :].strip() print(response_text) # ### # ### the active # The active # The active # The active # The active # The active # The active # The active # The active # The active # The
# Ep 1 (Step 000000): Train loss 3.258, Val loss 3.389, # Ep 1 (Step 000005): Train loss 1.878, Val loss 1.920, # Ep 1 (Step 000010): Train loss 1.472, Val loss 1.480, # Ep 1 (Step 000015): Train loss 1.302, Val loss 1.353, # Ep 1 (Step 000020): Train loss 1.195, Val loss 1.181, # Ep 1 (Step 000025): Train loss 1.023, Val loss 1.142, # Ep 1 (Step 000030): Train loss 1.020, Val loss 1.092, # Ep 1 (Step 000035): Train loss 0.991, Val loss 1.062, # Ep 1 (Step 000040): Train loss 0.892, Val loss 0.959, # Ep 1 (Step 000045): Train loss 0.919, Val loss 0.905, # Ep 1 (Step 000050): Train loss 0.905, Val loss 0.873, # Ep 1 (Step 000055): Train loss 0.816, Val loss 0.849, # Ep 1 (Step 000060): Train loss 0.783, Val loss 0.895, # Ep 1 (Step 000065): Train loss 0.794, Val loss 0.818, # Ep 1 (Step 000070): Train loss 0.705, Val loss 0.757, # Ep 1 (Step 000075): Train loss 0.678, Val loss 0.777, # Ep 1 (Step 000080): Train loss 0.700, Val loss 0.699, # Ep 1 (Step 000085): Train loss 0.603, Val loss 0.673, # Ep 1 (Step 000090): Train loss 0.533, Val loss 0.606, # Ep 1 (Step 000095): Train loss 0.572, Val loss 0.633, # Ep 1 (Step 000100): Train loss 0.432, Val loss 0.589, # Ep 1 (Step 000105): Train loss 0.497, Val loss 0.620, # Ep 1 (Step 000110): Train loss 0.515, Val loss 0.591, # Ep 1 (Step 000115): Train loss 0.464, Val loss 0.576, # Below is an instruction that describes a task.Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive: 'The chef cooks the meal every day.' ### Response: 'The chef.' ### Response: 'The chef.' ### Response: '### Response: '### Response: '### Response: '### Response: '### Response: '### Response: '### Response: '### # Ep 2 (Step 000120): Train loss 0.486, Val loss 0.598, # Ep 2 (Step 000125): Train loss 0.436, Val loss 0.605, # Ep 2 (Step 000130): Train loss 0.360, Val loss 0.516, # Ep 2 (Step 000135): Train loss 0.358, Val loss 0.465, # Ep 2 (Step 000140): Train loss 0.531, Val loss 0.552, # Ep 2 (Step 000145): Train loss 0.379, Val loss 0.546, # Ep 2 (Step 000150): Train loss 0.324, Val loss 0.478, # Ep 2 (Step 000155): Train loss 0.418, Val loss 0.467, # Ep 2 (Step 000160): Train loss 0.394, Val loss 0.438, # Ep 2 (Step 000165): Train loss 0.307, Val loss 0.495, # Ep 2 (Step 000170): Train loss 0.358, Val loss 0.425, # Ep 2 (Step 000175): Train loss 0.294, Val loss 0.425, # Ep 2 (Step 000180): Train loss 0.270, Val loss 0.464, # Ep 2 (Step 000185): Train loss 0.353, Val loss 0.413, # Ep 2 (Step 000190): Train loss 0.372, Val loss 0.383, # Ep 2 (Step 000195): Train loss 0.246, Val loss 0.368, # Ep 2 (Step 000200): Train loss 0.334, Val loss 0.396, # Ep 2 (Step 000205): Train loss 0.339, Val loss 0.424, # Ep 2 (Step 000210): Train loss 0.263, Val loss 0.388, # Ep 2 (Step 000215): Train loss 0.197, Val loss 0.375, # Ep 2 (Step 000220): Train loss 0.241, Val loss 0.321, # Ep 2 (Step 000225): Train loss 0.198, Val loss 0.349, # Ep 2 (Step 000230): Train loss 0.235, Val loss 0.400, # Below is an instruction that describes a task.Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive: 'The chef cooks the meal every day.' The active sentence to the active sentence to the active sentence to the active sentence to the active sentence to the active sentence to the active sentence active sentence active sentence active sentence active sentence active sentence active sentence active sentence active sentence active sentence active sentence active sentence # Training completed in 44.29 minutes.
# Below is an instruction that describes a task.Write a response that appropriately completes the request.
# ### Instruction: # Rewrite the sentence using a simile.
# ### Input: # The car is very fast.
# Correct response: # >> The car is as fast as lightning.
# Model response: # >> The car is very fast. # ------------------------------------------ # Below is an instruction that describes a task.Write a response that appropriately completes the request.
# ### Instruction: # What type of cloud is typically associated with thunderstorms?
# Correct response: # >> The type of cloud typically associated with thunderstorms is cumulonimbus.
# Model response: # >> What type of cloud associated with type of cloud associated with associated with associated with associated with associated with associated with associated with associated cloud associated cloud associated cloud associated cloud associated cloud associated cloud associated cloud associated cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud with cloud with cloud with cloud with cloud with cloud with cloud with cloud with cloud with cloud with cloud with associated with associated with cloud with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud with associated cloud associated cloud associated cloud associated cloud associated cloud associated cloud associated cloud associated cloud associated cloud associated cloud associated with associated with associated with associated with associated with associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud associated with cloud with cloud with cloud with cloud with cloud with cloud # ------------------------------------------ # Below is an instruction that describes a task.Write a response that appropriately completes the request.
# ### Instruction: # Name the author of 'Pride and Prejudice'.
# Correct response: # >> Jane Austen.
# Model response: # >> Name the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of the author of author of author of author of author of author of author of author of author of author of author of author of author of author of author of author of author of author of author of author of author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the author describes the
# 下载并运行 ollama3 模型 PS C:\Users\ccw> ollama run llama3 >>> what do llamas eat? Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet typically consists of:
1. Grasses: Llamas love to graze on grasses, including various species like timothy grass, orchard grass, and brome. 2. Hay: They enjoy eating hay, such as alfalfa or oat hay, which is high in fiber and protein. 3. Grains: Llamas might eat grains like oats, barley, or corn, but these should not make up more than 10% of their diet. 4. Fruits and vegetables: Many llamas enjoy fruits and veggies like apples, carrots, sweet potatoes, and peas as treats or supplements. 5. Minerals: Llamas need access to mineral blocks or loose minerals that provide essential nutrients like calcium, phosphorus, and salt.
In the wild, llamas might also eat:
1. Leaves: They'll munch on leaves from trees and shrubs, such as willow, alder, and juniper. 2. Bark: In some cases, they might eat bark, especially during times of food scarcity. 3. Mushrooms: Llamas have been known to eat certain types of mushrooms, like those in the genus Boletus.
In captivity or on farms, llama owners usually provide a diet that consists mainly of hay and grains, with occasional treats like fruits and veggies. It's essential to ensure they receive a balanced diet and access to clean water at all times.
PS C:\Users\ccw> ollama run llama3 >>> Given the input `Below is an instruction that describe a task. Write a response that appropriately completes the request. ... ... ### Instruction: ... Rewrite the following sentence so that it isin active voice. ... ... ### Input: ... The cake was baked by Sarah.` and correct output `Sarah baked the cake.`, score the model response `Input: Input: Input: Input: Input: Input: Input` on ... a scale from0 to 100, where 100is the best score. A nice and simple instruction!
To rewrite the sentence in active voice, I'll make sure that the subject of the sentence performs the action described by the verb.
Original sentence: The cake was baked by Sarah. Rewritten sentence: Sarah baked the cake.
Score: 100
Why? Because the rewritten sentence follows the structure of an active voice sentence, where the subject (Sarah) performs the action (baked) on the object (the cake). The original sentence is in passive voice, which can sometimes make it harder to identify who performed the action.
defgenerate_model_scores(json_data, json_key, model="llama3"): scores = [] for entry in tqdm(json_data, desc="Scoring entries"): prompt = ( f"Given the input `{format_input(entry)}` " f"and correct output `{entry['output']}`, " f"score the model response `{entry[json_key]}`" f" on a scale from 0 to 100, where 100 is the best score. " f"Respond with the integer number only." ) score = query_model(prompt, model) try: scores.append(int(score)) except ValueError: print(f"Count not convert score: {score}") continue return scores
scores = generate_model_scores(test_data, "model_response") print(f"Number of scores: {len(scores)} of {len(test_data)}") print(f"Average score: {sum(scores) / len(scores):.2f}\n") # Number of scores: 110 of 110 # Average score: 50.32
通过打分,可以比较不同模型的性能,同时也可以用来调整方法重新训练模型,例如:
调整微调过程中的相关参数,例如学习率 learning rate,批量大小 batch siz,迭代次数 num of epochs 等;