Do Llamas understand the periodic table?

This repository contains the official codebase for our paper:
"Do Llamas understand the periodic table?"

We investigate how large language models (LLMs) encode structured scientific knowledge using chemical elements as a case study. Our key findings include:

Discovery of a 3D spiral structure in LLM activations, aligned with the periodic table.
Intermediate layers encode continuous, overlapping attributes suitable for indirect recall.
Deeper layers sharpen categorical boundaries and integrate linguistic context.
LLMs organize facts as geometry-aware manifolds, not just isolated tokens.

Repository Structure

Each folder corresponds to a section or concept in the paper:

Pre/ — Preprocessing scripts: prompt creation, activation extraction.
Geometry/ — Code for geometric analyses, such as spiral detection.
Direct_recall/ — Linear probing for direct factual recall.
Indirect_recall/ — Experiments on retrieving unmentioned or related facts.
Appendix/ — Extra analysis, visualizations, and ablation results.
Results/ — Saved figures, metrics, and outputs.
periodic_table_dataset.csv — Structured dataset of 50 elements and attributes.

Setup & Installation

Clone the repository and enter the project directory.

Set your HuggingFace API token in config.json:

{
  "HF_TOKEN": "your_huggingface_token"
}

Install dependencies:

conda create --name myenv python=3.10
conda activate myenv
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

Datasets

This project uses activation_datasets located at ./activation_datasets/ (project root).

You can obtain the datasets in two ways:

Option A: Extract Residual Stream Yourself
1. Edit the configuration file: config_extract_activation.yaml
2. Run the extraction script:
```
python Pre/extract_activations.py
```
Option B: Download from Hugging Face
```
huggingface-cli download leige1114/activation_datasets \
--repo-type dataset \
--local-dir activation_datasets \
--local-dir-use-symlinks False
```

Hardware Compatibility & Quantization

bitsandbytes 4-bit quantization (load_in_4bit, nf4) is only supported on Linux with NVIDIA GPUs.
It does not work on macOS (including Apple Silicon) or CPU-only setups.

If you don’t have an NVIDIA GPU:

Disable quantization in configs:
- config_extract_activation.yaml: set extraction.quantization.load_in_4bit: false (or remove the whole block).
- config_indirect.yaml: set quantization.load_in_4bit: false if used.
Disable quantization in scripts:
- Geometry/intervention.py: 'use_quantization': False
- Appendix/entity_attention.py: quantize=False
For scripts without a toggle:
Remove BitsAndBytes-related code, or pass quantization_config=None.
On CPU, you can also use device_map="cpu" and reduce batch size.

Note: requirements.txt pins bitsandbytes.
On macOS/CPU-only, installation may fail—remove the dependency and keep quantization disabled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Do Llamas understand the periodic table?

Repository Structure

Setup & Installation

Hardware Compatibility & Quantization

If you don’t have an NVIDIA GPU:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Appendix		Appendix
Direct_recall		Direct_recall
Geometry		Geometry
Indirect_recall		Indirect_recall
Pre		Pre
Results		Results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
config_extract_activation.yaml		config_extract_activation.yaml
config_indirect.yaml		config_indirect.yaml
config_linear_regression.yaml		config_linear_regression.yaml
periodic_table_dataset.csv		periodic_table_dataset.csv
requirements.txt		requirements.txt

License

tldr-group/LLM-knowledge-representation

Folders and files

Latest commit

History

Repository files navigation

Do Llamas understand the periodic table?

Repository Structure

Setup & Installation

Hardware Compatibility & Quantization

If you don’t have an NVIDIA GPU:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages