Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .codestory.json
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
{"workingDirectory":"/Users/skcd/scratch/website","sessionId":"d1bbc1a275c00ca26347e6be6eaaa2b9157a7f6b58decca18f2df83dcb6eaa4d","preTestCommand":[]}
{
"workingDirectory": "/Users/skcd/scratch/website",
"sessionId": "d1bbc1a275c00ca26347e6be6eaaa2b9157a7f6b58decca18f2df83dcb6eaa4d",
"preTestCommand": []
}
11 changes: 5 additions & 6 deletions .prettierrc
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
{
"bracketSpacing": true,
"printWidth": 120,
"tabWidth": 4,
"trailingComma": "es5",
"useTabs": false
"bracketSpacing": true,
"printWidth": 120,
"tabWidth": 2,
"trailingComma": "es5",
"useTabs": false
}

2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# CodeStory main website

### v0.2.0

#### Website for https://codestory.ai/
24 changes: 14 additions & 10 deletions _posts/better-code-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ excerpt: "Searching code is an important part of every developer's workflow. We'
coverImage: "/assets/blog/code-reviews/cover.png"
date: "2023-08-26T14:22:00.000Z"
author:
name: Naresh Ramesh
picture: "/assets/blog/authors/naresh.jpg"
twitter: "https://twitter.com/ghostwriternr"
linkedin: "https://www.linkedin.com/in/naresh-ramesh"
github: "https://github.com/ghostwriternr"
name: Naresh Ramesh
picture: "/assets/blog/authors/naresh.jpg"
twitter: "https://twitter.com/ghostwriternr"
linkedin: "https://www.linkedin.com/in/naresh-ramesh"
github: "https://github.com/ghostwriternr"
ogImage:
url: "/assets/blog/code-reviews/cover.png"
url: "/assets/blog/code-reviews/cover.png"
---

A key part of every developer's workflow is to search code. Code search today surfaces in the form of file-level/global lookup, or by ⌘/Ctrl clicking our way within an IDE. We search code to:
Expand All @@ -23,9 +23,9 @@ A key part of every developer's workflow is to search code. Code search today su

And perhaps a few more.

 | 
:-------------------------:|:-------------------------:
![VSCode search](/assets/blog/better-code-search/vscode.png) | ![IntelliJ search](/assets/blog/better-code-search/intellij.png)
|   |   |
| :----------------------------------------------------------: | :--------------------------------------------------------------: |
| ![VSCode search](/assets/blog/better-code-search/vscode.png) | ![IntelliJ search](/assets/blog/better-code-search/intellij.png) |

## Diving deeper

Expand Down Expand Up @@ -70,11 +70,13 @@ We started by trying to break down the problem into a few facets.
## Solution space

### I know the exact code symbols I'm looking for

Great! Lexical search (as it exists in IDEs today) is the answer. In conjunction with the **modifiers** offered by the IDE, like _match case_, _match word_, _regex pattern_ and **filters** like _glob patterns to include_ and _glob patterns to exclude_, this gets you straight to the code symbol. Pairing this with tools for replacing code using the matching patterns, this solves the simplest use-cases that we have.

![Lexical search](/assets/blog/better-code-search/lexical_search.gif)

### I'm looking for something specific, but don't know where

The way I'd imagine looking for something specific would just be to ask for "cosine similarity" to find the cosine similarity implementation, or "async payment processor" to find the worker implementation that processes payments. We're pretty used to this type of search (aka Semantic search) from Google for so many years — but why can't we have this for searching code? So... we built it!

While this sounds trivial (_“just pass every file through an LLM and query, yea?”_), we can improve it in big ways with some small observations. Code, unlike natural language text like books or articles, is modular (in the form of being divided into packages, files and functions, for example) and has a lot of structure (in the form of being connected through imports, function calls, etc). Without leveraging this structure, a file in isolation might not know how the dependencies/dependents are written or can have unrelated functions that are all being used elsewhere. This addititional context can be quite critical to tokenizing correctly and generating better embeddings.
Expand All @@ -84,11 +86,13 @@ While this sounds trivial (_“just pass every file through an LLM and query, ye
Pairing this experience with the IDE's search UI, we can now search for code as _concepts_, and get back a list of results that are sorted by relevance. This is a big step forward in leveraging semantic search for code and we're excited about improving this further!

### I want to explore broadly and don't have a place to start
I'm going to use our experience of building the previous feature to explain this one. In order to augment VSCode's built-in search implementation, we needed to find answers to so many questions: ***Where does the UI implementation start? How are the classes composed and what are their responsibilities? Is it all in one module? Where is the search logic and how does the hand-off happen from the UI? How are results populated and kept updated?***

I'm going to use our experience of building the previous feature to explain this one. In order to augment VSCode's built-in search implementation, we needed to find answers to so many questions: **_Where does the UI implementation start? How are the classes composed and what are their responsibilities? Is it all in one module? Where is the search logic and how does the hand-off happen from the UI? How are results populated and kept updated?_**

Clearly, this is not a simple, static search query that a list of file paths can answer. And honestly, what you do if you were in this situation? Ask a senior engineer who knows this well to give you some pointers, right? Well — we didn't have one. So, we built one!

This is where our AI agent comes in. An ideal AI agent should be able to guide you through the codebase, answer questions about the codebase and help you find the right place to start. Well, we want our AI agent to be able to complete end-to-end tasks, but let's start here for now. In order to perform these actions, we need to provide the agent with a few things:

1. An index of the codebase that can be queried. Oh, hey, we just built that earlier! Sweet, we just need to give the agent access to the index.
2. The ability to breakdown a query into smaller chunks and extract this information from the index.
3. Most importantly, use these smaller parts to go deeper into the codebase. Today, we structurally navigate code by clicking around in the IDE (powered by the LSP) or just how a particular file is structured (syntax trees). Integrating these tools with the agent allows autonomous navigation of the code to understand how different pieces are linked together. "How the code flows", really.
Expand Down
54 changes: 27 additions & 27 deletions _posts/llm_lsp.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ excerpt: "To start with, we wanted the LLM to see a code repository as we humans
coverImage: "/assets/blog/code-reviews/cover.png"
date: "2023-07-31T13:09:00.000Z"
author:
name: Sandeep Kumar Pani
picture: "/assets/blog/authors/sandeep.jpg"
twitter: "https://twitter.com/skcd42"
linkedin: "https://www.linkedin.com/in/sandeep-kumar-pani"
github: "https://github.com/theskcd"
name: Sandeep Kumar Pani
picture: "/assets/blog/authors/sandeep.jpg"
twitter: "https://twitter.com/skcd42"
linkedin: "https://www.linkedin.com/in/sandeep-kumar-pani"
github: "https://github.com/theskcd"
ogImage:
url: "/assets/blog/code-reviews/cover.png"
url: "/assets/blog/code-reviews/cover.png"
---

With CodeStory we want to build a senior engineer right in your IDE!
Expand All @@ -35,23 +35,23 @@ Parsing typescript code is a bit of hit and miss unless you are very careful and

```js
export function something() {
console.log("interesting");
console.log("interesting");
}
```

or you can [declare](https://github.com/codestoryai/typescript_parsing/blob/main/parseRepo.ts#L195) it as:

```js
export const something = () => {
console.log("interesting");
console.log("interesting");
};
```

and there are [cases](https://github.com/codestoryai/typescript_parsing/blob/main/parseRepo.ts#L488) like this too

```js
export const revisit = doSomething("interesting", {
maxAge: 24 * 60 * 60, // one week
maxAge: 24 * 60 * 60, // one week
});
```

Expand All @@ -61,19 +61,19 @@ The core part of a code graph is getting a unique name for each symbol in the co

```js
export interface CodeSymbolInformation {
symbolName: string;
symbolKind: CodeSymbolKind;
symbolStartLine: number;
symbolEndLine: number;
codeSnippet: { languageId: string, code: string };
extraSymbolHint: string | null;
dependencies: CodeSymbolDependencies[];
fsFilePath: string;
originalFilePath: string;
workingDirectory: string;
displayName: string;
originalName: string;
originalSymbolName: string;
symbolName: string;
symbolKind: CodeSymbolKind;
symbolStartLine: number;
symbolEndLine: number;
codeSnippet: { languageId: string, code: string };
extraSymbolHint: string | null;
dependencies: CodeSymbolDependencies[];
fsFilePath: string;
originalFilePath: string;
workingDirectory: string;
displayName: string;
originalName: string;
originalSymbolName: string;
}
```

Expand Down Expand Up @@ -118,8 +118,8 @@ By giving the LLM a **Code Graph** it can walk on, we are able to get the LLM to

This allows the LLM to ask for:

- more information about the symbol if it has not seen
- provide better code completion and reasoning as it now has a LSP to interact with
- more information about the symbol if it has not seen
- provide better code completion and reasoning as it now has a LSP to interact with

---

Expand All @@ -131,6 +131,6 @@ While on our quest to create a senior engineer we are also building these toolin

We are finishing up work on giving:

- **terminal** access to the LLM so it can run commands
- **linters** so the the code generated is closer to what a human would write
- **debuggers** so it can debug its own code
- **terminal** access to the LLM so it can run commands
- **linters** so the the code generated is closer to what a human would write
- **debuggers** so it can debug its own code
12 changes: 6 additions & 6 deletions _posts/reimaginging-the-ide.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ excerpt: "We're building Aide by CodeStory, an AI-powered mod of VSCode. Here's
coverImage: "/assets/blog/code-reviews/cover.png"
date: "2023-08-19T13:11:00.000Z"
author:
name: Naresh Ramesh
picture: "/assets/blog/authors/naresh.jpg"
twitter: "https://twitter.com/ghostwriternr"
linkedin: "https://www.linkedin.com/in/naresh-ramesh"
github: "https://github.com/ghostwriternr"
name: Naresh Ramesh
picture: "/assets/blog/authors/naresh.jpg"
twitter: "https://twitter.com/ghostwriternr"
linkedin: "https://www.linkedin.com/in/naresh-ramesh"
github: "https://github.com/ghostwriternr"
ogImage:
url: "/assets/blog/code-reviews/cover.png"
url: "/assets/blog/code-reviews/cover.png"
---

How many times have you switched IDEs as a programmer? I’ve done so only thrice in 10 years. I started with Sublime Text at uni and briefly tried Atom before running back to Sublime due to its far superior performance. And then came VSCode. I’d be remiss if I also didn’t mention IntelliJ and GoLand, which I used for specific projects when my employer gave licenses to the JetBrains suite of IDEs. And of course, vim—but it’s too much work to make it do all the things I get from an IDE—so my usage there is limited to quick edits (and instead use vim keybindings in all my IDEs).
Expand Down
36 changes: 18 additions & 18 deletions _posts/stream.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ excerpt: "LLM output to editor edits: A step by step guide"
coverImage: "/assets/blog/code-reviews/cover.png"
date: "2023-11-02T14:22:00.000Z"
author:
name: Sandeep Kumar Pani
picture: "/assets/blog/authors/sandeep.jpg"
twitter: "https://twitter.com/skcd42"
linkedin: "https://www.linkedin.com/in/sandeep-kumar-pani"
github: "https://github.com/theskcd"
name: Sandeep Kumar Pani
picture: "/assets/blog/authors/sandeep.jpg"
twitter: "https://twitter.com/skcd42"
linkedin: "https://www.linkedin.com/in/sandeep-kumar-pani"
github: "https://github.com/theskcd"
ogImage:
url: "/assets/blog/code-reviews/cover.png"
url: "/assets/blog/code-reviews/cover.png"
---

TLDR: We use a mix of tree sitter queries and parsing the LLM output as it streams to generate editor edits to do in-place edits on the editor.
Expand Down Expand Up @@ -114,31 +114,31 @@ Since the context window is also limited, we have to be careful about the data w

The problem boils down to the following:

- send LLM the code we want it to change (duh!)
- send some context about the code above (which is often times important)
- send context about the code after the selection (which is not as important)
- keep in mind the [Lost-in-the-middle](https://arxiv.org/abs/2307.03172) behaviour of LLMs
- also give the LLM some space to think about how to change the code
- send LLM the code we want it to change (duh!)
- send some context about the code above (which is often times important)
- send context about the code after the selection (which is not as important)
- keep in mind the [Lost-in-the-middle](https://arxiv.org/abs/2307.03172) behaviour of LLMs
- also give the LLM some space to think about how to change the code

Lots of things right? But we can do something smart here using the system prompts and special markers which bias the LLM to not hallucinate and produce output which we can parse as quickly as possible.

While doing code generation another interesting fact which we found by experimenting and looking at how the LLM itself generates code gave us a few ideas.

When asked to generate code the GPT family of models output code which looks like this:

```jsx
````jsx
```{language}
{code}
`‎`‎`
```
````

language here can be any of typescript, rust, javascript etc…

the backticks are super important markers for parsing cause they give us a hint on when to start.

So our system prompt along with the messages ends up looking like this:

```jsx
````jsx
const system_message = "
You are an AI programming assistant.
When asked for your name, you must respond with "Aide".
Expand All @@ -149,7 +149,7 @@ Follow the user's requirements carefully and to the last detail.
- You always answer with {language} code.
- Modify the code or create new code.
- Unless directed otherwise, the user is expecting for you to edit their selected code.";
```
````

The LLM pays special attention to the system message and we tell it to always spit out a single code block so we can start parsing quickly!

Expand All @@ -175,7 +175,7 @@ We use temperature setting to 0.1 (since we want the LLM to be imaginative but n

Often times the completion of such prompt looks like this:

```txt
````txt
Sure! Here's the modified code with the try-catch block added:

```typescript
Expand Down Expand Up @@ -221,8 +221,8 @@ Great! so we got the LLM to output code, but we are not done yet, the magic of h

If you set `stream=True` on most LLM models, you can get the delta of what has been produced. This is often handy for many reasons:

- using `stream=True` in practice leads to a more stable connection with any LLM you are working on (the network stays fresh and there are fewer instances of timeouts or errors creeping in)
- you get incremental updates which is a great UX win, cause no one wants to wait for **5-6+ seconds** before seeing an output.
- using `stream=True` in practice leads to a more stable connection with any LLM you are working on (the network stays fresh and there are fewer instances of timeouts or errors creeping in)
- you get incremental updates which is a great UX win, cause no one wants to wait for **5-6+ seconds** before seeing an output.

So how do we start processing this stream of output from the LLM, in our case because of how the output from LLM looks like, we can use a few tricks (read as regexes and make assumptions about the code generated to fix things!)

Expand Down
41 changes: 20 additions & 21 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,22 @@
- Long file edits are now more stable with huristics and fuzzy matching instead of being fully controlled by AI.
- Initial golang support with dependencies powered by LSP and Tree-sitter.


#### 18th August, 2023

###### Aide — v1.0.11

- Introducing a new chat with support for slash commands!
- You can now invoke the AI agent from the chat using the `/agent` command.
![Chat](/changelog/1808-1.gif)
- Streaming output within the chat view to make the system feel more responsive and increase stability.
![Streaming output](/changelog/1808-2.gif)
- Introducing a new chat with support for slash commands!
- You can now invoke the AI agent from the chat using the `/agent` command.
![Chat](/changelog/1808-1.gif)
- Streaming output within the chat view to make the system feel more responsive and increase stability.
![Streaming output](/changelog/1808-2.gif)

#### 17th August, 2023

###### Aide — v1.0.10

- Typescript and Javascript support using LSP
- Agent view is more responsive and better prompt tuned.
- Typescript and Javascript support using LSP
- Agent view is more responsive and better prompt tuned.

#### 16th August, 2023

Expand Down Expand Up @@ -66,9 +65,9 @@ Initial launch of CodeStory's AI powered mod of VSCode! 🎉

With Aide you can do:

- AI agents which can do repo wide edits
- AI which takes care of keeping your commits
- Semantic search over your codebase
- AI agents which can do repo wide edits
- AI which takes care of keeping your commits
- Semantic search over your codebase

#### 28th July, 2023

Expand All @@ -88,19 +87,19 @@ We now link 🖇️ directly to the diff page, allowing easier navigation to the

###### VSCode Extension

- Automatic summaries for grouped changes under the "What was I doing?" feature.
![Automatic summaries](/changelog/1907-1.gif)
- Commit message helper that automatically captures the "what" of your changes, and let you fill in the "why".
![Commit message helper](/changelog/1907-2.gif)
- Automatic summaries for grouped changes under the "What was I doing?" feature.
![Automatic summaries](/changelog/1907-1.gif)
- Commit message helper that automatically captures the "what" of your changes, and let you fill in the "why".
![Commit message helper](/changelog/1907-2.gif)

#### 17th July, 2023

###### VSCode Extension

- Support for multiple tsconfig.json files in a Typescript project.
![Support for multiple tsconfig.json files](/changelog/1707.jpg)
- Support for multiple tsconfig.json files in a Typescript project.
![Support for multiple tsconfig.json files](/changelog/1707.jpg)

- Fix the caching logic to avoid re-indexing files while saving changes.
- Fix the caching logic to avoid re-indexing files while saving changes.

#### 14th July, 2023

Expand Down Expand Up @@ -163,10 +162,10 @@ Cloud backed explanations and semantic search.

Launching the CodeStory VSCode extension! 🎉

- Look up code using the context you remember in natural language 🔍
- Look up code using the context you remember in natural language 🔍

![search](/changelog/0407.gif)

- Recover from context switches and recollect work easily with an automatic timeline of changes 📅
- Recover from context switches and recollect work easily with an automatic timeline of changes 📅

- Self-serve onboarding for new hires and on-call with explanations for any class or function like your co-worker would 🚍
- Self-serve onboarding for new hires and on-call with explanations for any class or function like your co-worker would 🚍
Loading