Skip to content

Conversation

Bigfoot71
Copy link
Contributor

@Bigfoot71 Bigfoot71 commented Mar 12, 2025

This PR aims to work on the integration of software rendering support into rlgl, through an external header rlsw.h

I give to @raysan5 the privilege of choosing the name if rlsw doesn't suit him.

Additionally, the header is delivered with an MIT license in my name. I also leave the choice of the license to @raysan5. Consider this project as a donation to the community; I will not redistribute it on my side.

This header rlsw.h is intended to provide, for now, all the functionalities offered by rlgl.h for OpenGL 1.1.

Currently, rendering is performed to a framebuffer that supports multiple formats, selectable at compile time:

  • Color Buffer:

    • RGB - 8-bit (3:3:2)
    • RGB - 16-bit (5:6:5)
    • RGB - 24-bit (8:8:8)
  • Depth Buffer:

    • D - 8-bit (unorm)
    • D - 16-bit (unorm)
    • D - 24-bit (unorm)

For the rest, I'll leave you to check the checklist.

If you notice any features that should be implemented and are missing from the checklist, please mention them or edit the post if you can.

note: This PR currently only contains the header. I will work on the integration once the checklist is complete and a decision has been made on how to integrate it into raylib.


Feature Checklist

Clipping

  • Point Clipping
  • Line Clipping
  • Triangle Clipping
  • Quad Clipping

Rendering

  • Point Rendering
  • Line Rendering
  • Triangle Rendering
  • Quad Rendering
  • Polygon Modes
  • Point Width
  • Line Width

Texture Support

  • All Uncompressed Texture Formats Supported by Raylib
  • Texture Minification / Magnification Checks
  • Bilinear Filtering
  • Texture Wrap Modes with Separate Check for S/T

Vertex Arrays

  • Vertex Arrays Support
  • Direct Primitive Drawing Mode
  • Matrix Stack Support

Misc

  • GL-like Getter Functions
  • Framebuffer Resizing
  • Perspective Correct
  • Scissor Clipping
  • Depth Testing
  • Blend Modes
  • Face Culling

@Bigfoot71
Copy link
Contributor Author

Here is a quick video example of some of the current capabilities. Note that the texture is rendered with bilinear filtering, although I agree that it may not be noticeable as it is.

simplescreenrecorder-2025-03-12_01.34.46.mp4

@ColleagueRiley
Copy link
Contributor

This repeats a lot of code that RLGL does for you.

RLGL already handles the OpenGL 1 like abstraction and the marxies, so I think it would make more sense to build from the modern opengl setup and render everything in rlDrawRenderBatch.

@Bigfoot71
Copy link
Contributor Author

Additionally, I chose to implement triangle rasterization using the scanline method rather than barycentric interpolation method.

Barycentric interpolation can be much more efficient when implemented with SIMD optimizations and is also conceptually simpler (excluding SIMD considerations).

However, I assumed that the goal here is not to achieve maximum performance on a modern desktop CPU but rather to enable raylib to run on platforms that lack OpenGL support, including software rendering.

When properly implemented, the scanline method requires fewer computational resources and offers better cache locality, making it a more suitable choice for embedded systems and older hardware.

@Bigfoot71
Copy link
Contributor Author

This repeats a lot of code that RLGL does for you.

RLGL already handles the OpenGL 1 like abstraction and the marxies, so I think it would make more sense to build from the modern opengl setup and render everything in rlDrawRenderBatch.

@ColleagueRiley Yes, I know, and it is a deliberate choice for several reasons.

The first is that we will be able to integrate its implementation directly into the parts currently dedicated only to OpenGL 1.1 only, we can therefore also be sure that it does not generate any duplication during the build.

The second reason is that if we ever want to implement lighting support, the current matrix stack system in rlgl for OpenGL 2+ could be problematic, especially when computing the normal matrix.

@ColleagueRiley
Copy link
Contributor

@Bigfoot71 I'm also not sure if the dedicated legacy OpenGL backend makes sense, when it can easily be integrated.

@Bigfoot71
Copy link
Contributor Author

@Bigfoot71 I'm also not sure if the dedicated legacy OpenGL backend makes sense, when it can easily be integrated.

@ColleagueRiley Do you also want to add shader support? Reinvent the Mesa driver for raylib? No one is going to do that.

Unless we use function pointers or something like that, but it's simply not compatible.

Anyway, the goal of this implementation is to enable raylib to run on machines that don't even have a software driver for OpenGL. We’re not going to implement what hasn’t been done for these machines ourselves, it would be absurd to think that.

@ColleagueRiley
Copy link
Contributor

I meant it can be integrated into the modern API. I think it's more important RLGL is refactored to support additional backends. Not only to make this software rendering backend, but it would help users that want to add custom support for the native graphics API.

@Bigfoot71
Copy link
Contributor Author

Bigfoot71 commented Mar 12, 2025

TL;DR: Adding support for modern APIs will require abandoning OpenGL, and in such a case, adding software rendering support for platforms that don't support OpenGL would then be strange.


Ah, excuse me, I see, but there would be several problems with that.

Relying on the batch system would just be a waste of memory in this specific case, unless we attempt very specific software rendering architectures, which wouldn't necessarily be relevant for the target devices (where memory is limited).

There is also the issue of the current management of matrices by rlgl, as I mentioned before.

Moreover, this could become increasingly confusing as to which features are implemented or not for each backend.


Also, I’ve reconsidered what you said about refactoring rlgl to allow the implementation of different backends.

If this is with the intention of implementing Vulkan, Metal, and D3D12, there will be many other problems that will arise, such as, for example, one of the most obvious that comes in my mind: VAOs.

Not to mention the specific implementations that will be needed to support OpenGL in order to align with the principles of immutable pipelines.

Implementing a modern API in the right way will require sacrifices for older APIs.

And here we are talking about a hypothesis that, if chosen, will take a very long time to materialize. I am ultimately quite skeptical.

Favoring modern APIs, neglecting OpenGL, but still adding support for software rendering seems like a somewhat obscure decision.

Anyway, this restructuring is a good idea in itself, but if it's done, then I find adding software rendering support quite strange.

In any case, I am just offering a potential solution to Raysan's long-standing request; it's up to him to decide.

@ColleagueRiley
Copy link
Contributor

@Bigfoot71 It wouldn't require too many changes, it would be more of a reorganization effort. I don't think there's much benefit for direct support for DirectX or Vulkan, but it would be nice to be able to support native APIs.

This would not involve neglecting Legacy OpenGL, it could be implemented like any other custom backend.

As for memory usage, I'm unsure how much that would be a problem.

@Bigfoot71
Copy link
Contributor Author

@ColleagueRiley There will inevitably be a choice to make regarding which APIs to prioritize.

For example, with Metal, if OpenGL is favored, it requires some consideration because if the Metal support is poorly suited to work with OpenGL, then wouldn’t ANGLE already do a better job?

Creating a low-level abstraction for multiple APIs, including both old and modern ones, will inevitably mean neglecting some, that's unavoidable...

So, my opinion is that:

  • Either we prioritize modern APIs going forward, in which case software rendering support is irrelevant
  • Or we decide to prioritize OpenGL, but I think projects like ANGLE will always do a better job than what we can achieve here

@ColleagueRiley
Copy link
Contributor

The purpose of software rendering is to support certain platforms that do not support OpenGL, but many of those platforms would support GPU rendering via alternative APIs, for example, the Wii has its graphics API.

I don't think RLGL should be implementing these, but making it easier for users to implement their own would be nice.

@Bigfoot71
Copy link
Contributor Author

Bigfoot71 commented Mar 12, 2025

Mhh I see, but creating a basic API that we can only judge as being adapted to the maximum number of possible APIs, including proprietary ones, seems like a bad idea to me.

@raysan5
Copy link
Owner

raysan5 commented Mar 12, 2025

@Bigfoot71 thank you very much for this fantastic header! From my point of view, simplicity is key.

Being able to support some basic software rendering option for platforms with no OpenGL driver support with minimal changes in current raylib implementation seems a very valid reason to me. I specially have in mind the upcoming range of RISC-V micro-computers we will be probably seeing soon in the market (most of them with no GPU).

About the implementation into raylib, I see most rlsw functions map directly to OpenGL 1.1 counter-parts so it would be nice that the OpenGL1.1-mapping was done directly in the rlsw header, so in rlgl it can be simply implemented using:

#if defined(GRAPHICS_API_OPENGL_11_SOFTWARE)
    #define RLSW_IMPL
    #include <rlsw.h>          // OpenGL 1.1 software implementation

    #define GRAPHICS_API_OPENGL_11
#endif

Beside that, some platform layer (RGFW?) should support a raw color framebuffer initialization and screen blitting.

@Bigfoot71
Copy link
Contributor Author

Bigfoot71 commented Mar 12, 2025

Okay, no problem, I'll continue writing right away.

Since some features of GL 1.1 related to certain function parameters are not used in RLGL, I will simply write a binding using macros and omit these parameters.

I will also create macros to generate functions based on rendering parameters and add line rasterization and blend modes.

This could go very fast!

@ColleagueRiley
Copy link
Contributor

This doesn't make much sense to me. If you're going to add a software rendering backend, then it makes more sense to implement it as a custom backend rather than working around the legacy OpenGL framework, as it could be more performant and useful.

If you do not care about the performance, the legacy OpenGL backend can already easily do software rendering. Without any modifications to Raylib at all.

If you wanted to ensure it's rendering to a pixel buffer, you could use OSMesa. That wouldn't require any changes to RLGL.

@Bigfoot71
Copy link
Contributor Author

Bigfoot71 commented Mar 12, 2025

as it could be more performant and useful.

@ColleagueRiley Explain why not implementing it through legacy OpenGL would be more efficient? Especially if you say "more useful", then what features would be missing? Knowing that any added feature will inevitably make it less efficient in some way.

you could use OSMesa

We have already talked about it: #3928 (comment)

And this may not be suitable for all scenarios considered by Ray.

@Bigfoot71
Copy link
Contributor Author

No matter what, what we are doing here is not incompatible with your suggestion to refactor rlgl in the future.

And even if we implement the possibility of using different backends with rlgl, it does not solve the problem that each platform requires specific implementations for copying color buffers to the screen.

@raysan5
Copy link
Owner

raysan5 commented Mar 12, 2025

@Bigfoot71 Just to clarify some points in case I missed something:

  1. rlsw is a single-file header-only self-contained portable partial-OpenGL 1.1 implementation, that draws into an RGBA RAM memory buffer, right?
  2. The required "OpenGL context" is actually sw_data_t, a global variable contained in rlsw implementation.
  3. swFunctionName() are a direct mapping/implementation of the OpenGL glFunctionName()
  4. The "only" requirements on raylib side for basic software rendering functionality would be:
    • InitWindow() - Provide a system (double)framebuffer to flip rlsw generated frames.
    • CloseWindow() - Free the provided (double)framebuffer and rlsw loaded "context resources"
    • SwapScreenBuffer() - Copy data and swap buffers, back to front
  5. This implementation (plus some extra platform logic) would allow raylib running on no-GPU devices like RaspberryPi Pico 2 or similar RISC-V alternatives, is that correct?
  6. In terms of performance, what is the performance hit in comparison to a GPU-accelerated implementation? Maybe 10% performance?
  7. Is there any other single-file header-only portable alternative to accomplish the same?

Thanks for your answers! 😄

Also replace the triangle rasterization functions with macros that generate specific functions for each state of the rendering system.
Also, add the OpenGL definitions in order to add a binding for rlgl.
@Bigfoot71
Copy link
Contributor Author

Bigfoot71 commented Mar 12, 2025

@raysan5


  1. rlsw is a single-file header-only self-contained portable partial-OpenGL 1.1 implementation, that draws into an RGBA RAM memory buffer, right?

Yes


2. The required "OpenGL context" is actually sw_data_t, a global variable contained in rlsw implementation.

Yes


3. swFunctionName() are a direct mapping/implementation of the OpenGL glFunctionName()

Not exactly, I made a few simplifications compared to what rlgl does with GL 1.1, but I will add a binding corresponding to the OpenGL API that handles these small differences. It will be done through macros, and there will be no overhead.


4. The "only" requirements on raylib side for basic software rendering functionality would be:

* `InitWindow()` - Provide a system (double)framebuffer to flip `rlsw` generated frames.

* `CloseWindow()` - Free the provided (double)framebuffer and `rlsw` loaded "context resources"

* `SwapScreenBuffer()` - Copy data and swap buffers, back to front

That's exactly it, in the end, nothing more should be necessary.


5. This implementation (plus some extra platform logic) would allow raylib running on no-GPU devices like RaspberryPi Pico 2 or similar RISC-V alternatives, is that correct?

Exactly!


6. In terms of performance, what is the performance hit in comparison to a GPU-accelerated implementation? Maybe 10% performance?

It’s a bit early to say, the first video I presented wasn’t optimized yet.

I just added function generation based on the state (whether textures are being sampled or not, depth test on or off, etc.)

With this new addition, I now see the counter going above 2000 FPS for the same example, so I would say an improvement of around ~x1.5 roughly.

But it’s still early, other optimizations are still possible, and I will do thorough testing at the end.


7. Is there any other single-file header-only portable alternative to accomplish the same?

Contained in a single header with the same functionalities, no, at least not to my knowledge.

Edit: Aside from PortableGL, but it's different, as it aims to support more modern versions of OpenGL.
But, for example, shaders are managed via function pointers, which is problematic in our case.

@ColleagueRiley
Copy link
Contributor

OpenGL 1.0 will already use software rendering by default if there is no GPU. I don't think this is fixing a clearly defined problem.

@Bigfoot71
Copy link
Contributor Author

Bigfoot71 commented Mar 12, 2025

OpenGL 1.0 will already use software rendering by default if there is no GPU. I don't think this is fixing a clearly defined problem.

This depends on the platform and the driver

See for example:

  • Raspberry Pi Pico (RP2040)
  • Arduino Zero
  • Arduino Nano 33 BLE

@raysan5
Copy link
Owner

raysan5 commented Mar 12, 2025

@ColleagueRiley Actually it will allow this:

  1. This implementation (plus some extra platform logic) would allow raylib running on no-GPU devices like RaspberryPi Pico 2 or similar RISC-V alternatives, is that correct?

For me this is the most notable achievement and step towards the future. Afaik, no other alternative allows that, at least in a simple way.

Note that this will expand raylib to low-level embedded devices and microcontrollers.

@ColleagueRiley
Copy link
Contributor

Ok, I think I've said most of what I want to say on this.

I can help with the platform part.

@Bigfoot71
Copy link
Contributor Author

The issue with nearby triangles has been resolved, it was actually caused by face culling, which occurs before clipping and division by W

The function contained an error related to negative homogeneous W values, everything has been documented

It's much better this way!

simplescreenrecorder-2025-05-18_18.13.54.mp4

Bonus, another example, I also see that there's a precision issue in the sampling that needs to be fixed, it's particularly noticeable with the text

simplescreenrecorder-2025-05-18_18.19.03.mp4

I'll resume work on the other platforms as soon as I have time
If anyone wants to help, there's already an example for SDL!

@raysan5 raysan5 added the new feature This is an addition to the library label May 19, 2025
@vaguinerg
Copy link

OpenGL 1.0 will already use software rendering by default if there is no GPU. I don't think this is fixing a clearly defined problem.

what opengl 1.0?

@Bigfoot71
Copy link
Contributor Author

OpenGL 1.0 will already use software rendering by default if there is no GPU. I don't think this is fixing a clearly defined problem.

what opengl 1.0?

The statement doesn't really make sense anyway
OpenGL 1.0 itself doesn't "use" software rendering, it's just a specification
Whether software fallback exists depends entirely on the implementation (like Mesa) and the platform
So saying "OpenGL already uses software rendering" is misleading

@Bigfoot71
Copy link
Contributor Author

Bigfoot71 commented Jul 6, 2025

@raysan5 I just added support for rcore_drm.c!

I used a dumb buffer, mainly adding code under GRAPHICS_API_OPENGL_11_SOFTWARE. I had to handle a few extra init cases to get it running on my PC, but no existing code was changed or removed.

I'd really appreciate if someone familiar with DRM could review it, just to make sure there's nothing to clean up or improve before I go further, since it's my first time working on this kind of integration.

All changes are in this commit: 954cde5


So now we have SDL and DRM!

What’s left, theoretically:

  • Android
  • RGFW
  • GLFW

I’m not mentioning Web, seems out of scope to me 😅

As for Android, not sure how relevant it is, open to your thoughts

GLFW might be tricky since it would need pixel buffer access, I found some examples, but making it work properly across platforms could take serious effort


Just an extra note, once the target platforms are implemented, it’ll still take a bit more time before it's ready to merge

Framebuffer copying and blitting still need improvement, and there are a few issues I noticed with some examples, but we're almost there!

@raysan5
Copy link
Owner

raysan5 commented Sep 28, 2025

@Bigfoot71 do you think we can merge this PR and keep working from there? It is a big change with many parts involved so if we already got some parts functional I think we can merge this big PR and move to smaller PRs.

@Bigfoot71
Copy link
Contributor Author

Yes, this looks good to me!

I did some cleanup and applied a few minor fixes and improvements. I tested most of the examples, everything in the shapes and text examples works fine (what seems most important), but I noticed a few issues with others:

  • T-junction artifacts (3D)
  • Blending issues
  • Some models not rendering

For the first, I'll need to check for inconsistencies in the rasterizer conventions.

Regarding blending, there was a simple mistake in swBlendFunc. After fixing it, it worked in isolated tests, but some issues remain on the raylib side. Alpha blending itself works (strange).

For the models, it's unclear if the issue comes from raylib or rlsw. It mostly affects animated models, I'll investigate further.

So feel free to merge now, I'll open PRs as needed, but for now we have a functional, maintainable single-header solution, which is a solid step I think.

PS: Out of curiosity, I tried GLFW and it compiles but gives a black screen. I think that's fine for now x)

@Bigfoot71 Bigfoot71 marked this pull request as ready for review September 29, 2025 02:19
@raysan5 raysan5 merged commit 584bc14 into raysan5:master Sep 29, 2025
17 checks passed
@raysan5
Copy link
Owner

raysan5 commented Sep 29, 2025

@Bigfoot71 Thanks for the further review! This is definitely the biggest addition to raylib in a long time!

I'll open PRs as needed, but for now we have a functional, maintainable single-header solution, which is a solid step I think.

Absolutely! This new backend will allow raylib to evolve in new directions! Thanks for the hard work put on it! 🚀

psxdev pushed a commit to raylib4Consoles/raylib that referenced this pull request Oct 5, 2025
* add base of rlsw.h

* implement state support
Also replace the triangle rasterization functions with macros that generate specific functions for each state of the rendering system.
Also, add the OpenGL definitions in order to add a binding for rlgl.

* branchless float saturation

* apply perspective correction to colors

* impl line clipping and rasterization
+ tweak function names

* impl face culling

* impl color blending

* fixes and tweaks

* add clear buffer bitmasks

* small optimizations / tweaks

* review ndc to screen projection

* avoid to recalculate MVP when its not needed + tweaks

* review the loading and management of textures
to be closer to the OpenGL API

* texture sampling optimization

* review get pixel functions
+ review unorm/float conversion

* add several buffer format support
Several depth and color formats have been added for the framebuffer.

8-bit, 16-bit, and 24-bit formats are now available for depth.

RGB 8-bit (332), RGB 16-bit (565), and RGB 24-bit (888) formats are now available for color.

Alpha support is no longer present for the framebuffer at the moment, but it can easily be restored by adding the formats and reinterpolating the alpha in the areas that do not perform color blending.

Additionally, this commit brings performance improvements.

* tweaks

* impl line width

* impl points + point size

* fix and improve polygon clipping functions

* impl polygone modes

* add some not planned functions
- `glDepthMask`
- `glColorMask`

* framebuffer resizing + handle init failure

* add quick notes about line clipping algorithms used

* start to impl scissor test + review line clipping
The support for the scissor test has been implemented for clearing as well as for triangle clipping.
The implementation for lines and points is still missing.

I also removed the 2D clipping of lines that used the Cohen-Sutherland algorithm, opting instead to always use the Liang-Barsky algorithm in all cases.
This simplifies the implementation, and the 2D version would have caused issues when interpolating vertices in the future if we want to implement additional features.

* review scissor clear

* review `swScissor`

* impl line scissor clipping

* round screen coordinate (line rasterization)

* impl point scissor clipping

* remove unused defs

* add getter functions

* gl binding

* add `glHint` and `glShadeModel` macros (not implmented)

* binding tweaks

* impl copy framebuffer function + glReadPixels

* review `swCopyFramebuffer`

* update rlgl.h

* update rlgl.h

* texture copy support

* fix typo..

* add get error function

* def sw alloc macros

* reimpl get color buffer func
just in case

* remove normal interpolation

* review texture wrap

* fix ndc projection (viewport/scissor)

* impl framebuffer blit function

* reduce matrix compuations and memory usage

* swBegin tweaks

* preventing a possible division by zero

* remove useless scissor related data

* review color blending system

* greatly improve float saturation

* tweak lerp vertex function

* use opitmized fract function in sw_texture_map

* tweak framebuffer functions for better readability

* optimized copy/blit functions for each dst format

* review framebuffer filling functions

* impl specific quad rendering func

* use of a single global vertex buffer

* fix 'sw_poly_point_render'

* added `SW_RESTRICT` and redesigned `sw_lerp_vertex_PNCTH`

* tweak the pipeline flow regarding the face culling
avoids misprediction, improves vectorization if possible

* new rendering path for axis aligned quads

* oops, translating some comments

* use of `restrict` for blending function parameters

* update rlgl.h

* adding `GRAPHICS_API_OPENGL_11_SOFTWARE` in `DrawMesh`

* add `RL_OPENGL_11_SOFTWARE` enum

* temp tweak

* build fixes

* fix DrawMesh for GL 1.1

* update swClose

* review texture format + fix copy

* set minimum req vertices to 3 (quads)

* check swInit

* review pixelformat

* tweaks

* fix animNormals (DrawMesh)

* fallback color/texcoord (swDrawArrays)

* review swMultMatrixf

* fix texture pool alloc..

* review triangle scanlines
increment all data

* fix `sw_quad_sort_cw`

* impl sdl platform

* rm def

* increase max clipped polygon vertices

* improve triangle rasterization along Y axis
improved robustness against numerical errors
incremental interpolation along Y
simplified function, fewer jumps

* review current vertex data
+ increase max clipped polygon vertices (for extreme cases)

* fix and improve polygon clipping
Sets the vertex count to zero when the polygon is invalid
Stops clipping when the vertex count drops below 3

* fix gradient calculation

* cache texture size minus one + comments

* tweaks

* BGRA copy support

* adding software backend option (cmake)

* update Makefile

* fix face culling

* excluse some exemple with the software backend

* review SW_CLAMP case in sw_texture_map

* review sw_saturate

* review line raster

* fix sw_quad_is_aligned

* review sw_raster_quad_axis_aligned

* tweaks

* codepoint fix (?)

* fix var name...

* rcore_drm software renderering

* cleanup and tweaks

* adding support for `GL_POINT_SIZE` and `GL_LINE_WIDTH` get

* fix sampling issue

* fix swBlendFunc

---------

Co-authored-by: Ray <raysan5@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new feature This is an addition to the library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants