A Summer of Code: My GSoC 2023 Conclusion

16 min read

Introduction

My name is Vikram Kangotra, and as of September 24, 2023, I am a Computer Science and Engineering undergraduate student at the National Institute of Technology Srinagar. I am an avid programmer who loves to contribute to open source projects. Open source has always fascinated me, but like many others, I felt a barrier before diving into open source contribution. Google Summer of Code (GSoC) provided me with the opportunity to overcome that barrier and gain confidence in my skills and contributions. This summer, I had the privilege of working on the VLC media player project.

What is Google Summer of Code (GSoC)?

Google Summer of Code (GSoC) is a prestigious global program that brings together talented student developers from around the world to work on open-source software projects. Organized and sponsored by Google, GSoC provides an opportunity for students to collaborate with established open-source organizations and contribute to their projects during the summer break.

Participating in GSoC is a remarkable experience for aspiring developers like me. It offers a unique platform to dive into the world of open source, gain hands-on experience, and make meaningful contributions to real-world projects. The program is not just about coding; it's about learning, problem-solving, and becoming a part of the global open-source community.

GSoC provides students with stipends, mentorship, and a structured environment for their contributions. It's a journey of personal growth, skill development, and community engagement. Over the course of several weeks, students work closely with their mentors to complete project tasks, meet deadlines, and produce high-quality code.

The program encourages students to embrace challenges, push their boundaries, and turn their passion for programming into tangible results. It's a stepping stone for budding developers to establish themselves in the world of open source and gain recognition for their contributions.

Project Evolution: Shifting Goals and Expanding Horizons

Initially, I embarked on this journey with the intention of "rewriting some C modules into Rust." Given my proficiency in Rust programming, this project choice seemed like a natural fit. However, the project's trajectory took an intriguing turn during my very first discussion with my mentor. We collectively decided to expand the project's scope significantly – our new mission: bringing the VLC media player into the realm of web browsers.

This marked a profound shift in our objectives. Our challenge was now to port VLC to web browsers, requiring the creation of entirely new Rust modules capable of functioning seamlessly within the web environment. This pivot introduced a fresh set of complexities and opportunities, pushing the boundaries of what I could achieve during Google Summer of Code (GSoC) 2023.

Project Work

1. Enabling Rust Support in VLC

  • Understanding Contribs and Their Role

    Before delving into the technical details, it's crucial to grasp VLC's intricate build system and the pivotal role played by "contribs." VLC relies heavily on a vast array of third-party libraries, codecs, and tools collectively known as "contribs." These contribs are precompiled and seamlessly integrated into VLC's build process, streamlining compilation and ensuring consistent performance across diverse platforms.

  • The Objective: Bringing Rust to VLC

    The primary objective of this task was to introduce Rust, a powerful systems programming language renowned for its performance and safety features, into the VLC ecosystem. Specifically, the goal was to enable Rust's integration within VLC's WebAssembly (Wasm) builds. This ambitious undertaking promised to unlock new horizons for performance optimization and significantly broaden VLC's development possibilities.

  • The Merge Request: Ushering in Rust

    The work undertaken for this task is meticulously documented in this merge request. The focus of this merge request was to upgrade the Rust and Cargoc versions utilized in VLC's build system and configure the appropriate Rust target for WebAssembly.

    The Technical Details

    • Upgrading Rust and Cargoc Versions

      The initial phase of this endeavor revolved around upgrading the Rust and Cargoc versions utilized within VLC's build environment. This upgrade was indispensable for the wasm32-unknown-emscripten build, as it hinged on cargo-c version 0.9.20. Additionally, cargo-c mandated a minimum Rust version of 1.69.0.

    • Setting the RUST_TARGET

      A pivotal aspect of enabling Rust support for WebAssembly was configuring the precise target architecture. This was accomplished by introducing the selection of wasm32-unknown-emscripten as the target triple for authoring Rust modules within the VLC project.

    • Host Checking and Rust Enablement

      While this specific aspect wasn't encompassed within the commit, it was a critical step in the process of successfully compiling Rust for the wasm target. To enable Rust compilation, a host check was imperative to ascertain the correct configuration. The following code, excerpted from the configure.ac file, illustrates this process:

      AS_IF([test "$\{host_os\}" = "emscripten" -a "$\{host_cpu\}" = "wasm32"],[
          AC_SUBST([RUST_TARGET], ["wasm32-unknown-emscripten"])
          ],[
          AC_MSG_WARN([no suitable rust target found (only x86_64 linux-gnu and wasm32 emscripten are supported). Your host is $\{host_os\} $\{host_cpu\}])
          enable_rust = no
      ])
      

      The Simplicity of the Task

      From a technical standpoint, this task didn't demand intricate problem-solving or extensive code alterations. Instead, it revolved around ensuring that the build environment was up-to-date and meticulously configured to accommodate WebAssembly. The process was straightforward, culminating in the acceptance of the merge request, marking a significant enhancement to VLC's capabilities.

2. Overcoming Path Errors and Undefined Constants

  • Following the initial phase of the project, a new set of challenges emerged. Path errors surfaced in some build.rs files, creating hurdles that needed to be addressed promptly. Fortunately, the initial problem was relatively straightforward to solve – adding the necessary include path from the emscripten SDK resolved these path errors effectively.

    However, the journey did not end there. As I proceeded with the project, a more intricate issue surfaced – undefined constants. These constants were a vital component of the project, yet they proved challenging to port from their original C/C++ codebase to Rust. The primary challenge stemmed from the intricate nature of the macros defined within the C/C++ code, which proved to be too complex for bindgen to seamlessly translate into Rust.

    Bindgen is a tool that automatically generates Rust code to bridge the gap between Rust and C/C++ code, making it easier to use C/C++ libraries in Rust projects. It parses C/C++ header files and creates Rust bindings, simplifying interoperability between the two languages.

    To surmount this obstacle, a strategic decision was made. Rather than grappling with the limitations of automated translation tools, I opted for a proactive approach. This involved manually adding the missing constants directly to the C/C++ codebase. This adjustment allowed bindgen to successfully port these essential constants to the Rust ecosystem.

    This meticulous approach ensured that vital elements of the project remained intact and that the Rust implementation could seamlessly integrate with the existing C/C++ components.

    For a detailed record of the progress made up to this point, you can refer to this commit, which encapsulate the journey of overcoming these technical challenges

3. Adding Atomics Feature for WASM Compilation

  • As I progressed with the project, a new challenge surfaced during the compilation phase, specifically related to atomics support. The error message I encountered highlighted the issue:

    wasm-ld: error: --shared-memory is disallowed by rav1e-80a20d22ae9816b4.23l01rh5fhv9a8sh.rcgu.o
    because it was not compiled with 'atomics' or 'bulk-memory' features.
    

    To address this hurdle, I devised a straightforward solution. I provided specific flags to Cargo as outlined in this commit. This adjustment enabled the necessary atomics support, resolving the compilation error and ensuring the project's continued progress.

    Cargo is the official package manager and build tool for the Rust programming language. It simplifies the process of managing dependencies, compiling code, and building Rust projects, making development more efficient and consistent.

4. Navigating the Challenges of Interfacing Rust with JavaScript in a WebAssembly Project

  • Writing JavaScript code in C/C++ becomes relatively straightforward with emscripten. However, I encountered an unexpected challenge when attempting to invoke JavaScript code from Rust.

    Emscripten is a powerful toolchain for compiling C/C++ code into WebAssembly (Wasm) and JavaScript, making it possible to run high-performance native code in web browsers. It facilitates the conversion of code written in languages like C and C++ into formats that can be executed within web environments, enabling web applications to harness the performance of native code for tasks like gaming, multimedia, and more.

    Initially, I had the idea to leverage wasm-bindgen to facilitate the seamless calling of Rust functions and data from JavaScript and vice-versa.

    Wasm-bindgen is a Rust library and tool that simplifies the interaction between WebAssembly (Wasm) modules and JavaScript. It generates JavaScript glue code to enable seamless communication between Rust and JavaScript, making it easier for developers to integrate Rust code into web applications and leverage the capabilities of both languages when working with Wasm.

    The issue at hand:

    A significant portion of the VLC codebase is implemented in C/C++, and it gets compiled into WebAssembly using the Emscripten toolchain. I was in the process of adding a new module written in Rust that needed to interface with existing modules written in C/C++, necessitating the use of Emscripten. Unfortunately, there was a roadblock – wasm-bindgen lacked compatibility with Emscripten. Consequently, I couldn't employ wasm-bindgen for this particular scenario.

    Initial solution

    Initially, we tackled this problem by opting to bypass wasm-bindgen and directly utilize emscripten. In the initial approach, I authored the emscripten-side code in C++ and invoked these functions from Rust.

    You can find the code I wrote for this stage in this commit

    However, the code was predominantly in C++, necessitating further adjustments.

    Subsequently, I came across stdweb, but encountered some errors due to ABI changes, which prompted me to explore alternative solutions.

    Another approach involved using wasm-bindgen in Rust to generate the wasm file. Then, I generated a js file from this wasm file using the wasm-bindgen CLI command and linked the resulting js file as a --post-js file in the emscripten output.

    Despite this alternative approach, it also presented several challenges.

    1. Dealing with the Undefined Export Error

      To resolve the undefined export error, I initially resorted to adding type="module" to the HTML file generated by emscripten. However, I found this manual process somewhat cumbersome and not ideal. An alternative approach involved configuring emscripten to use the EXPORT_ES6 flag. However, while exploring the VLC codebase, I came across the following comment:

      # Note that we use '-s MODULARIZE', but no '-s EXPORT_ES6', which would
      # conflict with pthreads on Firefox.
      

      This prompted me to reconsider this solution.

      Subsequently, I discovered another method: instructing wasm-bindgen not to produce ES modules by using the --no-modules command.

    2. Dealing with more errors

      main.js:838 wasm streaming compile failed: TypeError: WebAssembly.instantiate(): Import #0 module="__wbindgen_placeholder__" error: module is not an object or function
      main.js:801 failed to asynchronously prepare wasm: TypeError: WebAssembly.instantiate(): Import #0 module="__wbindgen_placeholder__" error: module is not an object or function
      main.js:661 Aborted(TypeError: WebAssembly.instantiate(): Import #0 module="__wbindgen_placeholder__" error: module is not an object or function)
      main.js:680 Uncaught (in promise) RuntimeError: Aborted(TypeError: WebAssembly.instantiate(): Import #0 module="__wbindgen_placeholder__" error: module is not an object or function)
          at abort (main.js:680:11)
          at main.js:807:5
      

      A temporary solution to test and resolve these issues is available in this GitHub commit

      However, after applying these changes, I encountered the following console output:

      main.js:31 Uncaught RuntimeError: unreachable
      at main.wasm:0x1dae
      at main.wasm:0x1b42
      at main.wasm:0x1b38
      at main.wasm:0x1b4c
      at main.wasm:0x1410
      at main.wasm:0x253
      at main.wasm:0x214
      at main.wasm:0x231
      at main.js:719:14
      at callMain (main.js:1539:15)
      

      A call to action was necessary, so I settled down and delved into the enlightening pages of "Programming Webassembly with Rust" by Kevin Hoffman. This book opened up a world of insights regarding WebAssembly, revealing that its significance extends far beyond web browsers.

      Here's what I gleaned from my exploration:

      • The wasm32-unknown-emscripten target lacks wasm_bindgen code generation. This issue needs attention.

      • Attempting to use --post-js to incorporate JS code from wasm-bindgen becomes problematic, as both emscripten and wasm-bindgen JS code invoke their own WebAssembly.instantiate() functions.

      • Creating a library file from Rust code with the wasm32-unknown-unknown target and linking it with C++ code using emcc works seamlessly, provided that wasm_bindgen's exported functions remain untouched.

      • My solution: Generate a Rust library file without the JS glue code, link it to a C++ file, and then use emcc for wasm compilation once wasm-bindgen supports emscripten targets.

      • It's crucial to enable emscripten target support in wasm-bindgen.

      After extensive experimentation, I arrived at this solution: link to solution.

      However, a lingering issue remained at the forefront: the inability to call JavaScript functions from the main thread. This predicament called for a solution of its own.

      Streamlining the Process

      In search of a comprehensive solution to my challenges, I contemplated harnessing Emscripten directly from within Rust. The idea was to invoke Emscripten functions from Rust, effectively bridging the gap between the two.

      Initially, it appeared promising, but a significant hurdle remained. I needed to execute functions on the main thread, which proved tricky since modules ran on threads and consequently on WebWorkers within the web environment, making direct execution impossible.

      In the realm of C/C++, Emscripten offers macros like MAIN_THREAD_EM_ASM, within which JavaScript code runs on the main thread—a perfect fit for my needs. However, Rust lacked such conveniences, prompting me to take matters into my own hands.

      My quest led me to discover the emscripten_asm_const_int_sync_on_main_thread() function for calling JavaScript code from the main thread. Yet, when I attempted to use it, errors surfaced.

      error: undefined symbol: emscripten_asm_const_async_on_main_thread (referenced by root reference (e.g. compiled C/C++ code))
      

      This anomaly baffled me. If other Emscripten functions could be invoked without issue, why was this one undefined?

      Diving into the Emscripten source code, I found that MAIN_THREAD_EM_ASM was defined as follows:

      #define MAIN_THREAD_EM_ASM(code, ...) ((void)emscripten_asm_const_int_sync_on_main_thread(CODE_EXPR(#code) _EM_ASM_PREP_ARGS(__VA_ARGS__)))
      

      Puzzlingly, I noticed the presence of CODE_EXPR. Investigating its definition shed light on the matter:

      #define CODE_EXPR(code) (__extension__({           \
          __attribute__((section("em_asm"), aligned(1))) \
          static const char x[] = code;                  \
          x;                                             \
        }))
      

      There it was, concealed within the extensive codebase. The key issue lay in the requirement for the string I passed to reside within the "em_asm" section—an essential aspect of Emscripten's functionality.

      The solution was to instruct Rust to write to the "em_asm" section, which could be achieved using the #[link_section = "em_asm"] attribute. Sounds straightforward, right?

      However, Rust introduced its own twist to the tale. When defining a section, Rust generates a custom section instead. Resolving this dilemma presented two options: either submit a Merge Request to Rust's compiler, rustc, to adopt a different approach, or adapt to the existing ecosystem.

      Enter wasm-bindgen, a library we had discussed earlier, which relied on the custom section for its operations. Disrupting this balance was out of the question, given the risk of breaking libraries upstream. Thus, the most sensible and straightforward course of action was to propose patches to Emscripten, incorporating data from the custom section.

      The details of this journey are documented in this commit: GitHub - Emscripten PR #20219.

      Addressing WebAssembly Custom Section Challenges: A Deep Dive into Emscripten and Data Handling

      This solution still presents some issues. Specifically, the custom section lacks an address and is subsequently omitted from the final WebAssembly (wasm) output. Here are some observations:

      • In Rust, when writing to the custom section using em_asm, it appears to write to both the custom section and the data section simultaneously, based on my observations.

      • When passing a string to Emscripten's main thread function, it passes the address of the string in the data section rather than the em_asm section.

      • The code I've written to read from the custom section retrieves the string's address from the custom section, resulting in different addresses.

      To resolve this issue, I need a way to instruct wasm to read the string from the custom section address stored in ASM_CONSTS rather than from the data section. Previously, I was passing the address of the string from the data section, but it required manual updates to the offset in Emscripten.

      For further reference, here are links that support my theory:

      1. Issue 13838 on GitHub comment-819061085
      2. Issue 13838 on GitHub comment-819088055

      The address passed to the Emscripten function pertains to the data string in the data section, while the dictionary I create in Emscripten's Python code contains the address from the custom section before being excluded from the final output. My goal is to obtain the address from the data section instead.

      I am still grappling with this problem, and it remains unsolved until I find a viable solution. You can find the final working code for capturing video streams from a web camera here

Concluding Thoughts: Navigating Open Source and WebAssembly Challenges

My journey during Google Summer of Code 2023 has been transformative. From rewriting C modules to venturing into the world of WebAssembly, I've encountered challenges and seized opportunities that have enriched my understanding of open source development.

I've learned that open source is a dynamic world where adaptability and creativity are key. Each problem I faced provided a unique opportunity to learn and grow. From enabling Rust support in VLC to overcoming path errors, undefined constants, and atomics issues, I've expanded my knowledge and problem-solving skills.

One crucial lesson I've gained is the power of collaboration and mentorship. My mentor and the open-source community have been invaluable in helping me navigate the complexities of open source development. I'm immensely grateful to my mentor and VideoLan for this incredible opportunity.

As I move forward, I carry with me the knowledge and experiences gained during GSoC 2023, and I'm more confident in my abilities as a programmer and open source contributor. I look forward to applying what I've learned to future projects and contributing to the global open source community.

Signing Off: Embracing the Open Source Odyssey

Closing this blog post, I reflect on the incredible journey that Google Summer of Code 2023 has been. It's been a summer of challenges, discoveries, and growth, from rewriting code to diving into the world of WebAssembly.

Open source is a dynamic realm where adaptability and creativity thrive. Each challenge is an opportunity to learn and innovate. As we continue coding, keep collaborating, and pushing boundaries, we shape the future of technology and the open source community.

To my fellow open source enthusiasts, keep coding, keep collaborating, and keep innovating. Our contributions today pave the way for the technologies of tomorrow.

Thank you for joining me on this journey. Until next time, happy coding!