Freedreno on Android – Overview

I’m working on adding Freedreno support to Android this summer with the X.org Foundation! This post documents the technical specifics of what I’ll be doing.

Android abstracts its hardware interfaces behind device-specific Hardware Abstraction Layers which can be customized by the vendors. The compositor, called SurfaceFlinger uses the Hardware Composer HAL to
1) decide if a layer must be processed through OpenGL/GPU (HWC_FRAMEBUFFER) or the SoC display controller (HWC_OVERLAY) during prepare().
2) handle VSYNCs through vsync().
3) select displays and provide modesetting.
The OpenGL composition pathway requires libEGL and libGLES to be present. In the absence of a HWC, all compositions use this pathway.

Buffer allocations happen through gralloc HAL, which uses an in-kernel memory manager to provide alloc() and free() calls. It allocates suitable buffers depending on the requested usage type. HWC and gralloc API documentation is available at hwcomposer.h and gralloc.h.
A KMS based HWC would reduce GPU dependency and improve performance by handling composition through the display controller.

This project targets providing a functional Android graphics path using DRM/KMS based gralloc and HWC with atomic pageflips using Freedreno/Mesa EGL/GLES running on upstream Android or Android-x86 on an IFC6410. A stretch goal is to have Freedreno support with an Android distro (CyanogenMod).

Implementation Plans:
I am starting with testing and fixing the existing bits and proceeding to interface HWC with KMS APIs to use the SoC display controller.

The libEGL and libGLES requirements would be provided using Mesa, similar to Android-x86. drm_gralloc support for Freedreno has been developed, but remains largely untested. The existing reference HWC implementation just uses eglSwapBuffers (i.e. GPU composition).

Initial task would be assembling these untested parts to run on the IFC6410. After fixing the issues discovered, we would have Freedreno running on Android, but *without* any modesetting support in place.

The project would then proceed to implementing the HWC with KMS APIs. A reference implementation exists using userspace fences – sw_sync, but atomic modesetting support within MSM kernel can provide a way to do away with these.

Resources:
Android Graphics Stack Requirements
Freedreno drm_gralloc in Android-x86

Advertisements

Porting UEFI to BeagleBoneBlack: Technical Details I

I’m adding a BeagleBoneBlack port to the Tianocore/EDK2 UEFI implementation. This post details the implementation specifics of the port so far.

About the hardware:
BeagleBoneBlack is a low-cost embedded board that boasts an ARM AM335x SoC. It supports Linux, with Android, Ubuntu and Ångström ports already available. It comes pre-loaded with the MLO and U-Boot images on its eMMC which can be flashed with custom binaries. Bootup can also be done from a partitioned sdcard or by transferring binaries over UART (ymodem) or USB (TFTP). The boot flow is presented here:

The Tianocore Project / Build System

The EDK2 framework provides an implementation of the UEFI specifications. It’s got its own customizable pythonic build system that works based on the config details provided through build meta-files. The build setup is described in this document.

(TL;DR: the build tool parses INF, DSC and DEC files for each package that describe its dependencies, exports and the back-end library implementations it shall use. This makes EDK2 highly modular to support all kinds of hardware platforms. It generates Firmware Volume images for each section in the Flash Description File, which are put into a Flash Description binary with addressing as specified in the FDF. The DSC specifies which library in code should point to which implementation, and the INF keeps a record of a module’s exports and imports. If these don’t match, the build simply fails.)

Implementation

I started out with an attempt to write a bare-metal binary that prints over some letters to UART to get a hang of how low-level development works. Here‘s a great guide to the basics for bare-metal on ARM. All the required hardware has to be initialized in the binary before use, and running C requires an execution environment set up that provides stacks and handles placement of segments in memory. Since U-Boot already handles that in its SPL phase, I wrote a standalone that could be called by U-Boot instead.

The BeagleBoneBlackPkg is derived from the ArmPlatformPkg. I began with echoing the “second stage” steps mentioned here – implement the libraries available and perform platform specific tasks – as I intended to take over boot from U-Boot/MLO.  This also eased me from having to do the IRQ and memory initializations.

I’m using the PrePeiCore (and not Sec) module’s entry point to load the image. It builds the PPIs necessary to pass control over to PEI and calls PeiMain.

Running the FD:  The build generates an .Fd image that will be used to boot the device. The MLO binary I’m using is built to look for and launch a file named ‘u-boot.img’ on the MMC (there’s a CONFIG_ macro to change this somewhere in u-boot), so I just rename the FD to u-boot.img before flashing it.


UEFI over BeagleBone Black: Notes

The Tianocore/EDK2 project provides an opensource implementation of UEFI specification. It has its own Python-scripted build system that supports configuring the build parameters on the go using build metadata files [http://tianocore.sourceforge.net/wiki/Build_Description_Files]. These files decide which library instances are required for a package; which instance implementation is to be used, what interfaces it exports, compiler specifications for the package and the generated image’s flash layout.

I am working to bring up UEFI support for a BeagleBone Black. Currently, I am using u-boot’s SPL to call the UEFI image (by placing generated .Fd on an MMC as “u-boot.img”), which, in turn, would provide the UEFI console and kernel loading functionality.

Since SPL does memory, stack and irq initialization, the SEC/PEI phases have little work. As per the BBB SoC (an AM335x), all multicore and AArch64 code can be safely removed from the package. UART being similar to the 16550 module can be written to by implementing SerialPortLib accordingly. Console services can be made available only after the EFI_BOOT_SERVICES table has been populated, which requires DXE phase completion.


A U-Boot Independent Standalone Application

U-Boot allows you to load your own applications at the console. The application already has the hardware interfaces available for use (u-boot does it), and everything does not need to be brought up from scratch.

It comes with a sample hello_world program at u-boot/examples/standalone/hello_world.c, which is supposed to print stuff to console. It depends on U-Boot interfaces, but by tracing back the source code, it can be easily re-written to have nothing to do with the U-Boot API.

In the end, hello_world.c:printf()’s job is to write the characters to UART’s address. Implementing this on a BeagleBone Black is pretty easy:

The ARM AM335x TRM mentions the address-offsets of all registers available with the processor. The UART0_BASE is defined at 0x44E09000. Memory mapped registers need to be kept volatile to prevent compiler optimizing them away.

Here’s the code:

Compile it similarly to U-Boot’s examples to get a .bin.

To execute it, you can either: 1) copy the SREC over serial, 2) set up TFTP or 3) put it on an sdcard

The load_address below is an env var that specifies where the application will be loaded. It can be changed when building U-Boot.
For the current build, find it from the console using

U-Boot# printenv

U-Boot# fatload mmc 0 <load_address>
[...]

The entry point’s address of the application can be found by looking at the objdump. See this if you have larger applications.
Launching it,

U-Boot# go <entry_point_address>


If you plan to write a purely standalone binary, you are required to initialize the hardware manually and provide a functioning C execution environment. It also requires information regarding placement and relocation of the text and data segments, allocation and zeroing of BSS segment and placement of the stack space. See this and this.


Transferring a .bin from openSUSE to U-Boot, or, Rightly Configuring TFTP on openSUSE

After spending a *lot* of time figuring out why I could not transfer a standalone binary to u-boot running on my BeagleBone Black, I finally discovered it was a firewall issue. This post is to save anyone in the future from suffering the same nightmare as I just went through on openSUSE 13.1.

The Problem:

I needed to put a .bin on my BBB which has U-Boot. The available options are:

Transferring the S-Record

SREC is the hex encoding of binary data generated on compilation. To load this, U-Boot provides the `loads` command at its console. You just need to pass the ASCII-Hex data from the .srec to the serial console (see this). The problem is, the speed of sending this data must be okay with the U-Boot console. Gave me a `data abort` message and my board reset.

Using TFTP

Better option: tftp. Have static IP setup for the host and the board (set env vars ipaddr and serverip on u-boot) and call tftp. It gave me this:

U-Boot# tftp
link up on port 0, speed 100, full duplex
Using cpsw device
TFTP from server 192.168.10.1; our IP address is 192.168.10.2
Filename 'hello_world.bin'.
Load address: 0x80200000
Loading: T T T T T T T T T T T T T T T 
Abort

(**T = Timeout.**)

Fix:

TFTP uses UDP port 69 for transfers. I needed to explicitly check “Open port in firewall” from the TFTP server config from YaST and add port 69 to Firewall->Allowed Services->UDP Ports.

 

X-Loader / MLO With a Custom Payload

X-Loader (MLO) (u-boot/common/spl/spl.c, responsible for eMMC initialization and executing the u-boot binary) first parses the image header info (u-boot/common/spl/spl.c:spl_parse_image_header) which effectively does this:

if (magic number of header == IH_MAGIC) {
  set spl_image to the detected image
}
else {
  fill in spl_image assuming u-boot.bin
}
...
call spl_image->entry_point()

IH_MAGIC is set to 0x27051956, the default magic number when creating an image with mkimage. This image can be called from within u-boot at the u-boot command line. By default, the SPL assumes a `uImage` payload and if not found, tries to launch u-boot.

Some Notes on Randomness

Judging the ‘quality’ of random data is indeed a complex statistical problem. The dieharder suite has been designed around this issue, which consists of a number of tests that establish a criteria for an RNG to be good, and fail if it isn’t met. The tests differ significantly from each other; some of these may be easier to pass than the others.

A PRNG needs to be initialized using a ‘seed’, which may be taken from the machine-available sources of entropy (/dev/random, /dev/urandom on Linux). Since it works on deterministic algorithms, for the same seed, the PRNG produces the same results on each run. So, if the seed is compromised, the adversary may be able to predict the sequence produced by the PRNG. Since /dev/random and /dev/urandom are char devices, their concurrent reads from multiple applications will preserve the uniqueness of the data received by each application. This implies that if multiple PRNGs are being initialized at the same time, each of these receives a unique seed.

The OpenSSL PRNG, on Unix-like OSes it seeds itself using data obtained by reading /dev/urandom, /dev/random and /dev/srandom (on OpenBSD), spending 10 ms on each (openssl/crypto/rand/rand_unix.c). For the PRNG to be cryptographically secure, its initial seed must not become known. This, in case of OpenSSL, implies that the external devices it reads from must be reliable sources of randomness.

For machines that lack /dev/random as an option, Entropy Gathering Daemon can be used. It is a perl script which runs in the background, calling the programs available on the machine and using their results to slowly fill its entropy pool (egd.pl:175 – 321). OpenSSL can be configured to use EGD as a source of randomness. A virtual machine running on QEMU can also be fed from EGD but this is slow due to EGD’s way of working and is being investigated. Also, if EGD is not running in the `–bottomless` mode, it often blocks when being used with QEMU. So, it is not advisable to be used as of now.

Another source of randomness could be the hardware random number generators. These claim to be cryptographically secure and unpredictable, and are often very fast. But these are only as reliable as their manufacturer. Certain Intel processors provide an instruction that they claim returns reliable random numbers. But a recent revelation indicates that this instruction may have been rigged because of influence from the NSA. Therefore, depending on an HWRNG as the only source of randomness would be a bad idea. A better option is to just add this data to the main entropy pool along with data from other sources.