rgdb Remote debugging

Posted: July 8, 2014 in gdb, linux
Tags: , , ,

Check out the project I’ve been working on in github.
I frequently have to debug C/C++ code from the command line, but I have separate build machines because I build on multiple architectures and platforms. So, I wrote this python script for remote debugging.
All it does is open up an ssh connection (using python paramiko) and connects to gdb on the build machine. It then parses through the output using regular expressions and opens the source code on your local machine for debugging. As you step through the code, the line of code is underlined and centered in the code window.
I also added a useful (for my work) feature which allows you to get the tcpdump while executing a particular line of code. It then opens the tcpdump on your local machine in wireshark.

rgdb_screenshot

Authors: David Mulder; Curtis Welborn

ABSTRACT

Compilers are a core technology within Computer Science, whether they create a native executable file that runs directly on the hardware or a byte-code file that runs in a Virtual Machine. This paper begins by outlining the primary author’s capstone where he implemented an Assembler, Virtual Machine and Compiler all from scratch. No existing code-base or any special purpose libraries could be used in the implementation. The major components for these programs will be discussed and the base approach taken to creating them discussed. Following this overview, a more detailed description of lessons learned by the author while converting part of his capstone project from Python to C++ will be given. The project used in this paper provided for a rather unique perspective on the issue of converting a program from one language to another as the example program was the non-trivial Virtual Machine. The paper will conclude with a brief section describing future work where a Garbage Collector will be implemented.

INTRODUCTION

As part of a capstone project, students in the Computer Science Department are required to implement an Assembler, Virtual Machine (VM) and a Compiler. The Two-pass Assembler reads an assembly source file (.asm ) containing assembly code patterned on the MIPS architecture. During the first pass of the assembler, the syntax of the assembly code is verified and the addresses of labels are computed and stored in a dictionary for later reference. If any syntax errors are detected, an error message is produced and the assembler stops; otherwise, the assembler continues with pass two. In pass two, some semantic checks are performed, such as verifying that all referenced labels are unique and are defined. If the assembly code passes the semantic checks, numeric byte-code is produced.

Because the Two-pass Assembler and the Virtual Machine are constructed as a single project, byte-code is not written to a file; rather, it is loaded directly into the memory of the VM at the time it is produced during pass two of the assembler, as seen in Figure 1. This eliminates the need for students to write a separate Loader program which would have to read the byte-code from the file and load it into the memory of the VM.
Once the source code for an assembly program has been converted into byte-code and loaded into the VM’s memory, the VM begins by executing the first line of the assembly program. A VM is composed of two major components: memory and a byte-code execution engine. The byte-code execution engine is implemented as a loop with a big switch statement. During each pass through the loop, the VM picks the byte-code instruction pointed to by the Program Counter (PC) to execute. A separate case clause is implemented for each byte-code instruction within the switch statement. Details for each instruction, such as number of arguments and addressing modes used, are implemented in the case clause for each instruction. Memory of the VM is implemented as a large byte array which is used to store both byte-code and data. Memory of the VM can be logically segregated into three sections.
The first section which is stored in low memory (addresses 0 … N) contains byte-code, global variables and constants. The second section, which starts at the max address of memory and works backward (address max … L), contains the run-time stack where activation records for function calls are stored. Activation records and the run-time stack where they are stored are what make it possible to execute programs with recursive function calls. The third section of memory located between the first and second sections (address N+1 … L-1) is the heap. Programs that dynamically allocate space at run-time via commands such as ‘new’ or ‘malloc’ are allocating space on the heap.
Once students have completed writing the Two-pass Assembler, the Virtual Machine and various assembly programs, they implement a Two-pass Compiler for an Object-based language created just for the capstone class known as kxi. Kxi source code is a subset of both Java and C++. As kxi is Object-based, classes can be defined and instances of a class can be created at run-time, but there is no inheritance supported. The kxi Compiler is implemented using the standard phases of lexical analysis, syntax analysis, semantic analysis, intermediate code generation and target code generation [1]. No optimizations of intermediate or target code are implemented as part of the capstone.
Lexical analysis reads tokens from the source file and passes them to syntax analysis. Syntax analysis is implemented using a recursive descent parser which loads a symbol table that contains definitions for all the classes, functions and variables defined in a kxi source program, as well as any temporary variables that may be created during the compilation process. Semantic analysis, which runs during pass two of the compiler, relies upon a combination of the recursive descent parser, the symbol table, a semantic action stack and the operator stack as defined in the shunting-yard algorithm. This combination allows the shunting-yard algorithm, which converts infix expressions to post-fix expressions, to be applied at the same time that semantic checks are being performed. Using this combination of structures and algorithms makes it possible to implement a compiler without creating and traversing a syntax tree. This has proven to be significantly easier for students to implement in a single semester. The details of this process are beyond the scope of this paper; however, a tutorial has been presented [2] outlining this process.
Python was used by the primary author to implement all of these capstone projects over two semesters. After completing the capstone courses, the author wanted to complete a research class where he would add a Mark-and-Sweep garbage collector to the Virtual Machine to reclaim objects allocated on the heap. Currently there is no way to reclaim space once allocated in kxi. During the initial phase of this research project, the VM was converted from Python to C++. Additionally, the kxi Compiler was converted to produce byte-code directly to a file and a Loader was implemented to read the numeric byte-code from the file so it could be executed by the new VM written in C++. The following sections outline lessons learned and observations made by the author during this process.

VIRTUAL MACHINE MEMORY

For students completing the capstone, one of the first issues they must address is how to implement memory in the Virtual Machine and create the code which can read/write to any address as either a byte or integer (multiple bytes). Once a student truly understands how to deal with the memory of the VM, most of the remaining code for the VM is rather straightforward.
The Python implementation of the virtual machine memory used a bytearray object. The bytearray could either be declared with an initial size in bytes, or bytes could be appended individually. Appending to the bytearray is advantageous during compilation, since the size of memory needed to store all the byte-code and data cannot be determined until after the assembler has run.
Inserting byte-code into main memory (the bytearray object) involves casting an 8-bit binary sequence to an integer before making the assignment. Python did not require that the binary sequence be cast as a char, since Python integers are an arbitrary length. To make a 32-bit integer assignment, the bits are split into 8-bit segments, cast to integers, and assigned to the bytearray in sequence. This provides control over the endianness of the integers. Endianness refers to the order in which the bytes are stored in memory. If integers were stored in an integer array, then the endianness of our integers would be dependent on the architecture of the host machine.

data = “{:032b}”.format(x & 0xffffffff)
memory[i] = int(data[:8], 2)
memory[i+1] = int(data[8:16], 2)
memory[i+2] = int(data[16:24], 2)
memory[i+3] = int(data[24:], 2)
Example 1: Python memory assignment

Retrieving a 32-bit integer from the bytearray required some careful observance of the sign bit in the left most byte (using big endian notation). A 32-bit array could be constructed from a 4-byte subsection of main memory using Python list comprehension and formatting. After checking the sign bit, the bit array can be cast to an integer, and then converted from 2’s complement if the sign bit was negative. Note that inserting the 32-bit integer into main memory did not require any bit manipulation because Python already uses 2’s compliment notation, but when retrieving the bit array, special care had to be taken because Python integers are arbitrary in length. Technically, a 2’s complement integer in Python requires an infinite sequence of negation bits.

binary = ”.join([“{:08b}”.format(byte) for byte in memory[y:y+4]])
if binary[0] == ‘1’: # Convert from 2’s Complement
binary = ”.join([str(int(not int(val))) for val in binary])
return -(int(binary, 2))-1
else:
return int(binary, 2)
Example 2: Python memory retrieval

In C++, the Virtual Machine memory was implemented using an unsigned char array. The size of the array is declared statically. Inserting into memory requires bit shifting a 32-bit integer 24, 16, 8 or 0 bits, respectively, and making the assignment in the array (essentially casting each set of shifted bits to an unsigned char).

int shift = 24;
for (int i = 0; i > shift;
shift -= 8; }
Example 3: C++ memory assignment

Similarly, a 32-bit integer is retrieved by bit shifting each unsigned char and adding the resulting integers together. There is no need to check the sign bit as long as the result is cast to a 32-bit integer.

return (memory[index] << 24) + (memory[index+1] << 16) +
(memory[index+2] << 8) + memory[index+3];
Example 4: C++ memory retrieval

Accessing individual bytes in memory was similar in both Python and C++, as it only required an assignment statement.

VIRTUAL MACHINE BYTE-CODE EXECUTION ENGINE

Once single byte and multiple-byte information can be read/written to any address in memory, the byte-code execution engine must be implemented. Python does not support a switch statement, so a sequence of cascading if statements were written. During each iteration through the Virtual Machine loop, the execution engine loads the opcode, a single byte instruction, pointed to by the Program Counter, then looks for the correct opcode instruction to execute. Each instruction includes two operands. The first operand is always a register and is stored in a single byte, which sets the maximum amount of registers to 255. The second operand is 4 bytes in length and can be either a register or an address. The lone exception to this rule is the jump command, which places its destination address in the second operand. Retrieving the opcode and operands is done in the same manner as retrieving an integer or byte retrieval, as described in the previous section. Once the opcode and its associated operands are retrieved, the byte-code instruction can be executed.
Rather than encapsulate the machine state, the components of the Python Virtual Machine are all global. The functions for setting and retrieving values from memory are also global. This convention was chosen out of convenience, rather than for design purposes.

registers[command_operand1()] = registers[command_operand1()] *
registers[command_operand2()]
incrementpc()
Example 5: Python MUL instruction execution

In the C++ implementation of the VM loop, the machine state is encapsulated in a singleton class. Machine components and methods for setting and retrieving memory values are referenced in the class. Converting the instruction loop from Python simply involved inserting object references and making syntax adjustments, such as enclosing blocks in braces. For consistency, method and variable names were not changed.

machine.registers[machine.command_operand1()] = machine.registers[machine.command_operand1()] * machine.registers[machine.command_operand2()];
machine.incrementpc();
Example 6: C++ MUL inst ruction execution

The assembly trap instruction required some special adjustments for reading and writing data to and from standard I/O. In Python, the sys module is used to read from stdin and write to stdout. The method sys.stdin.read(1) is used to read a single character from the input buffer. Similarly, the C module stdio.h contains a buffered input method getchar(), which (when cast to an unsigned char) reads a single character from stdin.

BYTE-CODE LOADER

The original capstone project did not require a loader. So, the Python implementation which was an Assembler/VM combination program began execution by reading assembly instructions from a file then writing the byte-code directly to the VM’s memory for execution. In the C++ implementation, the compilation from assembly to byte-code is extracted from the VM, and added into the kxi Compiler. The Compiler outputs byte-code directly to a file in the final stage of compilation. Because the data segment of the byte-code (constants and global variables) comes before the first instruction in the file, a single 4-byte integer is prepended to the file indicating the starting address of the first byte-code instruction.
Loading requires opening the byte-code file for reading, setting the program counter based on the first 4 bytes of the file, then reading the remaining data byte for byte into main memory (the unsigned char array). A counter is kept during the file read to count the number of bytes in the file, which determines the total size of the data segment plus the byte-code segment. This counter becomes the stack limit, which is the start of the heap. The stack base is set to the length of the main memory array. The runtime stack extends from the stack base up to the stack limit. The stack limit will vary at runtime based on allocation on the heap.

DEBUGGING BYTE-CODE INSTRUCTIONS

The VM can be executed in a debug mode, which outputs each instruction as it is run. It's useful to have the contents of registers output to a file, and some sort of break point instruction is also necessary. This process makes debugging less tedious.
Because the Python VM begins it's execution by compiling from assembly to byte-code, it has access to the symbol table at run time. This proved advantageous during debugging because branches and methods could be labeled within a back trace. This also provided access to line numbers within the source assembly file, so that error messages could specifically indicate the location of a failure.
The C++ implementation of the VM executes directly from byte-code, which means extra symbol loading would need to be added to the byte-code loader in order to have labels and line numbers in error messages. Line numbers can be roughly estimated, based on the distance of the current instruction from the START label, but this is useless when executing instructions outside the main method (methods are on negative line numbers if they are above the START label). This could be remedied by including debug symbols, during the load phase, which indicate where the code segment begins. Unfortunately, line numbers could only be calculated based on the distance between instructions, which ignores white space and comments. Removing the white space and comment lines from the assembly would also complicate debugging.

CONCLUSIONS

The amount of code to implement memory management for the VM was similar between Python and C++; however, the Python code was considerably more complex. Whereas in C++ you would simply shift and input bits in their proper order, Python required complicated array manipulations and careful observance of the rules of 2's complement negation. Python's representation of bit arrays as characters may have been convenient, but certainly effected execution time.
Syntactically, converting the VM’s execution engine from Python code to C++ was relatively simple. The greatest difference between the two languages was Python's indentation blocks. Adding brackets around each block and semi-colons after each statement was simple enough. Creating the singleton object which held the machine state took very little time, and would not have required many syntax changes if it had been implemented originally in the Python code.
The most significant difference between the Python and C++ implementations of the Virtual Machine was performance. After accounting for the extra compile/load time in the Python Virtual Machine (time was measured directly within the start of the execution engine), the Python implementation was much slower. The performance was tested using recursive Fibonacci calls. Both virtual machines were executed using the same byte-code, with the same number of available registers and stack space. To calculate the 27th number in the Fibonacci sequence (196,418), it took the C++ implementation 1.8 seconds to return the answer. In Python it took nearly 13 minutes to return the answer. The execution time increases exponentially in both implementations, but begins much sooner in Python. It should be pointed out again that the byte-code execution engines are nearly identical between the Python and C++ implementations. The main bottleneck appears to be in the setting and retrieving of integers in Python. Another concern is that Python is already executing against it's own virtual machine, so kxi is adding an additional layer of abstraction between the source and machine code.
Interestingly, each preceding Fibonacci number takes approximately 1.62 times longer than the previous calculation time in both Python and C++. Based on that average, it would take the C++ Virtual Machine implementation 1.4 days to calculate the 50th Fibonacci number. It would take the Python implementation 1.6 years.
Although these results are interesting, the disadvantage to using Fibonacci is that the algorithm is inherently exponential. Therefore, we can't draw any concrete determination of performance from the results. A simple time wasting method would better measure the performance differences between the implementations.

public int traverse(int n) {
if (n == 0) return;
else return traverse(n – 1);
}
Example 7: Algorithm for testing performance

This method pushes n activation records to the run-time stack, then returns. Over a large data set, the results are very similar. The C++ implementation returns so quickly, that the algorithm causes a stack overflow at 56/100 of a second (187 thousand activation records). The Python implementation exceeds 1 second after a little over 1 thousand activation records are pushed onto the stack.
This project demonstrates an obvious need to pick the right tool for the job. In this case, Python was not the right tool for implementing a Virtual Machine. Any language that executes against it's own virtual machine is probably insufficient for this project. Python, in particular, did not allow enough low-level manipulation to properly manage memory access. A compiled language is much more suitable for this task. The low level access that languages such as C and C++ provide for memory management makes them better suited for this work.
In contrast, although it is not discussed in this paper, Python proved to be a very good choice for implementing a Compiler. Python’s flexibility with string manipulation and easy data passing proved very appropriate for interpreting and compiling the kxi language to byte-code.

FUTURE WORK

While the initial goal of this project was to dramatically increase the performance of the Virtual Machine, the ultimate goal is to implement a Mark-and-Sweep Garbage Collection algorithm. To complete the Garbage Collector, activation records and object instances need to be modified to store all references (pointers) in a known location. The Garbage Collection algorithm will have to traverse the run-time stack searching for each reference to an object instance on the heap so that the memory can be marked as currently in use. In the sweep phase of the algorithm, all the memory allocated to the heap will be traversed to search for unused blocks of memory that can be added back to a list of available memory.

REFERENCE

[1] Aho, Alfred, Lam, Monica, Sethi, Ravi, Ullman, Jeffrey, Compilers Principles, Techniques, & Tools, 2nd, Pearson, Addison Wesley, 2007.
[2] Welborn, R., Curtis, Making your Undergraduate Compiler Course: Interesting, Relevant and Passable Tutorial, CCSC Rocky Mountain Region, Ft. Collins, Co., 2010.

© CCSC, (2013). This is the author's version of the work. It is posted here by permission of CCSC for your personal use. Not for redistribution. The definitive version was published in The Journal of Computing Sciences in Colleges, {volume 29, number 2, December 2013}, http://dl.acm.org/.

The capstone project for my Computer Science degree is a virtual machine and compiler written from scratch.
The original requirements for the virtual machine was for it to read and execute assembly code. Framing function calls on the stack was hand written in assembly. The vm also had to support pseudo multi-processing.
The compiler (the second semester of the capstone project) takes c-like source code and compiles it down to assembly code which can execute on the vm.

Extending the Project

I wanted to do more with the original project, so I spoke with the Professor, and he’s working with me to do independent study to improve the original design of the vm. I’ve modified the compiler so that it writes byte code to a file, instead of stopping at assembly. I also completely rewrote the vm in C++. I originally wrote the vm in Python (for convenience), but the Python vm is super slow. I had to implement a loader in the rewritten vm, since my compiler now outputs byte code.

Garbage Collection

The big piece of the project I’m working on right now is implementing a mark and sweep garbage collector. I had to reorganize the frames and put all the pointers at the top of the frame (otherwise the gc algorithm can’t find the pointers inside the method call). I’ve also reorganized the heap allocation to put pointers at the front of every object.
My plan for implementing the gc algorithm in the vm is to use the list of pointers in the main method as the root of the tree during the mark phase. Any allocated memory that can’t be reached from the the pointers in the main method will be considered unused and will be freed.

I’ll post more as I get close to completing this project (sometime this August). Is anybody interested in seeing source code or more details? Post if you’re interested.

Yes people, this really does work (and very well).

If you’re installing on openSUSE 12.2:

1. Install the latest wine from the community repo.
sudo zypper ar http://download.opensuse.org/repositories/Emulators:/Wine/openSUSE_12.2/ wine
sudo zypper ref
sudo zypper dup -r wine
sudo zypper in wine

If you’re installing on openSUSE 12.3:

1. Just install wine:
sudo zypper in wine

2. Install office
WINEARCH=win32 WINEPREFIX=~/.office wine /setup/directory/setup.exe

3. The office apps should now be in your menu, if not… (change OUTLOOK.EXE for whatever you’re opening)
WINEARCH=win32 WINEPREFIX=~/.office wine ~/.office/drive_c/Program\ Files/Microsoft\ Office/Office14/OUTLOOK.EXE &

If you need to configure Outlook:
WINEARCH=win32 WINEPREFIX=~/.office wine control
Click on Mail. The rest is the same as if you were configuring on Windows.

Here are some more specifics for actually installing Office: http://appdb.winehq.org/objectManager.php?sClass=version&iId=17336
Only the 32-bit installer works. I happen to be using MS Office 2010 Professional Plus.

Screenshot from 2013-02-07 09:54:49

EDIT: Just install Pipelight http://fds-team.de/cms/pipelight-installation.html
Pipelight lets you run netflix in your native linux browser (it runs a wine application in the background and pipes info through it)!

With the big news that someone got netflix working using wine + firefox + silverlight, I decided to give it a shot on openSUSE 12.2.
There’s no rpm package available for this yet (though I’m thinking about making one, but don’t really have the time).

This is what I did to get it working:

1. Install the latest wine from the community repo:
sudo zypper ar http://download.opensuse.org/repositories/Emulators:/Wine/openSUSE_12.2/ wine
sudo zypper ref
sudo zypper dup -r wine
sudo zypper in wine

2. Install mstcore fonts (thanks Tyler):
sudo zypper in fetchmsttfonts

3. Install firefox version 14.0.1:
wget -O Firefox-14.0.1.exe http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/14.0.1/win32/en-US/Firefox%20Setup%2014.0.1.exe
WINEARCH=win32 WINEPREFIX=~/.netflix wine Firefox-14.0.1.exe

4. Install Silverlight version 4:
wget -O Silverlight-4.exe http://silverlight.dlservice.microsoft.com/download/6/A/1/6A13C54D-3F35-4082-977A-27F30ECE0F34/10329.00/runtime/Silverlight.exe
WINEARCH=win32 WINEPREFIX=~/.netflix wine Silverlight-4.exe /q

5. Watch Netflix!
WINEARCH=win32 WINEPREFIX=~/.netflix wine "C:\\Program Files\\Mozilla Firefox\\firefox.exe" http://netflix.com/

DO NOT use the latest versions of Firefox and Silverlight! They are broken in wine!

Looks like we don’t have to pay for an account anymore! Too bad they don’t have an rpm built yet :( but it looks like a Fedora rpm is at least on the way.

1. Install dependencies
zypper in libcryptopp-5_6_1-0 rpm-build alien

2. Get the debian package from Spotify
wget http://repository.spotify.com/pool/non-free/s/spotify/spotify-client_0.8.4.103.g9cb177b.260-1_amd64.deb

3. Convert the debian to an rpm
sudo alien -r spotify-client_0.8.4.103.g9cb177b.260-1_amd64.deb

4. Install the rpm
sudo zypper in spotify-client-0.8.4.103.g9cb177b.260-2.x86_64.rpm

5. Link dependencies
sudo ln -s /usr/lib64/libnss3.so /usr/lib64/libnss3.so.1d
sudo ln -s /usr/lib64/libnssutil3.so /usr/lib64/libnssutil3.so.1d
sudo ln -s /usr/lib64/libsmime3.so /usr/lib64/libsmime3.so.1d
sudo ln -s /usr/lib64/libplc4.so /usr/lib64/libplc4.so.0d
sudo ln -s /usr/lib64/libnspr4.so /usr/lib64/libnspr4.so.0d

6. Run Spotify!
spotify&

Enjoy!

In case anyone is looking and stumbles upon this, these are the dependencies you need to install in order to run Skype on 64-bit openSUSE 12.1:
xorg-x11-libXv-32bit
libqt4-32bit
libqt4-x11-32bit
libpng12-0-32bit