Analysis of FLAC – Stage Three of Three – SPO600 Project

Hello and welcome to my blog!

For my final post this series, about my SPO 600 project, I will discuss the last part of the project, the pull request to the upstream project. And, I will be reviewing what I have learned while working on this project.

PULL REQUEST:

Here is my the link to the pull request I created: https://github.com/xiph/flac/pull/183

The FLAC project has an extensive testing system that gets run by the continuous integration system of Travis CI. It took 2 hours, 28 minutes and 15 seconds for Travis to finish the tests on my pull request. Travis builds and runs the FLAC program in 16 different ways for the test. There are four different build settings for that are tested, and each build setting gets compiled twice once by the GCC compiler and once by the CLANG compiler. Travis will run these eight tests twice once on a Linux machine and then once on a Mac OSX machine. To speed up the tests, Travis will run multiple test scenarios at a time. The full run-time of all the test scenarios is 6 hours and 57 seconds.

My changes passed all the tests Travis ran. That means the changes I made didn’t affect the regular operation of the program.

It has been a little over a week since I made the PR. It is currently not merged, and I have not heard from the project owner. The day I made the pull request, I got asked if the CPU feature detection that I did was at run-time or compile time? Initially, I thought the question was referring to the detection of the CPU architecture, which gets performed at compile time. After some discussion, I understood that the original question was referring to choosing the correct version of a function based on variables at run-time. Which I did implement, the selection of which autocorrelation function to run gets decided by the variable “encoder->protected_->max_lpc_order.” As far as I can tell, this question was by another member of the community and not a project maintainer, but I could be wrong.

Since that question, there has not been any more activity on the pull request.

PROJECT REVIEW:

This course and project have been a great learning experience for me. I have learned a lot about low-level programming through the lectures and the project. Especially about compilation, for this project, I learned a lot about the GNU auto-tools build system. I now know how to read and edit a configure script. With the practice I got from working with this project, I am now comfortable with the whole build process using a configure script and make.

Through this course and project, I also learned about using assembler and intrinsics in the C language. I was able to gain practical experience working with x86 and aarch64 intrinsics during this project. Specifically, I learned how to use the intrinsic documentation to find the correct intrinsic I need and to understand how to use each intrinsics.

This project was the first major C/C++ project I worked on. I learned a lot about the build process of a C/C++ project. For example, how to detect the hardware details and the available software on a computer.

I also gained a lot of practice with profilers. I used profilers in the past for previous course projects, but this project gave me a practical way I could practice using profilers. For example, by finding hot spots in a program and by helping measure the performance changes in the optimization, I do to the program.

Conclusion

I have enjoyed doing this project, and I am glad I could mix my interests in audio and computer programming. This project has been a great learning experience and I would be interested in doing this type of work in the future.

Analysis of FLAC – Stage Two of Three – SPO600 Project

Hello and welcome to my blog!

In this blog post, I will be writing about my final project for my software portability and optimization class SPO 600. In this post, I will review the work I have done with the FLAC project. If you didn’t read the first post in this series, you could find it here. A quick review of the last post, for my SPO 600 class, I got tasked with finding and optimizing an open-source project. Knowing I wanted to work with audio, I found the open-source project called FLAC (Free Lossless Audio Codec). On further investigation, I made a plan on how to optimize the FLAC library. In this blog post, I will go over the implementation and results of my optimization.

Execution of my plan:

In the previous blog post, I laid out a strategy for the completion of my optimization. It turned out to be quite helpful. I was able to follow it, and I completed the optimization just as planned. Below is the strategy I made and some notes on how I completed each task.

  1. Research the required pre-processor directives that I will need to run the Aarch64 code inside the FLAC library conditionally.
    Through reading the FLAC code, I was able to determine that the pre-processor directive used in this project really on global variables defined by the configure script. After reading the “configure.ac” file, online AutoTools documentation and the “configure.ac” file from the OPUS project, which is also created by Xiph, the same people who created the FLAC project. I was able to determine how to check if the user has an aarch64 CPU and if the intrinsic library arm-neon is available. After confirming that I am on an aarch64 machine, I define “FLAC__CPU_AARCH64,” and after confirming arm-neon intrinsic are possible, I define “FLAC__HAS_NEONINTRIN.”
  2. Test the pre-processor directives with some code that will cause a fault, so I know it is working.
    Since the FLAC project has some optimizations for other platforms, I needed to follow the same pattern as the previous people. To test the preprocessor directives, I had to add the new architecture to the function selection logic. To add the new architecture, I added code to “src/libFLAC/cpu.c”, “src/libFLAC/include/private/cpu.h”, “src/libFLAC/include/private/lpc.h” and “src/libFLAC/stream_encoder.c.” Once I had the architecture function selection logic done, I was ready to test. I did not end up needing to cause any faults to confirm that the preprocessor directives were working. Instead, I used a printf statement and a copy of the vanilla C code version of the auto-correlation function. In running the program, I was able to see the printf messages, and I also used perf to confirm I was using the new function.
  3. Examine the codebase to know where precisely I need to put the pre-processor directives. And check if I need to mess with the build instructions.
    I ended up doing this in step in steps one and two since the FLAC project didn’t use pre-defined variables for the pre-processor directives. Instead, the FLAC project uses variables defined by the configure script. And I had to examine the codebase and implement the changes to run a test.
  4. Configure the makefile to build the new file that I am adding.
    I added one file to the project called lpc_intrin_neon.c. So for the compiler to build it, I put it into the list of source files inside “src/libFLAC/Makefile.am.”
  5. I am going to focus on the “FLAC__lpc_compute_autocorrelation” function, and I am going to translate it into aarch64 intrinsic’s. I will use the existing c and x86 intrinsic code to help me with the translation.
    Success! It took some time, but I was able to translate the x86 code into aarch64. The way I did this was by using the Intel and arm-neon online documentation. I also got help by googling a specific x86 intrinsic and asking what arm-neon instruction does the same thing or similar. For a few intrinsics, there was no direct replacement. Specifically, there is no shuffle in arm-neon, so I had to read up on how shuffle worked on x86 and execute that using multiple arm-neon instruction. I ended up creating inline functions for the shuffles to make writing the code more manageable and cleaner.
  6. Testing my optimization, I will re-run the test that I performed on the original code with my optimized version and see if I have improved the performance on the aarch64 platform.
    I tested with two aarch64 machines. The first machine has a faster single-thread performance with 8 threads. The second machine has 24 threads but slower single-thread performance. On the first machine, initially, the autocorrelation function took 26.11 percent of the runtime. After the optimizations, the autocorrelation function took 12.64 percent of the time. On the second machine, initially, the autocorrelation function took 52.41 percent of the runtime. After the optimizations, the autocorrelation function took 14.78 percent of the time. I also tested the optimizations on an x86 machine to confirm that the changes did not affect that architecture.
  7. As a stretch goal, depending on how hard it is to write the Aarch64 intrinsics, I would like to translate the full “ipc.c” file with aarch64 intrinsics.
    I didn’t end up translating the full “ipc.c” file, but I did translate all versions of the autocorrelation function. There are four versions of the autocorrelation function. Depending on how much lag it will call the correct version, either lag 4, lag 8, lag 12 or lag 16.

Full Results:

The following results are not averaged, but I did run theses test multiple times with similar results. The numbers below are from a few of the many tests I performed.

Aaarch64 Machine 1:

TOTAL RUNTIME OF THE TEST BEFORE OPTIMIZATION:

real    0m51.784s
user 0m49.356s
sys 0m2.349s

TOTAL RUNTIME OF THE TEST AFTER OPTIMIZATION:

real    0m43.503s
user 0m40.950s
sys 0m2.470s

PERF REPORT BEFORE OPTIMIZATION (First 20 Lines):

To display the perf.data header info, please use --header/--header-only options.
 #
 #
 Total Lost Samples: 0
 #
 Samples: 208K of event 'cycles:u'
 Event count (approx.): 98509947650
 #
 Overhead  Command   Shared Object           Symbol
 ……..  ……..  ………………….  ………………………………………………………………………………..
 #
     26.11%  lt-flac   libFLAC.so.8.3.0        [.] FLAC__lpc_compute_autocorrelation
     25.54%  lt-flac   libFLAC.so.8.3.0        [.] FLAC__fixed_compute_best_predictor_wide
     11.35%  lt-flac   libFLAC.so.8.3.0        [.] FLAC__bitwriter_write_rice_signed_block
      9.45%  lt-flac   libFLAC.so.8.3.0        [.] FLAC__MD5Transform
      5.95%  lt-flac   lt-flac                 [.] format_input
      5.60%  lt-flac   libFLAC.so.8.3.0        [.] FLAC__lpc_compute_residual_from_qlp_coefficients_wide
      3.42%  lt-flac   libFLAC.so.8.3.0        [.] precompute_partition_info_sums_
      2.34%  lt-flac   libFLAC.so.8.3.0        [.] FLAC__MD5Accumulate
      2.21%  lt-flac   libFLAC.so.8.3.0        [.] FLAC__crc16

PERF REPORT AFTER OPTIMIZATION (First 20 Lines):

To display the perf.data header info, please use --header/--header-only options.
 #
 #
 Total Lost Samples: 0
 #
 Samples: 175K of event 'cycles:u'
 Event count (approx.): 81871492155
 #
 Overhead  Command  Shared Object       Symbol
 ……..  …….  ………………  ………………………………………………………………………………..
 #
     30.58%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__fixed_compute_best_predictor_wide
     13.36%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__bitwriter_write_rice_signed_block
     12.64%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__lpc_compute_autocorrelation_intrin_neon_lag_12
     11.71%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__MD5Transform
      7.16%  lt-flac  lt-flac             [.] format_input
      5.18%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__lpc_compute_residual_from_qlp_coefficients_wide
      4.16%  lt-flac  libFLAC.so.8.3.0    [.] precompute_partition_info_sums_
      3.00%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__MD5Accumulate
      2.62%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__crc16

Aaarch64 Machine 2:

TOTAL RUNTIME OF THE TEST BEFORE OPTIMIZATION:

real    3m43.841s
user    3m33.558s
sys     0m8.791s

TOTAL RUNTIME OF THE TEST AFTER OPTIMIZATION:

real    2m3.675s
user    1m54.260s
sys     0m8.588s

PERF REPORT BEFORE OPTIMIZATION (First 20 Lines):

To display the perf.data header info, please use --header/--header-only options.
 #
 #
 Total Lost Samples: 0
 #
 Samples: 901K of event 'cycles:uppp'
 Event count (approx.): 213328075836
 #
 Overhead  Command  Shared Object     Symbol
 ……..  …….  …………….  ………………………………………………………………………………..
 #
     52.41%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__lpc_compute_autocorrelation
     11.36%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__fixed_compute_best_predictor_wide
      6.62%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__bitwriter_write_rice_signed_block
      5.80%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__MD5Transform
      4.35%  lt-flac  lt-flac           [.] format_input
      4.05%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__lpc_compute_residual_from_qlp_coefficients_wide
      2.69%  lt-flac  libFLAC.so.8.3.0  [.] precompute_partition_info_sums_
      2.52%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__MD5Accumulate
      2.10%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__fixed_compute_residual

PERF REPORT AFTER OPTIMIZATION (First 20 Lines):

To display the perf.data header info, please use --header/--header-only options.
 #
 #
 Total Lost Samples: 0
 #
 Samples: 620K of event 'cycles:uppp'
 Event count (approx.): 144725968757
 #
 Overhead  Command  Shared Object     Symbol
 ……..  …….  …………….  ………………………………………………………………………………..
 #
     15.03%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__fixed_compute_best_predictor_wide
     14.78%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__lpc_compute_autocorrelation_intrin_neon_lag_12
     10.06%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__bitwriter_write_rice_signed_block
     10.03%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__lpc_compute_residual_from_qlp_coefficients_wide
      9.14%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__lpc_window_data
      9.09%  lt-flac  libFLAC.so.8.3.0  [.] precompute_partition_info_sums_
      8.07%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__MD5Transform
      6.14%  lt-flac  libFLAC.so.8.3.0  [.] FLAC__fixed_compute_residual
      5.68%  lt-flac  lt-flac           [.] format_input

Code Changes:

On GitHub, I created a pull request inside my fork of FLAC HERE. With this pull request, you will be able to see exactly all the code changes I made to the FLAC project.

Analysis of FLAC – Stage One of Three – SPO600 Project

Hello and welcome to my blog!

In this blog post, I will be writing about my final project for SPO600. The goal of this project is to optimize an open-source library. To complete the project, I had to choose one of the following tasks: alter build options, code changes to permit better optimization by the compiler, algorithm improvements or In-line assembler.

What I did to find an Open-source project:

For a long time, I have been working with audio as a hobby. I actively produced music, mix live bands, calibrate PA speakers and play drums. So, after completing the lab about changing the volume of sound, I knew I wanted to work with audio. So, I started at an open-source project that I knew about that works with audio called Audacity. Audacity is an open-source DAW (Digital Audio Workstation.) I looked around the Audacity project source for a bit then decided to dig into its dependencies. I got the idea of looking at the dependencies from my professor. He suggested to everyone that a library would be a great place to find an opportunity to optimize some code. In looking at the Audacity dependencies, I found the library called FLAC by Xiph. FLAC is an acronym that means Free Lossless Audio Codec. Similar to what I did with Audacity, I started looking at the source code. The way I navigated the source code was by using the search on GitHub and the Find command on bash. I was looking for architecture-specific code, so my search’s were like “x86” or “aarch64”. Between searching for keywords and browsing the folders, I found a file that was called “cpu.c” in “src/libFLAC,” and inside, I was able to determine that the FLAC project has not yet been optimized for Aarch64. The way I discovered that the FLAC project has yet to be optimized for Aarch64 was by looking at the compiler preprocessing directives. Inside that file, I could see that this project has optimizations for the x86, IA32 and PPC architectures, but not for Aarch64. After learning that the FLAC project has not been optimized for Aarch64, I submitted an Issue on GitHub. Here is a link to the issue I created: Issue #156. Inside the issue, I inquired to the maintainers of the FLAC project if they were open to me, adding some optimizations for Aarch64. One of the maintainers of the FLAC project responded and is open to me, adding Aarch64 support to the FLAC project. Now that I got approval to work on this repo, I could now start on my benchmarking.

Benchmarking the FLAC project:

Step one to benchmarking the FLAC project was building the projects on both x86 and Aarch64. I downloaded the source code from the FLAC GitHub, so the first step to building the library was to run the “autogen.sh” script. Which created my “./configure” script. I then ran the configure script with the “-pg” command to allow gprof, so my configuration command was “./configure CFLAGS=”-g -pg -O2″ CXXFLAGS=”-g -pg -O2″.” I was then able to use “make -j” to build the code. Unfortunately, I could not get gporf working with the main FLAC binary. It would only send garbage data to the “gprof.out” file. Though I did get gprof working with some of the tests that were included with the source code, so I know the “-pg” worked. I ended up switching to perf for my profiling. I ran Make clean and then re-ran Configure so I could use perf. “./configure CFLAGS=”-g -O2″ CXXFLAGS=”-g -O2″ .” After doing that, I grabbed my test data and ran perf record. The command I used to test with is “src/flac/flac input.wav” this command runs FLAC and passes a wave file to it. For my sizable test data, I took one of the live multitrack recordings that I had on my computer and exported the mix to a stereo wave file. The test wave file is 1 hour and 31 minutes long and is 1.57 gigabytes.

Using Perf:

Perf did a fantastic job of helping me find what I want to optimize for this project. The part of FLAC that I tested was the encoding, specifically “.wav” encoding to “.flac.” I ran the same test file on the two different architectures, and the performance difference was noticeable instantly. On the x86 machine, it will run the encoding in 16.439s. And the Aarch64 machine took 4minutes and 39.585s. (These times are from one of the tests I did. I ran the test multiple times with similar results.) I then took a look at the perf report.

Perf Report:

In analyzing the perf reports, I was able to narrow down where I should target my optimizations. You can see from the following snippets of the perf report that on the Aarch64 architecture, the function called “FLAC__lpc_compute_autocorrelation” is taking about 49.64% of the run time. Versus, on the x86 report it uses an intrinsic version of that function called “FLAC__lpc_compute_autocorrelation_intrin_sse_lag_12_new” which significantly improves the performance. On the x86 machine, this function only took 7.25% of the run time. The vanilla c code in which the Aarch64 machine is running is located inside the file “ipc.c.” The Intrinsic code the x86 machine is running is located inside the file “ipc_intin_sse.c.” These files are located in the “src/libFLAC” folder.

aarch64
# Samples: 1M of event 'cycles:uppp'
# Event count (approx.): 255896905943
#
# Overhead  Command  Shared Object       Symbol
# ........  .......  ..................  ............................................................................................
    49.64%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__lpc_compute_autocorrelation
     8.54%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__fixed_compute_best_predictor_wide
     7.08%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__lpc_compute_residual_from_qlp_coefficients_wide
     5.65%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__bitwriter_write_rice_signed_block
     5.19%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__lpc_window_data
     5.14%  lt-flac  libFLAC.so.8.3.0    [.] precompute_partition_info_sums_
     4.54%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__MD5Transform


x86
# Samples: 61K of event 'cycles:u'
# Event count (approx.): 52656457012
#
# Overhead  Command  Shared Object       Symbol                                                                                      
# ........  .......  ..................  ............................................................................................
#
    20.02%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__fixed_compute_best_predictor_wide_intrin_ssse3
    16.78%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__bitwriter_write_rice_signed_block
    16.44%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__MD5Transform
     7.43%  lt-flac  lt-flac             [.] format_input
     7.25%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__lpc_compute_autocorrelation_intrin_sse_lag_12_new
     7.17%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__lpc_compute_residual_from_qlp_coefficients_wide_intrin_avx2
     5.44%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__fixed_compute_residual
     4.17%  lt-flac  libFLAC.so.8.3.0    [.] FLAC__MD5Accumulate

Strategy:

In reviewing the benchmarks and the code, I have narrowed down the strategy I will be doing to complete the project. Here is the list of steps I came up with to optimize the encoding of FLAC for aarch64.

  1. Research the required pre-processor directives that I will need to run the Aarch64 code inside the FLAC library conditionally.
  2. Test the pre-processor directives with some code that will cause a fault, so I know it is working.
  3. Examine the codebase to know where precisely I need to put the pre-processor directives. And check if I need to mess with the build instructions.
  4. Configure the makefile to build the new file that I am adding.
  5. I am going to focus on the “FLAC__lpc_compute_autocorrelation” function, and I am going to translate it into aarch64 intrinsic’s. I will use the existing c and x86 intrinsic code to help me with the translation.
  6. Testing my optimization, I will re-run the test that I performed on the original code with my optimized version and see if I have improved the performance on the aarch64 platform.
  7. As a stretch goal, depending on how hard it is to write the Aarch64 intrinsics, I would like to translate the full “ipc.c” file with aarch64 intrinsics.

Results To This Point:

At this point, I have found a project to work on, started learning the project structure, did research on how to use the package, I successfully built the package on both x86 and Aarch64 and performed benchmarks on the project using gprof, perf and the bash time command on both platforms. For my optimization, I have narrowed down exactly what I am going to work on for the remainder of this project and created a plan for how I will accomplish those changes.

Code Review – HelloWorld.c Lab02

Hello,

In the following post, I will be looking at seven different ways that you can compile a simple hello world program in C.

I will be using the following four flags for the GCC compiler.


-g               # enable debugging information
-O0              # do not optimize
-fno-builtin     # do not use builtin function optimizations
-static          # includes the header files in the executable

After I compile using gcc, I will look at things like the filesize and the decompiled assembler to gather my results.

Okay, so test one is to compile the following program with -g -O0 -fno-builtin. This test will be the control for this experiment. We will be basing or conclusions for the other tests of this one.


$ gcc -g -O0 -fno-builtin -o test test.c

#include 
int main(){
    printf("Hello World!\n");
}


$ objdump -fsd --source test

The command “objdump” will allow me to view the compiled code. You will probably want to pipe this command into less or send to an output file.

The results of this command will be extensive, so you will want to use a search to find “<main>.”


// Total File Size 24656 bytes
#include <stdio.h>
int main(){
  401126:	55                   	push   %rbp
  401127:	48 89 e5             	mov    %rsp,%rbp
    printf("Hello World!\n");
  40112a:	bf 10 20 40 00       	mov    $0x402010,%edi
  40112f:	b8 00 00 00 00       	mov    $0x0,%eax
  401134:	e8 f7 fe ff ff       	callq  401030 <printf@plt>
  401139:	b8 00 00 00 00       	mov    $0x0,%eax
  40113e:	5d                   	pop    %rbp
  40113f:	c3                   	retq   

Alright, now we have our control results we can now start some tests.
Here is a link to an assembler quick start guide if you need.

TEST 1

Add the compiler option -static.


$ gcc -static -g -O0 -fno-builtin -o test1 test.c

// Code
#include <stdio.h>
int main(){
    printf("Hello World!\n");
}

$ objdump -fsd --source test1 >> test1.text

// Total File Size 1720896 bytes
#include 
int main(){
  401bb5:	55                   	push   %rbp
  401bb6:	48 89 e5             	mov    %rsp,%rbp
    printf("Hello World!\n");
  401bb9:	bf 10 00 48 00       	mov    $0x480010,%edi
  401bbe:	b8 00 00 00 00       	mov    $0x0,%eax
  401bc3:	e8 f8 72 00 00       	callq  408ec0 <_IO_printf>
  401bc8:	b8 00 00 00 00       	mov    $0x0,%eax
  401bcd:	5d                   	pop    %rbp
  401bce:	c3                   	retq   
  401bcf:	90                   	nop 

If we review the results the assembler, the “printf” gets called from inside the file versus linking to another file. And, if we look at the file size, it is now a lot larger. Due to the -static, it has made it, so it included the header with the assembled code.

TEST 2

Remove the compiler option -fno-builtin.


$ gcc -g -O0 -o test2 test.c

// Code
#include <stdio.h>
int main(){
    printf("Hello World!\n");
}

$ objdump -fsd --source test2 >> test2.text

// Total File Size 24648 bytes
#include <stdio.h>
int main(){
  401126:	55                   	push   %rbp
  401127:	48 89 e5             	mov    %rsp,%rbp
    printf("Hello World!\n");
  40112a:	bf 10 20 40 00       	mov    $0x402010,%edi
  40112f:	e8 fc fe ff ff       	callq  401030 <puts@plt>
  401134:	b8 00 00 00 00       	mov    $0x0,%eax
  401139:	5d                   	pop    %rbp
  40113a:	c3                   	retq   
  40113b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)

If we review the results of the assembler, we can see that we are no longer using “printf” the compiler has replaced it with the “puts” function.

TEST 3

Remove the compiler option -g.


$ gcc -O0 -fno-builtin -o ../excabutables/test3 test.c

// Code
#include <stdio.h>
int main(){
    printf("Hello World!\n");
}

$ objdump -fsd --source test3 >> test3.text

// Total File Size 22272 bytes
  401126:	55                   	push   %rbp
  401127:	48 89 e5             	mov    %rsp,%rbp
  40112a:	bf 10 20 40 00       	mov    $0x402010,%edi
  40112f:	b8 00 00 00 00       	mov    $0x0,%eax
  401134:	e8 f7 fe ff ff       	callq  401030 <printf@plt>
  401139:	b8 00 00 00 00       	mov    $0x0,%eax
  40113e:	5d                   	pop    %rbp
  40113f:	c3                   	retq

If we review the results of the assembler, we can see that we are no longer able to see the debugger information like the header include, or code.

TEST 4

Add additional arguments to the printf() function in your program.


$ gcc -g -O0 -fno-builtin -o test4 testMultiArgs.c

// Code
#include <stdio.h>
int main(){
    printf("Hello World!!!, %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
}

$ objdump -fsd --source test4 >> test4.text


// Total File Size 24688 bytes
#include <stdio.h>
int main(){
  401126:	55                   	push   %rbp
  401127:	48 89 e5             	mov    %rsp,%rbp
    printf("Hello World!!!, %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  40112a:	48 83 ec 08          	sub    $0x8,%rsp
  40112e:	6a 0a                	pushq  $0xa
  401130:	6a 09                	pushq  $0x9
  401132:	6a 08                	pushq  $0x8
  401134:	6a 07                	pushq  $0x7
  401136:	6a 06                	pushq  $0x6
  401138:	41 b9 05 00 00 00    	mov    $0x5,%r9d
  40113e:	41 b8 04 00 00 00    	mov    $0x4,%r8d
  401144:	b9 03 00 00 00       	mov    $0x3,%ecx
  401149:	ba 02 00 00 00       	mov    $0x2,%edx
  40114e:	be 01 00 00 00       	mov    $0x1,%esi
  401153:	bf 10 20 40 00       	mov    $0x402010,%edi
  401158:	b8 00 00 00 00       	mov    $0x0,%eax
  40115d:	e8 ce fe ff ff       	callq  401030 <printf@plt>
  401162:	48 83 c4 30          	add    $0x30,%rsp
  401166:	b8 00 00 00 00       	mov    $0x0,%eax
  40116b:	c9                   	leaveq 
  40116c:	c3                   	retq   
  40116d:	0f 1f 00             	nopl   (%rax)

If we review the results, we can see all the steps needed to perform a “printf” with ten arguments.

TEST 5

Move the printf() call to a separate function named output()


$ gcc -g -O0 -fno-builtin -o test5 testFunctionCall.c

// Code
#include <stdio.h>
void output(){
    printf("hello World");
}
int main(){
    output();
}

$ objdump -fsd --source test5 >> test5.text


// Total File Size 24792 bytes
int main(){
  40113c:	55                   	push   %rbp
  40113d:	48 89 e5             	mov    %rsp,%rbp
    output();
  401140:	b8 00 00 00 00       	mov    $0x0,%eax
  401145:	e8 dc ff ff ff       	callq  401126 
  40114a:	b8 00 00 00 00       	mov    $0x0,%eax
  40114f:	5d                   	pop    %rbp
  401150:	c3                   	retq   
  401151:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  401158:	00 00 00 
  40115b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)

#include <stdio.h>
void output(){
  401126:	55                   	push   %rbp
  401127:	48 89 e5             	mov    %rsp,%rbp
    printf("hello World");
  40112a:	bf 10 20 40 00       	mov    $0x402010,%edi
  40112f:	b8 00 00 00 00       	mov    $0x0,%eax
  401134:	e8 f7 fe ff ff       	callq  401030 
}
  401139:	90                   	nop
  40113a:	5d                   	pop    %rbp
  40113b:	c3                   	retq

If we review the results, we can see that we are now using the output function for the “printf.”

TEST 6

Remove -O0 and add -O3 to the gcc options.


$ gcc -g -fno-builtin -O3 -o test6 test.c

// Code
#include <stdio.h>
int main(){
    printf("Hello World!\n");
}

$ objdump -fsd --source test6 >> test6.text

// Total File Size 24896 bytes
#include <stdio.h>
int main(){
  401040:	48 83 ec 08          	sub    $0x8,%rsp
    printf("Hello World!\n");
  401044:	bf 10 20 40 00       	mov    $0x402010,%edi
  401049:	31 c0                	xor    %eax,%eax
  40104b:	e8 e0 ff ff ff       	callq  401030 <printf@plt>
  401050:	31 c0                	xor    %eax,%eax
  401052:	48 83 c4 08          	add    $0x8,%rsp
  401056:	c3                   	retq 

If we review the results, we can see that the -O3 gcc option has swapped a bunch of operations with more efficient versions.

Download my files

SPO 600 – Lab 1

Hello,

This post is the start to a new blogging series I will be doing, and it will revolve around the work I am doing for my SPO 600 class at school. SPO 600 is the course code, and the full name is Software Portability and Optimization. The course has a public wiki page if you want to read more available here.

Lab 1 – Code Review

For this lab, I needed to find two opensource community to review how they operate. I found VSCODE and OPENCV.

VSCODE
Visual Studio Code is an open-source project run by Microsoft under the MIT license.

Contributing to VScode
VScode has the following wiki page that explains how you can help here. On the wiki is where you can learn how to file bugs or request features. It seems to be simple you list a few items that they would like you to include, but the formatting of the issue is left up to you. From what I found, the pull request has a link to the issue in them and not any more writing.

Following an Issue request
This seems to be a very active project pull request (or PR) seem to get merged or closed quickly. One thing I notice is that there is not much community PR’s what appears to happen is someone submits an issue then it gets worked on by an employee at Microsoft.

I followed Issue#80352 to see an example of the process, from the moment KamasamaK posted the issue it took about a day for sbatten who I think is a reviewer to reply. Sbatten then added someone who would be able to fix the issue. In the issue, you can see jrieken start working on a solution and then another person mjbvz joined and helped solve the problem and post the PR fixing it.

The total time to fix the issue was about five days.

OPENCV
Opencv is an open-source computer vision library for c++. I took the computer vision course at my school and learned how to use this library. It uses the BSD licence.

Contributing to OPENCV
Similar to VScode, Opencv uses a wiki page to explain how you can contribute to the project. Here is a link to the wiki. On that page, you will find how the developers would like you to send in your help. This project seems to run more on pull request then issues. From what I read on the wiki the developers seem okay with people directly sending pull request versus sending an issue then a pull request when you finish the code, but it is likely a good idea to submit an issue before you start working on it too let the developers know.

Following an Issue request
I followed Issue#15439 to see what the process is to contribute to OpenCV. The first step was for the creation of the issue about the problem, and then you can see a reviewer for OpenCV came and suggested some ideas. In this case, the person who created the issue persuaded fixing the problem and created a pull request with the required changes PR#15440. In this pull request, you can see the conversation between the person creating the pull request and the developer approving it. It took a few days of back and forth fixing the code before the pull request got merged.

The total time to fix the issue was about three days.

NodeChat Development

Have gotten a bunch of PR’s this past week, since this is the last week of class. Everyone is finishing up their release 4,  which means the project has made some great progress.

We now have the continuous integration system Travis, that helps with managing the app. And, we are working on es-lint and prettier, there is still some configuring to do but it’s almost up and running.

So with all the changes, there was a bit of backtracking I had to fix. One of the PR’s changed how the messages were sent to the screen but they forgot to change some corresponding code, so it ended up breaking the messages altogether so no messages would display correctly on the screen. It was a pretty easy fix, but I am not sure about keeping it. The change puts the JSX for a message into a variable, then we just render that. The problem with that is it doesn’t get updated anymore when state changes. I have to look into it more later, but I might have to go back to making the JSX just before it is rendered.

Here is the PR that fixed the above bug and a few others: https://github.com/OTRChat/NodeChat/pull/72.

The other bugs include:

  •  package.json – formatting bugs that formed when I did a manual merge.
  • Fixed avatar’s so you can now upload custom avatars. This change required changes to the server aswell, this is the PR that changed the server https://github.com/OTRChat/server/pull/4.
  • package-lock.json – There was a problem merging so I just deleted the old version and made a new one.

 

 

Finding a JS bug using chrome developer Tools

Another issue was filed in the cube-roll project, Issue #4 which was a problem with the games score not updating. It turned out to be a small fix but I wanted to go through how I found where in the code the bug was.

I first started the game and duplicated the problem.

I then looked through the code and find what I think gets called when I get a point. I found that line 90 of the world.js file is where the logic for updating the score was.

After finding the area of code where I thought the problem could be, I opened the developers tools in chrome and then I navigated to the sources tab. In the sources tab I navigate to the file I want world.js. I can now setup breakpoints, after that I started the game and duplicated the problem again. This time since I setup the breakpoints the game pause on the breakpoint and I could look at the values of the data in the developer tools.

As you can see in Figure 1 this is what happens when you trigger a breakpoint.

A snippet of the chrome developer tool
Figure 1

After doing the above a few times in multiple areas of the code, I located the line of code with the problem. Inside, hud.js there is the function setText() that is called from line 98 of the world.js file.

Inside setText() I found the below typo:

setText(id, text){
    this.elements[id].test=text; // .test should be .text
    this.redraw=true;
}

Now that I found the problem I was able to fix the typo and submit a PR:

https://github.com/mklan/cube-roll/pull/5

Fixed restart bug in Three.js game cube-roll

While I was looking for a game to help develop, I found the game cube-roll on github.

The bug I fixed was issue 1 : cannot restart game, the problem occurred after you play a round, the game will freeze and you were not able to restart it.

This turned out to be a fun project to work on, it was challenging at the start. Since, I have never worked on a game before so I didn’t know what the code was doing. But after looking through the code for a bit, I found out that the game was using Three.js. Three.js is a javascript 3D library I was able to learn a bit more from their docs on what the code was doing.

After learning the code, I found out were the problem was, it turned out most of the code was already their it was just not working correctly. The problem resided in the main function, when the game is over and the user pressed enter, it would cue a .once() command in the main. Originally it just called the main again to restart the game. The problem was the game never got cleared so the old game was still running.

In order to fix the issue I used the following code, it first calls the constructor on the world object hence clearing and restarting the game. I then pass that world object back to main to restart the game. The important factor was I am not ever creating a new world just resting it.

sync function main(connectToServer, world = undefined) {
  renderPause = true;
  if(!world){
    world = await new World();
  }
  world.playerControls.once('enterWhenGameOver', async () => {
    await world.constructor();
    main(false,world);
  });
  if (connectToServer) {
      server = await initServer;
      server.on('clientKeyUp', key => {
        world.playerControls.processRemoteControl(key);
      });
  }
  renderPause = false;
  if (!tickingStarted) {
    tick(world, server);
  }
}

The reason I do not want to loose the world object is because it is being used in the tick() function. The tick function is what refreshes the game it contains a reference to the world object, so that is the reason I needed to keep the same instance of world object when the restarts.

function tick(world, server) {
  requestAnimationFrame(() => tick(world, server));
  if (!renderPause) {
    tickingStarted = true;
    world.update();
    world.render();
    server && server.stream();
  }
}

Here is a link to my PR: https://github.com/mklan/cube-roll/pull/3

 

Continue development of chat app.

Development on the chat app has been going good. Few things I would like to get finalized would be a continued integration system, linting, tests and database.

I think travisCI would be a great addition to the repo. Took a class today that explained how to implement travisCi into a GitHub repo. It looks like it will be pretty easy, might add it this week.

To make travisCI really useful it need’s to be doing more then just running the app as the test. It is capable of running linting software like eslint and actually test that could be created for the app.

I would also like to start research on the best options for incorporating a database into the app. I created a discussion thread here, I would like other people’s opinion in making this choice.

Also, for release 3 I found a external project to contribute to on GitHub. It was interesting, it required me to get into a bit of Ruby which was interesting. Here is the PR made for the code-gov-style project.

Maintaining a GitHub Project

Wow I now have a much better understanding of the life of a project maintenance, after this week.

So, a few classmates have now joined the NodeChat Project, the week started simple by me posting a bunch of issues for everyone to work on. But, once they started to work in the project, they started to post their own issue’s. They ran into thing’s that being the one who wrote most of the code, I didn’t really think about.

For example, the readme file I didn’t need it so it was very outdated and simple. So, I needed to update that so people could more easily start working. Also, I added a contributing file which explain’s a few more thing’s. updated docs.

Another example, was the folder structure was outdated, we were working inside a folder while the root contained the original project before I started working on it, so there were a bunch of files kinda not really doing anything. And we also had the server inside the same git repo. Apparently this is very confusing, another thing that I got used to and learned to ignore, which wasn’t good for new contributors joining the project. We have now gotten ride of all the old code and moved the server into it’s own repo. So now the react app is alone it the main repo, which make a lot more sense for new contributors.

Lot’s of change’s happened over this past week on the OTRChat Project. I have now had a taste of what managing a project looks like and I am only working with 4 people, can’t image large project with 100+ contributor’s. As with everything you start small and work toward something big.