<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>challenge | FLRNKS</title><link>https://flrnks.netlify.app/tag/challenge/</link><atom:link href="https://flrnks.netlify.app/tag/challenge/index.xml" rel="self" type="application/rss+xml"/><description>challenge</description><generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>© 2024</copyright><lastBuildDate>Sat, 11 Jan 2020 11:11:00 +0000</lastBuildDate><image><url>https://flrnks.netlify.app/images/icon_hu0b7a4cb9992c9ac0e91bd28ffd38dd00_9727_512x512_fill_lanczos_center_2.png</url><title>challenge</title><link>https://flrnks.netlify.app/tag/challenge/</link></image><item><title>Performance tuning GO</title><link>https://flrnks.netlify.app/post/go-performance/</link><pubDate>Mon, 11 Nov 2019 11:11:00 +0000</pubDate><guid>https://flrnks.netlify.app/post/go-performance/</guid><description>&lt;h3 id="introduction">Introduction&lt;/h3>
&lt;p>This post is going to contain a short story on how I managed to optimize the execution of a simple program, written for a coding challenge on the site &lt;code>runcode.ninja&lt;/code>.&lt;/p>
&lt;p>Short description of the task:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">There is a text file which is given as argument to your program.This text
file contains lines, each of which is an encoded englishword. Recover them
and print them out to the standard output lineby line. Hint: the UNIX
built-in dictionary may come in handy at &lt;span class="s2">&amp;#34;/usr/share/dict/american-english&amp;#34;&lt;/span>.
&lt;/code>&lt;/pre>&lt;/div>&lt;p>To attack problem, I used the GO language to write a program which used the built-in &lt;code>encoding&lt;/code> and &lt;code>os/exec&lt;/code> packages to decode the lines and to call grep to search in the file-based dictionary. It was not very difficult to figure out that the encoding in use was base64.&lt;/p>
&lt;p>However, to make each line valid either a single &lt;code>=&lt;/code> or double equation &lt;code>==&lt;/code> characters had to be added to each line. The below code takes care of this addition of extra characters at the end of each line.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="kd">func&lt;/span> &lt;span class="nf">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">encodedStr&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">decoded&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">base64&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StdEncoding&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">DecodeString&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">encodedStr&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">encodedStr&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="s">&amp;#34;=&amp;#34;&lt;/span>
&lt;span class="nx">decoded&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">base64&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">StdEncoding&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">DecodeString&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">encodedStr&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="k">return&lt;/span> &lt;span class="nb">string&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">decoded&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>In order to test if the result of a decode operation is a valid word, a helper function was written, which is passed a string as an argument and performed the call to grep via &lt;code>os/exec&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="kd">func&lt;/span> &lt;span class="nf">dictLookup&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">bool&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">dictLocation&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="s">&amp;#34;/usr/share/dict/american-english&amp;#34;&lt;/span>
&lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">exec&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Command&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;grep&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;-w&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">dictLocation&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nf">Output&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="k">return&lt;/span> &lt;span class="kc">false&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="k">return&lt;/span> &lt;span class="kc">true&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Finally, putting these pieces together, there is a function which reads in the txt file, iterates over the lines and calls decode and dict lookup until a valid word comes out, then prints it to standard output. Below is the sample code.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="nx">scanner&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">bufio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewScanner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">file&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="kd">var&lt;/span> &lt;span class="nx">line&lt;/span> &lt;span class="kt">string&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="nx">scanner&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scan&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">line&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nf">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scanner&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Text&lt;/span>&lt;span class="p">())&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="p">!(&lt;/span>&lt;span class="nf">dictLookup&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">line&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nf">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Println&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="initial-results">Initial results&lt;/h3>
&lt;p>The sample code worked well enough and running it on the test / sample data provided yielded correct output, so all seemed to be fine!&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash">flrnks@t460:~/drop_the_bass &lt;span class="o">(&lt;/span>master&lt;span class="o">)&lt;/span> ▶ go run main.go input.txt
interpretation
sanctioned
lawn
electives
unifying
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Then came the idea to try to test this code on both of my laptops because it did not seem to run very quickly, even though it only had to decode 5 lines. So one of the machines I have is a ThinkPad T460 with an i5 and 16GB of RAM, while the other is a 15&amp;rdquo; MacBook Pro with i9 CPU and 32GB of RAM. I initially developed the code on the ThinkPad, and was quite surprised how much slower it was to execute on the MacBook. I would have expected that it would be the opposite, since the ThinkPad is around 3-4 years old already with a less powerful CPU. Initial test results from both machine:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash"> &lt;span class="o">[&lt;/span>MacBook&lt;span class="o">]&lt;/span> &lt;span class="o">[&lt;/span>ThinkPad&lt;span class="o">]&lt;/span>
interpretation 285.76ms 32.61ms
lawn 425.63ms 59.31ms
unifying 1.10s 93.60ms
electives 1.20s 91.10ms
sanctioned 6.18s 141.28ms
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Overall the MacBook took on average 9 seconds to finish, while the ThinkPad took around 0.5 to 1 second to finish. This was not normal, so I had to investigate! 👀 😄&lt;/p>
&lt;h3 id="performance-tuning-10">Performance Tuning 1.0&lt;/h3>
&lt;p>Seeing the results and the difference in performance, I was quite interested what could be the cause for such a performance drop on the MacBook. My first idea was to implement concurrency into the processing, so that instead of reading lines sequentially, they get processed in parallel by getting assigned to a worker using channels, which will return it to the main routine waiting for the results.&lt;/p>
&lt;p>&lt;img src="concurrent-go.png" alt="Go concurrency implemented">&lt;/p>
&lt;p>The above figure contains the basic idea for this concurrent processing model and the below code snippet shows some parts of the code that are most important:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="c1">// define the channels for distributing work and collecting the results
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="nx">jobs&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nb">make&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">chan&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="nx">results&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nb">make&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">chan&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="c1">// use the waitgroup for syncing up between the workers
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="nx">wg&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nb">new&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sync&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">WaitGroup&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="c1">// start up some workers that will block and wait
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="k">for&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="o">++&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">wg&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">go&lt;/span> &lt;span class="nf">workerFunc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">jobs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">results&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">wg&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="c1">// interate over the file line by line and queue them up in the jobs channel
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="k">go&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">scanner&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">bufio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewScanner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">file&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="nx">scanner&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scan&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">jobs&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="nx">scanner&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Text&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="nb">close&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">jobs&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}()&lt;/span>
&lt;span class="c1">// In parallel routine wait for WG to finish and close channel for results
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="k">go&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">wg&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Wait&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="nb">close&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">results&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}()&lt;/span>
&lt;span class="c1">// Print out the results from the results channel.
&lt;/span>&lt;span class="c1">&lt;/span>&lt;span class="k">for&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">results&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Println&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">v&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This parallel processing has noticeable improved the performance, but still did not eliminate the substantial difference between the two platforms.&lt;/p>
&lt;p>&lt;em>Note&lt;/em>: implementing the concurrent model means the words on the standard output will appear in a random order, and so the submission to the grading system might fail.&lt;/p>
&lt;h3 id="performance-tuning-20">Performance Tuning 2.0&lt;/h3>
&lt;p>Next, I was looking around on the internet (StackOverFlow.com in particular) where I got the idea to stop calling grep via the &lt;code>os/exec&lt;/code> package, and instead read the contents of the dictionary into memory and perform lookups that way. Essentially this was trading memory footprint for speed. So then I create a global dictionary {&amp;lsquo;map[string]bool&amp;rsquo;} which was loaded once at the start of the program and used as often as needed by the various go-routines. And this was perfectly fine because the worker routines called read-only operations on this map so there was no issue with concurrent access to the global map variable.&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="kd">var&lt;/span> &lt;span class="nx">wordDict&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nb">make&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">map&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="kt">bool&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="kd">func&lt;/span> &lt;span class="nf">loadDictionary&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">dict&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;/usr/share/dict/american-english&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">defer&lt;/span> &lt;span class="nx">dict&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Close&lt;/span>&lt;span class="p">()&lt;/span>
&lt;span class="nx">ds&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">bufio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewScanner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">dict&lt;/span>&lt;span class="p">)&lt;/span>
&lt;span class="k">for&lt;/span> &lt;span class="nx">ds&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scan&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;span class="nx">wordDict&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">ds&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Text&lt;/span>&lt;span class="p">()]&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="kc">true&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;span class="p">}&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This way the lookups in the dictionary cannot be a bottleneck of the I/O system of the particular OS the program is running on. Executing the same timing test this time yielded much improved results. It became clear that the issue on the MacBook was slow execution of the external &lt;code>grep&lt;/code> call from the GO program. Why this is the reason I am not sure, but the results speak for themselves:&lt;/p>
&lt;div class="highlight">&lt;pre class="chroma">&lt;code class="language-bash" data-lang="bash"> &lt;span class="o">[&lt;/span>MacBook&lt;span class="o">]&lt;/span> &lt;span class="o">[&lt;/span>ThinkPad&lt;span class="o">]&lt;/span>
interpretation 54.691µs 24.17µs
lawn 65.922µs 9.176µs
unifying 155.726µs 71.785µs
electives 113.074µs 47.478µs
sanctioned 286.94µs 464.20µs
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Somehow the older and less powerful ThinkPad still seems considerably faster, but at least the difference is not so substantial anymore&amp;hellip; 😌&lt;/p>
&lt;h3 id="results">Results&lt;/h3>
&lt;p>The below picture briefly summarizes the observed results when it comes to performance, which was measured by execution time. In order to mitigate transient effects on execution time, there were 10 measurements taken for each variant.&lt;/p>
&lt;p>&lt;img src="perf.png" alt="Performance measurements">&lt;/p>
&lt;p>Explanation for the different variants (Seq vs. Con and Grep vs Map):&lt;/p>
&lt;ul>
&lt;li>&lt;code>Seq&lt;/code>: each line is decoded one after the other in sequence.&lt;/li>
&lt;li>&lt;code>Con&lt;/code>: each line is processed concurrently on a pool of workers.&lt;/li>
&lt;li>&lt;code>Grep&lt;/code>: dictionary lookup done via exec call to GREP.&lt;/li>
&lt;li>&lt;code>Map&lt;/code>: dictionary is loaded into a string map in memory.&lt;/li>
&lt;/ul>
&lt;p>Quite frankly, the results speak for themselves. The most notable thing is that, compared to the most basic version (Seq-Grep), the biggest improvement is achieved not by using concurrency, but by eliminating the repeated calls to Grep.&lt;/p>
&lt;p>This is not to say that enabling concurrency did not have an impact on the execution time, on average it decreased from 9 to 6 seconds, which is quite good already!&lt;/p>
&lt;p>However, I/O latency seems to have a higher cost on the performance than lack of parallel processing. At least at the scale of input for this example this is the case. This difference is less pronounced when tests were run using a file which had 500 lines of encoded words (instead of just 5).&lt;/p>
&lt;h3 id="conclusion">Conclusion&lt;/h3>
&lt;p>Never underestimate the power of I/O delay and the effect it can have on your program. Even if you have a very powerful machine, this can bog your performance down considerably! Also, it may help your program&amp;rsquo;s performance further, if you implement proper concurrent processing whenever possible.&lt;/p></description></item></channel></rss>