FreBlogghttps://freblogg.com/2021-08-23T23:00:00+05:30Archive vs Archived | A rant on naming things correctly2021-08-23T23:00:00+05:302021-08-23T23:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2021-08-23:/archive-vs-archived<p>We are always taught that grammar is important in English (or any other language for that matter). But somehow it is not the case with English usage in programming, most of the time. The Grammar rules seem to be relaxed when people are coding. </p>
<p>My students and my colleagues know …</p><p>We are always taught that grammar is important in English (or any other language for that matter). But somehow it is not the case with English usage in programming, most of the time. The Grammar rules seem to be relaxed when people are coding. </p>
<p>My students and my colleagues know how particular I am about naming things in code. It is something I take pride in. Apart from picking descriptive names, I also try to make sure that the Tense and Number of the variables or functions, always agree with the Grammar rules. </p>
<p>I review a lot of code daily - of my colleagues, my students, etc. Even though I am less critical of other people's code than I am of mine, one of the things I do try to give feedback on is about adhering to the correct grammar. It happens to be one of the most common mistakes I see. They use the correct word to name something but often in incorrect tense and number than what it is supposed to be. </p>
<p>The latest installment of this is the misuse of "archive" and "archived" in code by a student. And hence this article. </p>
<p><img alt="archives" src="https://freblogg.com/archive-vs-archived/archive-750x300.jpg"></p>
<p>Let's get the definitions first. </p>
<blockquote>
<p>archive /ˈɑːkʌɪv/</p>
<p>(noun)</p>
<p>A collection of something. In software terms, it refers to a folder, location where files and data will be kept aside for later use. </p>
<p>(verb)</p>
<p>The act of storing or placing something in an archive.</p>
</blockquote>
<p>So, you can use them in sentences like:</p>
<div class="highlight"><pre><span></span><code><span class="mf">1.</span> <span class="n">I</span> <span class="n">am</span> <span class="n">placing</span> <span class="n">the</span> <span class="n">files</span> <span class="n">in</span> <span class="n">the</span> <span class="n">archive</span><span class="mf">.</span> <span class="p">(</span><span class="n">Noun</span><span class="p">)</span>
<span class="mf">2.</span> <span class="n">I</span> <span class="n">am</span> <span class="n">archiving</span> <span class="n">the</span> <span class="n">files</span><span class="mf">.</span> <span class="p">(</span><span class="n">Verb</span><span class="p">)</span>
</code></pre></div>
<p><strong>Archived</strong>, is the state of an object or a file after it has been placed in an archive. </p>
<p>Once you know these, you can name the variables and functions that use this word, correctly. </p>
<p>For example, the folder where you are storing the files would be called an <code>archive</code>. A file you have archived would be named <code>archived_file</code>. The function or method that archives a file can be called <code>archive_file()</code> or maybe simply <code>archive()</code>. </p>
<p>Let's add a few more things to it. Based on this, if you have a function called <code>is_archive</code> (or <code>archive?</code> in ruby), you expect it to give <code>True</code> or <code>False</code> based on whether something (a folder) is an archive, or not. </p>
<p>To check if a file has been archived, you will have a function called <code>is_archived</code> that will take a file name. </p>
<p>Here's a sample code (in python) that shows all of this in action:</p>
<div class="highlight"><pre><span></span><code><span class="n">archive</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s1">'/opt/archive'</span><span class="p">)</span>
<span class="n">is_archive</span><span class="p">(</span><span class="n">archive</span><span class="p">)</span> <span class="c1"># True</span>
<span class="n">files</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">current_dir</span><span class="p">)</span> <span class="c1"># Get all files in the current directory</span>
<span class="c1"># Archive the files</span>
<span class="k">for</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span>
<span class="n">archive</span><span class="p">(</span><span class="n">file</span><span class="p">)</span>
<span class="n">archived_files</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">archive</span><span class="p">)</span>
<span class="k">for</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">archived_files</span><span class="p">:</span>
<span class="n">is_archived</span><span class="p">(</span><span class="n">file</span><span class="p">)</span> <span class="c1"># True</span>
</code></pre></div>
<p>You can use similar conventions in other languages also.</p>
<p>Now, for some general advice on naming things from my experience reviewing code for years:</p>
<ol>
<li>
<p>Try to keep the names of variables/objects as nouns and names of functions as verbs or instructions. </p>
</li>
<li>
<p>A function that validates something would be <code>validate_<something></code>. A file validator would be called <code>validate_file(...)</code> and so on. </p>
</li>
<li>
<p>A file that has been validated should be called <code>validated_file</code> or even better, a <code>valid_file</code></p>
</li>
<li>
<p>A function that shares credentials (with something else), should be <code>share_credentials(...)</code> while the actual credential after sharing can be named <code>shared_credential</code>. </p>
</li>
<li>
<p>Use <code>configuration</code> or <code>config</code> for an object and <code>configure(...)</code> for a function. </p>
</li>
<li>
<p>Do not use multiple words to mean the same thing. I have seen functions like <code>get_file(...)</code>, <code>fetch_file(...)</code> and <code>retrieve_file(...)</code> in the same code base. Stick to one and use that everywhere. <code>get_file(...)</code> is my preferred option. </p>
</li>
<li>
<p>Do not use a variable in the plural when it only has one value. Similarly, when a variable will have multiple values, use a plural.
So, if you have a variable <code>containers</code>, it is expected to contain multiple containers. </p>
</li>
<li>
<p>Also, you don't need to suffix a variable with <code>_list</code> or <code>_set</code> to indicate the type of collection it is. In most cases, it is not relevant. So, prefer <code>objects</code> instead of <code>object_list</code> and <code>files</code> instead of <code>file_set</code>. </p>
</li>
<li>
<p>Similarly, don't suffix or prefix data types when it is obvious. No need to use a variable <code>user_name_str</code> to indicate that it is a string. Names are generally strings. So, we can omit the <code>_str</code> suffix. </p>
</li>
<li>
<p>Use similar forms of words when referring to similar things. For example, if you are referring to states of a docker container can be in, don't use "Running, "Success" and "Fail". Keep them all in one grammatical form. Something like "Running", "Succeeded/Completed", "Failed". That reads better. </p>
</li>
</ol>
<p>I can go on, but I will stop this list here before it gets too pedantic.</p>
<p>Hopefully, all of this information has been useful for you. If not, you can let me know all about it on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a></p>Mocking functions in Python with Pytest Part I2020-04-11T02:18:00+05:302020-04-11T02:18:00+05:30Durga Swaroop Perlatag:freblogg.com,2020-04-11:/pytest-functions-mocking-1<p>Mocking resources in unit tests is just as important and common as writing unit tests. However, a lot of people are not familiar with how to properly mock classes, objects or functions for tests, because the available documentation online is either too short or unnecessarily complicated. One of the main …</p><p>Mocking resources in unit tests is just as important and common as writing unit tests. However, a lot of people are not familiar with how to properly mock classes, objects or functions for tests, because the available documentation online is either too short or unnecessarily complicated. One of the main reasons for this confusion — several ways to do the same thing. Every other article out there seems to mock things in a different way. With this series of articles on mocking, I hope to bring some clarity on the topic.</p>
<p><img alt="Mocking with pytest" src="https://cdn-images-1.medium.com/max/720/0*3bM970bQ7UYqsDyy.png"></p>
<h4 id="pre-requisite">Pre-requisite</h4>
<p>This is a tutorial on Mocking with pytest. I am operating with the assumption that you can write unit tests in Python using <code>pytest</code>.</p>
<h4 id="why-mock">Why Mock?</h4>
<p>As you are here, reading this article, I will assume that you are familiar with mocking. In case you are not, let us do a quick overview of what it is and why we need it.</p>
<p>Say, you have a service that collects stock market data and gives information about the top gainers in a particular sector. You get the stock market information from a third party API, and process it to give out the results. Now, to test your code, you would not want to hit the API every time, as it will make the tests slower, and also the API provider would charge you for the extra hits. What you want here is a mock! A mock replaces a function with a dummy you can program to do whatever you choose. This is also called ‘Patching’. For the rest of this series, I am going to use ‘mock’ and ‘patch’ interchangeably.</p>
<h4 id="packages-needed-for-mocking">Packages needed for Mocking</h4>
<p>Unlike the majority of programming languages, Python comes with a built-in library for unit testing and mocking. They are powerful, self-sufficient and provide the functionality you need. The Pytest-mock plugin we will use, is a convenient wrapper around it which makes it easier to use it in combination with <code>pytest</code>.</p>
<p>If you look up articles on mocking, or if you read through the endless questions on Stackoverflow, you will frequently come across the words <code>Mock</code>, <code>MagicMock</code>, <code>patch</code>, etc. I'm going to demystify them here.</p>
<p>In Python, to mock, be it functions, objects or classes, you will mostly use <code>Mock</code> class. <code>Mock</code> class comes from the built-in <code>unittest.mock</code> module. From now on, anytime you come across <code>Mock</code>, know that it is from the <code>unittest</code> library. <code>MagicMock</code> is a subclass of <code>Mock</code> with some of the magic methods implemented. Magic methods are your usual dunder methods like<code>__str__</code>, <code>__len__,</code> etc.</p>
<p>For the most part, it does not matter which one you use, <code>Mock</code> or <code>MagicMock</code>. Unless you need magic methods like the above implemented, you can stick to <code>Mock</code>. Pytest-mock gives you access to both of these classes with an easy to use interface.</p>
<p><code>patch</code> is another function that comes from the 'unittest' module that helps replace functions with mocks. Pytest mock has a wrapper for this too.</p>
<h4 id="installing-pytest-mock">Installing Pytest Mock</h4>
<p>Before you get started with using pytest-mock, you have to install it. You can install it with pip as follows:</p>
<div class="highlight"><pre><span></span><code>pip install pytest-mock
</code></pre></div>
<p>This is a pytest plugin. So, it will also install <code>pytest</code>, if you have not installed it already.</p>
<h4 id="mocking-a-simple-function">Mocking a simple function</h4>
<p>As this is the first article, we will keep it simple. We will start by mocking a simple function.</p>
<p>Say, we have a function <code>get_operating_system</code> that tells us whether we are using Windows or Linux.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># application.py </span>
<span class="kn">from</span> <span class="nn">time</span> <span class="kn">import</span> <span class="n">sleep</span>
<span class="k">def</span> <span class="nf">is_windows</span><span class="p">():</span>
<span class="c1"># This sleep could be some complex operation instead</span>
<span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">get_operating_system</span><span class="p">():</span>
<span class="k">return</span> <span class="s1">'Windows'</span> <span class="k">if</span> <span class="n">is_windows</span><span class="p">()</span> <span class="k">else</span> <span class="s1">'Linux'</span>
</code></pre></div>
<p>This function uses another function <code>is_windows</code> to check if the current system is Windows or not. Assume that this <code>is_windows</code> function is quite complex taking several seconds to run. We can simulate this slow function by making the program sleep for 5 seconds every time it is called.</p>
<p>A pytest for <code>get_operating_system()</code> would be as follows:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># test_application.py</span>
<span class="kn">from</span> <span class="nn">application</span> <span class="kn">import</span> <span class="n">get_operating_system</span>
<span class="k">def</span> <span class="nf">test_get_operating_system</span><span class="p">():</span>
<span class="k">assert</span> <span class="n">get_operating_system</span><span class="p">()</span> <span class="o">==</span> <span class="s1">'Windows'</span>
</code></pre></div>
<p>Since, <code>get_operating_system()</code> calls a slower function <code>is_windows</code>, the test is going to be slow. This can be seen below in the output of running pytest which took 5.05 seconds.</p>
<div class="highlight"><pre><span></span><code>$ <span class="nv">pytest</span>
<span class="o">================</span> <span class="nb">test</span> session <span class="nv">starts</span> <span class="o">========================</span>
Python <span class="m">3</span>.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /usr/Personal/Projects/pytest-and-mocking
plugins: mock-2.0.0
collected <span class="m">1</span> item
test_application.py . <span class="o">[</span><span class="m">100</span>%<span class="o">]</span>
<span class="o">================</span> <span class="m">1</span> passed <span class="k">in</span> <span class="m">5</span>.05s <span class="o">==========================</span>
</code></pre></div>
<p>Unit tests should be fast. We should be able to run hundreds of tests in seconds. A single test that takes five seconds slows down the test suite. Enter mocking, to makes our lives easier. If we patch the slow function, we can verify <code>get_operating_system</code>'s behavior without waiting for five seconds.</p>
<p>Let’s mock this function with pytest-mock.</p>
<p>Pytest-mock provides a fixture called <code>mocker</code>. It provides a nice interface on top of python's built-in mocking constructs. You use <code>mocker</code> by passing it as an argument to your test function, and calling the mock and patch functions from it.</p>
<p>Say, you want the <code>is_windows</code> function to return <code>True</code> without taking those five precious seconds. We can patch it as follows:</p>
<div class="highlight"><pre><span></span><code>mocker.patch('application.is_windows', return_value=True)
</code></pre></div>
<p>You have to refer to <code>is_windows</code> here as <code>application.is_windows</code>, given that it is the function in the <em>application</em> module. If we only patch <code>is_windows</code>, it will try to patch a function called <code>is_windows</code> in the 'test_application' file, which obviously does not exist. The format is always <code><module_name>.<function_name></code>. Knowing how to mock correctly is important and we will continue working on it in this series.</p>
<p>The updated test function with the patch is as follows:</p>
<div class="highlight"><pre><span></span><code># <span class="s1">'</span><span class="s">mocker</span><span class="s1">'</span> <span class="nv">fixture</span> <span class="nv">provided</span> <span class="nv">by</span> <span class="nv">pytest</span><span class="o">-</span><span class="nv">mock</span>
<span class="nv">def</span> <span class="nv">test_get_operating_system</span><span class="ss">(</span><span class="nv">mocker</span><span class="ss">)</span>:
# <span class="nv">Mock</span> <span class="nv">the</span> <span class="nv">slow</span> <span class="nv">function</span> <span class="nv">and</span> <span class="k">return</span> <span class="nv">True</span> <span class="nv">always</span>
<span class="nv">mocker</span>.<span class="nv">patch</span><span class="ss">(</span><span class="s1">'</span><span class="s">application.is_windows</span><span class="s1">'</span>, <span class="nv">return_value</span><span class="o">=</span><span class="nv">True</span><span class="ss">)</span>
<span class="nv">assert</span> <span class="nv">get_operating_system</span><span class="ss">()</span> <span class="o">==</span> <span class="s1">'</span><span class="s">Windows</span><span class="s1">'</span>
</code></pre></div>
<p>Now when you run the test, it will finish much faster.</p>
<div class="highlight"><pre><span></span><code>$ <span class="nv">pytest</span>
<span class="o">============</span> <span class="nb">test</span> session <span class="nv">starts</span> <span class="o">==================</span>
Python <span class="m">3</span>.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /mnt/c/Personal/Projects/pytest-and-mocking
plugins: mock-2.0.0
collected <span class="m">1</span> item
test_application.py . <span class="o">[</span><span class="m">100</span>%<span class="o">]</span>
<span class="o">===========</span> <span class="m">1</span> passed <span class="k">in</span> <span class="m">0</span>.11s <span class="o">======================</span>
</code></pre></div>
<p>As you can see, the test only 0.11 seconds. We have successfully patched the slow function and made the test suite faster.</p>
<p>Another advantage of mocking - you can make the mock function return anything. You can even make it raise errors to test how your code behaves in in those scenarios. We will see how all of this works and more, in the future articles.</p>
<p>For now, if you want to test the case where <code>is_windows</code> returns<code>False</code>, write the following test:</p>
<div class="highlight"><pre><span></span><code><span class="nv">def</span> <span class="nv">test_operation_system_is_linux</span><span class="ss">(</span><span class="nv">mocker</span><span class="ss">)</span>:
<span class="nv">mocker</span>.<span class="nv">patch</span><span class="ss">(</span><span class="s1">'</span><span class="s">application.is_windows</span><span class="s1">'</span>, <span class="nv">return_value</span><span class="o">=</span><span class="nv">False</span><span class="ss">)</span> # <span class="nv">set</span> <span class="nv">the</span> <span class="k">return</span> <span class="nv">value</span> <span class="nv">to</span> <span class="nv">be</span> <span class="nv">False</span>
<span class="nv">assert</span> <span class="nv">get_operating_system</span><span class="ss">()</span> <span class="o">==</span> <span class="s1">'</span><span class="s">Linux</span><span class="s1">'</span>
</code></pre></div>
<p>Note that all of the mocks & patches set with <code>mocker</code> are function scoped i.e., they will only be available for that specific function. Therefore, you can patch the same function in multiple tests and they will not conflict with each other.</p>
<p>That is your first introduction to the world of mocking with pytest. We will cover more scenarios in the upcoming articles. Stay tuned, stay safe and stay awesome till then.</p>
<p>List of articles in this series:</p>
<p><a href="https://medium.com/analytics-vidhya/mocking-in-python-with-pytest-mock-part-i-6203c8ad3606">Mocking Functions Part I</a> 🢠 Current Article</p>
<p><a href="https://medium.com/@durgaswaroop/writing-better-tests-in-python-with-pytest-mock-part-2-92b828e1453c">Mocking Functions Part II</a></p>
<p>If you like this article, you can like this article to encourage me to put out the next article soon. If you think someone you know can benefit from this article, do share it with them.</p>
<p>If you want to thank me, you can say hi on twitter <a href="http://twitter.com/durgaswaroop">@durgaswaroop</a>. And, if you want to support me here’s my paypal link: paypal.me/durgaswaroop</p>
<p>Attribution: Python Logo — https://www.python.org/community/logos/</p>A Practical Introduction to Kafka Storage Internals2018-08-06T18:00:00+05:302018-08-06T18:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2018-08-06:/kafka-storage-internals<p>Kafka is everywhere these days. With the advent of Microservices and distributed computing, Kafka has become a regular occurrence in the architecture of every product. In this article, I’ll try to explain how Kafka’s internal storage mechanism works.</p>
<p>Since this is going to be a deep dive into …</p><p>Kafka is everywhere these days. With the advent of Microservices and distributed computing, Kafka has become a regular occurrence in the architecture of every product. In this article, I’ll try to explain how Kafka’s internal storage mechanism works.</p>
<p>Since this is going to be a deep dive into Kafka’s internals, I would expect you to have some understanding about Kafka. Although I’ve tried to keep the entry-level for this article pretty low, you might not be able to understand everything if you’re not familiar with the general workings of Kafka. Proceed further with that in mind.</p>
<p><img src="https://kafka.apache.org/images/kafka_diagram.png" width="500" title="Kafka" alt="kafka"></p>
<p>Kafka is typically referred to as a <em>Distributed, Replicated Messaging Queue</em>, which although technically true, usually leads to some confusion depending on your definition of a <em>messaging queue</em>. Instead, I prefer to call it a <strong>Distributed, Replicated Commit Log</strong>. This, I think, clearly represents what Kafka does, as all of us understand how logs are written to disk. And in this case, it is the messages pushed into Kafka that are stored to disk.</p>
<p>Regarding storage in Kafka, you’ll always hear two terms - Partition and Topic. <strong>Partitions</strong> are the units of storage in Kafka for messages. And <strong>Topic</strong> can be thought of as being a container in which these partitions lie.</p>
<p>With the basic stuff out of our way, let’s understand these concepts better by working with Kafka.</p>
<p>I am going to start by creating a topic in Kafka with three partitions. If you want to follow along, the command looks like this for a local Kafka setup on windows.</p>
<div class="highlight"><pre><span></span><code>kafka-topics.bat --create --topic freblogg --partitions 3 --replication-factor 1 --zookeeper localhost:2181
</code></pre></div>
<p>If I go into Kafka’s log directory, I see three directories created as follows.</p>
<div class="highlight"><pre><span></span><code>$ tree freblogg*
freblogg-0
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.index
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.log
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.timeindex
<span class="sb">`</span>-- leader-epoch-checkpoint
freblogg-1
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.index
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.log
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.timeindex
<span class="sb">`</span>-- leader-epoch-checkpoint
freblogg-2
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.index
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.log
<span class="p">|</span>-- <span class="m">00000000000000000000</span>.timeindex
<span class="sb">`</span>-- leader-epoch-checkpoint
</code></pre></div>
<p>We have three directories created because we’ve given three partitions for our topic, which means that each partition gets a directory on the file system. You also see some files like <em>index, log etc</em>. We’ll get to them shortly.</p>
<p>One more thing that you should be able to see from here is that in Kafka, the <strong>topic</strong> is more of a logical grouping than anything else and that the <strong>Partition is the actual unit of storage in Kafka</strong>. That is what is physically stored on the disk. Let’s understand partitions in some more detail.</p>
<h4 id="partitions">Partitions</h4>
<p>A partition, in theory, can be described as an immutable collection (or sequence) of messages. We can only append messages to a partition but cannot delete from it. And by “We”, I am talking about the Kafka producer. A producer can’t delete the messages in the topic.</p>
<p>Now we’ll send some messages into the topic. But before that, I want you to see the sizes of files in our partition folders.</p>
<div class="highlight"><pre><span></span><code>$ ls -lh freblogg-0
total 20M
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.index
- freblogg <span class="m">197121</span> <span class="m">0</span> Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.log
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.timeindex
- freblogg <span class="m">197121</span> <span class="m">0</span> Aug <span class="m">5</span> <span class="m">08</span>:26 leader-epoch-checkpoint
</code></pre></div>
<p>You see the index files combined are about 20M in size while the log file is empty. This is the same case with <code>freblogg-1</code> and <code>freblogg-2</code> folders.</p>
<p>Now let us send a couple of messages and see what happens. To send the messages I’m using the console producer as follows:</p>
<div class="highlight"><pre><span></span><code>kafka-console-producer.bat --topic freblogg --broker-list localhost:9092
</code></pre></div>
<p>I have sent two messages, first a customary “hello world” and then I pressed the Enter key, which becomes the second message. Now if I print the sizes again:</p>
<div class="highlight"><pre><span></span><code>$ ls -lh freblogg*
freblogg-0:
total 20M
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.index
- freblogg <span class="m">197121</span> <span class="m">0</span> Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.log
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.timeindex
- freblogg <span class="m">197121</span> <span class="m">0</span> Aug <span class="m">5</span> <span class="m">08</span>:26 leader-epoch-checkpoint
freblogg-1:
total 21M
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.index
- freblogg <span class="m">197121</span> <span class="m">68</span> Aug <span class="m">5</span> <span class="m">10</span>:15 <span class="m">00000000000000000000</span>.log
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.timeindex
- freblogg <span class="m">197121</span> <span class="m">11</span> Aug <span class="m">5</span> <span class="m">10</span>:15 leader-epoch-checkpoint
freblogg-2:
total 21M
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.index
- freblogg <span class="m">197121</span> <span class="m">79</span> Aug <span class="m">5</span> <span class="m">09</span>:59 <span class="m">00000000000000000000</span>.log
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.timeindex
- freblogg <span class="m">197121</span> <span class="m">11</span> Aug <span class="m">5</span> <span class="m">09</span>:59 leader-epoch-checkpoint
</code></pre></div>
<p>Our two messages went into two of the partitions where you can see that the log files have a non zero size. This is because <strong>the messages in the partition are stored in the ‘xxxx.log’ file</strong>. To confirm that the messages are indeed stored in the log file, we can just see what’s inside that log file.</p>
<div class="highlight"><pre><span></span><code>$ cat freblogg-2/*.log
@^@^B°£æÃ^@^K^Xÿÿÿÿÿÿ^@^@^@^A<span class="s2">"^@^@^A^VHello World^@</span>
</code></pre></div>
<p>The file format of the ‘log’ file is not conducive for textual representation but, you should see the ‘Hello World’ at the end indicating that this file got updated when we have sent the message into the topic. The second message we have sent went into the other partition.</p>
<p>Notice that the first message we sent, went into the third partition (freblogg-2) and the second message went into the second partition (freblogg-1). This is because Kafka arbitrarily picks the partition for the first message and then distributes the messages to partitions in a round-robin fashion. If a third message comes now, it would go into freblogg-0 and this order of partition continues for any new message that comes in. We can also make Kafka choose the same partition for our messages by adding a key to the message. Kafka stores all the messages with the same key into a single partition.</p>
<p>Each new message in the partition gets an Id which is one more than the previous Id number. This Id number is also called the <em>Offset</em>. So, the first message is at ‘offset’ 0, the second message is at offset 1 and so on. These offset Id’s are always incremented from the previous value.</p>
<p><img alt="Kafka Partitions" src="https://freblogg.com/images/kafka-partitions.png"></p>
<h4 id="_1"><Quick detour></h4>
<p>We can understand those random characters in the log file, using a Kafka tool. Those extra characters might not seem useful to us, but they are useful for Kafka as they are the metadata for each message in the queue. If I run,</p>
<div class="highlight"><pre><span></span><code><span class="n">kafka</span><span class="o">-</span><span class="n">run</span><span class="o">-</span><span class="k">class</span><span class="o">.</span><span class="n">bat</span> <span class="n">kafka</span><span class="o">.</span><span class="n">tools</span><span class="o">.</span><span class="n">DumpLogSegments</span> <span class="o">--</span><span class="n">deep</span><span class="o">-</span><span class="n">iteration</span> <span class="o">--</span><span class="nb">print</span><span class="o">-</span><span class="n">data</span><span class="o">-</span><span class="nb">log</span> <span class="o">--</span><span class="n">files</span> <span class="n">logs</span>\<span class="n">freblogg</span><span class="o">-</span><span class="mi">2</span>\<span class="mf">00000000000000000000.</span><span class="n">log</span>
</code></pre></div>
<p>This gives the output</p>
<div class="highlight"><pre><span></span><code><span class="n">Dumping</span> <span class="n">logs</span>\<span class="n">freblogg</span><span class="o">-</span><span class="mi">2</span>\<span class="mf">00000000000000000000.</span><span class="n">log</span>
<span class="n">Starting</span> <span class="n">offset</span><span class="p">:</span> <span class="mi">0</span>
<span class="n">offset</span><span class="p">:</span> <span class="mi">0</span> <span class="n">position</span><span class="p">:</span> <span class="mi">0</span> <span class="n">CreateTime</span><span class="p">:</span> <span class="mi">1533443377944</span> <span class="n">isvalid</span><span class="p">:</span> <span class="bp">true</span> <span class="n">keysize</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span> <span class="n">valuesize</span><span class="p">:</span> <span class="mi">11</span> <span class="n">producerId</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span> <span class="n">headerKeys</span><span class="p">:</span> <span class="p">[]</span> <span class="n">payload</span><span class="p">:</span> <span class="n">Hello</span> <span class="n">World</span>
<span class="n">offset</span><span class="p">:</span> <span class="mi">1</span> <span class="n">position</span><span class="p">:</span> <span class="mi">79</span> <span class="n">CreateTime</span><span class="p">:</span> <span class="mi">1533462689974</span> <span class="n">isvalid</span><span class="p">:</span> <span class="bp">true</span> <span class="n">keysize</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span> <span class="n">valuesize</span><span class="p">:</span> <span class="mi">6</span> <span class="n">producerId</span><span class="p">:</span> <span class="o">-</span><span class="mi">1</span> <span class="n">headerKeys</span><span class="p">:</span> <span class="p">[]</span> <span class="n">payload</span><span class="p">:</span> <span class="n">amazon</span>
</code></pre></div>
<p>(I’ve removed a couple of things from this output that are not necessary for this discussion.)</p>
<p>You can see that it stores information of the <strong>offset</strong>, <strong>time of creation</strong>, <strong>key and value sizes</strong> etc along with the actual message payload in the log file.</p>
<h4 id="_2"></Quick detour></h4>
<p>It is also important to note that <strong>a partition is tied to a broker</strong>. In other words, If we have three brokers and if the folder <code>freblogg-0</code> exists on broker-1, you can be sure that it will not appear in any of the other brokers. Partitions of a topic can be spread out to multiple brokers but a partition is always present on one single Kafka broker (When the replication factor has its default value, which is 1. Replication is mentioned further below).</p>
<p><img alt="Parititions across brokers" src="https://freblogg.com/images/kafka-partitions-in-brokers.png"></p>
<h4 id="segments">Segments</h4>
<p>We’ll finally talk about those index and log files we’ve seen in the partition directory. Partition might be the standard unit of storage in Kafka, but it is not the lowest level of abstraction provided. Each partition is divided into <strong>segments</strong>.</p>
<p>A segment is simply a collection of messages of a partition. Instead of storing all the messages of a partition in a single file (think of the log file analogy again), Kafka splits them into chunks called segments. Doing this provides several advantages. Divide and Conquer FTW!</p>
<p>Most importantly, it makes purging data easy. As previously introduced partition is immutable from a consumer perspective. But Kafka can still remove the messages based on the “Retention policy” of the topic. Deleting segments is much simpler than deleting things from a single file, especially when a producer might be pushing data into it.</p>
<div class="highlight"><pre><span></span><code>$ ls -lh freblogg-0
total 20M
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.index
- freblogg <span class="m">197121</span> <span class="m">0</span> Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.log
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.timeindex
- freblogg <span class="m">197121</span> <span class="m">0</span> Aug <span class="m">5</span> <span class="m">08</span>:26 leader-epoch-checkpoint
</code></pre></div>
<p>The <code>00000000000000000000</code> in front of the log and the index files in each partition folder, is the name of our segment. Each segment file has <code>segment.log</code>, <code>segment.index</code> and <code>segment.timeindex</code> files.</p>
<p>Kafka always writes the messages into these segment files under a partition. There is always an <em>active</em> segment to which Kafka writes to. Once the segment’s size limit is reached, a new segment file is created and that becomes the active segment.</p>
<p><img alt="Kafka segments" src="https://freblogg.com/images/kafka-segments.png"></p>
<p>Each segment file is created with the offset of the first message as its file name. So, In the above picture, segment 0 has messages from offset 0 to offset 2, segment 3 has messages from offset 3 to 5 and so on. Segment 6 which is the last segment is the active segment.</p>
<div class="highlight"><pre><span></span><code>$ ls -lh freblogg*
freblogg-0:
total 20M
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.index
- freblogg <span class="m">197121</span> <span class="m">0</span> Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.log
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.timeindex
- freblogg <span class="m">197121</span> <span class="m">0</span> Aug <span class="m">5</span> <span class="m">08</span>:26 leader-epoch-checkpoint
freblogg-1:
total 21M
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.index
- freblogg <span class="m">197121</span> <span class="m">68</span> Aug <span class="m">5</span> <span class="m">10</span>:15 <span class="m">00000000000000000000</span>.log
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.timeindex
- freblogg <span class="m">197121</span> <span class="m">11</span> Aug <span class="m">5</span> <span class="m">10</span>:15 leader-epoch-checkpoint
freblogg-2:
total 21M
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.index
- freblogg <span class="m">197121</span> <span class="m">79</span> Aug <span class="m">5</span> <span class="m">09</span>:59 <span class="m">00000000000000000000</span>.log
- freblogg <span class="m">197121</span> 10M Aug <span class="m">5</span> <span class="m">08</span>:26 <span class="m">00000000000000000000</span>.timeindex
- freblogg <span class="m">197121</span> <span class="m">11</span> Aug <span class="m">5</span> <span class="m">09</span>:59 leader-epoch-checkpoint
</code></pre></div>
<p>In our case, we only had one segment in each of our partitions which is <code>00000000000000000000</code>. Since we don't see another segment file present, it means that <code>00000000000000000000</code> is the active segment in each of those partitions.</p>
<p>The default value for segment size is a high value (1 GB) but let’s say we’ve tweaked the Kafka configuration so that each segment can hold only three messages. Let’s see how that would play out.</p>
<p>Say this is the current state of the <code>freblogg-2</code> partition. We've three messages pushed into it.</p>
<p><img alt="Kafka segment with messages" src="https://freblogg.com/images/kafka-segment-with-messages.png"></p>
<p>Since ‘three messages’ is the limit we’ve set, If a new message comes into this partition, Kafka will automatically close the current segment, create a new segment, make that the active segment and store that new message in the new segment’s log file.</p>
<div class="highlight"><pre><span></span><code>(I'm not showing the preceding zeroes to make it easy on the eyes)
freblogg-2
|-- 00.index
|-- 00.log
|-- 00.timeindex
|-- 03.index
|-- 03.log
|-- 03.timeindex
`--
</code></pre></div>
<p>You should’ve noted that the name of the newer segment is not <code>01</code>. Instead, you see <code>03.index, 03.log</code>. So, what is going on?</p>
<p><img alt="Kafka segment with new message" src="https://freblogg.com/images/kafka-segment-with-new-message.png"></p>
<p>This is because Kafka makes the lowest offset in the segment as its name. Since the new message that came into the partition has offset <code>3</code>, that is the name Kafka gives for the new segment. It also means that since we have <code>00</code> and <code>03</code> as our segments, we can be sure that the messages with offsets 0,1 and 2 are indeed present in the <code>00</code> segment. New messages coming into the <code>freblogg-2</code> partition with offsets 3,4 and 5 will be stored in the segment <code>03</code>.</p>
<p>One of the common operations in Kafka is to read the message at a particular offset. For this, if it has to go to the log file to find the offset, it becomes an expensive task especially because the log file can grow to huge sizes (Default — 1G). This is where the <code>.index</code> file becomes useful. <strong>Index file stores the offsets and physical position of the message in the log file</strong>.</p>
<p>An index file for the log file I’ve showed in the ‘Quick detour’ above would look something like this:</p>
<p><img alt="Index and log file" src="https://freblogg.com/images/kafka-index-log.png"></p>
<p>If you need to read the message at offset 1, you first search for it in the index file and figure out that the message is in position <code>79</code>. Then you directly go to position 79 in the log file and start reading. This makes it quite effective as you can use binary search to quickly get to the correct offset in the already sorted index file.</p>
<h4 id="parallelism-with-partitions">Parallelism with Partitions</h4>
<p>To guarantee the order of reading messages from a partition, Kafka restricts to having only one consumer (from a consumer group) per partition. So, if a partition gets messages a,f and k, the consumer will also read them in the order a,f and k. This is an important thing to make a note of as the <strong>order of message consumption is not guaranteed at a topic level</strong> when you have multiple partitions.</p>
<p>Just increasing the number of consumers won’t increase the parallelism. You need to scale your partitions accordingly. To read data from a topic in parallel with two consumers, you create two partitions so that each consumer can read from its own partition. Also since partitions of a topic can be on different brokers, two consumers of a topic can read the data from two different brokers.</p>
<h4 id="topics">Topics</h4>
<p>We’ve finally come to what a topic is. We’ve covered a lot of things about topics already. The most important thing to know is that <strong>a Topic is merely a logical grouping of several partitions</strong>.</p>
<p>A topic can be distributed across multiple brokers. This is done using the partitions. But a partition still needs to be on a single broker. Each topic will have its unique name and the partitions will be named from that.</p>
<h4 id="replication">Replication</h4>
<p>Let’s talk about replication. Whenever we’re creating a topic in Kafka, we need to specify the replication factor we need for that topic. Let's say we've two brokers and so we've given the <code>replication-factor</code> as 2. What this means is that Kafka will try to always ensure that each partition of this topic has a backup/replica. The way Kafka distributes the partitions is quite similar to how HDFS distributes its data blocks across nodes.</p>
<p>Say for the <code>freblogg</code> topic that we've been using so far, we've given the replication factor as 2. The resulting distribution of its three partitions will look something like this.</p>
<p><img alt="Kafka Partition distribution" src="https://freblogg.com/images/kafka-partition-distribution.png"></p>
<p>Even when you have a replicated partition on a different broker, Kafka wouldn’t let you read from it because in each replicated set of partitions, there is a <code>LEADER</code> and the rest of them are just mere <code>FOLLOWERS</code> serving as backup. The followers keep on syncing the data from the leader partition periodically, waiting for their chance to shine. When the leader goes down, one of the <code>in-sync</code> follower partitions is chosen as the new leader and now you can consume data from this partition.</p>
<p>A Leader and a Follower of a single partition are never in a single broker. It should be quite obvious why that is so.</p>
<p>Finally, this long article ends. Congratulations on making it this far. You now know most of what there is to know about Kafka’s data storage. To ensure that you retain this information let’s do a quick recap.</p>
<h4 id="recap">Recap</h4>
<ul>
<li>Data in Kafka is stored in topics</li>
<li>Topics are partitioned</li>
<li>Each partition is further divided into segments</li>
<li>Each segment has a log file to store the actual message and an index file to store the position of the messages in the log file</li>
<li>Various partitions of a topic can be on different brokers but a partition is always tied to a single broker</li>
<li>Replicated partitions are passive. You can consume messages from them only when the leader is down</li>
</ul>
<p>That ought to cover everything we’ve talked about. Thanks for reading. See you again in the next one.</p>
<hr>
<p>Attribution:</p>
<p>Kafka image - https://kafka.apache.org/images/kafka_diagram.png</p>Reshaping Pandas Data frames with Melt & Pivot2018-06-17T09:00:00+05:302018-06-17T09:00:00+05:30Swarooptag:freblogg.com,2018-06-17:/pandas-melt-pivot<p>Pandas is a wonderful data manipulation library in python. Working in the field of Data science and Machine learning, I find myself using Pandas pretty much everyday. It's an invaluable tool for data analysis and manipulation.</p>
<p>In this short article, I will show you what Melt and Pivot (Reverse melt …</p><p>Pandas is a wonderful data manipulation library in python. Working in the field of Data science and Machine learning, I find myself using Pandas pretty much everyday. It's an invaluable tool for data analysis and manipulation.</p>
<p>In this short article, I will show you what Melt and Pivot (Reverse melt or Unmelt) are in Pandas, and how you can use them for reshaping and manipulating data frames.</p>
<p><img alt="Happy Panda" src="https://freblogg.com/pandas-melt-pivot/happy-panda.png"></p>
<p>Say, I have the data of the closing prices of stock market data of stock market closing prices of two major companies for the last week as follows:</p>
<table>
<thead>
<tr>
<th align="center">Day</th>
<th align="center">Google</th>
<th align="center">Apple</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">MON</td>
<td align="center">1129</td>
<td align="center">191</td>
</tr>
<tr>
<td align="center">TUE</td>
<td align="center">1132</td>
<td align="center">192</td>
</tr>
<tr>
<td align="center">WED</td>
<td align="center">1134</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">THU</td>
<td align="center">1152</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">FRI</td>
<td align="center">1152</td>
<td align="center">188</td>
</tr>
</tbody>
</table>
<p>For an analysis I want to do I need the names of the companies Google & Apple to appear in a single column with the stock price as another column, something like this:</p>
<table>
<thead>
<tr>
<th align="center">Day</th>
<th align="center">Company</th>
<th align="center">Closing Price</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">MON</td>
<td align="center">Google</td>
<td align="center">1129</td>
</tr>
<tr>
<td align="center">TUE</td>
<td align="center">Google</td>
<td align="center">1132</td>
</tr>
<tr>
<td align="center">WED</td>
<td align="center">Google</td>
<td align="center">1134</td>
</tr>
<tr>
<td align="center">THU</td>
<td align="center">Google</td>
<td align="center">1152</td>
</tr>
<tr>
<td align="center">FRI</td>
<td align="center">Google</td>
<td align="center">1152</td>
</tr>
<tr>
<td align="center">MON</td>
<td align="center">Apple</td>
<td align="center">191</td>
</tr>
<tr>
<td align="center">TUE</td>
<td align="center">Apple</td>
<td align="center">192</td>
</tr>
<tr>
<td align="center">WED</td>
<td align="center">Apple</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">THU</td>
<td align="center">Apple</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">FRI</td>
<td align="center">Apple</td>
<td align="center">188</td>
</tr>
</tbody>
</table>
<p>This is exactly where <code>Melt</code> comes into picture. Melt is used for converting multiple columns into a single column, which is exactly what I need here.</p>
<p>Let's see how we can do this.</p>
<h3 id="melt">Melt</h3>
<p>First we need to import pandas.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
</code></pre></div>
<p>Then, we'll create the Dataframe with the data.</p>
<div class="highlight"><pre><span></span><code>df = pd.DataFrame(data = {
'Day' : ['MON', 'TUE', 'WED', 'THU', 'FRI'],
'Google' : [1129,1132,1134,1152,1152],
'Apple' : [191,192,190,190,188]
})
</code></pre></div>
<p>And this will get us the dataframe we need as follows:</p>
<table>
<thead>
<tr>
<th align="center"></th>
<th align="center">Day</th>
<th align="center">Google</th>
<th align="center">Apple</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">0</td>
<td align="center">MON</td>
<td align="center">1129</td>
<td align="center">191</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">TUE</td>
<td align="center">1132</td>
<td align="center">192</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">WED</td>
<td align="center">1134</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">THU</td>
<td align="center">1152</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">FRI</td>
<td align="center">1152</td>
<td align="center">188</td>
</tr>
</tbody>
</table>
<p>Let's melt this now. To melt this dataframe, you call the <code>melt()</code> method on the dataframe with the <code>id_vars</code> parameter set.</p>
<div class="highlight"><pre><span></span><code><span class="n">reshaped_df</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">melt</span><span class="p">(</span><span class="n">id_vars</span><span class="o">=</span><span class="p">[</span><span class="s1">'Day'</span><span class="p">])</span> <span class="c1"># id_vars is the column you do not want to change</span>
</code></pre></div>
<p>And you're done. Your reshaped_df would like this now.</p>
<table>
<thead>
<tr>
<th align="center"></th>
<th align="center">Day</th>
<th align="center">variable</th>
<th align="center">value</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">0</td>
<td align="center">MON</td>
<td align="center">Apple</td>
<td align="center">191</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">TUE</td>
<td align="center">Apple</td>
<td align="center">192</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">WED</td>
<td align="center">Apple</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">THU</td>
<td align="center">Apple</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">FRI</td>
<td align="center">Apple</td>
<td align="center">188</td>
</tr>
<tr>
<td align="center">5</td>
<td align="center">MON</td>
<td align="center">Google</td>
<td align="center">1129</td>
</tr>
<tr>
<td align="center">6</td>
<td align="center">TUE</td>
<td align="center">Google</td>
<td align="center">1132</td>
</tr>
<tr>
<td align="center">7</td>
<td align="center">WED</td>
<td align="center">Google</td>
<td align="center">1134</td>
</tr>
<tr>
<td align="center">8</td>
<td align="center">THU</td>
<td align="center">Google</td>
<td align="center">1152</td>
</tr>
<tr>
<td align="center">9</td>
<td align="center">FRI</td>
<td align="center">Google</td>
<td align="center">1152</td>
</tr>
</tbody>
</table>
<p>The <code>id_vars</code> you've passed into the melt() method is to specify which column you want to leave untouched. Since we want the Day column to stay the same even after the melt, we set <code>id_vars=['Day']</code>.</p>
<p>Also, you would have noticed that the output dataframe of melt has the columns <code>variable</code> and <code>value</code>. These are the default names given by pandas for the columns. We can change this either manually with something like</p>
<div class="highlight"><pre><span></span><code>reshaped_df.columns = [['Day', 'Company', 'Closing Price']]
</code></pre></div>
<p>Or, we can specify the values for these columns in the <code>melt()</code> itself. Melt takes arguments <code>var_name</code> and <code>value_name</code> apart from <code>id_vars</code>. These options specify the names for the <code>variable</code> column and the <code>value</code> column respectively.</p>
<div class="highlight"><pre><span></span><code><span class="n">reshaped_df</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">melt</span><span class="p">(</span><span class="n">id_vars</span><span class="o">=</span><span class="p">[</span><span class="s1">'Day'</span><span class="p">],</span> <span class="n">var_name</span><span class="o">=</span><span class="s1">'Company'</span><span class="p">,</span> <span class="n">value_name</span><span class="o">=</span><span class="s1">'Closing Price'</span><span class="p">)</span>
</code></pre></div>
<p>That will give us:</p>
<table>
<thead>
<tr>
<th align="center"></th>
<th align="center">Day</th>
<th align="center">Company</th>
<th align="center">Closing Price</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">0</td>
<td align="center">MON</td>
<td align="center">Apple</td>
<td align="center">191</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">TUE</td>
<td align="center">Apple</td>
<td align="center">192</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">WED</td>
<td align="center">Apple</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">THU</td>
<td align="center">Apple</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">FRI</td>
<td align="center">Apple</td>
<td align="center">188</td>
</tr>
<tr>
<td align="center">5</td>
<td align="center">MON</td>
<td align="center">Google</td>
<td align="center">1129</td>
</tr>
<tr>
<td align="center">6</td>
<td align="center">TUE</td>
<td align="center">Google</td>
<td align="center">1132</td>
</tr>
<tr>
<td align="center">7</td>
<td align="center">WED</td>
<td align="center">Google</td>
<td align="center">1134</td>
</tr>
<tr>
<td align="center">8</td>
<td align="center">THU</td>
<td align="center">Google</td>
<td align="center">1152</td>
</tr>
<tr>
<td align="center">9</td>
<td align="center">FRI</td>
<td align="center">Google</td>
<td align="center">1152</td>
</tr>
</tbody>
</table>
<h3 id="unmeltreverse-meltpivot">Unmelt/Reverse Melt/Pivot</h3>
<p>We can also do the reverse of the melt operation which is also called as <strong>Pivoting</strong>. In Pivoting or Reverse Melting, we convert a column with multiple values into several columns of their own.</p>
<p>The <code>pivot()</code> method on the dataframe takes two main arguments <code>index</code> and <code>columns</code>. The <code>index</code> parameter is similar to <code>id_vars</code> we have seen before i.e., It is used to specify the column you don't want to touch. The columns parameter is to specify which column should be used to create the new columns.</p>
<div class="highlight"><pre><span></span><code>reshaped_df.pivot(index='Day', columns='Company')
</code></pre></div>
<p>Running the above command gives you the following:</p>
<div class="highlight"><pre><span></span><code><span class="nb">+---------+-----------------------+</span><span class="c"></span>
<span class="c">| | Closing Price |</span>
<span class="nb">+</span><span class="c">=========</span><span class="nb">+</span><span class="c">:=============:</span><span class="nb">+</span><span class="c">:=====:</span><span class="nb">+</span><span class="c"></span>
<span class="c">| Company | Google | Apple |</span>
<span class="nb">+---------+---------------+-------+</span><span class="c"></span>
<span class="c">| Day | | |</span>
<span class="nb">+---------+---------------+-------+</span><span class="c"></span>
<span class="c">| MON | 1129 | 191 |</span>
<span class="nb">+---------+---------------+-------+</span><span class="c"></span>
<span class="c">| TUE | 1132 | 192 |</span>
<span class="nb">+---------+---------------+-------+</span><span class="c"></span>
<span class="c">| WED | 1134 | 190 |</span>
<span class="nb">+---------+---------------+-------+</span><span class="c"></span>
<span class="c">| THU | 1152 | 190 |</span>
<span class="nb">+---------+---------------+-------+</span><span class="c"></span>
<span class="c">| FRI | 1152 | 188 |</span>
<span class="nb">+---------+---------------+-------+</span><span class="c"></span>
<span class="c"># (Showing in textual format as multi</span><span class="nb">-</span><span class="c">level columns are not posible in Markdown)</span>
</code></pre></div>
<p>This is close, but probably not exactly what you wanted. The <code>Closing Price</code> is an extra stacked column (index) on top of Google & Apple. So to get exactly the reverse of melt and get the original <code>df</code> dataframe we started with, we do the following:</p>
<div class="highlight"><pre><span></span><code>original_df = reshaped_df.pivot(index='Day', columns='Company')['Closing Price'].reset_index()
original_df.columns.name = None
</code></pre></div>
<p>And that gets us back to what we have started with.</p>
<table>
<thead>
<tr>
<th align="center">Day</th>
<th align="center">Google</th>
<th align="center">Apple</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">MON</td>
<td align="center">1129</td>
<td align="center">191</td>
</tr>
<tr>
<td align="center">TUE</td>
<td align="center">1132</td>
<td align="center">192</td>
</tr>
<tr>
<td align="center">WED</td>
<td align="center">1134</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">THU</td>
<td align="center">1152</td>
<td align="center">190</td>
</tr>
<tr>
<td align="center">FRI</td>
<td align="center">1152</td>
<td align="center">188</td>
</tr>
</tbody>
</table>
<p>That is all for this article. I hope this was useful for you and that you'll try to use this in your data processing workflow.</p>
<hr>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a>, <a href="https://freblogg.com/tags/java">Freblogg/Java</a>, <a href="https://freblogg.com/tags/apache20spark">Freblogg/Spark</a></p>
<p>Thanks for reading. See you again in the next article.</p>Stash uncommitted changes with Git Stash2018-01-13T09:00:00+05:302018-01-13T09:00:00+05:30Swarooptag:freblogg.com,2018-01-13:/git-stash<p>You are in the middle of developing a feature and suddenly your manager tells you to work on an urgent fix for a production bug! You want to create a new branch for the fix but git wouldn't let you as you have uncommited changes. How can you switch to …</p><p>You are in the middle of developing a feature and suddenly your manager tells you to work on an urgent fix for a production bug! You want to create a new branch for the fix but git wouldn't let you as you have uncommited changes. How can you switch to a new branch without losing your local uncommitted changes? Git Stash to your rescue.</p>
<p><img alt="Git Logo" src="https://freblogg.com/git-stash/git-logo.png"></p>
<p>Say, I’ve two commits in my git repository:</p>
<div class="highlight"><pre><span></span><code>$ git log --oneline --decorate --graph
* 10c532b <span class="o">(</span>HEAD -> master<span class="o">)</span> Add File2 <- Commit <span class="c1">#1</span>
* d19fe8d Add File1 <- Commit <span class="c1">#2</span>
</code></pre></div>
<p>And I’ve those two files <code>file1</code> and <code>file2</code> in my directory.</p>
<div class="highlight"><pre><span></span><code>$ ls
file1 file2
</code></pre></div>
<p>After I have added <code>file2</code>, I've made some changes to it that have not been committed yet, as you see from the output of <code>git status</code> below:</p>
<div class="highlight"><pre><span></span><code>$ git status
On branch master
Changes not staged <span class="k">for</span> commit:
<span class="o">(</span>use <span class="s2">"git add <file>..."</span> to update what will be committed<span class="o">)</span>
<span class="o">(</span>use <span class="s2">"git checkout -- <file>..."</span> to discard changes <span class="k">in</span> working directo
modified: file2
no changes added to commit <span class="o">(</span>use <span class="s2">"git add"</span> and/or <span class="s2">"git commit -a"</span><span class="o">)</span>
</code></pre></div>
<p>At this stage, say I have to switch to a different branch or a different commit, I usually have two options. Either commit these changes or lose them by switching to the other commit. As I don’t want to pick either of those options, I will go with the third option available, which is <em>Stashing</em>.</p>
<p>Git stash, as the name indicates, lets you stash-away some changes temporarily. You can think of stashes as being "temporary commits". </p>
<p>You can stash your changes with the following command:</p>
<div class="highlight"><pre><span></span><code>git stash save "Changes in file2"
</code></pre></div>
<p>To <code>git stash</code>, I pass in the command <code>save</code> along with a message. This message is similar to a commit messages, by which you can identify a particular stash.</p>
<p>You can do the same with just <code>git stash</code> as well. But with that, you will not be able to provide a stash message.</p>
<p>You can see the existing list of stashes with the <code>list</code> command:</p>
<div class="highlight"><pre><span></span><code>$ git stash list
stash@<span class="o">{</span><span class="m">0</span><span class="o">}</span>: On master: Changes <span class="k">in</span> file2
</code></pre></div>
<p>That is the stash I’ve just created, and you can see the branch name <code>master</code> and the stash message I have given as well. The <code>stash@{0}</code> is the identifier for your stash.</p>
<p>Now that we have created the stash, to use it, we have two options.</p>
<ol>
<li><code>apply</code> — Adds the changes in stash to working directory but keeps the stash</li>
<li><code>pop</code> — Adds the changes in stash to working directory and deletes the stash</li>
</ol>
<p>To <code>apply</code> a stash:</p>
<div class="highlight"><pre><span></span><code>git stash apply stash@{0}
</code></pre></div>
<p>This will still keep the stash, and you will see it in the output of the <code>list</code> command.</p>
<p>To <code>pop</code> a stash:</p>
<div class="highlight"><pre><span></span><code>git stash pop stash@{0}
</code></pre></div>
<p>If you want to delete the stash, you can do:</p>
<div class="highlight"><pre><span></span><code>git stash drop stash@{0}
</code></pre></div>
<p>And that will delete the stash.</p>
<p>Git stashes are a great way to quickly stow away your unsaved changes for some later use. Try this out and this can be a really useful tool in your development workflow. One way in which I use stashes is to make a change on multiple branches. I <code>stash</code> the necessary changes and then <code>apply</code> the stash on all of the branches. Pretty neat!</p>
<p>That is all for this article.</p>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a>, <a href="https://freblogg.com/tags/git">Freblogg/Git</a></p>
<p>Thanks for reading. See you again in the next article.</p>
<hr>
<p>Image attribution:</p>
<p>Git Logo - Git Logo by Jason Long is licensed under the Creative Commons Attribution 3.0 Unported License. </p>Build A Web Application With Flask In Python Part I2018-01-12T23:54:00+05:302018-01-12T23:54:00+05:30Durga Swaroop Perlatag:freblogg.com,2018-01-12:/webapp-with-flask-1<p>Flask is a popular micro web application framework for Python using which you can create web apps. Unlike another popular framework like Django, Flask keeps its foot print to a minimum providing only the basic functionality required instead of picking out the entire stack for you the way Django does …</p><p>Flask is a popular micro web application framework for Python using which you can create web apps. Unlike another popular framework like Django, Flask keeps its foot print to a minimum providing only the basic functionality required instead of picking out the entire stack for you the way Django does. And we call it a micro framework for this very reason. Using flask's extensibility at the core, you can build any type of applications by picking the components you want to use. Several big name companies like LinkedIn, Pinterest use Flask for their products.</p>
<p><img alt="Flask Logo" class="aligncenter" height="300" src="https://avatars1.githubusercontent.com/u/18305767"> </p>
<p>In this tutorial we will get started with using <code>Flask</code> and create a simple web application with it.</p>
<h3 id="prerequisites">Prerequisites</h3>
<p>To follow along with this series you should have some knowledge of Python language. I'm using <code>3.6</code> for these tutorials and if you would like to follow along without any issues, I would suggest you to use the same version. For any of the previous versions, there might be a couple of changes in the syntax but the ideas and concepts will remain same.</p>
<p>You will also need to install <code>Flask</code>. You can do that with <code>pip</code>.</p>
<div class="highlight"><pre><span></span><code>pip install -U flask
</code></pre></div>
<p>This will install flask if you don't already have it and update the version to latest if you have a previous version installed.</p>
<p>With those two things, you are good to go.</p>
<h3 id="getting-started">Getting Started</h3>
<p>Just like with anything else you start by importing the stuff you want.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span>
</code></pre></div>
<p>And this will make <code>Flask</code> ready for you to use. After this you have to create an <code>app</code> object by calling the <code>Flask</code> constructor like this:</p>
<div class="highlight"><pre><span></span><code>app = Flask("hello")
</code></pre></div>
<p>This will create our <code>app</code> object. The name <code>hello</code> I've specified in the constructor can be anything. But the usual convention is to keep it <code>__main__</code>. Also, the app is just a variable. So, you can name it anything you want.</p>
<p>Next you have to define the routes. Using routes you configure your server to do different actions. Let's say when you type in some website URL in to your browser, you will be taken to its home page. Now if you do a <code><website>/info</code> it will take you to the info page. So, this mapping of the call to <code>/info</code> URL to the <code>info</code> page is what we call as a route. For the home page the route is simply <code>/</code>.</p>
<p>Let's say we want our server's homepage to display <code>Hello World</code>. You can configure that with a method like this:</p>
<div class="highlight"><pre><span></span><code><span class="nv">@app</span><span class="p">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span><span class="w"></span>
<span class="n">def</span><span class="w"> </span><span class="k">index</span><span class="p">()</span><span class="err">:</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="ss">"Hello World"</span><span class="w"></span>
</code></pre></div>
<p>With the <code>@app.route('/')</code>, we are defining a route on our server. So, when ever somebody opens that route, which for us i the homepage, the <code>index()</code> method associated with that route annotation will be called. And when the <code>index()</code> method is called it will return <code>Hello World</code> just as we expect it to.</p>
<p>And there is one final command to start and run our server which is:</p>
<div class="highlight"><pre><span></span><code>app.run(debug=True)
</code></pre></div>
<p>And that's it. This will run the app that we have created when you run the python file. The <code>debug=True</code> option is useful while developing and testing applications. So, we'll keep that for now.</p>
<p>Just run your python script and you should output like this on the console:</p>
<div class="highlight"><pre><span></span><code>* Debugger is active!
* Debugger PIN: 127-398-124
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
</code></pre></div>
<p>Now If you go to <a href="http://localhost:5000">http://localhost:5000</a>, you can see <code>Hello World</code> displayed.</p>
<p>That's it. You have successfully created your first web application with flask in just 3 lines of code. Now, that is awesome. Stay tuned for the next part.</p>
<p>That is all for this article.</p>
<hr>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a> <a href="https://freblogg.com/tags/python">Freblogg/Python</a></p>
<p>Some articles on automation:</p>
<p><a href="https://medium.com/@durgaswaroop/web-scraping-with-python-introduction-7b3c0bbb6053">Web Scraping For Beginners with Python</a></p>
<p><a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a></p>
<p><a href="https://medium.com/@durgaswaroop/publish-articles-on-blogger-in-just-one-second-2ef45586901">Publish articles to Blogger automatically</a></p>
<p><a href="https://freblogg.com/publish-articles-to-your-medium-blog">Publish articles to Medium automatically</a></p>
<hr>
<p>This is the 21st article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Nine more articles on various topics, including but not limited to, <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. </p>
<hr>
<p>If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Json Parsing With Python2018-01-10T23:56:00+05:302018-01-10T23:56:00+05:30Durga Swaroop Perlatag:freblogg.com,2018-01-10:/json-parsing-with-python<p>JSON has become an ubiquitous data exchange format everywhere. Pretty much every service has a JSON API. And since it is so popular, most of the programming languages have built-in JSON parsers. And Of course, Python is no exception. In this article, I'll show you how you can parse JSON …</p><p>JSON has become an ubiquitous data exchange format everywhere. Pretty much every service has a JSON API. And since it is so popular, most of the programming languages have built-in JSON parsers. And Of course, Python is no exception. In this article, I'll show you how you can parse JSON with Python's <code>json</code> library.</p>
<p><img alt="Python Logo" class="aligncenter" src="https://www.python.org/static/community_logos/python-logo-master-v3-TM.png"> </p>
<p>JSON parsing in Python is quite straight forward and easy unlike in some languages, where it is unnecessarily cumbersome. Like everything else in Python, You start by importing the library you want.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">json</span>
</code></pre></div>
<p>In this article, I am going to use the following JSON I got from <a href="http://json.org/example.html">json.org</a></p>
<div class="highlight"><pre><span></span><code>{
"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}
}
</code></pre></div>
<p>We have got a good set of dictionaries and arrays to work with in this data. If you want to follow along, you can use the same JSON or you can use anything else as well.</p>
<p>The first thing to do is to get this json string into a variable.</p>
<div class="highlight"><pre><span></span><code>json_string = """{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}"""
</code></pre></div>
<p>And now we parse this string into a <code>dictionary</code> object with the help of the <code>json</code> library's <code>loads()</code> method.</p>
<div class="highlight"><pre><span></span><code><span class="n">json_dict</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">json_string</span><span class="p">)</span>
</code></pre></div>
<p>And you're done. The JSON is parsed and is stored in the <code>json_dict</code> object. The <code>json_dict</code> here is a python dictionary object. If you want to verify, you can do that by calling the <code>type()</code> on it with</p>
<div class="highlight"><pre><span></span><code>print(type(json_dict))
</code></pre></div>
<p>And it will show that it is <code><class 'dict'></code>.</p>
<p>Getting back, We have the entire json object as a dictionary in <code>json_dict</code> object and you can just drill down into the dictionary with the keys. On the top level, We just have one key in the dictionary which is <code>menu</code>. We get can get that by indexing the dictionary with that key.</p>
<div class="highlight"><pre><span></span><code>menu = json_dict['menu']
</code></pre></div>
<p>And of course <code>menu</code> is a dictionary too with the keys <code>id</code>, <code>value</code>, and <code>popup</code>. We can access them and print them as well.</p>
<div class="highlight"><pre><span></span><code>print(menu['id']) ## => 'file'
print(menu['value']) ## => 'File'
</code></pre></div>
<p>And then finally we've got <code>popup</code> which is another dictionary as well with the key <code>menuitem</code> which is a list. We can verify this by checking the types of these objects.</p>
<div class="highlight"><pre><span></span><code>popup = menu['popup']
print(type(popup)) ## => <class 'dict'>
menuitem = popup['menuitem']
print(type(menuitem)) ## => <class 'list'>
</code></pre></div>
<p>And Since <code>menuitem</code> is a list, we can iterate on it and print the values.</p>
<div class="highlight"><pre><span></span><code><span class="k">for</span> <span class="nv">item</span> <span class="nv">in</span> <span class="nv">menuitem</span>:
<span class="nv">print</span><span class="ss">(</span><span class="nv">item</span><span class="ss">)</span>
</code></pre></div>
<p>And the output is</p>
<div class="highlight"><pre><span></span><code>{'value': 'New', 'onclick': 'CreateNewDoc()'}
{'value': 'Open', 'onclick': 'OpenDoc()'}
{'value': 'Close', 'onclick': 'CloseDoc()'}
</code></pre></div>
<p>And of course each of these elements are dictionaries and so you can go further inside and access those keys and values.</p>
<p>For example, If you want to access <code>New</code> from the above output, you can do this:</p>
<div class="highlight"><pre><span></span><code>print(menuitem[0]['value']) ## => New
</code></pre></div>
<p>And so on and so forth to get any value in the JSON.</p>
<p>And not only that, <code>json</code> library can also accept JSON responses from web services. One cool thing here is that, web server responses are <code>byte</code> strings which means that if you want to use them in your program you'd have convert them to regular strings by using the <code>decode()</code> method. But for <code>json</code> you don't have to do that. You can directly feed in the <code>byte</code> string and it will give you a parsed object. That's pretty cool!</p>
<p>That is all for this article.</p>
<hr>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a> <a href="https://freblogg.com/tags/python">Freblogg/Python</a></p>
<p>Some of my other articles on automation:</p>
<p><a href="https://medium.com/@durgaswaroop/web-scraping-with-python-introduction-7b3c0bbb6053">Web Scraping For Beginners with Python</a></p>
<p><a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a></p>
<p><a href="https://medium.com/@durgaswaroop/publish-articles-on-blogger-in-just-one-second-2ef45586901">Publish articles to Blogger automatically</a></p>
<p><a href="https://freblogg.com/publish-articles-to-your-medium-blog">Publish articles to Medium automatically</a></p>
<hr>
<p>This is the 19th article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Eleven more articles on various topics, including but not limited to, <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. While you're at it, Go ahead and subscribe <a href="https://medium.com/@durgaswaroop/">on medium</a> and my <a href="http://freblogg.com">blog</a> as well.</p>
<hr>
<p>If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Writing Datasets to Disk | Datasets In Apache Spark III2018-01-08T21:46:00+05:302018-01-08T21:46:00+05:30Durga Swaroop Perlatag:freblogg.com,2018-01-08:/apache-spark-datasets-3<p>In the <a href="https://freblogg.com/apache-spark-datasets-2">last tutorial</a> we've seen how to create parametrized datasets. Once you create datasets and perform <a href="https://freblogg.com/apache-spark-datasets-1#operations-on-datasets">some operations</a> on them, you would like to save those results back into storage. This is what we'll try to do in this article - Saving Datasets to storage.</p>
<p><img alt="Spark Logo" class="aligncenter" src="https://redislabs.com/wp-content/uploads/2016/12/spark.png" width="500"> </p>
<p>The first thing we'll do …</p><p>In the <a href="https://freblogg.com/apache-spark-datasets-2">last tutorial</a> we've seen how to create parametrized datasets. Once you create datasets and perform <a href="https://freblogg.com/apache-spark-datasets-1#operations-on-datasets">some operations</a> on them, you would like to save those results back into storage. This is what we'll try to do in this article - Saving Datasets to storage.</p>
<p><img alt="Spark Logo" class="aligncenter" src="https://redislabs.com/wp-content/uploads/2016/12/spark.png" width="500"> </p>
<p>The first thing we'll do as always is to create the spark-session variable.</p>
<div class="highlight"><pre><span></span><code>// Initialize Sparksession
SparkSession spark = SparkSession.builder().appName("Freblogg-Spark").master("local").getOrCreate();
</code></pre></div>
<p>Using that session variable, we read the <code>fake-people.csv</code> file which has data like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">id</span><span class="p">,</span><span class="n">first_name</span><span class="p">,</span><span class="n">last_name</span><span class="p">,</span><span class="n">email</span><span class="p">,</span><span class="n">gender</span><span class="p">,</span><span class="n">ip_address</span><span class="w"></span>
<span class="mi">1</span><span class="p">,</span><span class="n">Netti</span><span class="p">,</span><span class="n">McKirdy</span><span class="p">,</span><span class="n">nmckirdy0</span><span class="nv">@slideshare</span><span class="p">.</span><span class="n">net</span><span class="p">,</span><span class="n">Female</span><span class="p">,</span><span class="mf">148.3.248.193</span><span class="w"></span>
<span class="mi">2</span><span class="p">,</span><span class="n">Nickey</span><span class="p">,</span><span class="n">Curreen</span><span class="p">,</span><span class="n">ncurreen1</span><span class="nv">@tripadvisor</span><span class="p">.</span><span class="n">com</span><span class="p">,</span><span class="n">Male</span><span class="p">,</span><span class="mf">206.9.48.216</span><span class="w"></span>
<span class="mi">3</span><span class="p">,</span><span class="n">Allayne</span><span class="p">,</span><span class="n">Chatainier</span><span class="p">,</span><span class="n">achatainier2</span><span class="nv">@trellian</span><span class="p">.</span><span class="n">com</span><span class="p">,</span><span class="n">Male</span><span class="p">,</span><span class="mf">191.118.4.217</span><span class="w"></span>
<span class="p">...</span><span class="w"></span>
</code></pre></div>
<p>We read this file into a dataset as following:</p>
<div class="highlight"><pre><span></span><code>// Read csv file
Dataset<Row> peopleDs = spark.read().option("header", "true").csv("fake-people.csv");
</code></pre></div>
<p>After we have the dataset, Let's assume you've performed some operations on it. Some <a href="https://freblogg.com/apache-spark-datasets-1#column-selection">column selections</a>, some <a href="https://freblogg.com/apache-spark-datasets-1#filtering-on-columns">filtering</a>, some <a href="https://freblogg.com/apache-spark-datasets-1#sorting-on-columns">sorting</a> etc. And we have a new dataset after all those operations.</p>
<div class="highlight"><pre><span></span><code>// After performing several awesome operations
Dataset<Row> newDs = ....
</code></pre></div>
<p>We want to store this dataset back on the disk. We can do that with the <code>write()</code> on spark session variable, just like <code>read()</code>.</p>
<div class="highlight"><pre><span></span><code>newDs.write().csv("processed-data");
</code></pre></div>
<p>The <code>processed-data</code> in the above command is not the name for the output CSV file but instead for the output directory. When you write a Dataset to a file, it will store the data in the format you asked for, <code>CSV</code> in this case, along with adding some check files and status flags as well creating a directory with that name.</p>
<p>These are the files that get created in the <code>processed-data</code> folder.</p>
<div class="highlight"><pre><span></span><code>$ ls ../../apache-spark/processed-data
_SUCCESS part-00000-311049cf-3e48-4286-b93c-7d2096a18678-c000.csv
</code></pre></div>
<p>There are two more hidden CRC files that I'm not showing here. The <code>part-00000-31hxxxxxxxxx.csv</code> is the actual data file which has the data from the new dataset.</p>
<p>You can also create a <code>json</code> file by running</p>
<div class="highlight"><pre><span></span><code>newDs.write().json("processed-data")
</code></pre></div>
<p>And that will create another folder with json file and the <code>_SUCCESS</code> file inside it.</p>
<p>You can also save this data to an external Database if you want to. You'll use the <code>jdbc()</code> method along with the connection string and the table name. And Spark will write it to the DB.</p>
<p><img alt="Parquet Logo" class="aligncenter" src="https://sdtimes.com/wp-content/uploads/2015/04/OGYr_m6J.jpeg" width="300"> </p>
<p>Apart from the CSV and JSON formats, there is one more popular data format in the Data Science and Big Data world. That is <a href="https://parquet.apache.org/">Parquet</a>. Parquet is a data format that is highly optimized and well suited for column-wise operations. It is widely used in a lot of projects in the Big Data ecosystem as a data serialization format. And In Spark, Parquet is the default file storage format. Of course one main difference between Parquet and formats like CSV, JSON is that Parquet is not meant to be used for humans. It can only be read by a parquet reader. A sample file looks something like this:</p>
<div class="highlight"><pre><span></span><code>PAR1 �k �>, � 999 1 �5, � 1 2 3 4 5 6 7 8 9 - 0 1 2 3 4 5 6 7 8 < 2 < 2 < 2 < 2 < 2 < 2 < 2 <
.....
</code></pre></div>
<p>Utterly gibberish. But spark can read and understand it. In fact, As Parquet is designed for speed and throughput, it can be 10-100 times faster than reading/writing from an ordinary data format like CSV or JSON, depending on the type of data.</p>
<p>You save dataset to Parquet as follows:</p>
<div class="highlight"><pre><span></span><code>newDs.write().parquet("processed");
</code></pre></div>
<p>And this will save the dataset as a parquet file along with the <code>_SUCCESS</code> status file.</p>
<p>That is all for this article.</p>
<hr>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a>, <a href="https://freblogg.com/tags/java">Freblogg/Java</a>, <a href="https://freblogg.com/tags/apache20spark">Freblogg/Spark</a></p>
<p>Articles on Apache Spark:</p>
<p><a href="https://freblogg.com/apache-spark-map-vs-flatmap">Map Vs Flat map</a></p>
<p><a href="https://freblogg.com/spark-word-count-with-java">Spark Word count with Java</a></p>
<p><a href="https://freblogg.com/apache-spark-datasets-1">Datasets in Spark | Part I</a></p>
<p><a href="https://freblogg.com/apache-spark-datasets-2">Datasets in Spark | Part II</a></p>
<hr>
<p>This is the 17th article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Thirteen more articles on various topics, including but not limited to, <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. </p>
<hr>
<p>If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Remove Duplicate Elements From An Array2018-01-06T23:35:00+05:302018-01-06T23:35:00+05:30Durga Swaroop Perlatag:freblogg.com,2018-01-06:/remove-duplicate-elements-from-array<p>Interviews are a great place to learn about your strengths and weaknesses, which makes them a great way to improve oneself. In one of my interviews, I was asked to <code>Remove duplicate elements from an array</code>. So, given the array <code>a</code> as below, I've to produce <code>b</code>.</p>
<div class="highlight"><pre><span></span><code>a = {1, -2 …</code></pre></div><p>Interviews are a great place to learn about your strengths and weaknesses, which makes them a great way to improve oneself. In one of my interviews, I was asked to <code>Remove duplicate elements from an array</code>. So, given the array <code>a</code> as below, I've to produce <code>b</code>.</p>
<div class="highlight"><pre><span></span><code>a = {1, -2, 3, 1, 0, 9, 5, 6, 4, 5, 3, 1, 0}
b = {1, -2, 3, 0, 9, 5, 6, 4}
</code></pre></div>
<p>Here <code>b</code> has the same order of elements as <code>a</code> but per the problem statement, It is not necessary to do that.</p>
<p>I was flustered for a bit after getting the question. It took me a while to get to a proper solution, not before getting my first solution rejected for using <code>HashMap</code> which apparently, I was not supposed to. I am attributing this mainly to the fact that I was told to write Java code on a piece of paper and not an IDE. Anyway, I came home after that and decided to try it out and find what other's have done online. That is what this article is about.</p>
<p><img alt="Array of Donuts" src="https://source.unsplash.com/qZ6uvJHLHFc/640x427"> </p>
<p>Since that particular interview was in Java, It is only fair that I use Java for the solution here, although I really wanted to do it in Python. Maybe some other time.</p>
<p>Approaches for solving the problem:</p>
<h3 id="approach-1">Approach #1</h3>
<p>The most naive approach would be to just look through the entire array and compare each element to every other element to see if there's a duplicate. Of course, this is useless as its time complexity is <code>O(n^2)</code>. So, Let's skip this one and go to the next one.</p>
<h3 id="approach-2">Approach #2</h3>
<p>Another approach is using a HashMap to keep track of elements. This is what I've tried initially but was rejected because I've used HashMap when I wasn't supposed to. The pseudo code would be:</p>
<div class="highlight"><pre><span></span><code>map = new Map // Create map
new_arrray = []
for number in numbers_array
if not map.contains(number)
map += number
new_array += number
print(new_array)
</code></pre></div>
<p>Of course, Since I wrote my implementation of this in Java, I had to make a few modifications to this as you need to first define the size of the array and only then can you add elements to it. So, I've added a count variable to count unique elements and then created a new array after the iteration with that size. This would require two loop iterations, but it is still <code>O(n)</code> which is fine. But Alas I couldn't use this.</p>
<p>And so, then comes my final approach.</p>
<h3 id="approach-3">Approach #3</h3>
<p>The third solution is to first sort the array and then from the sorted array, remove duplicates. We can do this because the problem didn't want us to maintain the given input order. Otherwise, we wouldn't have been able to sort the array.</p>
<p>Sorting is easy enough. We just use the built-in sort method, which will sort the array in place.</p>
<div class="highlight"><pre><span></span><code>Arrays.sort(numbers);
</code></pre></div>
<p>Then comes the major part which is removing the duplicates in that sorted array. We can accomplish that by using two pointers <code>i</code>, <code>j</code> on our array. <code>i</code> goes through the entire loop while <code>j</code> is the slow-moving pointer that only changes based on a condition.</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="nc">int</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Slow</span><span class="w"> </span><span class="n">moving</span><span class="w"> </span><span class="k">index</span><span class="w"></span>
<span class="o">//</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">fast</span><span class="w"> </span><span class="n">moving</span><span class="w"> </span><span class="k">index</span><span class="w"> </span><span class="n">that</span><span class="w"> </span><span class="n">loops</span><span class="w"> </span><span class="n">through</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">entire</span><span class="w"> </span><span class="k">array</span><span class="w"></span>
<span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="nc">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">numbers</span><span class="p">.</span><span class="n">length</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">numbers</span><span class="o">[</span><span class="n">i</span><span class="o">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">numbers</span><span class="o">[</span><span class="n">j</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">j</span><span class="o">++</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">numbers</span><span class="o">[</span><span class="n">j</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">numbers</span><span class="o">[</span><span class="n">i</span><span class="o">]</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
</code></pre></div>
<p>The <code>j</code> index is basically playing catch up with <code>i</code>. When there is a duplicate element, <code>i</code> moves ahead while <code>j</code> stays back at the first duplicate element and then with <code>numbers[j] = numbers[i]</code>, we assign the next unique value to the <code>j</code> location. After this, our original array has unique elements till index <code>j</code> but after that, we'll have leftover elements. To take care of that, we can create a new array from the numbers array.</p>
<div class="highlight"><pre><span></span><code>int[] result = Arrays.copyOf(numbers, j + 1);
System.out.println(Arrays.toString(result));
</code></pre></div>
<p>And that's it. This will remove all the duplicated elements from the array. To test it, let me run the code:</p>
<div class="highlight"><pre><span></span><code>Input array: [1, -2, 3, 1, 0, 9, 5, 6, 4, 5]
Final result after removing duplicates: [-2, 0, 1, 3, 4, 5, 6, 9]
</code></pre></div>
<p>The sorting of the array can be assumed to be done in <code>O(nlogn)</code>. And then the iteration after that is <code>O(n)</code>. Put together you still technically get <code>O(n)</code>, which is the same as the previous case. Of course depending on a more specific kind of array, the sort might take less time as well. <code>O(n)</code> for the average case is what you finally get.</p>
<p>The full code is present as a gist:</p>
<script src="https://gist.github.com/durgaswaroop/a3b4ce78d4a1626b14ade072ea941cd2.js"></script>
<p>Let me know if you have any more questions that need answers. That is all for this article.</p>
<hr>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a> <a href="https://freblogg.com/tags/java">Freblogg/Java</a></p>
<hr>
<p>This is the 15th article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Fifteen more articles on various topics, including but not limited to, <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. While you're at it, Go ahead and subscribe <a href="https://medium.com/@durgaswaroop/">here on medium</a> and my <a href="http://freblogg.com">other blog</a> as well.</p>
<hr>
<p>If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code but also with usage documentation and also by identifying any bugs in its functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Reduce Image Size With Python And Tinypng2018-01-04T23:11:00+05:302018-01-04T23:11:00+05:30Durga Swaroop Perlatag:freblogg.com,2018-01-04:/resize-compress-images-in-python<p>Whenever I want to upload images with my articles, I make sure they are of the right size first and then I have to check the file sizes and if they are too big, I will have to compress them. For this compression, I use <a href="https://tinypng.com/">Tinypng</a>. They compress your images …</p><p>Whenever I want to upload images with my articles, I make sure they are of the right size first and then I have to check the file sizes and if they are too big, I will have to compress them. For this compression, I use <a href="https://tinypng.com/">Tinypng</a>. They compress your images to a small size all the while keeping the image looking the same. I've tried some other services as well, but TinyPNG is definitely the best as their compression ratio is quite impressive.</p>
<p>In this article I'll show you how I'm planning to automate the image compression process using TinyPNG's <a href="https://tinypng.com/developers">developer API</a>. And of-course we are going to using <a href="https://freblogg.com/tags/python">python</a>.</p>
<h3 id="setup">Setting up</h3>
<p>First of all, you need to have a developer key to connect to TinyPNG and use their services. So, go to <a href="https://tinypng.com/developers">Developer's API</a> and enter your name and email.</p>
<p><img alt="TinyPNG API registration" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/Reduce-Image-Size-With-Python-And-TinyPNG/13.tinypng-email-name-form.png"> </p>
<p>Once you've registered, you'll get a mail from TinyPNG with a link and once you click on that, you'll go to your developers page which also has your API key and your usage information. Do keep it mind that for the free account, you can only compress 500 images per month. For someone like me, that's a number I won't really be reaching in a month anytime soon. But if do, you should probably check out their paid plans.</p>
<p><img alt="Developers API key page" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/Reduce-Image-Size-With-Python-And-TinyPNG/13.tinypng-api-key.png" width="600"> </p>
<p>PS: That's not my real key :D</p>
<h3 id="start">Get started</h3>
<p>Once you've the developer key, you can start compressing images using their service. The full documentation for Python is <a href="https://tinypng.com/developers/reference/python">here</a>.</p>
<p>You start by installing <a href="https://pypi.python.org/pypi/tinify">Tinify</a>, which is TinyPNG's library for compression.</p>
<div class="highlight"><pre><span></span><code>pip install --upgrade tinify
</code></pre></div>
<p>Then we can start using tinify in code by importing it and setting the API key from your developer's page.</p>
<div class="python highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">tinify</span>
<span class="n">tinify</span><span class="o">.</span><span class="n">key</span> <span class="o">=</span> <span class="s1">'API_Key'</span>
</code></pre></div>
<p>If you've to send your requests over a proxy, you can set that as well.</p>
<div class="highlight"><pre><span></span><code><span class="n">tinify</span><span class="p">.</span><span class="n">proxy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"http://user:pass@192.168.0.1:8080"</span><span class="w"></span>
</code></pre></div>
<p>Then, you can start compressing your image files. You can upload either PNG or JPEG files and tinify will compress it for you.</p>
<p>For the purpose of this article, I'm going to use the following <code>delorean.jpeg</code> image.</p>
<p><img alt="Delorean uncompressed" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/Reduce-Image-Size-With-Python-And-TinyPNG/delorean.jpeg"> </p>
<p>And I'll compress this to <code>delorean-compressed.jpeg</code>. For that we'll use the following code:</p>
<div class="highlight"><pre><span></span><code>source = "delorean.jpeg"
destination = "delorean-compressed.jpeg"
original = tinify.from_file(source)
original.to_file(destination)
</code></pre></div>
<p>And that gives me this file:</p>
<p><img alt="Delorean compressed" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/Reduce-Image-Size-With-Python-And-TinyPNG/delorean-compressed.jpeg"> </p>
<p>If they both look the same, then that is the magic of TinyPNG's compression algorithm. It looks pretty much identical but it did compress it. To verify that, let's print the file sizes.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">os.path</span> <span class="k">as</span> <span class="nn">path</span>
<span class="n">original_size</span> <span class="o">=</span> <span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="n">source</span><span class="p">)</span>
<span class="n">compressed_size</span> <span class="o">=</span> <span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="n">destination</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">original_size</span><span class="o">/</span><span class="mi">1024</span><span class="p">,</span> <span class="n">compressed_size</span><span class="o">/</span><span class="mi">1024</span><span class="p">)</span>
</code></pre></div>
<p>And this prints,</p>
<div class="highlight"><pre><span></span><code>29.0029296875 25.3466796875 1.144249662878058
</code></pre></div>
<p>The file was original 29 KB and now after compression it is 25.3 KB which is a fairly good compression for such a small file. If the original file was bigger, you will be able to see an even tighter compression.</p>
<p>And since this is the free version, there's a limit on the number of requests we can make. We can keep track of that with a built-in variable <code>compression_count</code>. You can print that after every requests to make sure you don't go over that.</p>
<div class="highlight"><pre><span></span><code>compressions_this_month = tinify.compression_count
print(compressions_this_month)
</code></pre></div>
<p>You can also compress images from their URL's and store it locally. You will just do:</p>
<div class="highlight"><pre><span></span><code>original = tinify.from_url("https://raw.githubusercontent.com/durgaswaroop/delorean/master/delorean.jpeg")
</code></pre></div>
<p>And then you can store the compressed file locally just like before.</p>
<p>Apart from just compressing the images, you can also resize them with TinyPNG's API. We'll cover that in the tomorrow's article <a href="">here</a>.</p>
<p>So, That is all for this article.</p>
<hr>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a>, <a href="https://freblogg.com/tags/python">Freblogg/Python</a></p>
<p>Some articles on automation:</p>
<p><a href="https://medium.com/@durgaswaroop/web-scraping-with-python-introduction-7b3c0bbb6053">Web Scraping For Beginners with Python</a></p>
<p><a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a></p>
<p><a href="https://medium.com/@durgaswaroop/publish-articles-on-blogger-in-just-one-second-2ef45586901">Publish articles to Blogger automatically</a></p>
<p><a href="https://freblogg.com/publish-articles-to-your-medium-blog">Publish articles to Medium automatically</a></p>
<hr>
<p>This is the 13th article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Seventeen more articles on various topics, including but not limited to, <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. </p>
<hr>
<p>If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Datasets In Apache Spark | Part 22018-01-02T18:43:00+05:302018-01-02T18:43:00+05:30Durga Swaroop Perlatag:freblogg.com,2018-01-02:/apache-spark-datasets-2<p>In the two last tutorials we have covered <a href="https://freblogg.com/spark-word-count-with-java">what Apache Spark</a> is and also got ourselves familiar with <a href="https://freblogg.com/apache-spark-datasets-1">Datasets in Apache Spark</a>, which is the primary data abstraction in Spark. In this tutorial we will see how to read a data file as a parametrized Bean object Dataset using Encoders …</p><p>In the two last tutorials we have covered <a href="https://freblogg.com/spark-word-count-with-java">what Apache Spark</a> is and also got ourselves familiar with <a href="https://freblogg.com/apache-spark-datasets-1">Datasets in Apache Spark</a>, which is the primary data abstraction in Spark. In this tutorial we will see how to read a data file as a parametrized Bean object Dataset using Encoders.</p>
<p><img alt="Spark Image Logo" class="aligncenter" src="https://redislabs.com/wp-content/uploads/2016/12/spark.png" width="450"> </p>
<p>This tutorial is going to be short, but this is very important as you would find yourself doing this frequently. In the last article you've seen how to read a CSV or JSON file as a <code>Dataset</code>. You might have noticed that we were using <code>Dataset<Row></code> for everything. If you're not familiar with <a href="https://docs.oracle.com/javase/tutorial/java/generics/why.html">Generics</a> in Java, <code>Dataset<Row></code> can be thought of as a Dataset consisting of Row objects. The Row object is a spark sql class and is the default when creating a Dataset.</p>
<p>Although the <code>Row</code> class has some useful methods, as a generic object suitable for all types, it is not suitable for everything. Since Datasets usually store data that usually corresponds to a Bean class, it is better to create a Dataset of that bean class instead of Row. With this, you'll have access to all your usual getters and setters of the bean class. That's what We'll do in this article. We'll create a Dataset of POJO's instead of Row objects.</p>
<p>I'm using the same <code>fake-people.csv</code> file that I used in the last article that looks like this:</p>
<div class="highlight"><pre><span></span><code>id,first_name,last_name,email,gender,ip_address
1,Netti,McKirdy,nmckirdy0@slideshare.net,Female,148.3.248.193
2,Nickey,Curreen,ncurreen1@tripadvisor.com,Male,206.9.48.216
3,Allayne,Chatainier,achatainier2@trellian.com,Male,191.118.4.217
...
</code></pre></div>
<p>To represent this data, I've created a POJO called <code>FakePeople.java</code>, which looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">lombok.Data</span><span class="p">;</span>
<span class="n">public</span> <span class="nd">@Data</span> <span class="k">class</span> <span class="nc">FakePeople</span> <span class="p">{</span>
<span class="n">final</span> <span class="nb">int</span> <span class="nb">id</span><span class="p">;</span>
<span class="n">final</span> <span class="n">private</span> <span class="n">String</span> <span class="n">firstName</span><span class="p">;</span>
<span class="n">final</span> <span class="n">private</span> <span class="n">String</span> <span class="n">lastName</span><span class="p">;</span>
<span class="n">final</span> <span class="n">private</span> <span class="n">String</span> <span class="n">email</span><span class="p">;</span>
<span class="n">final</span> <span class="n">private</span> <span class="n">String</span> <span class="n">gender</span><span class="p">;</span>
<span class="n">final</span> <span class="n">private</span> <span class="n">String</span> <span class="n">ipAddress</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>I'm using <a href="https://projectlombok.org/">Project Lombok</a> here, to generate the required Getters, Setters and other POJO methods. (If you don't know about Lombok, you should definitely check that out. It is quite handy).</p>
<p>We have our POJO now, Let's get a parametrized Dataset. To achieve this we first need to create an <code>Encoder</code>. We do that for the <code>FakePeople</code> class as following:</p>
<div class="highlight"><pre><span></span><code>Encoder<FakePeople> fakePeopleEncoder = Encoders.bean(FakePeople.class);
</code></pre></div>
<p>This will register our encoder which will help us parse our CSV data.</p>
<p>Of course we need our spark session variable as well.</p>
<div class="highlight"><pre><span></span><code>// Initialize Sparksession
SparkSession spark = SparkSession.builder().appName("Freblogg-Spark").master("local").getOrCreate();
</code></pre></div>
<p>Now we can go ahead and read the CSV file, very much like the way we did before with just one addition.</p>
<div class="highlight"><pre><span></span><code>// Without Encoder
Dataset<Row> people = spark.read().option("header", "true").csv("fake-people.csv");
// With Encoder
Dataset<FakePeople> people = spark.read().option("header", "true").csv("fake-people.csv").as(fakePeopleEncoder);
</code></pre></div>
<p>And the output of <code>people.show(5)</code> is the same as what you'd expect.</p>
<div class="highlight"><pre><span></span><code>+---+----------+----------+--------------------+------+--------------+
| id|first_name| last_name| email|gender| ip_address|
+---+----------+----------+--------------------+------+--------------+
| 1| Netti| McKirdy|nmckirdy0@slidesh...|Female| 148.3.248.193|
| 2| Nickey| Curreen|ncurreen1@tripadv...| Male| 206.9.48.216|
| 3| Allayne|Chatainier|achatainier2@trel...| Male| 191.118.4.217|
| 4| Tades| Emmett|temmett3@barnesan...| Male|153.113.87.195|
| 5| Shawn| McGenn|smcgenn4@shop-pro.jp| Male| 247.45.80.68|
+---+----------+----------+--------------------+------+--------------+
</code></pre></div>
<p>As you can see the only difference in creating the Dataset is <code>.as(fakePeopleEncoder)</code> and that gets us <code>Dataset<FakePeople></code> instead of <code>Dataset<Row></code>. And with that, we now have access to all the getters, setters of <code>FakePeople</code> class which we wouldn't otherwise have with a <code>Row</code> object. We'll explore more about how this is useful in a future tutorial.</p>
<p>For more information on Datasets: <a href="https://spark.apache.org/docs/latest/sql-programming-guide.html">Spark SQL, DataFrames and Datasets Guide</a></p>
<p>That is all for this article.</p>
<hr>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a>, <a href="https://freblogg.com/tags/java">Freblogg/Java</a>, <a href="https://freblogg.com/tags/apache20spark">Freblogg/Spark</a></p>
<p>Apache Spark articles:</p>
<p><a href="https://freblogg.com/spark-word-count-with-java">Word count with Apache Spark and Java</a></p>
<p><a href="https://freblogg.com/apache-spark-datasets-1">Datasets in Apache Spark | Part 1</a></p>
<p><a href="https://freblogg.com/apache-spark-datasets-2">Datasets in Apache Spark | Part 2</a></p>
<hr>
<p>This is the 11th article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#31DaysOfBlogging</a>. Nineteen more articles on various topics, including but not limited to, <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. </p>
<hr>
<p>If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>My (Almost) Fully Automated Blogging Workflow2017-12-31T18:07:00+05:302017-12-31T18:07:00+05:30Durga Swaroop Perlatag:freblogg.com,2017-12-31:/my-automated-blogging-workflow<p>In the article <a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a>, I have outlined what my blogging process is like and how I've started to automate it. Ofcourse, at the time of that article, the process was still in early stages and I hadn't automated everything I do. And, that's where this …</p><p>In the article <a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a>, I have outlined what my blogging process is like and how I've started to automate it. Ofcourse, at the time of that article, the process was still in early stages and I hadn't automated everything I do. And, that's where this article comes in. This is the second attempt at automating my entire Blogging workflow.</p>
<p><img alt="Medium blogger python logo" class="aligncenter" src="https://cdn-images-1.medium.com/max/800/1*dUMvQW8ynuO4qw2ceGF0BA.png"></p>
<p>Just to give you some context, here are the things that I do when I'm blogging.</p>
<ol>
<li>Open a markdown file in Vim with the title of the article as the name along with some template text</li>
<li>Open a browser with the html of the newly created markdown file</li>
<li>Convert markdown to html with pandoc several times during the writing process</li>
<li>Once the article is done and html is produced, edit the html to make some changes specific based on whether I'm publishing on <a href="https://medium.com/@durgaswaroop/">Medium</a> or if I'm publishing on <a href="http://freblogg.com">Blogger</a></li>
<li>Read the tags/labels and other attributes from the file and Publish the code as draft on Medium or Blogger.</li>
<li>Once it looks good, Schedule or Publish it (This is a manual process. There's no denying it.)</li>
<li>Finally tweet about the post with the link to the article</li>
</ol>
<p>I have the individual pieces of this process ready. I have already written about them in the following articles.</p>
<p><a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">Semi Automated Blogging Workflow</a></p>
<p><a href="https://medium.com/@durgaswaroop/publish-articles-on-blogger-in-just-one-second-2ef45586901">Publish Articles To Blogger In One Second</a></p>
<p><a href="https://freblogg.com/publish-articles-to-your-medium-blog">Publish Articles To Medium In One Second</a></p>
<p><a href="https://freblogg.com/tweeting-with-python-and-tweepy">Tweeting With Python & Tweepy</a></p>
<p>Now, since the individual pieces are ready, it might seem that everything is done. But, as it turns out (unsurprisingly), the integration is of-course a big deal and took a lot more effort than I was expecting. And I am documenting that in this article along with the complete flow.</p>
<p>It starts with the script <code>blog-it</code> which opens vim for me, opens chrome and also sets up a process for converting markdown to html, continuously.</p>
<script src="https://gist.github.com/durgaswaroop/8ed9a5a55b8629f2180880665866f30e.js"></script>
<p>That script calls <code>blog.py</code> which is what opens the vim along with the default text template. I would like to put the <a href="https://gist.github.com/durgaswaroop/78c51da2d74944d9e5a936cd18733f85">complete gist</a> here, but it is just too long and so instead I'm showing the meat of the script.</p>
<div class="highlight"><pre><span></span><code>article_title = title.replace("_", " ").title()
# Create the markdown file and add the title
f = open(md_file, "w+")
f.write(generate_comments_header(article_title))
f.write(article_title) # Replace underscores and title case it
f.write("\n")
f.write("-" * len(title))
f.write("\n")
f.write(generate_footer_text())
f.close()
# Now, create the html file
html_file = title + ".html"
open(html_file, "w").close()
# Start vim with the markdown file open on line #10
subprocess.run(['C:/Program Files (x86)/Vim/vim80/gvim.exe', '+10', md_file])
</code></pre></div>
<p>Then comes <code>m2h</code> which continuously converts markdown to html.</p>
<script src="https://gist.github.com/durgaswaroop/356c3aac4f8ce8f89501693b4d9bcb27.js"></script>
<p>This ends one flow. Next comes, publishing. I have broken this down because publishing is a manual process for me unless I can complete the entire article in one sitting, which is never going to be possible. So, Once I'm doing with writing it, I'll start the publishing.</p>
<p>I'll run <code>publish.py</code> which depending on the comments in the html publishes it to either <code>Blogger</code> or <code>Medium</code>. Again, I'm only showing a part of it. The full gist is available <a href="https://gist.github.com/durgaswaroop/4a81aabeca3bd91cccb0ceb9bda31663">here</a>.</p>
<div class="highlight"><pre><span></span><code><span class="n">with</span> <span class="n">open</span><span class="p">(</span><span class="n">html_file</span><span class="p">)</span> <span class="kr">as</span> <span class="n">file</span><span class="o">:</span>
<span class="n">html_file_contents</span> <span class="o">=</span> <span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">re_comments</span> <span class="o">=</span> <span class="n">re</span><span class="p">.</span><span class="n">compile</span><span class="p">(</span><span class="s">'\s*<!--(.*)-->'</span><span class="p">,</span> <span class="n">re</span><span class="p">.</span><span class="n">DOTALL</span><span class="p">)</span>
<span class="n">comments_text</span> <span class="o">=</span> <span class="n">re_comments</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="n">html_file_contents</span><span class="p">).</span><span class="kr">group</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="n">strip</span><span class="p">()</span>
<span class="n">comments_parser</span> <span class="o">=</span> <span class="n">CommentParser</span><span class="p">.</span><span class="n">parse_comments</span><span class="p">(</span><span class="n">comments_text</span><span class="p">)</span>
<span class="nf">if</span> <span class="n">comments_parser</span><span class="p">.</span><span class="n">destination</span><span class="p">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s">'blogger'</span><span class="o">:</span>
<span class="n">blogger_publish</span><span class="p">.</span><span class="n">publish</span><span class="p">(</span><span class="n">html_file</span><span class="p">,</span> <span class="n">comments_parser</span><span class="p">.</span><span class="n">title</span><span class="p">,</span> <span class="n">comments_parser</span><span class="p">.</span><span class="n">labels</span><span class="p">,</span> <span class="n">comments_parser</span><span class="p">.</span><span class="n">post_id</span><span class="p">)</span>
<span class="n">elif</span> <span class="n">comments_parser</span><span class="p">.</span><span class="n">destination</span><span class="p">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="s">'medium'</span><span class="o">:</span>
<span class="n">medium_publish</span><span class="p">.</span><span class="n">publish</span><span class="p">(</span><span class="n">html_file</span><span class="p">,</span> <span class="n">comments_parser</span><span class="p">.</span><span class="n">title</span><span class="p">,</span> <span class="n">comments_parser</span><span class="p">.</span><span class="n">labels</span><span class="p">)</span>
<span class="n">else</span><span class="o">:</span>
<span class="n">print</span><span class="p">(</span>
<span class="s">'Unknown destination: '</span> <span class="o">+</span> <span class="n">comments_parser</span><span class="p">.</span><span class="n">destination</span> <span class="o">+</span> <span class="s">'. Supported destinations are Blogger and Medium.'</span><span class="p">)</span>
</code></pre></div>
<p>Then comes the individual publishing scripts that publish to blogger and medium.</p>
<p>For <code>blogger-publish.py</code> (Gist <a href="https://gist.github.com/durgaswaroop/20bef02450137907d01f794ba99b965c">here</a>), I do any required modifications with <code>blogger_modifications.py</code> (Gist <a href="https://gist.github.com/durgaswaroop/dd9dcb1e592751c1aced5f0f42aeedc1">here</a>) which converts some tags as expected my blogger page.</p>
<p>Then for <code>medium-publish.py</code> (Gist <a href="https://gist.github.com/durgaswaroop/f6fbcc910ddcc5b3fa7a0c1cdbd57401">here</a>), I take the parameters and publish to blogger as html. No, modifications needed to be done here.</p>
<div class="highlight"><pre><span></span><code><span class="n">access_token_file</span> <span class="o">=</span> <span class="s1">'~/.medium-access-token'</span>
<span class="n">expanded_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">expanduser</span><span class="p">(</span><span class="n">access_token_file</span><span class="p">)</span>
<span class="n">with</span> <span class="n">open</span><span class="p">(</span><span class="n">expanded_path</span><span class="p">)</span> <span class="k">as</span> <span class="n">file</span><span class="p">:</span>
<span class="n">access_token</span> <span class="o">=</span> <span class="n">file</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="n">headers</span> <span class="o">=</span> <span class="n">get_headers</span><span class="p">(</span><span class="n">access_token</span><span class="p">)</span>
<span class="n">user_url</span> <span class="o">=</span> <span class="n">get_user_url</span><span class="p">(</span><span class="n">headers</span><span class="p">)</span>
<span class="c1"># Publish new post</span>
<span class="n">posts_url</span> <span class="o">=</span> <span class="n">user_url</span> <span class="o">+</span> <span class="s1">'posts/'</span>
<span class="n">payload</span> <span class="o">=</span> <span class="n">generate_payload</span><span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">html_file</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">request</span><span class="p">(</span><span class="s1">'POST'</span><span class="p">,</span> <span class="n">posts_url</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span>
</code></pre></div>
<p>Actually this publishing does send it to the site as a draft instead of actually publishing it. This is a step that I don't know how to automate as I have to manually take a look at how the article looks in preview. May be I should try doing this with selenium or something like that.</p>
<p>Once, I've verified that the post looks good, I will publish it and take the URL of the published article and call the <code>tweeter.py</code> (Gist <a href="https://gist.github.com/durgaswaroop/2169cd8c89cc6a9d89134a414bb49b9c">here</a>) which then opens a Vim file with some default text for title, and URL already filled in along with some hashtags. I'll complete the tweet and once, I close it, It gets published on Twitter.</p>
<p>And that completes the process. Obviously there are still a couple of manual steps. Although I can't eliminate all of them, I might be able to minimize them as well. But, so far it looks pretty good especially with just the little effort I've put into this in just one week. Of course, I'll keep on tuning it as needed to make it even better and may be I'll publish one final article for that.</p>
<p>That is all for this article.</p>
<hr>
<p>For more programming articles, checkout <a href="http://freblogg.com">Freblogg</a>, <a href="https://freblogg.com/tags/python">Freblogg/Python</a></p>
<p>Some articles on automation:</p>
<p><a href="https://medium.com/@durgaswaroop/web-scraping-with-python-introduction-7b3c0bbb6053">Web Scraping For Beginners with Python</a>
<a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a>
<a href="https://medium.com/@durgaswaroop/publish-articles-on-blogger-in-just-one-second-2ef45586901">Publish articles to Blogger automatically</a>
<a href="https://freblogg.com/publish-articles-to-your-medium-blog">Publish articles to Medium automatically</a></p>
<hr>
<p>This is the 9th article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Twenty one more articles on various topics, including but not limited to, <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. </p>
<hr>
<p>If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Publish Articles To Medium In One Second2017-12-29T19:57:00+05:302017-12-29T19:57:00+05:30Durga Swaroop Perlatag:freblogg.com,2017-12-29:/publish-articles-to-your-medium-blog<p>In my article <a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a>, I have talked about my blogging workflow. There were two main things (actually one thing) in that flow that were not automated. i.e., automatically Uploading to Blogger and automatically Uploading to Medium. I have talked about the first one <a href="https://medium.com/@durgaswaroop/publish-articles-on-blogger-in-just-one-second-2ef45586901">here …</a></p><p>In my article <a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a>, I have talked about my blogging workflow. There were two main things (actually one thing) in that flow that were not automated. i.e., automatically Uploading to Blogger and automatically Uploading to Medium. I have talked about the first one <a href="https://medium.com/@durgaswaroop/publish-articles-on-blogger-in-just-one-second-2ef45586901">here</a>. This article is about uploading posts to Medium automatically.</p>
<p><img alt="Medium Logo" class="aligncenter" src="https://cdn.hashnode.com/res/hashnode/image/upload/w_400,h_300,c_thumb/z6odfvngwx1gp60murhe/1473332149.png"> </p>
<p>Developer documentation for Medium is a breath of fresh air after the mess that is Google API’s. Of course, Google API’s are complex because they have so many different services, but they could’ve done a better job at organizing all that stuff. Anyway, Let’s see how you can use Medium API’s.</p>
<h3 id="setting-up">Setting Up</h3>
<p>We don’t really need any specific dependencies for what we’re doing in this article. You can do everything with <code>urllib</code> which is already part of the python standard library. I’ll be using <code>requests</code> as well to make it a bit more simpler but you can achieve the same without it.</p>
<h3 id="getting-the-access-token">Getting the access token</h3>
<p>To authenticate yourself with Medium, you need to get an access token that you’ll pass along to every request. There are two ways to get that token.</p>
<ol>
<li>Browser-based authentication</li>
<li>Self-issues access tokens</li>
</ol>
<p>Which one you should go with, depends on what kind of application you’re trying to build. As you can probably guess based on the title, we’ll be covering the second method in this article. The first method needs an authentication server setup which can accept callback from Medium. But, since at this moment, I don’t have that setup, I’m going with the second option.</p>
<p>The Self-issued access tokens method is quite easy to work with as you directly take the <code>access token</code> without having to have the user authenticate via the browser.</p>
<p>To get the access token, Go to <a href="https://medium.com/me/settings">Profile Settings</a> and scroll down till you see <code>Integration tokens</code> section.</p>
<p><img alt="Medium Integration tokens" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/7.Medium-integration-tokens-section.png"> </p>
<p>There enter some description for what you’re going to use this token and click on <code>Get integration token</code>. Copy that generated token which looks something like <code>181d415f34379af07b2c11d144dfbe35d</code> and save it some where to be used in your program.</p>
<h3 id="using-access-token-to-access-medium">Using Access token to access Medium</h3>
<p>Once you have the access token, you’ll use that token as your password and send it along with every request to get the required data.</p>
<p>Let’s get started then. As, I’ve said we’ll be using <code>requests</code> library for url connections. We’ll also be using the <code>json</code> libary for parsing the responses. So, Let’s import them.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">json</span>
</code></pre></div>
<p>Then use <code>access_token</code> you’ve got and put it in a <code>headers</code> dictionary.</p>
<div class="highlight"><pre><span></span><code>access_token = '181d415f34379af07b2c11d144dfbe35d'
headers = {
'Authorization': "Bearer " + access_token,
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
}
</code></pre></div>
<p>The <code>User-Agent</code> in the above dictionary is required as Medium won’t accept your request otherwise. You don’t have to have the same value as I did.</p>
<h4 id="validating-the-access-token">Validating the access token</h4>
<p>First thing to check is if the access_token is valid. You can do that by making a <code>GET</code> request to <code>https://api.medium.com/v1/me</code> and checking the response.</p>
<div class="highlight"><pre><span></span><code><span class="n">me_url</span> <span class="o">=</span> <span class="n">base_url</span> <span class="o">+</span> <span class="s1">'me'</span>
<span class="n">me_req</span> <span class="o">=</span> <span class="n">ureq</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span><span class="n">me_url</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span>
<span class="n">me_response</span> <span class="o">=</span> <span class="n">ureq</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">me_req</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">json_me_response</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">me_response</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">json_me_response</span><span class="p">)</span>
</code></pre></div>
<p>And, when I print the <code>json_me_response</code>, which is a json object, I get the following:</p>
<div class="highlight"><pre><span></span><code><span class="err">{</span><span class="w"></span>
<span class="ss">"data"</span><span class="err">:</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="ss">"id"</span><span class="err">:</span><span class="ss">"5303d74c64f66366f00cb9b2a94f3251bf5adskak7623as"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="ss">"username"</span><span class="err">:</span><span class="ss">"durgaswaroop"</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="ss">"name"</span><span class="err">:</span><span class="ss">"Durga swaroop Perla"</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="ss">"url"</span><span class="err">:</span><span class="ss">"https://medium.com/@durgaswaroop"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="ss">"imageUrl"</span><span class="err">:</span><span class="ss">"https://cdn-images-1.medium.com/fit/c/400/400/0*qVDXEHT9DDYUOcrj."</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="err">}</span><span class="w"></span>
</code></pre></div>
<p>If we got that response like above, then we know that the access token we have is valid.</p>
<p>From there, I extract, the <code>user_id</code> from the JSON string, with</p>
<div class="highlight"><pre><span></span><code>user_id = json_me_response['data']['id']
</code></pre></div>
<h4 id="get-users-publications">Get User’s Publications</h4>
<p>From the above request, we’ve validated that the access token is correct and we also have got the <code>user_id</code>. Using that we can get access to the publications of a user. For that, we’ve to make a <code>GET</code> to <code>https://api.medium.com/v1/users/{{userId}}/publications</code> and you’ll see the list of the publications by that user.</p>
<div class="highlight"><pre><span></span><code>user_url = base_url + 'users/' + user_id
publications_url = user_url + 'publications/'
publications_req = ureq.Request(publications_url, headers=headers)
publications_response = ureq.urlopen(publications_req).read()
print(publications_response)
</code></pre></div>
<p>I don’t have any publications on my medium account, and so I got an empty array as response. But, if you have some publications, the response will be something like this.</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="s">"data"</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="s">"id"</span><span class="o">:</span><span class="w"> </span><span class="s">"b969ac62a46b"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"name"</span><span class="o">:</span><span class="w"> </span><span class="s">"About Medium"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"description"</span><span class="o">:</span><span class="w"> </span><span class="s">"What is this thing and how does it work?"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"url"</span><span class="o">:</span><span class="w"> </span><span class="s">"https://medium.com/about"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"imageUrl"</span><span class="o">:</span><span class="w"> </span><span class="s">"https://cdn-images-1.medium.com/fit/c/200/200/0*ae1jbP_od0W6EulE.jpeg"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="s">"id"</span><span class="o">:</span><span class="w"> </span><span class="s">"b45573563f5a"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"name"</span><span class="o">:</span><span class="w"> </span><span class="s">"Developers"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"description"</span><span class="o">:</span><span class="w"> </span><span class="s">"Medium’s Developer resources"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"url"</span><span class="o">:</span><span class="w"> </span><span class="s">"https://medium.com/developers"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="s">"imageUrl"</span><span class="o">:</span><span class="w"> </span><span class="s">"https://cdn-images-1.medium.com/fit/c/200/200/1*ccokMT4VXmDDO1EoQQHkzg@2x.png"</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">]</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Now, one weird thing about Medium’s API is that they don’t have a <code>GET</code> for posts. From the API’s we can get a list of all the publications but you can’t get a user’s posts. You can only publish a new post. Although, it is odd for that to be missing, It is not something I’m looking for anyway, as I am only interested in publishing an article. But if you need that, you probably should check to see if there are any hacky ways of achieving the same (at your own volition).</p>
<h4 id="create-a-new-post">Create a New Post</h4>
<p>To create a new post, we have to make a <code>POST</code> request to <code>https://api.medium.com/v1/users/{{authorId}}/posts</code>. The <code>authorId</code> here would be the same as the <code>userId</code> of the user whose access-token you have.</p>
<p>I’m using <code>requests</code> library for this as making a <code>POST</code> request becomes easy with it. Of course, first you need to create a payload to be uploaded. The payload should look something like the following, as described <a href="https://github.com/Medium/medium-api-docs#33-posts">here</a></p>
<div class="highlight"><pre><span></span><code> {
"title": "Liverpool FC",
"contentFormat": "html",
"content": "<span class="nt"><h1></span>Liverpool FC<span class="nt"></h1><p></span>You’ll never walk alone.<span class="nt"></p></span>",
"tags": ["football", "sport", "Liverpool"],
"publishStatus": "public"
}
</code></pre></div>
<p>So, for this, I did the following:</p>
<div class="highlight"><pre><span></span><code><span class="n">posts_url</span> <span class="o">=</span> <span class="n">user_url</span> <span class="o">+</span> <span class="s1">'posts/'</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'Medium Test Post'</span><span class="p">,</span>
<span class="s1">'contentFormat'</span><span class="p">:</span> <span class="s1">'markdown'</span><span class="p">,</span>
<span class="s1">'tags'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'medium'</span><span class="p">,</span> <span class="s1">'test'</span><span class="p">,</span> <span class="s1">'python'</span><span class="p">],</span>
<span class="s1">'publishStatus'</span><span class="p">:</span> <span class="s1">'draft'</span><span class="p">,</span>
<span class="s1">'content'</span><span class="p">:</span> <span class="n">open</span><span class="p">(</span><span class="s1">'7.Test_post.md'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="p">}</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">request</span><span class="p">(</span><span class="s1">'POST'</span><span class="p">,</span> <span class="n">posts_url</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">payload</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">response</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
</code></pre></div>
<p>As you see, for <code>contentFormat</code>, I’ve set <code>markdown</code> and for <code>content</code> I read it straight from the file. I didn’t want to publish this as it is just a dummy post and so I’ve set the <code>publishStatus</code> to <code>draft</code>. And sure enough, it works as expected and I can see this draft added on my account.</p>
<p><img alt="Draft post" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/7.Medium-draft-posts.png"> </p>
<p>Do note that the <code>title</code> in the payload object won’t actually be the title of the article. If you want to have a title, you add it in the <code>content</code> itself as a <code><h*></code> tag.</p>
<p>The full code is available as a gist.</p>
<script src="https://gist.github.com/durgaswaroop/a0c5e1f772ec231d2254db43e2b26b93.js"></script>
<p>That is all for this article.</p>
<hr>
<p>For more programming and Python articles, checkout <a href="http://freblogg.com">Freblogg</a> and <a href="https://freblogg.com/tags/python">Freblogg/Python</a></p>
<p>Some articles on automation:</p>
<p><a href="https://medium.com/@durgaswaroop/web-scraping-with-python-introduction-7b3c0bbb6053">Web Scraping For Beginners with Python</a></p>
<p><a href="https://medium.com/@durgaswaroop/my-semi-automated-blogging-workflow-62cba2827986">My semi automated workflow for blogging</a></p>
<hr>
<p>This is the seventh article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Twenty-three more articles on various topics including but not limited to <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. While you’re at it, Go ahead and subscribe to this blog and my <a href="https://medium.com/@durgaswaroop/">blog on Medium</a> as well.</p>
<hr>
<p>If you are interested in contributing to any open source projects and haven’t found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in the functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Datasets in Apache Spark | Part 12017-12-27T20:00:00+05:302017-12-27T20:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2017-12-27:/apache-spark-datasets-1<p>In my <a href="https://freblogg.com/spark-word-count-with-java">previous post</a> I have talked about Apache Spark. We have also built an application for counting the number of words in a file, which is the hello world equivalent of the big data world.</p>
<p><img alt="Apache spark Java Logo’s" src="https://redislabs.com/wp-content/uploads/2016/12/spark.png"></p>
<p>It has been over 18 months since that article and spark has changed quite …</p><p>In my <a href="https://freblogg.com/spark-word-count-with-java">previous post</a> I have talked about Apache Spark. We have also built an application for counting the number of words in a file, which is the hello world equivalent of the big data world.</p>
<p><img alt="Apache spark Java Logo’s" src="https://redislabs.com/wp-content/uploads/2016/12/spark.png"></p>
<p>It has been over 18 months since that article and spark has changed quite a lot in this time. A new major release of spark, which is spark-2.0 came out and now the latest version is <a href="https://spark.apache.org/news/spark-2-2-1-released.html">2.2.1</a> And with a new version comes new API’s and improvements. In-fact the first thing you’ll probably notice is that, you don’t need to create <code>SparkContext</code> or <code>JavaSparkContext</code> objects anymore. The various context and configurations have been put together into a new class <code>SparkSession</code>. You can still access the <code>SparkContext</code> or the <code>SqlContext</code> from the <code>SparkSession</code> object itself. So, you’ll be starting your programs with this now:</p>
<div class="highlight"><pre><span></span><code><span class="nc">SparkSession</span> <span class="n">spark</span> <span class="o">=</span> <span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">appName</span><span class="p">(</span><span class="s">"Freblogg-Spark"</span><span class="p">).</span><span class="n">master</span><span class="p">(</span><span class="s">"local"</span><span class="p">).</span><span class="n">getOrCreate</span><span class="p">();</span>
</code></pre></div>
<p>And you can use this <code>spark</code> variable the way you’d use other context variables.</p>
<p>Another change in Spark 2.0 is that, there is a heavy emphasis on the usage of <code>Dataset</code> API’s, and for a good reason. <em>Datasets are more performant and memory efficient than RDD’s</em>. RDD (Resilient Distributed Datasets) have been pushed to second place now. You can still use RDD’s if you want but Datasets are the preferred API. In fact, datasets have some nice convenience methods that we can use them for even unstructured data like text as well. Let’s generate some cool lipsum from <a href="http://www.malevole.com/mv/misc/text/">Malevole</a>. It looks something like this:</p>
<div class="highlight"><pre><span></span><code><span class="nv">Ulysses</span>, <span class="nv">Ulysses</span> <span class="o">-</span> <span class="nv">Soaring</span> <span class="nv">through</span> <span class="nv">all</span> <span class="nv">the</span> <span class="nv">galaxies</span>. <span class="nv">In</span> <span class="nv">search</span> <span class="nv">of</span> <span class="nv">Earth</span>, <span class="nv">flying</span> <span class="nv">in</span> <span class="nv">to</span> <span class="nv">the</span> <span class="nv">night</span>. <span class="nv">Ulysses</span>, <span class="nv">Ulysses</span> <span class="o">-</span> <span class="nv">Fighting</span> <span class="nv">evil</span>
<span class="nv">and</span> <span class="nv">tyranny</span>, <span class="nv">with</span> <span class="nv">all</span> <span class="nv">his</span> <span class="nv">power</span>, <span class="nv">and</span> <span class="nv">with</span> <span class="nv">all</span> <span class="nv">of</span> <span class="nv">his</span> <span class="nv">might</span>. <span class="nv">Ulysses</span> <span class="o">-</span> <span class="nv">no</span><span class="o">-</span><span class="nv">one</span> <span class="k">else</span> <span class="nv">can</span> <span class="k">do</span> <span class="nv">the</span> <span class="nv">things</span> <span class="nv">you</span> <span class="k">do</span>. <span class="nv">Ulysses</span> <span class="o">-</span> <span class="nv">like</span> <span class="nv">a</span> <span class="nv">bolt</span> <span class="nv">of</span>
<span class="nv">thunder</span> <span class="nv">from</span> <span class="nv">the</span> <span class="nv">blue</span>. <span class="nv">Ulysses</span> <span class="o">-</span> <span class="nv">always</span> <span class="nv">fighting</span> <span class="nv">all</span> <span class="nv">the</span> <span class="nv">evil</span> <span class="nv">forces</span> <span class="nv">bringing</span> <span class="nv">peace</span> <span class="nv">and</span> <span class="nv">justice</span> <span class="nv">to</span> <span class="nv">all</span>....
</code></pre></div>
<p>Now, you might try to use an RDD to read this, but let’s see what we can do with Datasets.</p>
<div class="highlight"><pre><span></span><code><span class="nc">Dataset</span><span class="o"><</span><span class="nc">String</span><span class="o">></span> <span class="n">lipsumDs</span> <span class="o">=</span> <span class="n">spark</span><span class="p">.</span><span class="n">read</span><span class="p">().</span><span class="n">textFile</span><span class="p">(</span><span class="s">"fake-text.txt"</span><span class="p">);</span>
<span class="n">lipsumDs</span><span class="p">.</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
</code></pre></div>
<p>Here we are reading the text file using the <code>spark</code> object we created earlier and that gives us a <code>Dataset<String> lipsumDs</code>. The <code>show()</code> method on the dataset object prints the dataset. And we get the following output:</p>
<div class="highlight"><pre><span></span><code><span class="nb">+--------------------+</span><span class="c"></span>
<span class="c">| value|</span>
<span class="nb">+--------------------+</span><span class="c"></span>
<span class="c">|Ulysses</span><span class="nt">,</span><span class="c"> Ulysses </span><span class="nt">...</span><span class="c">|</span>
<span class="c">|Ulysses</span><span class="nt">,</span><span class="c"> Ulysses </span><span class="nt">...</span><span class="c">|</span>
<span class="c">| no</span><span class="nb">-</span><span class="c">one else can </span><span class="nt">...</span><span class="c">|</span>
<span class="c">| always fighting</span><span class="nt">...</span><span class="c">|</span>
<span class="c">| |</span>
<span class="nb">+--------------------+</span><span class="c"></span>
</code></pre></div>
<p>What we see here are the lines of the text file. Each line in the file is now a row in the Dataset. There are now a rich set of functions available to you in Datasets which weren’t in RDD’s. You can do filters on the rows for certain words, do a count on the table, perform <code>groupBy</code> operations, etc. all like you would on a Database table. For a full list of all the available operations on Dataset, read this: <a href="https://spark.apache.org/docs/latest/api/java/index.html">Dataset: Spark Documentation</a>.</p>
<p>I hope that’s enough talk about unstructured data analysis. Let’s get to the main focus of this article, which is using Datasets for structured data. More specifically, <code>csv</code> and <code>JSON</code>. For this tutorial, I am using the data created from <a href="http://www.mockaroo.com/">Mockaroo</a>, an online data generator. I’ve created 1000 csv records that look like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">id</span><span class="p">,</span><span class="n">first_name</span><span class="p">,</span><span class="n">last_name</span><span class="p">,</span><span class="n">email</span><span class="p">,</span><span class="n">gender</span><span class="p">,</span><span class="n">ip_address</span><span class="w"></span>
<span class="mi">1</span><span class="p">,</span><span class="n">Netti</span><span class="p">,</span><span class="n">McKirdy</span><span class="p">,</span><span class="n">nmckirdy0</span><span class="nv">@slideshare</span><span class="p">.</span><span class="n">net</span><span class="p">,</span><span class="n">Female</span><span class="p">,</span><span class="mf">148.3.248.193</span><span class="w"></span>
<span class="mi">2</span><span class="p">,</span><span class="n">Nickey</span><span class="p">,</span><span class="n">Curreen</span><span class="p">,</span><span class="n">ncurreen1</span><span class="nv">@tripadvisor</span><span class="p">.</span><span class="n">com</span><span class="p">,</span><span class="n">Male</span><span class="p">,</span><span class="mf">206.9.48.216</span><span class="w"></span>
<span class="mi">3</span><span class="p">,</span><span class="n">Allayne</span><span class="p">,</span><span class="n">Chatainier</span><span class="p">,</span><span class="n">achatainier2</span><span class="nv">@trellian</span><span class="p">.</span><span class="n">com</span><span class="p">,</span><span class="n">Male</span><span class="p">,</span><span class="mf">191.118.4.217</span><span class="w"></span>
<span class="mi">4</span><span class="p">,</span><span class="n">Tades</span><span class="p">,</span><span class="n">Emmett</span><span class="p">,</span><span class="n">temmett3</span><span class="nv">@barnesandnoble</span><span class="p">.</span><span class="n">com</span><span class="p">,</span><span class="n">Male</span><span class="p">,</span><span class="mf">153.113.87.195</span><span class="w"></span>
<span class="mi">5</span><span class="p">,</span><span class="n">Shawn</span><span class="p">,</span><span class="n">McGenn</span><span class="p">,</span><span class="n">smcgenn4</span><span class="nv">@shop</span><span class="o">-</span><span class="n">pro</span><span class="p">.</span><span class="n">jp</span><span class="p">,</span><span class="n">Male</span><span class="p">,</span><span class="mf">247.45.80.68</span><span class="w"></span>
<span class="mi">6</span><span class="p">,</span><span class="n">Giuseppe</span><span class="p">,</span><span class="n">Scobbie</span><span class="p">,</span><span class="n">gscobbie5</span><span class="nv">@twitter</span><span class="p">.</span><span class="n">com</span><span class="p">,</span><span class="n">Male</span><span class="p">,</span><span class="mf">123.114.131.200</span><span class="w"></span>
<span class="p">...</span><span class="w"></span>
</code></pre></div>
<p>We’ll use this data, which I’ve put in a file named <code>fake-people.csv</code>, to work with Datasets. Let’s create a Dataset out of this csv data.</p>
<div class="highlight"><pre><span></span><code><span class="nc">Dataset</span><span class="o"><</span><span class="nc">Row</span><span class="o">></span> <span class="n">peopleDs</span> <span class="o">=</span> <span class="n">spark</span><span class="p">.</span><span class="n">read</span><span class="p">().</span><span class="n">option</span><span class="p">(</span><span class="s">"header"</span><span class="p">,</span> <span class="s">"true"</span><span class="p">).</span><span class="n">csv</span><span class="p">(</span><span class="s">"fake-people.csv"</span><span class="p">);</span>
<span class="n">peopleDs</span><span class="p">.</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
</code></pre></div>
<p>Since we’ve column headers in our data, we add the <code>.option("header", "true")</code> and the output is a nicely formatted table of the data with all the columns like this:</p>
<div class="highlight"><pre><span></span><code><span class="nb">+---+----------+----------+--------------------+------+--------------+</span><span class="c"></span>
<span class="c">| id|first_name| last_name| email|gender| ip_address|</span>
<span class="nb">+---+----------+----------+--------------------+------+--------------+</span><span class="c"></span>
<span class="c">| 1| Netti| McKirdy|nmckirdy0@slidesh</span><span class="nt">...</span><span class="c">|Female| 148</span><span class="nt">.</span><span class="c">3</span><span class="nt">.</span><span class="c">248</span><span class="nt">.</span><span class="c">193|</span>
<span class="c">| 2| Nickey| Curreen|ncurreen1@tripadv</span><span class="nt">...</span><span class="c">| Male| 206</span><span class="nt">.</span><span class="c">9</span><span class="nt">.</span><span class="c">48</span><span class="nt">.</span><span class="c">216|</span>
<span class="c">| 3| Allayne|Chatainier|achatainier2@trel</span><span class="nt">...</span><span class="c">| Male| 191</span><span class="nt">.</span><span class="c">118</span><span class="nt">.</span><span class="c">4</span><span class="nt">.</span><span class="c">217|</span>
<span class="c">| 4| Tades| Emmett|temmett3@barnesan</span><span class="nt">...</span><span class="c">| Male|153</span><span class="nt">.</span><span class="c">113</span><span class="nt">.</span><span class="c">87</span><span class="nt">.</span><span class="c">195|</span>
<span class="c">| 5| Shawn| McGenn|smcgenn4@shop</span><span class="nb">-</span><span class="c">pro</span><span class="nt">.</span><span class="c">jp| Male| 247</span><span class="nt">.</span><span class="c">45</span><span class="nt">.</span><span class="c">80</span><span class="nt">.</span><span class="c">68|</span>
<span class="nb">+---+----------+----------+--------------------+------+--------------+</span><span class="c"></span>
</code></pre></div>
<p>You can read in <code>JSON</code> data similarly as well. So, I generated some JSON this time from <a href="https://www.mockaroo.com/">Mockaroo</a>.</p>
<div class="highlight"><pre><span></span><code><span class="err">{</span><span class="ss">"id"</span><span class="err">:</span><span class="mi">1</span><span class="p">,</span><span class="ss">"first_name"</span><span class="err">:</span><span class="ss">"Zenia"</span><span class="p">,</span><span class="ss">"last_name"</span><span class="err">:</span><span class="ss">"Joberne"</span><span class="p">,</span><span class="ss">"email"</span><span class="err">:</span><span class="ss">"zjoberne0@foxnews.com"</span><span class="p">,</span><span class="ss">"gender"</span><span class="err">:</span><span class="ss">"Female"</span><span class="p">,</span><span class="ss">"ip_address"</span><span class="err">:</span><span class="ss">"214.207.159.43"</span><span class="err">}</span><span class="w"></span>
<span class="err">{</span><span class="ss">"id"</span><span class="err">:</span><span class="mi">2</span><span class="p">,</span><span class="ss">"first_name"</span><span class="err">:</span><span class="ss">"Renard"</span><span class="p">,</span><span class="ss">"last_name"</span><span class="err">:</span><span class="ss">"Kezor"</span><span class="p">,</span><span class="ss">"email"</span><span class="err">:</span><span class="ss">"rkezor1@elpais.com"</span><span class="p">,</span><span class="ss">"gender"</span><span class="err">:</span><span class="ss">"Male"</span><span class="p">,</span><span class="ss">"ip_address"</span><span class="err">:</span><span class="ss">"199.3.18.104"</span><span class="err">}</span><span class="w"></span>
<span class="err">{</span><span class="ss">"id"</span><span class="err">:</span><span class="mi">3</span><span class="p">,</span><span class="ss">"first_name"</span><span class="err">:</span><span class="ss">"Briant"</span><span class="p">,</span><span class="ss">"last_name"</span><span class="err">:</span><span class="ss">"Patel"</span><span class="p">,</span><span class="ss">"email"</span><span class="err">:</span><span class="ss">"bpatel2@odnoklassniki.ru"</span><span class="p">,</span><span class="ss">"gender"</span><span class="err">:</span><span class="ss">"Male"</span><span class="p">,</span><span class="ss">"ip_address"</span><span class="err">:</span><span class="ss">"111.184.217.23"</span><span class="err">}</span><span class="w"></span>
<span class="err">{</span><span class="ss">"id"</span><span class="err">:</span><span class="mi">4</span><span class="p">,</span><span class="ss">"first_name"</span><span class="err">:</span><span class="ss">"Robinett"</span><span class="p">,</span><span class="ss">"last_name"</span><span class="err">:</span><span class="ss">"Heasley"</span><span class="p">,</span><span class="ss">"email"</span><span class="err">:</span><span class="ss">"rheasley3@tiny.cc"</span><span class="p">,</span><span class="ss">"gender"</span><span class="err">:</span><span class="ss">"Female"</span><span class="p">,</span><span class="ss">"ip_address"</span><span class="err">:</span><span class="ss">"21.40.190.226"</span><span class="err">}</span><span class="w"></span>
<span class="err">{</span><span class="ss">"id"</span><span class="err">:</span><span class="mi">5</span><span class="p">,</span><span class="ss">"first_name"</span><span class="err">:</span><span class="ss">"Rosalinda"</span><span class="p">,</span><span class="ss">"last_name"</span><span class="err">:</span><span class="ss">"Glandfield"</span><span class="p">,</span><span class="ss">"email"</span><span class="err">:</span><span class="ss">"rglandfield4@indiegogo.com"</span><span class="p">,</span><span class="ss">"gender"</span><span class="err">:</span><span class="ss">"Female"</span><span class="p">,</span><span class="ss">"ip_address"</span><span class="err">:</span><span class="ss">"26.16.4.132"</span><span class="err">}</span><span class="w"></span>
<span class="err">{</span><span class="ss">"id"</span><span class="err">:</span><span class="mi">6</span><span class="p">,</span><span class="ss">"first_name"</span><span class="err">:</span><span class="ss">"Haslett"</span><span class="p">,</span><span class="ss">"last_name"</span><span class="err">:</span><span class="ss">"Culligan"</span><span class="p">,</span><span class="ss">"email"</span><span class="err">:</span><span class="ss">"hculligan5@meetup.com"</span><span class="p">,</span><span class="ss">"gender"</span><span class="err">:</span><span class="ss">"Male"</span><span class="p">,</span><span class="ss">"ip_address"</span><span class="err">:</span><span class="ss">"201.191.72.10"</span><span class="err">}</span><span class="w"></span>
<span class="p">....</span><span class="w"></span>
</code></pre></div>
<blockquote>
<p>Note: Spark can read JSON only of this format where we have one object per row. Otherwise you will see <code>_corrupt_record</code> when you print your dataset. That’s your cue to make sure the JSON is formatted as per spark’s need.</p>
</blockquote>
<p>And you read JSON very similar to the way you read csv. Since in JSON we don’t have headers, we don’t need the header option.</p>
<div class="highlight"><pre><span></span><code><span class="nc">Dataset</span><span class="o"><</span><span class="nc">Row</span><span class="o">></span> <span class="n">peopleJsonDs</span> <span class="o">=</span> <span class="n">spark</span><span class="p">.</span><span class="n">read</span><span class="p">().</span><span class="nc">JSON</span><span class="p">(</span><span class="s">"fake-people.JSON"</span><span class="p">);</span>
<span class="n">peopleJsonDs</span><span class="p">.</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
</code></pre></div>
<p>And the output is,</p>
<div class="highlight"><pre><span></span><code><span class="nb">+--------------------+----------+------+---+--------------+---------+</span><span class="c"></span>
<span class="c">| email|first_name|gender| id| ip_address|last_name|</span>
<span class="nb">+--------------------+----------+------+---+--------------+---------+</span><span class="c"></span>
<span class="c">|psurgison0@istock</span><span class="nt">...</span><span class="c">| Prissie|Female| 1| 48</span><span class="nt">.</span><span class="c">151</span><span class="nt">.</span><span class="c">89</span><span class="nt">.</span><span class="c">171| Surgison|</span>
<span class="c">| rsewell1@jalbum</span><span class="nt">.</span><span class="c">net| Robena|Female| 2| 184</span><span class="nt">.</span><span class="c">16</span><span class="nt">.</span><span class="c">37</span><span class="nt">.</span><span class="c">210| Sewell|</span>
<span class="c">|aluxon2@list</span><span class="nb">-</span><span class="c">mana</span><span class="nt">...</span><span class="c">| Annamarie|Female| 3| 254</span><span class="nt">.</span><span class="c">69</span><span class="nt">.</span><span class="c">187</span><span class="nt">.</span><span class="c">23| Luxon|</span>
<span class="c">|sodoherty3@twitpi</span><span class="nt">...</span><span class="c">| Shannah|Female| 4| 0</span><span class="nt">.</span><span class="c">245</span><span class="nt">.</span><span class="c">101</span><span class="nt">.</span><span class="c">197|O'Doherty|</span>
<span class="c">| alodford4@jigsy</span><span class="nt">.</span><span class="c">com| Alice|Female| 5|70</span><span class="nt">.</span><span class="c">217</span><span class="nt">.</span><span class="c">170</span><span class="nt">.</span><span class="c">182| Lodford|</span>
<span class="nb">+--------------------+----------+------+---+--------------+---------+</span><span class="c"></span>
</code></pre></div>
<p>You can see the order of columns is jumbled. This is because JSON data doesn’t usually keep any specified order and so, when you read JSON data into a dataset, the order might not be same as what you’ve given. Of course if you want to display the columns in a particular order, you can always do a <code>select</code> operation.</p>
<div class="highlight"><pre><span></span><code><span class="n">peopleJsonDs</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span> <span class="s">"first_name"</span><span class="p">,</span> <span class="s">"last_name"</span><span class="p">,</span> <span class="s">"email"</span><span class="p">,</span> <span class="s">"gender"</span><span class="p">,</span> <span class="s">"ip_address"</span><span class="p">).</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
</code></pre></div>
<p>And that would print it in the right order. This is exactly like the <code>SELECT</code> query in SQL, if you’re familiar with it.</p>
<p>Now, that we have seen how to create Datasets, let’s see some of the operations we can perform on them.</p>
<h3 id="operations-on-datasets">Operations on Datasets</h3>
<p>Datasets are built on top of Data frames. So, if you’re already familiar with Data frames in the spark 1.x releases you already know a ton about Datasets. Some of the operations you can perform on Dataset are as follows:</p>
<h4 id="column-selection">Column selection</h4>
<p>Select one or more columns from the dataset.</p>
<div class="highlight"><pre><span></span><code><span class="n">peopleDs</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"email"</span><span class="p">).</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span> <span class="c1">// Selecting one column</span>
<span class="n">peopleDs</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">"email"</span><span class="p">),</span> <span class="n">col</span><span class="p">(</span><span class="s">"gender"</span><span class="p">)).</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span> <span class="c1">// Selecting multiple columns</span>
</code></pre></div>
<blockquote>
<p>Note: <code>col</code> is a static import of org.apache.spark.sql.functions.col;</p>
</blockquote>
<h4 id="filtering-on-columns">Filtering on columns</h4>
<p>Filter a subset of rows in the dataset based on conditions.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Filter rows with id > 5 and \<= 10</span>
<span class="n">peopleDs</span><span class="p">.</span><span class="n">filter</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">"id"</span><span class="p">).</span><span class="n">$less$eq</span><span class="p">(</span><span class="mi">10</span><span class="p">).</span><span class="n">and</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">"id"</span><span class="p">).</span><span class="n">$greater</span><span class="p">(</span><span class="mi">5</span><span class="p">))).</span><span class="n">show</span><span class="p">();</span>
</code></pre></div>
<h4 id="dropping-columns">Dropping columns</h4>
<p>Remove one or more columns from the dataset</p>
<div class="highlight"><pre><span></span><code><span class="n">peopleDs</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="s">"last_name"</span><span class="p">,</span> <span class="s">"ip_address"</span><span class="p">).</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
</code></pre></div>
<h4 id="sorting-on-columns">Sorting on columns</h4>
<div class="highlight"><pre><span></span><code><span class="n">peopleDs</span><span class="p">.</span><span class="n">sort</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="s">"first_name"</span><span class="p">)).</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
</code></pre></div>
<p>And that sorts the dataset in the reverse order of the column <code>first_name</code>.</p>
<p>Output:</p>
<div class="highlight"><pre><span></span><code><span class="nb">+---+----------+---------+--------------------+------+-------------+</span><span class="c"></span>
<span class="c">| id|first_name|last_name| email|gender| ip_address|</span>
<span class="nb">+---+----------+---------+--------------------+------+-------------+</span><span class="c"></span>
<span class="c">|685| Zedekiah| Brockie|zbrockiej0@mozill</span><span class="nt">...</span><span class="c">| Male|105</span><span class="nt">.</span><span class="c">119</span><span class="nt">.</span><span class="c">18</span><span class="nt">.</span><span class="c">98|</span>
<span class="c">|308| Zarla| Bryceson|zbryceson8j@redif</span><span class="nt">...</span><span class="c">|Female|55</span><span class="nt">.</span><span class="c">118</span><span class="nt">.</span><span class="c">168</span><span class="nt">.</span><span class="c">15|</span>
<span class="c">|636| Zacherie| Kermon|zkermonhn@prnewsw</span><span class="nt">...</span><span class="c">| Male| 120</span><span class="nt">.</span><span class="c">36</span><span class="nt">.</span><span class="c">10</span><span class="nt">.</span><span class="c">87|</span>
</code></pre></div>
<p>Those are some of the functions that you can use with Datasets. There are still several Database table type operations on Datasets, like group By, aggregations, joins, etc.. We’ll look at them in the next article on Spark as I think this article already has a lot of information already and I don’t want to overload you with information.</p>
<p>So, that is all for this article. If you’re someone that has never tried Datasets or Dataframes, I hope this article gave a good introduction on the topic to keep you interested in learning more.</p>
<p>The full code is available as gist.</p>
<script src="https://gist.github.com/durgaswaroop/646ffb6283aa0238277aa16ae0771016.js"></script>
<hr>
<p>This is the fifth article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Twenty-five more articles on various topics including but not limited to <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. </p>
<hr>
<p>If you are interested in contributing to any open source projects and haven’t found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in Scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in the functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Tweeting with Python and Tweepy2017-12-25T19:00:00+05:302017-12-25T19:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2017-12-25:/tweeting-with-python-and-tweepy<p>Programmers love to automate things and I'm no exception. I always like automate my common tasks. Whether it is checking for stock prices or checking to see when the next episode of my favorite show is coming, I've automated scripts for that. Today I am going to add one more …</p><p>Programmers love to automate things and I'm no exception. I always like automate my common tasks. Whether it is checking for stock prices or checking to see when the next episode of my favorite show is coming, I've automated scripts for that. Today I am going to add one more thing in that list i.e., automated tweeting. I tweet quite frequently and I would love to have a way of automating this as well. And that's exactly what we're going to do today. We are tweeting using python.</p>
<p><img alt="twitter and python" class="centeralign" src="https://s-media-cache-ak0.pinimg.com/600x315/a8/b5/ae/a8b5aea9cabee52dc57abdc8338fc80c.jpg" width="450"></p>
<p>We'll use a python library called <code>tweepy</code> for this. <code>Tweepy</code> is a simple, easy to use library for accessing Twitter API.</p>
<p>Accessing twitter API's programmatically is not only just an accessibility feature but can be of enormous value too. Mining the twitter verse data is one of the key steps in sentimental analysis. Twitter chat bots have also become quite popular now a days with hundreds and thousands of bot accounts. This article, although, only barely scratches the surface, hopefully will helping in building yourself towards that.</p>
<h3 id="settingup">Setting Up</h3>
<p>First thing's first, install tweepy by running <code>pip install tweepy</code>. The latest version at the time of the writing this article is <code>3.5.0</code>.</p>
<p>Then we need to have our Twitter API credentials. Go to <a href="https://apps.twitter.com/">Twitter Apps</a>. If you don't have any apps registered already, go ahead and click the <code>Create New App</code> button.</p>
<p>To register your app you have to provide the following three things</p>
<ol>
<li>Name of your application</li>
<li>Description</li>
<li>Your website url</li>
</ol>
<p>There is one more option which is <code>callback URL</code>. You can ignore that for now. Then after reading the Twitter developer agreement (wink wink), click on <code>Create your Twitter application</code> button to create a new app.</p>
<p>Once the app is created you should see that in your twitter apps page. Click on it and GOTO the <code>Keys and Access Tokens</code> tab.</p>
<p><img alt="twitter apps tabs" class="centeralign" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/Twitter-apps-page-tabs.png" width="600"></p>
<p>There you will see four pieces of information. First you have your app API keys which are <code>consumer key</code> and <code>consumer secret</code>. Then you have your <code>access token</code> and <code>access token secret</code>.</p>
<p>We'll need all of them to access twitter API's. So, have them ready. I have copied all of them and exported them as system variables. You could do the same or if you'd like, you can read them from a file as well.</p>
<h3 id="letsgetstarted">Let's get started</h3>
<p>First you have to import <code>tweepy</code> and <code>os</code>(only if you are accessing system variables).</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">tweepy</span>
<span class="kn">import</span> <span class="nn">os</span>
</code></pre></div>
<p>Then I'll populate the access variables by reading them environment variables.</p>
<div class="highlight"><pre><span></span><code>consumer_key = os.environ["t_consumer_key"]
consumer_secret = os.environ["t_consumer_secret"]
access_token = os.environ["t_access_token"]
access_token_secret = os.environ["t_access_token_secret"]
</code></pre></div>
<p>With the keys ready, we setup the authorization.</p>
<div class="highlight"><pre><span></span><code>authorization = tweepy.OAuthHandler(consumer_key, consumer_secret)
authorization.set_access_token(access_token, access_token_secret)
</code></pre></div>
<p>After authorization we create an API object <code>twitter</code></p>
<div class="highlight"><pre><span></span><code>twitter = tweepy.API(authorization)
</code></pre></div>
<p>And now you can tweet from python using this <code>twitter</code> object like this.</p>
<div class="highlight"><pre><span></span><code>twitter.update_status("Tweet using #tweepy")
</code></pre></div>
<p>That is all you have to do. Just five lines of code and you can already tweet. You should try it out and check your twitter account. I just ran this command and this is the tweet.</p>
<blockquote>
<p>Tweet using <a href="https://twitter.com/hashtag/tweepy?src=hash&ref_src=twsrc%5Etfw">#tweepy</a></p>
<p>— Durga Swaroop Perla (@durgaswaroop) <a href="https://twitter.com/durgaswaroop/status/945047842485280768?ref_src=twsrc%5Etfw">December 24, 2017</a></p>
</blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>Not just this, you can also tweet media. Let's tweet again, this time with a picture attached.</p>
<div class="highlight"><pre><span></span><code>image = os.environ['USERPROFILE'] + "\\Pictures\\cubes.jpg"
twitter.update_with_media(image, "Tweet with media using #tweepy")
</code></pre></div>
<p>And this is the media tweet.</p>
<blockquote>
<p>Tweet with media using <a href="https://twitter.com/hashtag/tweepy?src=hash&ref_src=twsrc%5Etfw">#tweepy</a> <a href="https://t.co/9bDuw9DDJI">pic.twitter.com/9bDuw9DDJI</a></p>
<p>— Durga Swaroop Perla (@durgaswaroop) <a href="https://twitter.com/durgaswaroop/status/945049796238118912?ref_src=twsrc%5Etfw">December 24, 2017</a></p>
</blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>When you run the previous commands, you'll see that there is a lot of output that is printed on the terminal. This is a <code>status</code> object with a lot of useful data like the number of followers you got, your profile picture URL, your location etc., pretty much everything you get from your twitter page. We can make use of this information, If we are building something more comprehensive.</p>
<p>Apart from sending regular tweets, you can also reply to existing tweets. To reply to a tweet you'd first need its <code>tweet_id</code> which you can get from the tweet's URL.</p>
<p>For example the URL for previous tweet is <a href="https://twitter.com/durgaswaroop/status/945049796238118912">https://twitter.com/durgaswaroop/status/945049796238118912</a> and the <code>tweet_id</code> is <code>945049796238118912</code>.</p>
<p>Using that id, we can send another tweet as reply.</p>
<div class="highlight"><pre><span></span><code>id_of_tweet_to_reply = "945049796238118912"
twitter.update_status("Reply to a tweet using #tweepy", in_reply_to_status_id=id_of_tweet_to_reply)
</code></pre></div>
<p>The only change in the syntax is <code>in_reply_to_status_id=id_of_tweet_to_reply</code> that is passed as the second argument. And with that our new tweet will be added as reply to the original tweet.</p>
<p>The new reply tweet is this:</p>
<blockquote>
<p>Reply to a tweet using <a href="https://twitter.com/hashtag/tweepy?src=hash&ref_src=twsrc%5Etfw">#tweepy</a></p>
<p>— Durga Swaroop Perla (@durgaswaroop) <a href="https://twitter.com/durgaswaroop/status/945053630129881088?ref_src=twsrc%5Etfw">December 24, 2017</a></p>
</blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>That's how easy it is to access Twitter API with tweepy. We now know how to tweet and how to reply to a tweet. Building up from this knowledge, In a later tutorial, I can show you how to create your own twitter chat-bot and also twitter streaming analysis.</p>
<p>The full code of things covered in this article is available as gist at</p>
<script src="https://gist.github.com/durgaswaroop/d16cad3f4e3f8d1976a124aac602f5d2.js"></script>
<hr>
<p>For more programming and Python articles, checkout <a href="http://freblogg.com">Freblogg</a> and <a href="https://freblogg.com/tags/python">Freblogg/Python</a></p>
<p><a href="https://medium.com/@durgaswaroop/web-scraping-with-python-introduction-7b3c0bbb6053">Web Scraping For Beginner with Python</a></p>
<hr>
<p>This is the third article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Twenty-seven more articles on various topics including but not limited to <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. While you're at it, Go ahead and subscribe to this blog and my <a href="https://medium.com/@durgaswaroop/">blog on Medium</a> as well.</p>
<hr>
<p>If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in the functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>Sessions in Vim2017-12-23T20:00:00+05:302017-12-23T20:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2017-12-23:/sessions-in-vim<p>I love to have a lot of tabs open at the same time in Vim. Being the deputy Scrum master (Yeah, it is a thing) of our dev-team, I have to keep track of a lot of things going on in the team. I maintain a repository of all the …</p><p>I love to have a lot of tabs open at the same time in Vim. Being the deputy Scrum master (Yeah, it is a thing) of our dev-team, I have to keep track of a lot of things going on in the team. I maintain a repository of all the links to product documentation, stories, tasks etc. I also need to keep track of the discussions that happened in various team meetings. On top of this, as a backend engineer I have my own stories and tasks to manage as well. All of this means that, I have a ton of tabs and splits open at any given time in Vim. Something like the following:</p>
<p><img alt="Vim multiple tabs open" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/vim_multiple_tabs.png"></p>
<p>Now, the problem comes when I have to shutdown and start my computer. All the tabs I have kept open for several days will be gone and I have to open them up all again and put them in the order I want. <em>Oh, the pain!</em>. There has to a better way!</p>
<p><img alt="There has to be a better way" class="aligncenter" src="https://i.imgur.com/m01MZOg.gif"></p>
<p>Luckily for us, Vim always has a better way to do something. There is an inbuilt feature just for this.</p>
<p>It is called <code>Vim-sessions</code> and with that you can get back all your tabs with just one command. <em>How nice!</em></p>
<h2 id="how-to-create-a-new-session">How to create a new session?</h2>
<p>To create a vim session, run the command <code>:mksession episodes.session</code>. Here <code>episodes</code> is the name of the session I want to create.</p>
<p>In short, <code>:mks <session-name>.session</code>. And that’s it. Your session is now saved. It saves information about what tabs are currently open, what splits are open and even what buffers are open, all into that session file.</p>
<blockquote>
<p>Note: The <code>.session</code> suffix is not needed. But it is the preferred way as you can easily identify the session file.</p>
</blockquote>
<p>Once this is done, you can go ahead and close your tabs as all of that is stored in the session file.</p>
<h2 id="how-to-open-an-existing-session">How to open an existing session?</h2>
<p>The next time you want to open all of those tabs, all you have to do is tell Vim to run that session. You do that by running the command <code>:so <session-file-path></code>(<code>:so</code> is short for <code>:source</code>).</p>
<p>And boom! All of your windows and tabs are back with just one command. You don’t have to have multiple tmux or screen buffers running anymore. Vim can do all of that with just one command.</p>
<p>That is all you need to know about sessions in vim to make yourself productive. You can always try the vim help with <code>:help session-file</code> and find out more.</p>
<hr>
<p>For more Vim articles, checkout <a href="https://freblogg.com/tags/vim">Freblogg/Vim</a></p>
<p>Beginner Vim Tutorials - <a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a></p>
<p>Vim Color scheme used: <a href="http://www.vim.org/scripts/script.php?script_id=1802">Eclipse</a></p>
<hr>
<p>This is the first article as part of my twitter challenge <a href="https://twitter.com/durgaswaroop/status/944503750340702208">#30DaysOfBlogging</a>. Twenty-nine more articles on various topics including but not limited to <a href="https://freblogg.com/tags/java">Java</a>, <a href="https://freblogg.com/tags/git">Git</a>, <a href="https://freblogg.com/tags/vim">Vim</a>, <a href="https://freblogg.com/tags/software">Software Development</a>, <a href="https://freblogg.com/tags/python">Python</a>, to come.</p>
<p>If you are interested in this, make sure to follow me on Twitter <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a>. </p>
<hr>
<p>If you are interested in contributing to any open source projects and haven’t found the right project or if you were unsure on how to begin, I would like to suggest my own project, <a href="https://github.com/durgaswaroop/delorean">Delorean</a> which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in the functionality.</p>
<hr>
<p>Thanks for reading. See you again in the next article.</p>How to recover from 'git reset --hard" | Git2017-09-11T16:00:00+05:302017-09-11T16:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2017-09-11:/how-to-recover-from-git-reset-hard-git<p>Git is an amazingly powerful tool. But, as Uncle Ben said, </p>
<blockquote>
<p>With great power, comes great responsibility</p>
</blockquote>
<p>And that is true for Git as well. If you are not careful when using it, you could easily burn your a**.<br>
So, If something like that happened to you, Or If you …</p><p>Git is an amazingly powerful tool. But, as Uncle Ben said, </p>
<blockquote>
<p>With great power, comes great responsibility</p>
</blockquote>
<p>And that is true for Git as well. If you are not careful when using it, you could easily burn your a**.<br>
So, If something like that happened to you, Or If you want to make sure that never happens to you, then watch this video. </p>
<iframe allowfullscreen class="YOUTUBE-iframe-video" frameborder="0" height="385" src="https://www.youtube.com/embed/MijDnC4mz9w?feature=player_embedded" width="640">
</iframe>
<p>Subscribe to the channel for more videos like this.</p>Git Cherrypick2017-02-20T14:26:00+05:302017-02-20T14:26:00+05:30Durga Swaroop Perlatag:freblogg.com,2017-02-20:/git-cherrypick<p><strong>Cherrypick</strong> is one of the most useful commands in Git. It is used to apply a commit that is present on another branch, on the current branch. Let's see an example to understand this better. </p>
<p>Let’s say you have two branches <em>feature1</em> and <em>feature2</em> as in the following picture …</p><p><strong>Cherrypick</strong> is one of the most useful commands in Git. It is used to apply a commit that is present on another branch, on the current branch. Let's see an example to understand this better. </p>
<p>Let’s say you have two branches <em>feature1</em> and <em>feature2</em> as in the following picture. </p>
<p><img alt="git branches feature1 and feature2" src="https://qphs.fs.quoracdn.net/main-qimg-d7022ab07a79d1c93bb1261cd2bd3bdf-c"> </p>
<p>Now, the green commit 5 on branch 2, has some interesting code that you want on feature1. How would you get that? You are probably thinking about merge/rebase. But with that you will get all the other green commits from 1–4 as well, which you don’t want.<br>
Cherrypick for the rescue!. </p>
<p>Assuming you are on feature1, all you have to say is </p>
<p>git cherry-pick green5 (Assuming 'green5' is the commit id)</p>
<p>And that’s it. You will have the green5 commit on your orange4 commit like in this picture as you wanted. </p>
<p><img alt="commit from feature2 cherrypicked on feature1 branch" src="https://qphs.fs.quoracdn.net/main-qimg-1ef6d0807fd45ec07e23ae6cfcbbbca0-c"> </p>
<p>Notice, that the green commit is no longer “5” but has been changed to “5′”. This is to show that, though the changes (change set is the git term) in the commit remain the same, Git will generate a new commit hash for this because hashes take parent node also into account. I have used the same colour to show that the content is the same. </p>
<p>And that is all you need to know about Cherry picking. So, Go ahead and pick some cherries! </p>
<hr>
<p>Follow <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a> on Twitter.</p>Git Merge Vs. Git Rebase2017-01-31T09:00:00+05:302017-01-31T09:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2017-01-31:/git-merge-vs-git-rebase<p>Merge and Rebase are two strategies available in Git to combine two ( or more) branches into one branch.<br>
Let’s say we have two branches <em>feature1</em> and <em>feature2</em> that have diverged from a common commit “<em>a</em>” to have four commits each. </p>
<p><img alt="git two branches" src="https://qphs.fs.quoracdn.net/main-qimg-27b4c373471ccb22663c3189b051dcc3.webp"> </p>
<p>Now we want to combine both the features into …</p><p>Merge and Rebase are two strategies available in Git to combine two ( or more) branches into one branch.<br>
Let’s say we have two branches <em>feature1</em> and <em>feature2</em> that have diverged from a common commit “<em>a</em>” to have four commits each. </p>
<p><img alt="git two branches" src="https://qphs.fs.quoracdn.net/main-qimg-27b4c373471ccb22663c3189b051dcc3.webp"> </p>
<p>Now we want to combine both the features into a single branch. Merge and Rebase are our options. Let’s see what each of them can do. </p>
<h3 id="git-merge">Git Merge</h3>
<p>Merge will seem like a fairly obvious thing, if you look at the end result. It is pretty much like taking two threads and tying them up in a knot. </p>
<p><img alt="git branches wit merge commit" src="https://qphs.fs.quoracdn.net/main-qimg-5c246cb7872aaed9c1243d3fea96b467?convert_to_webp=true"> </p>
<p>Here the commit ‘b’, has the information regarding all the commits in feature1 and feature2. So, Merge preserves the history of the repository. </p>
<h3 id="git-rebase">Git Rebase</h3>
<p><em>Rebase</em> on the other hand doesn’t preserve the history. It quite literally <em>re-bases</em> one branch on top of the other i.e., it changes the <em>base</em> of the branch. Let’s see rebasing with the same example.<br>
Let’s say I want to <em>rebase feature1 onto feature2,</em> what that means is that I want all the commits in the branch feature1 on top of the commits of feature2. So, after rebase your commit history would look like the following. </p>
<p><img alt="git branches rebased" src="https://qphs.fs.quoracdn.net/main-qimg-531720271c7a9e5ada9047c751c6ab27?convert_to_webp=true"> </p>
<p>As you see in the picture, the base of feature1 which was previously the commit “a”, has been shifted to the green commit “4”. Hence the name <strong>Re-Base.</strong> Here feature1 is sitting on top of feature2 as opposed to being on “a”. </p>
<p>Do note that I have added a <strong><em>‘</em></strong> next to the numbers of feature branch making them 1’, 2′ and so on, to indicate that the <em>orange 1′</em> commit is different from the <em>orange 1</em> commit. This is because each commit, apart from storing the changes to the files, stores the information regarding its parent. So, If a parent to a commit changes, even it has the exact sames modifications to the files, will be treated as a different commit by Git, which means we have changed the Git commit history. </p>
<p>Also Anyone who looks at the commit history now, would think that feature1 was added after feature2 which is not what actually happened. If this is the end result you’re going for, then it’s absolutely fine but if you want to show that feature1 and feature2 both started off simultaneously, then you need to use Merge. </p>
<p>Both Merge and Rebase have their pros and cons. Merge keeps the history of the repository but can make it hard for someone to understand and follow what’s going on at a particular stage when there are multiple merges. Rebase on the other hand ‘rewrites’ history (read - creates new history) but makes the repo look cleaner and is much easier to look at. </p>
<p>What you want to use depends on your need. A lot of companies make merges mandatory when adding stuff to master branch because they want to see the history of all changes. And a few companies/Open source projects mandate rebasing as it keeps the flow simple and easy to follow. Use the one that suits your workflow. </p>
<p>Fun Fact:<br>
There is a merge strategy called Octopus merge, where you merge multiple branches into one. For more info on this: <a href="https://freblogg.com/git-octopus-merge">Understanding Git Octopus Merge</a> </p>
<hr>
<p>For more interesting articles, follow me <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a> on Twitter</p>Understanding Git Octopus Merge2016-12-21T05:00:00+05:302016-12-21T05:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-12-21:/git-octopus-merge<p>The Code for Git merge is one of the most sophisticated pieces of software ever written. There is so much stuff that goes inside during a merge that its just mind boggling. Just for that alone, Linus could be considered a programming genius. Too bad for other geniuses, he also …</p><p>The Code for Git merge is one of the most sophisticated pieces of software ever written. There is so much stuff that goes inside during a merge that its just mind boggling. Just for that alone, Linus could be considered a programming genius. Too bad for other geniuses, he also has "Linux kernel" on his resume :-D.</p>
<p><img alt="Git Logo" class="aligncenter" src="https://git-scm.com/images/logos/downloads/Git-Icon-1788C.png" width="200"></p>
<p>As the title suggests this article is about <em>Octopus Merge</em> in Git. For this, I hope you know what a basic Git merge is and what it means to merge. If you're completely unfamiliar with Git, then I've no idea what you're doing here. You better read up on some Git 101 before jumping in to this article.</p>
<p>Anyway, Just to brush up, this is how a simple/familiar Git merge goes ..</p>
<p>We have a branch called <em>feature</em> that diverged from <em>master</em> at the second commit and went to have two commits of its own.</p>
<p><img alt="Master and Feature branches in Git" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/git_branches_master_feature.png" width="500"></p>
<blockquote>
<p><em>Note</em>: For the branch pictures in this article I am using a Git GUI tool called <a href="https://www.gitkraken.com/">Git Kraken</a>. I have been trying it a few days now and it looks quite promising. I am a fan of its clean and minimalist UI and have been using it extensively for the beautiful visualization of branches. And above all, It is free for personal non-commercial use. So, you can try it out for free.</p>
</blockquote>
<p>Now you want to add those cool new changes on the feature branch to master. The way you do it is by merging (Let's not talk about Rebasing for now. We will look at it another time). So, when you merge this is how it looks like.</p>
<p><img alt="Git merge master feature" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/git_merge_master_feature.png" width="500"></p>
<p>This is all the usual stuff that we are all familiar with.</p>
<p>Now there is another type of merge called the <strong>The Octopus Merge</strong>. At least some of you must have heard about it either from an online video or from a colleague in your office that seems to know everything. Either way the Octopus merge is a really fun way of Merging. You probably won't get to do this at your work as a lot of companies think this complicates things and we all know how much Companies hate complexity. Anyway, Let's see what it looks like. I have a local git repository with three branches <em>branch1</em>, <em>branch2</em>, <em>branch3</em> along with <em>master</em>. All four of these branches have two extra commits from the point they diverged.</p>
<p><img alt="Git octopus pre image" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/git_pre_octopus_merge.png" width="500"></p>
<p>Now if you want to merge them, the usual way would be to merge two branches at a time to finally get to the final combination after three merges like so.</p>
<p><img alt="The usual way to merge branches in Git" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/git_usual_merge.png" width="500"></p>
<p>This may seem fine and might actually be the only way you would think about this if not for the Octopus merge. You have three merge commits here and as we know merge commits are noise. They pollute the history of your repository and interrupt the story told by your Git history. So, how about keeping the noise low by just having one Merge commit instead of three. How you ask? Octopus, My friend. All hail the great and mighty Octopus. So, the way you perform an Octopus is by merging all the branches at once on to the master. To do that you give a command like this</p>
<p><img alt="Git merge octopus" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/git_octopus_merge_output.png" width="500"></p>
<p>This will merge all the three branches to master. The branches will look something like this. Do you see the reference of Octopus now?</p>
<p><img alt="Octopus Git merge" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/git_post_octopus_merge.png" width="500"></p>
<p>Now, if you know anything about octopuses, you might be wondering that we only have four legs here while an Octopus has 8. Well you are right. Octopuses do have 8 legs (technically 6 as two of them are used as hands) but 4 is good enough. Actually any merge can be called Octopus if you're merging three or more branches.</p>
<p>If you are using Git for sometime, you might be wondering, If Octopus is so freaking cool, why haven't more people heard about it and Why are more people not using it. Well, you are right my friend. Octopus is awesome for sure, but as I said it certainly does complicate things a lot especially when dealing with merge conflicts. Merge is hard enough as it is when dealing with just two branches. But if you are merging 5 or 10 branches together it feels like you're doing a complex surgery. You have to be really careful in that case and I am not even sure if any modern GUI tools support diffing 10-way. Also a lot of people tend to go overboard with Octopus.</p>
<p>Look at this <a href="http://marc.info/?l=linux-kernel&m=139033182525831">message</a> where Linus Torvalds yells (pleasantly) at a guy for creating an Octopus with 66 branches. Imagine that for a second. 66 branches! I wouldn't want to be the guy that handles merge conflicts on that one! Linux aptly says</p>
<blockquote>
<p>that's not an octopus, that's a <strong>Cthulhu</strong> merge</p>
</blockquote>
<p><img alt="Cthulu Image" class="aligncenter" src="https://upload.wikimedia.org/wikipedia/commons/6/62/Cthulhu_and_R%27lyeh.jpg" width="300"></p>
<p>So, a lot of companies don't really use this. A lot of people won't even consider this for their merge strategies.</p>
<p>A rule of thumb to follow with <strong>Octopus</strong> is to never overdo it. An 8-way octopus merge though borders on crazy hard and insane, is fine but more than that is an overkill. The situations where you have to merge more than 5 or 6 branches tend to be very rare and in those cases may be you can go for an Octopus on a subset of branches at a time and do a Octopus for those. Or may be rethink your merging strategy.</p>
<p>Either way, I hope this article helped you in understanding something new and gives you some ideas for dealing with complex merges. I hope you will educate your peers and colleagues about this new merge and share this article with them</p>
<p>Well, That is all for this article folks. See you again in the next one. Until then, Good Bye.</p>
<hr>
<p>Special Thanks to <a href="https://www.gitkraken.com/">Git Kraken</a> team at Axosoft for developing a great tool like Kraken.</p>
<p>You can find me as <a href="https://twitter.com/durgaswaroop">@durgaswaroop</a> on Twitter.</p>
<h4 id="attributions">Attributions:</h4>
<p>Cthulhu Image - CC BY-SA 3.0 -https://commons.wikimedia.org/wiki/File:Cthulhu_and_R'lyeh.jpg</p>Navigating In Vim II | Your First Lesson in Vim2016-10-13T12:00:00+05:302016-10-13T12:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-10-13:/your-first-lesson-in-vim-4<p>This is the fourth article in the series titled, "<a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a>". These articles are written with a goal of helping out new Vim users by teaching the awesomeness of the Vim editor and there by extending the Vim community. Vim though quite powerful, has a bad rep …</p><p>This is the fourth article in the series titled, "<a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a>". These articles are written with a goal of helping out new Vim users by teaching the awesomeness of the Vim editor and there by extending the Vim community. Vim though quite powerful, has a bad rep for being hard to learn and hard to get started with. So, even when someone is interested in learning about Vim, that infamous learning curve seem to be scaring them off. This series is going to put an end to all of that.</p>
<p><img alt="Vim Logo" class="aligncenter" src="http://wolfrosch.com/_img/works/goodies/icon/vim@2x" width="150"></p>
<p>In the last article <a href="https://freblogg.com/your-first-lesson-in-vim-3">Navigating in Vim I</a>, we have seen a lot of Vim motions. Most of these fall under the category of word-motions (<code>:help word-motions</code>). We will learn some more motions in this article. And in case you still haven't tried <a href="http://vim-adventures.com/">Vim Adventures</a> you should do it. It will help you a lot with getting the hang of Vim motions and getting around in vim.</p>
<p>[]{#linemotions} Here are the list of Vim motions for this article.</p>
<p>Motion What it does?</p>
<hr>
<p>0 Go to the <em>STARTING</em> of the <em>CURRENT LINE</em>
\^ Go to the <em>FIRST NBC*</em> of the <em>CURRENT LINE</em>
- Go to the <em>FIRST NBC*</em> of the <em>PREVIOUS LINE</em>
+ Go to the <em>FIRST NBC*</em> of the <em>NEXT LINE</em>
\$ Go to the <em>END</em> of the <em>CURRENT LINE</em>
g_ Go to the <em>LAST NBC*</em> of the <em>CURRENT LINE</em>
f{char} Find a character <em>FORWARD</em> in the current line (Usage: to go to first occurance of c, you type <code>fc</code>)
F{char} Find a character <em>BACKWARD</em> in the current line (Usage: to go to first occurance of c to the left of the cursor, you type <code>Fc</code>)
t{char} Like <code>f</code> but places the cursor before the character (Mnemonic : t - till)
T{char} Like <code>T</code> but places the cursor after the character
gg Move the cursor to the first line (compare this with <code>H</code>)
G Move the cursor to the last line (compare this with <code>L</code>)</p>
<p>* NBC - Non Blank Character</p>
<p><img alt="Line motions Vim picture" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/line_motions.png" width="600"></p>
<p>These motions let you move very fast between lines. You can go to any character you want on the current line with just 2 or at 3 keys, which is insanely fast compared to any other text editor. The last two motions (gg, G) are super useful and are certainly two of my most used commands.</p>
<p>Now we have one final set of motions to learn called Text Object motions (<code>:help object-motions</code>). Text objects is an important concept in Vim and we will cover that in depth in a future article. For now let's look at these motions.</p>
<h3 id="textmotionstext-object-motions">[]{#textmotions}Text Object Motions</h3>
<p>Motion What it does?</p>
<hr>
<p>( Go to the beginning of the <em>PREVIOUS</em> sentence
) Go to the beginning of the <em>NEXT</em> sentence
{ Go one paragraph <em>BACKWARD</em>
} Go one paragraph <em>FORWARD</em></p>
<p>These four motions are very useful too. Especially if you're a programmer, the <code>{</code> and <code>}</code> will make navigating the code base a breeze.</p>
<p>And with that, we have covered all the basic Vim motions for you to get started. There is just one more important thing you need to know in conjunction with Motions. I haven't told you about this till now because I wanted you to get a full grasp of Vim motions before I explain this. Anyway, here it goes ..</p>
<blockquote>
<p><strong>Every Vim Motion takes a count before it</strong></p>
</blockquote>
<p>That's it. It might seem simple and it is simple, but its usefulness is just immeasurable.</p>
<p>Let's say you have to move eight lines down. To go eight lines down you don't have to frantically type <code>jjjjjjjj</code>. Just simply type <code>*8j*</code>. Similarly <code>4k</code> to go four lines above, <code>6w</code> to go to the sixth word from the cursor and so on. This is just such a useful feature that quite literally <em>Sky is the limit</em> for what you can do with this. Want to go to the second <code>e</code> after the cursor? Try <code>2fe</code> and your cursor lands directly on <code>e</code>. Similarly to go to the ending of the 5th line below just do <code>5$</code> and B.A.M!</p>
<p>This opens up a whole new world of combinations for you to use and I hope you will make use of all of them. With these motions you can move to any place you want in the file with minimal number of keystrokes and your ultimate aim should be to accomplish everything with the minimum possible number of keystrokes. Be a Vim Ninja and conquer the world!</p>
<p><img alt="Vim ninja image" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/vim_text_motions.png" width="600px"></p>
<p>Well, That is all for this article folks. Will see you again in the next one. Until then, Keep practicing and Happy Vimming!</p>
<p><a class="navlinkleft" href="https://freblogg.com/your-first-lesson-in-vim-3">← Prev</a></p>
<hr>
<p>For more Vim stuff : <a href="https://freblogg.com/tags/vim">Vim</a></p>
<h4 id="attributions">Attributions:</h4>
<p>Vim Logo - Vim Replacement Icon http://wolfrosch.com/works/goodies/vim (CC BY-NC-ND 3.0)</p>
<p>Vim Ninja image - https://goo.gl/QgTrsY (Originally from Practical Vim by Drew Neal)</p>Navigating in Vim I | Your First Lesson In Vim2016-10-09T00:00:00+05:302016-10-09T00:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-10-09:/your-first-lesson-in-vim-3<p>This is the third article in a series titled, "<a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a>". These articles are written with a goal of helping out new Vim users by teaching the awesomeness of the Vim editor there by extending the Vim community. Vim though quite powerful, has a bad rep for …</p><p>This is the third article in a series titled, "<a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a>". These articles are written with a goal of helping out new Vim users by teaching the awesomeness of the Vim editor there by extending the Vim community. Vim though quite powerful, has a bad rep for being hard to learn and hard to get started with. So, even when someone is interested in learning about Vim, that infamous learning curve seem to be scaring them off. Hopefully this series will put those fears to bed.</p>
<p><img alt="Vim Logo" class="aligncenter" src="http://wolfrosch.com/_img/works/goodies/icon/vim@2x" width="150"></p>
<p>In the last article <a href="https://freblogg.com/your-first-lesson-in-vim-2">How To Exit Vim</a>, we have seen what Vim modes are and what they do. So, If you know about Visual Mode, Insert Mode, Command Mode and Normal Mode, then continue with this article. Otherwise take a look again at the previous article. Vim modes are really important to understand this article and the upcoming ones.</p>
<p>I wanted this to be a part of the previous article but since this is really important and has a lot of potential information to discuss, I decided to give this its own full article. We will be spending most of our time in Normal mode here as that is where we will navigate in the file. You have seen how you navigate in Vim using <code>h j k l</code>. If you haven't figured out already, Vim's main philosophy is increasing your productivity and because of this some of the things Vim does might seem different compared to the usual way you are used to in other editors. Using <code>h j k l</code> is one of those things.</p>
<p><img alt="vim navigation" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/vim_navigation_hjkl.png" width="450"></p>
<p>We have covered this in the last article but as promised I will expand about it here. If you look at your keyboard you will see that <code>h j k l</code> are on your Home Row (unless you are using Dvorak Keyboard, in which case this article probably won't help you much). Having the navigation keys on the home row is such an advantage as you don't have to move your fingers at all to access them. Going to the arrow keys for navigating is tiresome and time consuming. Don't think so? Well try it out yourself. Rest your fingers on their normal positions on the Home row (<code>a s d f - j k l ;</code>) and try to hit the UP arrow and come back. Did you see the travel involved in that? Do it again and see how you have moved your hand away from the keys and came back. Do it 5 more times and tell me If i am wrong when I say its just unnecessary travel, Especially since you have the navigation keys right on the home row in Vim. This is one of the reasons why Vim users are usually pretty fast. They don't keep moving their hands on and off the keyboard every time you have to go up or down. And again this could save you from potential RSI injuries.</p>
<p><img alt="qwerty keyboard" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/qwerty_time_waste_small.png" width="600"></p>
<p>So, I strongly advise you to stop using UP and DOWN arrow keys. To use the <code>h j k l</code> keys more, try playing <a href="http://vim-adventures.com/">Vim Adventures</a>. Its a fun game where you go around the textland collecting characters using Vim's navigation controls. It will help you use the <code>h j k l</code> keys and just after a couple of tries it becomes muscle memory.</p>
<p>One more thing that you won't see Vim users (#VimRocks) using is Mouse. The argument against using Mouse is the same one as that for the arrow keys. You are taking your hands off your main row which not only breaks your typing flow but also is just plain annoying. Its the same for mouse except you're moving your hand even further which makes it that much worse.</p>
<p>Getting rid of Arrow keys and Mouse is not an easy thing. This is cert ainly something that takes time to get used to. But once you do, you will be that much faster in your work flow.</p>
<h3 id="vim-motions">Vim Motions</h3>
<p>Vim Motions are the amazing things that make Vim users so fast. You already know about <code>h j k l</code>. Motions are just about anything that moves your cursor from one place to another. Apart from that you also have <code>w W b B e E H M L</code>. Let's see what they do.</p>
<p>Command What it does?</p>
<hr>
<p>w Move the cursor to the starting of the next word
W Move the cursor to the starting of the next WORD
b Move the cursor to the starting of the previous word (Mnemonic - <strong>b</strong>ack)
B Move the cursor to the starting of the previous WORD
e Move the cursor to the end of the current word (Mnemonic - <strong>e</strong>nd)
E Move the cursor to the end of the current WORD
H Move the cursor to the First line of the current visible screen (Mnemonic - <strong>H</strong>igh)
M Move the cursor to the Middle line of the current visible screen (Mnemonic - <strong>M</strong>iddle)
L Move the cursor to the Last line of the current visible screen (Mnemonic - <strong>L</strong>ow)</p>
<p>The usage of <code>H M L</code> commands should be clear with this image below. They move your cursor to first, middle and last line of the screen respectively.</p>
<p><img alt="vim hml motions" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/hml_vim.png" width="450"></p>
<p>To understand about <code>w b e</code> and their upper case variants we have to understand how <em>word</em> and <em>WORD</em> are defined in Vim. From the official documentation (<code>:help word</code>),</p>
<blockquote>
<p>A <strong>word</strong> consists of a sequence of letters, digits and underscores, or a sequence of other non-blank characters, separated with white space (spaces, tabs, <eol>)</p>
<p>A <strong>WORD</strong> consists of a sequence of non-blank characters, separated with white space</p>
</blockquote>
<p>In short, a group of characters with out a space between them is a <em>WORD</em> and there can be multiple <em>words</em> in that. With that definition let's take a look at some examples and identify the number of words and the number of WORDS in them.</p>
<p>Word # of words # of WORDS</p>
<hr>
<p>hello world 2 (hello,world) 2 (hello,world)
hello-world 3 (hello,-,world) 1 (hello-world)
hello_world 1 (hello_world) 1 (hello_world)</p>
<p>Once you understood the difference between <em>word</em> and <em>WORD</em>, all the motions explained in the first table would be clear. Just to make the foundation firm, try them out yourself. Type something in a file and try to see what each of the <code>w b e W B E H M L</code> commands are doing and how they are moving the cursor. All the motions we have covered in this article are called <em>Word motions</em> (<code>:help word-motions</code>). There are some more Vim motion commands that you need to know to quickly navigate with ease. To keep this article simple, we will end this discussion here and will pickup again in the next article where we will discuss about the other motion commands. So, Keep practicing these motions combined with other commands discussed in <a href="https://freblogg.com/your-first-lesson-in-vim-2">How To Exit Vim</a> and you would be really fast already. Fast like a Puma!</p>
<p>Well, That is all for this article folks. Will see you again in the next one. Until then, Keep practicing and Happy Vimming!</p>
<p><a class="navlinkleft" href="https://freblogg.com/your-first-lesson-in-vim-2">← Prev</a> <a class="navlinkright" href="https://freblogg.com/your-first-lesson-in-vim-4">Next →</a></p>
<hr>
<p>For more Vim stuff : <a href="https://freblogg.com/tags/vim">Vim</a></p>
<h4 id="attributions">Attributions:</h4>
<p>Vim Logo - Vim Replacement Icon http://wolfrosch.com/works/goodies/vim (CC BY-NC-ND 3.0)</p>How to Exit Vim? | Your First Lesson In Vim2016-10-02T17:00:00+05:302016-10-02T17:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-10-02:/your-first-lesson-in-vim-2<p>This is the second article in the series titled, "<a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a>". These articles are written with a goal of helping out new Vim users by teaching the awesomeness of the Vim editor and there by extending the Vim community. Vim though quite powerful, has a bad rep …</p><p>This is the second article in the series titled, "<a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a>". These articles are written with a goal of helping out new Vim users by teaching the awesomeness of the Vim editor and there by extending the Vim community. Vim though quite powerful, has a bad rep for being hard to learn and hard to get started with. So, even when someone is interested in learning about Vim, that infamous learning curve seem to be scaring them off. This series is going to put an end to all of that.</p>
<p><img alt="Vim Logo" class="aligncenter" src="http://wolfrosch.com/_img/works/goodies/icon/vim@2x" width="150"></p>
<p>In the last article <a href="https://freblogg.com/your-first-lesson-in-vim-1">Introduction & Installation</a> we have seen why Vim is the best and coolest editor ever. Hopefully after watching Damian Conway's YouTube video given at the end of that article, you would agree. In this article we will experience Vim for the first time. We will learn about the various modes of operations in Vim. And, most importantly as the title of the article suggests, we will learn <strong>How to Exit Vim</strong>.</p>
<p>First thing's first, You have to open Vim. Duh!. You can do that either by directly searching for <code>Vim</code> in your Search box or by typing <code>vim</code> in to your terminal. If you want to start off with <code>gvim</code> then open that instead. Your Gvim or Mac Vim would look something like this.</p>
<p><img alt="Gvim startup picture" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/gvim_start.png" width="450"></p>
<p>If you have already tried to type something, you would observe that there is something shady going on here. For example, if you type <code>hello world</code> you might observe that only <code>world</code> is displayed and <code>hello</code> is no where to be seen. Try it out for yourself. This happens because of the infamous Vim modes. One of the first things you have to realize while using Vim is that its not like your typical run of the mill text editor. Vim works a bit differently and <em>Modes</em> is one of the key things that makes vim different. So, let's take a look at them.</p>
<p>Broadly speaking Vim has Four major modes of operation. That number keeps changing depending on who you're talking to because there are a few more modes that can technically be called sub-modes but some people insist on treating them as Seperate modes. But to keep things simple here <strong>4</strong> is the magic number for you and <strong>4</strong> is the answer to Life, Universe and Everything. Not <em>42</em>, <em>4</em>!</p>
<p>The modes are :</p>
<h3 id="normal-mode">Normal Mode</h3>
<p>Normal mode is the default mode you will be in when you open Vim. Normal mode is used for altering, deleting and formatting text. You won't be <em>inserting</em> any new text into the document in this mode. Normal Mode is the mode you will be spending most of your time in. You can get to Normal mode by pressing <code>ESC</code> from any other mode. One of the main things you will be doing in this mode is moving around your document.</p>
<p>To move around the file in the window you might usually be using arrow keys. In Vim they will work the way you expect them to, but instead vim advises to use <code>h .. j .. k .. l</code> for moving the cursor. The reason why this came to be and the advantages of this will be apparent in the <a href="https://freblogg.com/your-first-lesson-in-vim-3">next article</a> but for now let's see how this works.</p>
<blockquote>
<p><code>h</code> moves the cursor left, <code>l</code> moves it right</p>
<p><code>j</code> moves the cursor down, <code>k</code> moves it up</p>
</blockquote>
<p>This following picture would make the idea clear.</p>
<p><img alt="vim navigation" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/vim_navigation_hjkl.png" width="450"></p>
<p>If you are thinking to stick with the arrow keys instead of <code>h j k l</code>, it is fine. There are a lot of people who use vim this way. But trust me when I say using <code>h j k l</code> speeds up you work flow a lot. Once you get used to this you wouldn't want to use arrow keys anymore. But anyway we will discuss more about the this in the next article.</p>
<p>Working in Normal mode, you will see how everything you do get's easy in Vim. For a quick sneak peak of some commands.</p>
<table>
<thead>
<tr>
<th>Command</th>
<th>What it does?</th>
</tr>
</thead>
<tbody>
<tr>
<td>dd</td>
<td>Copy (<em>yank</em>) the current line</td>
</tr>
<tr>
<td>p</td>
<td>paste the copied text below the current line</td>
</tr>
<tr>
<td>u</td>
<td>Undo your previous change</td>
</tr>
<tr>
<td>gg</td>
<td>Go to the beginning of the file</td>
</tr>
<tr>
<td>G</td>
<td>Go to the end of the file</td>
</tr>
</tbody>
</table>
<p>From now on you don't have to awkwardly select the full line with your mouse to delete it. All you have to do is press <code>dd</code> and that sucker goes away.</p>
<p>Didn't mean to delete it? No problem. Just hit <code>u</code> and it undoes the delete. No more holding down <code>Ctrl</code> and <code>z</code>. Also<code>u</code> for <code>undo</code> is so simple to remember. What does <code>Z</code> even mean in <code>Ctrl + z</code>? And how did that become synonymous to Undo?</p>
<p>Similarly no more <code>Ctrl + c</code> and <code>Ctrl + v</code> to copy and paste. <code>yy</code> and <code>p</code> got you covered.</p>
<p>So you see, Vim sticks to its philosophy of making you productive. Imagine all the keystrokes you save per day, per year. So, switching to Vim doesn't just improve your productivity, it take care of your health too. With every Key you saved keeps you a key away from getting Carpel Tunnels and RSI. So, Use Vim - Stay healthy. :]</p>
<p>You might be happy with using <code>Ctrl+c</code> to copy and <code>Ctrl+v</code> to paste in your plain old editor. Its absolutely fine but Vim offers a simple and easy alternative and honestly the choice is pretty clear.</p>
<p>Anyways, that is about Normal mode for now. We will discuss more later when required. Let's look at Insert mode.</p>
<h3 id="insert-mode">Insert Mode</h3>
<p>As the name suggests Insert mode is where you will inserting text and that is all you will be doing in here. You enter Insert mode by pressing <code>i</code> in Normal mode . And in almost all Vim distributions you should see a noticeable change in the cursor right away. It would have changed from a block type cursor (<code>█</code>) to an I-beam (|). That's your indication that you're in Insert mode. In another tutorial we will see how you make that even more apparent. Whatever you type in Insert mode would be displayed on the file literally. If you type <code>a b c</code>, it types in those characters in to the file as you would expected. Contrast this from pressing <code>dd</code> in Normal mode which doesn't print them on the file but instead does something to the file (In this case, a delete operation).</p>
<p>Unlike other editors you wont be spending much time here and Infact I'd advise you to get out of Insert mode once you are done typing. To go out of Insert mode you just have to press <code>Esc</code> and you will be back in Normal Mode.</p>
<p>Now, Let's get to the fun part of insert mode. Remember before when I said you go from Normal mode to Insert mode by pressing <code>i</code>, well, It turns out it is just one of the ways to get in to Insert mode. There are five more ways in which you can enter Insert mode and you can choose the best one based on what you need. Sounds complicated? Let's list them down first.</p>
<table>
<thead>
<tr>
<th>Command</th>
<th>What it does?</th>
</tr>
</thead>
<tbody>
<tr>
<td>i</td>
<td>Enters Insert mode with the cursor placed <em>before</em> the current character</td>
</tr>
<tr>
<td>a</td>
<td>Enters Insert mode with the cursor placed <em>after</em> the current character (Remember a - after)</td>
</tr>
<tr>
<td>o</td>
<td>Enters Insert mode by opening a new line below the current line (Remember o - open)</td>
</tr>
<tr>
<td>I</td>
<td>Enters Insert mode by placing the cursor at the beginning of the line (Remember big I - bigger version of i)</td>
</tr>
<tr>
<td>A</td>
<td>Enters Insert mode by placing the cursor at the end of the line (Remember big A - bigger version of a)</td>
</tr>
<tr>
<td>O</td>
<td>Enters Insert mode by opening a new line above the current line (Remember O - bigger version of o)</td>
</tr>
</tbody>
</table>
<p>As explained each one has a specific purpose.</p>
<p>If you want to quickly create a new line above the current line and start typing - You press <code>O</code></p>
<p>If you want to insert a new line below the current line - You press <code>o</code></p>
<p>To add something quickly at the end of a line - You press <code>A</code></p>
<p>To add something at the beginning of a line - You press <code>I</code></p>
<p>Could that be any more simpler? Surprisingly none of the other popular text editors do this. I can promise you that you won't be able to move so quickly in any other editor. This is Vim's power.</p>
<p>Let's look at another easy mode that will help you visualize things better, Enter Visual Mode.</p>
<h3 id="visual-mode">Visual Mode</h3>
<p>If you have carefully looked at things till now you might have started to feel that Vim favours Keyboard commands using a mouse. If you thought so, you would be absolutely correct. So, in the spirit of <em>No Mouse</em>, Visual mode tries to emulate Visual selections of your text similar to the way a mouse selects on Screen but instead with completely with the keyboard.</p>
<p>To enter in to Visual mode, just press <code>v</code> and move your cursor with either <code>h j k l</code> or the arrow keys and you will see that the text is getting highlighted indicating that it has been selected. Now, what can you do on this selected text? You can press <code>d</code> and delete it completely or you can press <code>y</code> and copy it. Notice that these are <code>d</code> and <code>y</code> and not <code>dd</code> and <code>yy</code> like in Normal mode. With <code>v</code>, Visual selection happens character by character. But if you want to select the full line, press <code>V</code> instead and you have the whole line highlighted and you can delete, copy or run any other command on the highlighted text.</p>
<p>This is how it looks like when you've something selected in Visual mode.</p>
<p><img alt="Vim visual mode selection" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/visual_mode_selection.png" width="450"></p>
<p>And to exit out of Visual mode or to cancel the selection, just press <code>ESC</code>.</p>
<table>
<thead>
<tr>
<th>Command</th>
<th>What it does?</th>
</tr>
</thead>
<tbody>
<tr>
<td>v</td>
<td>Visual selection by character</td>
</tr>
<tr>
<td>V</td>
<td>Visual selection by line</td>
</tr>
</tbody>
</table>
<p>We are finally down to the last mode, which is the Command mode <em>(You don't take Command, Son)</em></p>
<h3 id="command-mode">Command Mode</h3>
<p>Vim command mode is very powerful and one of the reasons why Vim is so versatile. Command mode is where you type Vim's commands, Vim configurations, Plugin settings, Open new files, close existing files and also access Vim's builtin help documentation. You enter to Command mode by typing <code>:</code> and then you type in the command you want. After you press <code>:</code> you will see the cursor at the bottom of the screen (called the <em>last line</em> appropriately) and you type the command.</p>
<p><img alt="vim command mode example" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/command_mode_vim.png" width="450"></p>
<p>To open a file, you type in <code>:e file_name</code> (<code>:e</code> is short for <code>:edit</code>)and hit <code>Enter</code>. If the file exists Vim will open it for you and if doesn't exist Vim will open blank file for you and the file will be created when you save it.</p>
<p>To save or rather to <em>write</em> the file to disk, you do <code>:w</code> and hit <code>Enter</code> for it to be saved. If the file doesn't yet have a name, You type <code>:w file_name</code> and it will save the contents of the window with that file name.</p>
<p>And Now for the most important question in all of Vim's History and the given title of this article, <strong>How to exit Vim</strong>! If you are using a Graphical version of Vim, then closing Vim is the same as closing the window and <em>poof</em>, its gone. But If you're using a terminal (works in gvim and macvim too) then you quit Vim by typing <code>:q</code> (short for <code>:quit</code>) and that closes the current window. If you have unsaved changes in your buffer Vim will give an error saying <code>No Write Since Last Change</code>. If you don't mind discarding unsaved changes, you append a <code>!</code> and so the command becomes <code>:q!</code>. That is all there is about how you exit Vim. The bang(<code>!</code>) at the end is similar to <code>-f</code> or <code>--force</code> option in a lot of linux commands. It forces Vim to quit even when there are unsaved changes.</p>
<p>From now on if you ever saw a meme like this, you know what they are talking about.</p>
<p><img alt="vim how to exit meme" class="aligncenter" src="https://comic.browserling.com/vim.png" width="450"></p>
<p>Another important Vim command is <code>:help</code>. It contains the full help manual for Vim and so should be one of your most used commands in the initial days of learning.</p>
<p>And similar to other modes, you exit command mode by pressing the <code>Esc</code> key and you will be back in the Normal mode.</p>
<p>This image illustrates how you switch from one mode to another</p>
<p><img alt="Vim_modes" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/vim_modes_small.png" width="450"></p>
<p>Okay. That's a lot of information for one article. Let's do a quick review.</p>
<h3 id="story-recap">Story Recap</h3>
<p>There are four modes of operations in Vim.</p>
<ul>
<li><a href="https://freblogg.com/your-first-lesson-in-vim-2#normal-mode">Normal Mode</a> : moving around the document, deleting, copying, formatting are some of the common things you do in this mode</li>
<li><a href="#insert-mode">Insert Mode</a>: Inserts text in to the document. Go into Insert mode by typing any one of <code>a A i I o O</code> in Normal mode. Come out with <code>Esc</code></li>
<li><a href="#visual-mode">Visual Mode</a> : For visually selecting the text. Enter with <code>v</code> or <code>V</code> and exit with <code>Esc</code></li>
<li><a href="#command-mode">Command Mode</a> : To execute commands. <code>:w</code> to save, <code>:help</code> for documentation and <code>:q</code> to quit.</li>
</ul>
<p>Well, That is all for this article folks. Will see you again in the next one. Until then, Keep practicing and Happy Vimming!</p>
<p><a class="navlinkleft" href="https://freblogg.com/your-first-lesson-in-vim-1">← Prev</a> <a class="navlinkright" href="https://freblogg.com/your-first-lesson-in-vim-3">Next →</a></p>
<h4 id="attributions">Attributions:</h4>
<p>Vim Logo - Vim Replacement Icon http://wolfrosch.com/works/goodies/vim (CC BY-NC-ND 3.0)</p>
<p>Vim Comic image - https://comic.browserling.com/vim.png</p>Introduction & Installation | Your First Lesson In Vim2016-09-24T21:20:00+05:302016-09-24T21:20:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-09-24:/your-first-lesson-in-vim-1<p>This is the first article in a series of articles titled, "<a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a>". These articles are written with a goal of helping out new Vim users by teaching the awesomeness of the Vim editor and there by extending the Vim community. Vim though quite powerful, has a …</p><p>This is the first article in a series of articles titled, "<a href="https://freblogg.com/tags/vimfirstlesson">Your First Lesson In Vim</a>". These articles are written with a goal of helping out new Vim users by teaching the awesomeness of the Vim editor and there by extending the Vim community. Vim though quite powerful, has a bad rep for being hard to learn and hard to get started with. So, even when someone is interested in learning about Vim, that infamous learning curve seem to be scaring them off. This series is going to put an end to all of that.</p>
<p><img alt="Vim Logo" class="aligncenter" src="http://wolfrosch.com/_img/works/goodies/icon/vim@2x" width="150"></p>
<p><strong>Warning</strong> : After going through all the articles in this series you will love Vim so much that you would like to have Vim style keyboard bindings everywhere, in your browser, in your mail client, in your shell and every other place which has a text input, which might not always be possible. Proceed further at your own risk. YOU HAVE BEEN WARNED!</p>
<p>Vim is one of the best text editors available out there in the market. In fact it is one of the two best editors, the other being Emacs (This would be the last you'll see its name. From here on, it will be referred to as, <em>The Editor which shall not be named</em> ). Now you might be wondering, what about Sublime Text ? or Atom ? or some other flashy editor that's getting attention. My answer to that is very simple - East or West, Vim is the best. Don't get me wrong, editors like Sublime, Atom are good and I was a fan of Sublime myself. But to be called the best, a Text Editor needs to customizable, extensible and most importantly should have a huge community of users helping out each other. None of these editors can beat Vim in those areas. Apart from that Vim is really fast and robust. It can open huge files that makes other editors crash. It has builtin syntax support for hundred's of file types. It has a huge plugin base that both extend vim's functionality and add more functionality to do pretty much any thing you want. And that's just a few reasons why its the best.</p>
<p>Since you are reading this article, I assume that you're interested in learning about what Vim is and about what Vim does. So, Let's start with some brief history of how the Vim editor came to be.</p>
<ul>
<li>In 1970's, Bill Joy developed <strong>ex</strong> editor for Unix which later came to be known as the <em>Vi</em> editor for having a <em>Vi</em>sual interface for editing.</li>
<li>1987 - <strong>Stevie</strong> was developed as a clone of Vi for Atari ST systems. Stevie stands for 'ST Editor for Vi Enthusiasts'. The name might be a mouthful but the editor itself is quite popular.</li>
<li>1988 - Vim (Vi IMitation) was created by Bram Moolenaar (Remember the name ..) as a port of Stevie for AmigaOS. Though started as an imitation, Vim quickly started to add several new features with support for multiple operating systems.</li>
<li>1993 - Vim 2.0 released with name changed to 'Vi IMproved' because, by then Vim had a lot more features than original Vi.
.
. Fast forwarding history
.</li>
<li>2006 - Vim 7.0 released with support for tabs, code completion, undo branching and a lot more</li>
<li>2016 - Vim 8.0 released with a lot of exciting features like Asynchronous I/O, channels, Jobs, Timers, Packages and a lot more</li>
</ul>
<p>(Shout out to <a href="https://buildingvts.com/a-brief-history-of-vim-1476ec4a6eb8#.dyvv00erx">buildingvts.com</a> for putting this history together)</p>
<p>So, as you can see from our brief Time travel, Vim has been around for almost 30 years. Now you might be asking yourself, why the heck is this editor still used today after almost 30 years. That's a good question and one that needs to answered right now.</p>
<p><img alt="Floppy disks" class="aligncenter" src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Floppy_disk_2009_G1.jpg/800px-Floppy_disk_2009_G1.jpg" width="450"></p>
<p>Technology sure changes a lot and old things usually tend to get lost with all the new things that keep coming. But in the case of Vim or <em>The Editor which shall not be named</em>, that is simply not the case. They fall in to the category of "Old is Gold". These editors are written during the days when floppy disks and magnetic tapes were all the rage and hence are written to be memory efficient. Though Vim has changed a lot over the years to add countless new features, the fundamental idea of being light weight and memory efficient is still one of its big selling points. That is the reason why Vim managed to stay relevant through three decades and that is also the reason why it will continue to be relevant for more decades to come.</p>
<p>So, If that answer convinced you to stay the course and explore the exciting and enticing world of Vim, then Welcome aboard! Make sure to remember that this is the day you have decided to take your text editing to the next level by learning Vim.</p>
<p>Now that we know the history of Vim, its time to install Vim on your Computers. If you are rocking a Linux Operating system, chances are you already have a version of Vim pre-installed. So, check if it exists by typing <code>vi</code> or <code>vim</code> in the command line. If it is available, you should see a screen that looks something like this.</p>
<p><img alt="Vim start screen image here" class="aligncenter" src="https://raw.githubusercontent.com/durgaswaroop/Your_First_Lesson_In_Vim/master/pictures/vim_start_screen.png" width="450"></p>
<p>If you see this then Vim is already installed.</p>
<p>If you don't have it installed, don't worry. Vim is a freeware (correction: Charity ware) and so you can download it for free from Vim's official site <a href="http://www.vim.org/">Vim.org</a>. Vim is available for pretty much every major Operating system out there. I heard that there is a version of Vim available even for Toasters. I have no idea who might use that, but its there if you need it. And this is another reason why people like vim so much.No matter the OS, they can be sure that their favorite editor is available. So, Just download vim for your operating system and install it.</p>
<p>And by the way, did I mention that Vim is primarily a terminal based program? It was initially designed to be run in terminals to access files on remote systems. A lot of people to this day, prefer the terminal version of Vim. But to those of you who like to have a Graphical User Interface (GUI) you've that available as well.</p>
<p>For windows users, it can be downloaded from the vim.org site. Look for Gvim (stands for Graphical Vim) For mac users, you can download Mac Vim which provides a good GUI experience. For Linux users, there are Gvim versions available for most of the distros. So, download the one suitable for your distribution.</p>
<p>If you have successfully installed Vim on your systems open Vim either in Terminal or the GUI and you should see a welcome screen similar to the picture above. If you got that, then Congratulations, you have the power of Vim with you now.</p>
<blockquote>
<p>Don't forget what Uncle Ben said, "With great power comes, great responsibility". So, your responsibility as a Vim user is to spread the vim awesomeness with your co-workers and friends. It would be even better if you can share this article with them but that is entirely up to you. (Jedi mind tricks working implicitly)</p>
</blockquote>
<p>And before we finish this article I will give you a sneak peak at the power of Vim and what you can do with it. Watch Damian Conway's Video on Vim : <a href="https://www.youtube.com/watch?v=aHm36-na4-4">More Instantly better Vim</a>. Conway is one of the Vim geniuses whom I admire.. This video gives you a small window in to the world of Vim and what Vim can do in the hands of a seasoned user. You might not be able to understand how Conway is doing his magic but that is entirely fine. You obviously won't be able to understand Linux Kernel module code when you're just starting to write <em>Hello World</em> programs. This video is just to show you how the masters use Vim and you will be able to do that too once you've mastered it.</p>
<p>Well, That is all for this article folks. Will see you again in the next one. Until then, Keep practicing and Happy Vimming!</p>
<p><a href="https://freblogg.com/your-first-lesson-in-vim-2">Next →</a></p>
<h4 id="attributions">Attributions:</h4>
<p>Vim Logo - Vim Replacement Icon http://wolfrosch.com/works/goodies/vim (CC BY-NC-ND 3.0)</p>
<p>Floppy Disks - https://goo.gl/0Ns2Dj</p>Using Tab windows in Vim2016-07-18T05:30:00+05:302016-07-18T05:30:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-07-18:/tab-pages-in-vim<p><a href="https://pbs.twimg.com/profile_images/64545277/vim_logo_400x400.png"><img alt="vim-logo" height="200" src="https://pbs.twimg.com/profile_images/64545277/vim_logo_400x400.png" title="vim-logo-freblogg" width="200"></a>Using Tabs (vim calls them tab pages) is one of the sure ways to increase your productivity. Vim Tabs are just like the tabs in your browsers. Each tab can have multiple splits (referred as windows in Vim's documentation). So, you can have multiple splits open in one tab and …</p><p><a href="https://pbs.twimg.com/profile_images/64545277/vim_logo_400x400.png"><img alt="vim-logo" height="200" src="https://pbs.twimg.com/profile_images/64545277/vim_logo_400x400.png" title="vim-logo-freblogg" width="200"></a>Using Tabs (vim calls them tab pages) is one of the sure ways to increase your productivity. Vim Tabs are just like the tabs in your browsers. Each tab can have multiple splits (referred as windows in Vim's documentation). So, you can have multiple splits open in one tab and then you can have multiple tabs. </p>
<p>Tabs are a really handy way of grouping things together. So, I usually have multiple tabs open in any session. I have a main editor tab where i will have multiple splits open for the code I am looking at and since I work with a lot of data files, I will have one tab dedicated for the data-sets that I will be using for my program. And, then if required, I will have another tab open for any notes, info that I have previously noted down. </p>
<p>Here are some commands and tips for working with tabs<br>
To create a new tab - <strong><em>:tabnew</em></strong><br>
To go to the next tab - <strong><em>:tabnext</em></strong><br>
To go to the previous tab - <strong><em>:tabprevious</em></strong><br>
I don't like to type all of these commands everytime and so I have added these mappings in my vimrc to make switching between tabs much easier. </p>
<div class="highlight"><pre><span></span><code>nnoremap <C-Tab> :tabnext<CR>
nnoremap <C-S-Tab> :tabprevious<CR>
</code></pre></div>
<p>With these I can move around the tabs just like I do with the tabs in my browser.<br>
Another important thing that you might want to do with tabs is to be able to move them. I am really particular about how my tabs should be ordered and so I have added these mappings to move them around. </p>
<div class="highlight"><pre><span></span><code>nnoremap <silent> <A-Left> :execute 'silent! tabmove ' . (tabpagenr()-2)<CR>
nnoremap <silent> <A-Right> :execute 'silent! tabmove ' . tabpagenr()<CR>
</code></pre></div>
<p>With these you can hit <em>Alt + Left</em> arrow to move it to left and vice versa.<br>
Try <code>:help tabpage</code> in your Vim help for more info.<br>
A lot of Vim users see tabs as an alternative to buffers and there are a lot of articles, discussions about Buffers Vs. Tabs. But,for me tabs and buffers are not exclusive. I often have multiple tabs opened and will still use buffers when I need them.<br>
So, that is all for this article. Come back again for the next article.<br>
Until then, Happy Vimming. </p>
<hr>
<p>Image Credits : https://pbs.twimg.com/profile_images/64545277/vim_logo_400x400.png </p>
<p>PS: To see all the vim tutorials of FreBlogg , see : <a href="https://freblogg.com/tags/vim">Freblogg/Vim</a> </p>Word Count application with Apache Spark and Java2016-06-23T05:30:00+05:302016-06-23T05:30:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-23:/spark-word-count-with-java<p>Apache Spark is becoming ubiquitous by day and has been dubbed the next big thing in the Big Data world. Spark has been replacing MapReduce with its speed and scalability. In this series of articles on Spark we will try to solve various problems using <a href="https://freblogg.com/tags/spark">Spark</a> and <a href="https://freblogg.com/tags/java">Java</a>. </p>
<p><img alt="spark-java-freblogg" src="http://www.datanami.com/wp-content/uploads/2014/12/spark-and-java-8.png"></p>
<p>Word count …</p><p>Apache Spark is becoming ubiquitous by day and has been dubbed the next big thing in the Big Data world. Spark has been replacing MapReduce with its speed and scalability. In this series of articles on Spark we will try to solve various problems using <a href="https://freblogg.com/tags/spark">Spark</a> and <a href="https://freblogg.com/tags/java">Java</a>. </p>
<p><img alt="spark-java-freblogg" src="http://www.datanami.com/wp-content/uploads/2014/12/spark-and-java-8.png"></p>
<p>Word count program is the big data equivalent of the classic <em>Hello world</em> program. The aim of this program is to scan a text file and display the number of times a word has occurred in that particular file. And for this word count application we will be using Apache spark 1.6 with Java 8. </p>
<p>For this program, we will be running spark in a stand alone mode. So you don't need to setup a cluster. Even Hadoop is not required for this exercise. Assuming you have Spark, Java and Maven installed properly, let's proceed.</p>
<h3 id="creating-pomxml">Creating pom.xml</h3>
<p>To compile Java programs with Maven, you will need a pom.xml file with the required dependencies. Use this pom.xml file if you don't have one available with you. </p>
<div class="highlight"><pre><span></span><code><span class="cp"><?xml version="1.0" encoding="UTF-8"?></span>
<span class="nt"><project></span>
<span class="nt"><groupId></span>com.freblogg.sparklearning<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>freblogg-spark-tutorial<span class="nt"></artifactId></span>
<span class="nt"><modelVersion></span>4.0.0<span class="nt"></modelVersion></span>
<span class="nt"><name></span>example<span class="nt"></name></span>
<span class="nt"><packaging></span>jar<span class="nt"></packaging></span>
<span class="nt"><version></span>0.0.1<span class="nt"></version></span>
<span class="nt"><dependencies></span>
<span class="nt"><dependency></span>
<span class="c"><!-- Spark dependency --></span>
<span class="nt"><groupId></span>org.apache.spark<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>spark-core_2.10<span class="nt"></artifactId></span>
<span class="nt"><version></span>1.6.1<span class="nt"></version></span>
<span class="nt"><scope></span>provided<span class="nt"></scope></span>
<span class="nt"></dependency></span>
<span class="nt"></dependencies></span>
<span class="nt"><properties></span>
<span class="nt"><java.version></span>1.8<span class="nt"></java.version></span>
<span class="nt"><encoding></span>UTF-8<span class="nt"></encoding></span>
<span class="nt"><spark.version></span>1.6.1<span class="nt"></spark.version></span>
<span class="nt"></properties></span>
<span class="nt"><build></span>
<span class="nt"><pluginManagement></span>
<span class="nt"><plugins></span>
<span class="nt"><plugin></span>
<span class="nt"><groupId></span>org.apache.maven.plugins<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>maven-compiler-plugin<span class="nt"></artifactId></span>
<span class="nt"><version></span>3.3<span class="nt"></version></span>
<span class="nt"><configuration></span>
<span class="nt"><source></span>${java.version}<span class="nt"></source></span>
<span class="nt"><target></span>${java.version}<span class="nt"></target></span>
<span class="nt"></configuration></span>
<span class="nt"></plugin></span>
<span class="nt"><plugin></span>
<span class="nt"><groupId></span>org.apache.maven.plugins<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>maven-plugin-plugin<span class="nt"></artifactId></span>
<span class="nt"><version></span>3.3<span class="nt"></version></span>
<span class="nt"></plugin></span>
<span class="nt"></plugins></span>
<span class="nt"></pluginManagement></span>
<span class="nt"></build></span>
<span class="nt"></project></span>
</code></pre></div>
<p>Now, save this file as pom.xml and put it in the same folder as your <strong>src</strong> directory. </p>
<h3 id="input-file">Input File</h3>
<p>After creating the POM file, you will need an input file on which we will run our Wordcount program, to count the number of occurrences of each word. This is the file I will be using. </p>
<blockquote>
<div class="highlight"><pre><span></span><code>It is close to midnight and something evil is lurking in the dark
Under the moonlight you see a sight that almost stops your heart
You try to scream but terror takes the sound before you make it
You start to freeze as horror looks you right between the eyes
You are paralyzed
</code></pre></div>
</blockquote>
<h3 id="java-program">Java Program</h3>
<p>Once we have the pom file ready, we can start with the code.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">org.apache.spark.SparkConf</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="p">;</span>
<span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="p">;</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">WordCount</span> <span class="p">{</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="p">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="p">)</span> <span class="p">{</span>
<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="p">().</span><span class="na">setMaster</span><span class="p">(</span><span class="s">"local"</span><span class="p">).</span><span class="na">setAppName</span><span class="p">(</span><span class="s">"wordCount"</span><span class="p">);</span>
<span class="n">JavaSparkContext</span> <span class="n">sc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="p">(</span><span class="n">conf</span><span class="p">);</span>
<span class="c1">// Load our input data.</span>
<span class="n">String</span> <span class="n">inputFile</span> <span class="o">=</span> <span class="s">"file:///home/dsp/Desktop/sparkExamples/sample_testing/resources/inputFile"</span><span class="p">;</span>
<span class="n">JavaRDD</span> <span class="o"><</span> <span class="n">String</span> <span class="o">></span> <span class="n">input</span> <span class="o">=</span> <span class="n">sc</span><span class="p">.</span><span class="na">textFile</span><span class="p">(</span><span class="n">inputFile</span><span class="p">);</span>
<span class="c1">// Split in to list of words</span>
<span class="n">JavaRDD</span> <span class="o"><</span> <span class="n">String</span> <span class="o">></span> <span class="n">words</span> <span class="o">=</span> <span class="n">input</span><span class="p">.</span><span class="na">flatMap</span><span class="p">(</span><span class="n">l</span> <span class="o">-></span> <span class="n">Arrays</span><span class="p">.</span><span class="na">asList</span><span class="p">(</span><span class="n">l</span><span class="p">.</span><span class="na">split</span><span class="p">(</span><span class="s">" "</span><span class="p">)));</span>
<span class="c1">// Transform into pairs and count.</span>
<span class="n">JavaPairRDD</span> <span class="o"><</span> <span class="n">String</span><span class="p">,</span> <span class="n">Integer</span> <span class="o">></span> <span class="n">pairs</span> <span class="o">=</span> <span class="n">words</span><span class="p">.</span><span class="na">mapToPair</span><span class="p">(</span><span class="n">w</span> <span class="o">-></span> <span class="k">new</span> <span class="n">Tuple2</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="mi">1</span><span class="p">));</span>
<span class="n">JavaPairRDD</span> <span class="o"><</span> <span class="n">String</span><span class="p">,</span> <span class="n">Integer</span> <span class="o">></span> <span class="n">counts</span> <span class="o">=</span> <span class="n">pairs</span><span class="p">.</span><span class="na">reduceByKey</span><span class="p">((</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="o">-></span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">);</span>
<span class="n">System</span><span class="p">.</span><span class="na">out</span><span class="p">.</span><span class="na">println</span><span class="p">(</span><span class="n">counts</span><span class="p">.</span><span class="na">collect</span><span class="p">());</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h3 id="execution">Execution</h3>
<p>Once we have everything ready, its time to execute our program and see the output.<br>
To compile it, first execute this in the directory with the pom file. </p>
<div class="highlight"><pre><span></span><code> mvn clean && mvn compile && mvn package
</code></pre></div>
<p>This will take sometime to run the first time because maven will have to download and install the dependencies. After successful compilation, It creates the <em>target</em> folder and a jar file named freblogg-spark-tutorial-0.0.1.jar. </p>
<p>Then to execute the program you need to run the spark-submit script in your SPARK_HOME folder. </p>
<div class="highlight"><pre><span></span><code> $SPARK_HOME/bin/spark-submit --class "WordCount" target/freblogg-spark-tutorial-0.0.1.jar
</code></pre></div>
<p>Once this command is executed your screen will be completely filled with spark logs. If you scroll a bit to the top, you will see the following output, which is the output we are interested in. </p>
<blockquote>
<p><code>{.prettyprint}
[(freeze,1), (are,1), (Under,1), (it,1), (is,2), (you,3), (takes,1), (lurking,1), (right,1), (that,1), (a,1), (You,3), (terror,1), (start,1), (dark,1), (between,1), (scream,1), (before,1), (to,3), (as,1), (in,1), (moonlight,1), (sound,1), (midnight,1), (see,1), (stops,1), (sight,1), (try,1), (something,1), (paralyzed,1), (evil,1), (It,1), (eyes,1), (make,1), (almost,1), (but,1), (and,1), (close,1), (heart,1), (looks,1), (your,1), (horror,1), (the,4)]</code></p>
</blockquote>
<p>That is the counts of each word in the file. So, there you go. You have successfully written your first Spark application. Congratulations. You're officially a Spark programmer now! </p>
<h3 id="understanding-the-code">Understanding the code</h3>
<p>Now that we have our application set up, let's see what the program is doing, step by step.</p>
<p>First we have the spark variables sc and conf. Don't worry too much about them right now. All you need to know is that every Spark program needs those two lines.</p>
<div class="highlight"><pre><span></span><code> SparkConf conf = new SparkConf().setMaster("local") .setAppName("wordCount");
JavaSparkContext sc = new JavaSparkContext(conf);
</code></pre></div>
<p>So, just copy paste the lines in every application you are going to work on.</p>
<p>Next we are reading the input file using RDD's. RDD's are essentially blob's of text that you read from various sources and you can transform them in to whatever you want using various operations. Here we are reading the input file from our local file system. If you want to read from HDFS, then replace the <strong>file:///</strong> with <strong>hdfs:///</strong></p>
<div class="highlight"><pre><span></span><code> <span class="nt">String</span> <span class="nt">inputFile</span> <span class="o">=</span> <span class="s2">"file:///home/dsp/Desktop/sparkExamples/sample_testing/resources/inputFile"</span><span class="o">;</span>
<span class="nt">JavaRDD</span><span class="o"><</span><span class="nt">String</span><span class="o">></span> <span class="nt">input</span> <span class="o">=</span> <span class="nt">sc</span><span class="p">.</span><span class="nc">textFile</span><span class="o">(</span><span class="nt">inputFile</span><span class="o">);</span>
</code></pre></div>
<p>Then we have our first transformation operation on the input RDD we have created in the above step.</p>
<p>Flat Map is an inbuilt function that takes one input and can provide any number of outputs depending on the operations used inside it.</p>
<div class="highlight"><pre><span></span><code> <span class="n">JavaRDD</span> <span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">words</span> <span class="o">=</span> <span class="n">input</span><span class="p">.</span><span class="n">flatMap</span><span class="p">(</span><span class="n">l</span> <span class="o">-></span> <span class="n">Arrays</span><span class="p">.</span><span class="n">asList</span><span class="p">(</span><span class="n">l</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="s">" "</span><span class="p">)));</span>
</code></pre></div>
<p>Here we are splitting the sentence on white space characters. So, the flatmap function here returns a list of all the words in the input document and that will be stored in the RDD named words. For more about Flatmap, read this : <a href="https://freblogg.com/apache-spark-map-vs-flatmap">Spark FlatMap and Map</a></p>
<p>Next, we have another transformation <em>mapToPair</em> that returns a Tuple of word and the number 1.</p>
<p>And, a Tuple is very similar to ordered pairs in Cartesian coordinate system. Tuple2 looks like (x,y), where x is the Key. Similarly Tuple3 will be (x,y,z) and so on.</p>
<div class="highlight"><pre><span></span><code> <span class="n">JavaPairRDD</span><span class="o"><</span><span class="n">String</span><span class="p">,</span> <span class="n">Integer</span><span class="o">></span> <span class="n">pairs</span> <span class="o">=</span> <span class="n">words</span><span class="p">.</span><span class="n">mapToPair</span><span class="p">(</span><span class="n">w</span> <span class="o">-></span> <span class="n">new</span> <span class="n">Tuple2</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="mi">1</span><span class="p">));</span>
</code></pre></div>
<p>As an example, the word <strong><em>you </em></strong>in the input will be mapped to <strong><em>(you,1)</em></strong> by <code>mapToPair</code> function. And, since the result is a pair, we have to store it in a <code>JavaPairRDD</code> which supports pairs.</p>
<p>And, then we are doing the final transformation on the pairs that will add up individual counts of each word. </p>
<div class="highlight"><pre><span></span><code><span class="n">JavaPairRDD</span> <span class="o"><</span><span class="n">String</span><span class="p">,</span> <span class="n">Integer</span><span class="o">></span> <span class="n">counts</span> <span class="o">=</span> <span class="n">pairs</span><span class="p">.</span><span class="n">reduceByKey</span><span class="p">((</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="o">-></span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">);</span>
</code></pre></div>
<p><code>ReduceByKey</code> method groups all the Tuple pairs with the same key. We have the word 'you' repeated thrice and so we have (you,1) three times. Now, <strong><em>(you,1)</em></strong> , <strong><em>(you,1)</em></strong>, <strong><em>(you,1)</em></strong> will become <em>(you,3) * because of</em> * the sum we are doing inside the function<em>.</em> And similarly for the other words. </p>
<p>Then finally we are performing an action on the RDD which is where the actual computation of all the above steps takes place. <em>collect()</em> will return all the elements in the RDD and we are printing that using <code>println</code>, giving us the output we want. </p>
<p>So there you go, Your first Spark application completed. To learn more go through the documentation and examples given on the Spark's webpage and subscribe to Freblogg for more tutorials. </p>
<p>Happy Sparking! </p>
<p>Image : http://www.datanami.com/wp-content/uploads/2014/12/spark-and-java-8.png </p>
<hr>
<p>Self Promotion: </p>
<p>If you have liked this article and would like to see more, subscribe to our Facebook and G+ pages. </p>
<p>Facebook page @ <a href="https://www.facebook.com/freblogg">Facebook.com/freblogg</a> </p>
<p>Google Plus Page @ <a href="https://plus.google.com/102904658212987164302">Google.com/freblogg</a></p>Apache Spark | Map and FlatMap2016-06-19T03:12:00+05:302016-06-19T03:12:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-19:/apache-spark-map-vs-flatmap<p>Map and FlatMap functions transform one collection in to another just like the map and flatmap functions in several other functional languages. In the context of Apache Spark, they transform one RDD in to another RDD.</p>
<p><img alt="Apache Spark Logo" height="212" src="https://spark.apache.org/images/spark-logo-trademark.png" width="400"></p>
<p>Here is how they differ from each other. </p>
<h2 id="map">Map</h2>
<p>Map converts an RDD of …</p><p>Map and FlatMap functions transform one collection in to another just like the map and flatmap functions in several other functional languages. In the context of Apache Spark, they transform one RDD in to another RDD.</p>
<p><img alt="Apache Spark Logo" height="212" src="https://spark.apache.org/images/spark-logo-trademark.png" width="400"></p>
<p>Here is how they differ from each other. </p>
<h2 id="map">Map</h2>
<p>Map converts an RDD of size ’n’ in to another RDD of size ‘n’. The input and output size of the RDD's will be the same. Or to put it in another way, [one element in input gets mapped to only one element in the output.</p>
<p><img alt="Map venn diagram" height="125" src="https://qphs.fs.quoracdn.net/main-qimg-0cb8323fcae6acb7b6206bd000e0fd14.webp" width="320"></p>
<p>So, for example let’s say I have an array [1,2,3,4] and I want to increment each element by 10. The input size and output size are same, so we can use map for this transformation.</p>
<p>Required :</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]</span> <span class="o">-></span> <span class="p">[</span><span class="mi">11</span><span class="p">,</span><span class="mi">12</span><span class="p">,</span><span class="mi">13</span><span class="p">,</span><span class="mi">14</span><span class="p">]</span>
</code></pre></div>
<p>Spark code :</p>
<div class="highlight"><pre><span></span><code><span class="n">myRdd</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="o">+</span><span class="mi">10</span><span class="p">)</span>
</code></pre></div>
<p>So, that is what <em>map</em> function does. While using map, you can be sure that the size of input and output will remain the same and so even if you put a hundred map functions in series, the output and the input will have the same number of elements.</p>
<h3 id="flatmap">FlatMap</h3>
<p>Coming to FlatMap, it does a similar job. Transforming one collection to another. Or in spark terms, one RDD to another RDD. But there is no condition that output size has to be equal to the input size. Or to put it in another way, [one element in input can map to zero or more elements in the output.</p>
<p><img alt="Flatmap venn diagram" height="174" src="https://qphs.fs.quoracdn.net/main-qimg-90a70c401636ad8d53ad0821a1deec95.webp" width="320"></p>
<p>Also, the output of flatMap is flattened . Though the function in flatMap returns a list of element(s) for each individual element of the input, the output of FlatMap will be an RDD which has all the elements flattened to a single list.</p>
<p>Let’s see this with an example.</p>
<p>Say you have a text file as follows</p>
<div class="highlight"><pre><span></span><code>Hello World
Who are you
</code></pre></div>
<p>Now, if you run a flatMap on the textFile rdd,</p>
<div class="highlight"><pre><span></span><code><span class="n">words</span> <span class="o">=</span> <span class="n">linesRDD</span><span class="p">.</span><span class="n">flatMap</span><span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">List</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="err">“</span> <span class="err">“</span><span class="p">)))</span>
</code></pre></div>
<p>And, the value in the words RDD would be,</p>
<div class="highlight"><pre><span></span><code>[“Hello”, “World”, “Who”, “are”, “you”]
</code></pre></div>
<p>so, the transformation process looks like this,</p>
<div class="highlight"><pre><span></span><code> <span class="n">linesRDD</span> <span class="o">-></span> <span class="p">[</span> <span class="p">[</span><span class="err">“</span><span class="n">Hello</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">World</span><span class="err">”</span><span class="p">],[</span><span class="err">“</span><span class="n">Who</span><span class="err">”</span><span class="p">,</span><span class="err">”</span><span class="n">are</span><span class="err">”</span><span class="p">,</span><span class="err">”</span><span class="n">you</span><span class="err">”</span><span class="p">]</span> <span class="p">]</span>
<span class="o">-></span> <span class="p">[</span><span class="err">“</span><span class="n">Hello</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">World</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">Who</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">are</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">you</span><span class="err">”</span><span class="p">]</span>
</code></pre></div>
<p>So, those are the differences between Map and FlatMap of Apache Spark.<br>
Keep Practicing and Keep Learning!</p>
<hr>
<p>If you have liked this article and would like to see more, subscribe to our Facebook and G+ pages.<br>
Facebook page @ <a href="https://www.facebook.com/freblogg">Facebook.com/freblogg</a></p>
<p>Google Plus Page @ <a href="https://plus.google.com/102904658212987164302">Google.com/freblogg</a></p>
<p>Image Credits : http://spark.apache.org/images/spark-logo-trademark.png</p>Quick Vim Tips2016-06-19T02:59:00+05:302016-06-19T02:59:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-19:/vim-tips-1<p><img alt="freblogg-vim-image" src="https://s3.amazonaws.com/hackdesign/tools/app_images/000/000/051/icon_small/vim-logo-en.png?1391303578" title="vim-freblogg"></p>
<p>Vim is one of the most powerful text editors available. And, hence it is not really possible for everyone to know everything or get the same ideas on improving their work experience. And, so this article includes a few tips and handy shortcuts that will help your productivity just as …</p><p><img alt="freblogg-vim-image" src="https://s3.amazonaws.com/hackdesign/tools/app_images/000/000/051/icon_small/vim-logo-en.png?1391303578" title="vim-freblogg"></p>
<p>Vim is one of the most powerful text editors available. And, hence it is not really possible for everyone to know everything or get the same ideas on improving their work experience. And, so this article includes a few tips and handy shortcuts that will help your productivity just as we have been doing with various <a href="{tagname}vim">Vim</a> articles, but individually not extensive enough to get their own dedicated article. </p>
<p>So, here are some useful tips for Vim: </p>
<h2 id="change-your-directory-locally">Change your directory locally</h2>
<p>Usually to change the current directory in Vim, you would do, </p>
<div class="highlight"><pre><span></span><code>:cd <path/directory>
</code></pre></div>
<p>But that would change it in all open buffers. So, to change it locally either just in a window or rather just in a <a href="https://freblogg.com/vim-windows">split window</a> , you can use <strong>lcd</strong>.</p>
<div class="highlight"><pre><span></span><code>:lcd $HOME "change the directory locally to $HOME
</code></pre></div>
<h2 id="open-a-link-in-a-browser-from-vim">Open a link in a browser from vim</h2>
<p>Now, this is something that I have found out recently and has been very useful ever since.<br>
Especially if you are some one who does a lot of documentation in Vim or some one who works extensively with HTML, this would be invaluable to you. Just put the cursor on the URL and press <strong>gx</strong></p>
<div class="highlight"><pre><span></span><code>gx "opens a link in your default browser
</code></pre></div>
<h3 id="edit-and-source-your-vimrc-fast">Edit and source your Vimrc fast</h3>
<p>If you are using vim as your editor, changing, updating vimrc becomes one of the things that you do very frequently. It can be adding new mappings, deleting the old ones or adding new plugins etc. What ever that is having these mappings will help you do that much faster.</p>
<p>Opens vimrc in a vertical split</p>
<div class="highlight"><pre><span></span><code>:nnoremap <leader>r :vsp $MYVIMRC
</code></pre></div>
<p>Source your vimrc to get the new changes.</p>
<div class="highlight"><pre><span></span><code>:nnoremap <leader>sv :source $MYVIMRC
</code></pre></div>
<h2 id="automatically-delete-trailing-white-spaces">Automatically delete trailing white-spaces</h2>
<p>If you are one of those people (including me) who wouldn't like to have trailing white spaces at the end of lines, then you would absolutely want to have this in your vimrc file. This will take care of all those nasty white-spaces on all lines.</p>
<div class="highlight"><pre><span></span><code><span class="n">autocmd</span><span class="o">!</span><span class="w"> </span><span class="n">BufWritePre</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">call</span><span class="w"> </span><span class="n">DeleteTrailingSpaces</span><span class="p">()</span><span class="w"></span>
<span class="n">function</span><span class="o">!</span><span class="w"> </span><span class="n">DeleteTrailingSpaces</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="n">execute</span><span class="w"> </span><span class="s">"normal! mzA "</span><span class="w"></span>
<span class="w"> </span><span class="s">"Deletes all Trailing spaces"</span><span class="w"></span>
<span class="w"> </span><span class="nf">%s</span><span class="o">/</span><span class="err">\</span><span class="n">s</span><span class="err">\</span><span class="o">+</span><span class="n">$</span><span class="c1">//g</span>
<span class="w"> </span><span class="n">execute</span><span class="w"> </span><span class="s">"normal! `z"</span><span class="w"></span>
<span class="n">endfunction</span><span class="w"></span>
</code></pre></div>
<p>So, that is all for this Quick tips article. More will be coming in the future. So, stay tuned.</p>
<p>Happy Vimming!</p>Multi task in Vim with panes2016-06-19T02:42:00+05:302016-06-19T02:42:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-19:/vim-windows<p>A lot of Programmers use Vim in some way or another but a vast majority of them use only a handful of features. Knowing to use Windows/Splits, tabs, macros and marks can really increase your productivity. Through this and the upcoming articles on <a href="https://freblogg.com/tags/vim">Vim</a> I will try to cover …</p><p>A lot of Programmers use Vim in some way or another but a vast majority of them use only a handful of features. Knowing to use Windows/Splits, tabs, macros and marks can really increase your productivity. Through this and the upcoming articles on <a href="https://freblogg.com/tags/vim">Vim</a> I will try to cover the important things that make Vim so awesome.</p>
<h2 id="splitting-your-screen">Splitting your Screen</h2>
<p>Vim Splits are a very powerful way of keeping your workflow organized. You can use splits (windows or view-ports in Vim vernacular) to get a different view in to the same file or open a different file to see a quick diff . </p>
<p>The advantage of Vim compared to other popular editors is that, they either don't support splitting the screen or have several limitations on how you can split . Vim lets you split the window as much as you want in any number of pieces and also lets you switch between them instantaneously. And, most importantly you have both Horizontal and Vertical splits.<br>
So, if it makes sense, you can create a really complex split layout like this. (though i would probably advise against it )</p>
<p><img alt="vim windows" src="https://freblogg.com/images/vim-splits-extreme.png" width="600"></p>
<p>Or you can just keep it simple with just a couple of splits open. All up to your requirement.</p>
<p>So, Let's see a few things that will get you started with using Split windows in Vim.<br>
To open a file in a Horizontal Split window just type this in, </p>
<div class="highlight"><pre><span></span><code>:sp filename.here <Enter>
</code></pre></div>
<p>To open a file in a Vertical Split window, </p>
<div class="highlight"><pre><span></span><code>:vsp filename.here <Enter>
</code></pre></div>
<p>Now, if you want to open the current file in a new split (horizontal or vertical), just type <code>:sp</code> or <code>:vsp</code> and it will open a new split-view in to the current file.<br>
Here are a few things that will make using splits much easier.<br>
You probably want to put these in your vimrc file.</p>
<div class="highlight"><pre><span></span><code>Easy 'split' navigation
""""""""""""""""""""""""
nnoremap h
nnoremap j
nnoremap k
nnoremap l
</code></pre></div>
<p>This will make switching from one split to another really easy and intuitive. So, <Ctrl + h> moves the cursor to the left split just as 'h' would move it to the left character. Similarly for others.<br>
Another thing i really like to do is resizing the Splits. I have the following in my vimrc for easy resizing.</p>
<div class="highlight"><pre><span></span><code>Easily resize the splits
""""""""""""""""""""""""""""
nnoremap :vertical resize +5
nnoremap :vertical resize -5
nnoremap :res +5
nnoremap :res -5
</code></pre></div>
<p>So, to resize your vertical split to the right, press <code>Ctrl + <Right-Arrow></code> and similarly for others.</p>
<p>Another mapping that I use almost every day is mapping my <code><Right-Arrow></code> to open up a new vertical split. I absolutely love this and use it when ever i want a quick diff of two files or just to open two views of the file in the buffer.<br>
<code>nnoremap <right> :vsplit <CR></code>
So, these are the basic things that you need to know to start using Splits in Vim. For more info on Splits and buffers use the Vim's default help <code>:help windows</code></p>
<p>That is all for this post. Stay tuned for the next one .<br>
Happy Vimming!</p>Optional Character | RegEx : The Right Way2016-06-19T02:24:00+05:302016-06-19T02:24:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-19:/regular-expressions-tutorial-3<p>We have already seen how to use the dot operator in Regular Expressions in the last tutorial (<a href="http://www.freblogg.com/2016/06/regular-expressions-tutorial-2.html">Link</a>). To see all the articles of this Regular Expressions series, click <a href="https://freblogg.com/tags/regex">here</a> .</p>
<p><a href="https://freblogg.com/tags/regex"><img alt="Regular Expressions title card" height="177" src="https://2.bp.blogspot.com/-L0KVjYArli4/V2W0uHY2O7I/AAAAAAAAHHc/SKkDX4KunmkDQN3kMu-CS_Ee7_vx9dScwCK4B/s640/regex.png" width="640"></a></p>
<p>In this lesson, we will see what to do when some characters you want to match are optional, i.e …</p><p>We have already seen how to use the dot operator in Regular Expressions in the last tutorial (<a href="http://www.freblogg.com/2016/06/regular-expressions-tutorial-2.html">Link</a>). To see all the articles of this Regular Expressions series, click <a href="https://freblogg.com/tags/regex">here</a> .</p>
<p><a href="https://freblogg.com/tags/regex"><img alt="Regular Expressions title card" height="177" src="https://2.bp.blogspot.com/-L0KVjYArli4/V2W0uHY2O7I/AAAAAAAAHHc/SKkDX4KunmkDQN3kMu-CS_Ee7_vx9dScwCK4B/s640/regex.png" width="640"></a></p>
<p>In this lesson, we will see what to do when some characters you want to match are optional, i.e., if they are present, match them and if they are not, don't bother.<br>
So, Imagine a situation where you want to match <strong>foo </strong>and <strong>foobar</strong>. Hence, <code>bar</code> is optional. Then, what do we do to match and identify such words? Let's see ...</p>
<p><img alt="regex foo" src="https://3.bp.blogspot.com/-Mhhdevg3_5k/V3EkImNBmfI/AAAAAAAAHKc/JDZHI80veuIrhJ_UMXmLIKD2jE8qRI3dACK4B/s200/optional1.PNG"></p>
<p>We can start with <code>/foo/</code>. As you see, it matches both <em>foo </em>and *foobar * but also matches any other word that contain foo. So, not exactly what we want.</p>
<p>For situations like these we use '<strong>?' </strong>which tells the RegEx engine that what ever character that precedes the <strong>'?'</strong> is optional and is not required.<br>
Now let's simplify the problem a bit. We want to match only <strong>foo </strong>and <strong>foob </strong>for now. </p>
<p><img alt="regex foob?" src="https://1.bp.blogspot.com/-_XtW7BZIDvo/V3EkSLrBwEI/AAAAAAAAHKk/jqLwduBJFHEnRT9y2VME8JYlHsNsOzqegCK4B/s200/optional2.PNG"></p>
<p>So, we'll do <code>/foob?/</code>.
As you see, it matches both <em>foo </em>and <em>foob </em>completely. So, this is good. Although, it still matches <em>food</em>, ignore that for now.<br>
Extending the same idea forwards, <code>/foob?a?r?/</code> will mean that 'b', 'a', 'r' are optional. And all of these words <em>foo, foob, fooba </em>and <em>foobar. S</em>ince I am looking for only <strong>foo </strong>and <strong>foobar </strong>, I need to do something more to not allow the intermediaries.<br>
Also, too many <strong>?</strong>'s in the RegEx don't look that good.<br>
<img alt="regex foob?a?r?" src="https://2.bp.blogspot.com/-EmjjyRSTQA0/V3EkYXDxbkI/AAAAAAAAHKs/qGXR-ODG0XwqPM0ofmFl8kWjE2uz5B7vwCK4B/s200/optional3.PNG"></p>
<p>So, Let's see how it can be done to meet our requirements. We will use <strong>grouping</strong> to do that. Groups will be covered in much detail in a separate tutorial. For now just sit tight and continue. So, we will do this, <code>/foo(bar)?/</code>. And, as you can see it is matching both <em>foo </em>and <em>foobar</em> as we wanted. </p>
<p><img alt="regex foo(bar)?" src="https://2.bp.blogspot.com/-x32fjVP8shQ/V3Ekht5c_EI/AAAAAAAAHK0/fW-Q_2B4HJAoa-RRTXfAYepwjxN4SiMhgCK4B/s400/optional4.PNG">]</p>
<p>Now, let's try to understand what we did here.<br>
<strong>(bar)</strong> is called a Group. <strong>Group is just a fancy way of saying that everything inside has something in common</strong>. <strong><br>
</strong>And, in our case the common property is that all the letters inside are optional. And, when the RegEx engine looks at that it knows <em>bar</em> is optional and so matches <em>foo</em> even if <em>bar</em> is present or not. The end result being only <em>foo </em>and <em>foobar </em>are matched completely and not the other things.</p>
<p>Now, Its time for some RegExercise...<br>
What will you do to match both <strong>'cats'</strong> and '<strong>carts'</strong> but nothing else?<br>
Think about it and try it in <a href="http://regexr.com/">Regexr.com</a> to see if it is working.</p>
<p>Well, that is everything for this tutorial. Stay Tuned for more.</p>
<p>Happy RegExing!</p>Encryption in Vim2016-06-19T02:21:00+05:302016-06-19T02:21:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-19:/encryption-in-vim<p><img alt="vim logo" height="200" src="https://s3.amazonaws.com/hackdesign/tools/app_images/000/000/051/icon_small/vim-logo-en.png?1391303578" width="200">]
Vim never fails to surprise you with the amazing features it has in its arsenal. Very recently I have found that Vim comes bundled with an encryption mechanism referred to as <strong>VimCrypt</strong>.<br>
It is always a good practice to encrypt your files especially when it contains personal or sensitive information …</p><p><img alt="vim logo" height="200" src="https://s3.amazonaws.com/hackdesign/tools/app_images/000/000/051/icon_small/vim-logo-en.png?1391303578" width="200">]
Vim never fails to surprise you with the amazing features it has in its arsenal. Very recently I have found that Vim comes bundled with an encryption mechanism referred to as <strong>VimCrypt</strong>.<br>
It is always a good practice to encrypt your files especially when it contains personal or sensitive information. I often write my Daily journal notes in vim and i always encrypt them with some external programs. But, Vim itself is capable of doing that.<br>
Let's see how it works.</p>
<p>Here, I have a file named <em>encrypt.txt</em> open in vim.</p>
<p><img alt="text to encrypt" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/vim_encrypt1.PNG"></p>
<p>So, now I want to encrypt this. All I have to do is this and press Enter.</p>
<p><code>:X</code></p>
<p>And, it will prompt you to enter a key for the Encryption. Enter it twice. This will be used to encrypt and later decrypt the file</p>
<p><img alt="enter encryption key" height="69" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/vim_encrypt3.PNG" width="640"></p>
<p>And, that is all.. Your file is now encrypted with the key you have given.<br>
The next time you try to open the file again, it will prompt you to enter the Encryption key. </p>
<p><img alt="enter encryption key" height="68" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/vim_encrypt4.PNG" width="640"></p>
<p>Make sure to remember the key you have entered because if you enter a wrong key to decrypt the data, you will see a completely garbled gibberish on screen. </p>
<p><img alt="garbled data if wrong key is used" height="190" src="https://raw.githubusercontent.com/durgaswaroop/blogimages/master/vim_encrypt5.PNG" width="640"></p>
<p>So, that is how you encrypt your files in Vim. </p>
<p>But, here are a few things to be mindful of</p>
<ol>
<li>VimCrypt uses a really <strong>weak encryption algorithm</strong>. It can be broken rather effortlessly. Hence ,don't use this for encrypting really really important files.</li>
<li>If you open the file with a wrong password, you'll see garbled text on screen. But, <strong>Do not save</strong> that gibberish file on to the disc, because if you do, Vim will overwrite your original file contents and your data will be lost.</li>
</ol>
<p>There are several new encryption methods available to achieve the same thing like, Blowfish and Blowfish 2. These are much better than the default VimCrypt and will be much harder to crack. More about them later in another tutorial.</p>
<p>So, that is all for this article. Its good practice to encrypt files to keep your data secure. So, try to use this whenever you can.<br>
And, until next time, happy Vimming!</p>Dot Operator | RegEx : The Right Way2016-06-19T02:18:00+05:302016-06-19T02:18:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-19:/regular-expressions-tutorial-2<p>Let's continue with Regular Expressions. All articles in this series can be found <a href="https://freblogg.com/tags/regex">here</a>. I will be using <a href="http://regexr.com/">Regexr.com</a> for most of these tutorials. It is a great site, where you can write and validate your regular expressions against your desired input text.</p>
<p>Now, Let's look at how regular …</p><p>Let's continue with Regular Expressions. All articles in this series can be found <a href="https://freblogg.com/tags/regex">here</a>. I will be using <a href="http://regexr.com/">Regexr.com</a> for most of these tutorials. It is a great site, where you can write and validate your regular expressions against your desired input text.</p>
<p>Now, Let's look at how regular expressions usually look like..</p>
<p><a href="https://freblogg.com/tags/regex"><img alt="" height="348" src="https://1.bp.blogspot.com/-ZmsrZ_DaBFU/V2WzYe2cwhI/AAAAAAAAHHQ/cXfkolFOZHAUrupoBLdO5J9mppP_8CpigCK4B/s640/regex_small.png" width="640"></a></p>
<p>The '/' before and after the pattern is the "delimiter". We put our search pattern between the delimiters and in the end pass a flag. Flags add a bit more control on what you want to accomplish. So, with that knowledge let's get started. </p>
<h2 id="matching-a-string">Matching a String</h2>
<p>To match any given string, we just do this, <code>/{string}/</code> and the Regex engine will find that string from the text.
So, when i search for <strong>Blogg </strong> it will be matched with FreBlogg as we can expect but not with 'blogg' as it is case-sensitive by default. </p>
<p><img alt="" src="https://4.bp.blogspot.com/-Ehsuq4JsxUY/V3EtmsEO6WI/AAAAAAAAHLI/1MiEcs2MEiAF7hounSTqmcvZxJF5H_h9wCK4B/s200/freblogg%2B%25281%2529.png"></p>
<p><img alt="" src="https://4.bp.blogspot.com/-iE7hbXCbNks/V3Et2tbe2KI/AAAAAAAAHLQ/SPBZI28TEZ4X6zqUMKxuvm9_6EVXPP5lQCK4B/s200/freblogg1%2B%25282%2529.png"></p>
<p>To, match even the '<em>blogg</em>' on the second line, you can use the <code>i</code>(ignores the case) flag along with <code>g</code>. But, the problem with that <code>i</code> flag is that it will match a lot more than we want it to as you can see in the second image. So, use the <strong>i </strong>flag with caution as it can match other strings you might not want to.</p>
<p>We will later see what to do if you only want to match 'Blogg' and 'blogg' and nothing else.</p>
<p>So, that is how you can match for a single string. Let's look at the Dot operator now.</p>
<h2 id="the-dot-operator">The Dot Operator</h2>
<p><img alt="" src="https://2.bp.blogspot.com/-T1V3ah_uGkU/V3EurMvusDI/AAAAAAAAHLc/RVajI9qtuKkMypQBnRfQfVQflWj4sQM9QCK4B/s200/dot.PNG"></p>
<p>The <code>/./</code> in regex, matches every character, except the newline characters.<br>
We use this when we want to match a character but don't care what it is. A[s you see in the image, it matches all characters, numbers and spaces. Even special characters like @,!,\$.. will be matched.</p>
<p><img alt="" src="https://1.bp.blogspot.com/-_PvjnNgJQlk/V3Eu2BM1a7I/AAAAAAAAHLo/4iBQS8Hocn0RUr1QFn_APxWWW2CJhD1ZwCK4B/s200/dot1.PNG"></p>
<p>When i try with <code>/D./</code> , it matches *Do * because the Dot will match anything. But, it didn't match the 'D' on line 2 with 'o' on line 3, because dot operator won't match new line characters and we have a new line character after 'D'.</p>
<p>So, Let's use the stuff we've learned till now and do a small exercise.<br>
<img alt="" src="https://2.bp.blogspot.com/-oXWp8egCOOQ/V3Eu_MBum-I/AAAAAAAAHLw/DI_VFlM9hs4Nln09_4V29Y2jhBkMpyx6wCK4B/s200/dot2.PNG"></p>
<p>Say, I want to match three letter words, i can use <code>/.../</code>, but look at what it matched.<br>
As you see it did not exactly do what we wanted. Since, <em>Dot</em> matches Spaces as well, it matches a space and two letters as in 'oh'. So, we can't use it for this particular case. We will see much better way to do the same thing in the later tutorials.</p>
<p>Well, that is everything for this tutorial. Stay Tuned for more.</p>
<p>Happy RegExing!</p>RegEx : The Right Way |Tutorial 12016-06-19T02:07:00+05:302016-06-19T02:07:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-19:/regular-expressions-tutorial-1<p>Regular Expressions or <code>RegEx</code> is a sequence of characters that define a <em>search pattern</em>. Regex is every where these days and you can use it to extract information from Text files, Log files, Dictionaries, Spread sheets and webpages. Every major programming language has support for Regular Expressions. Most importantly <strong><em>grep …</em></strong></p><p>Regular Expressions or <code>RegEx</code> is a sequence of characters that define a <em>search pattern</em>. Regex is every where these days and you can use it to extract information from Text files, Log files, Dictionaries, Spread sheets and webpages. Every major programming language has support for Regular Expressions. Most importantly <strong><em>grep, awk and sed</em></strong> use regex to find/replace matches. </p>
<p><img alt="" height="179" src="https://3.bp.blogspot.com/-MlWrA76NZNU/V2Ww3pRtjyI/AAAAAAAAHHE/vNKsLN1Mo08yESw5aOro1-48kfrbuFpCwCK4B/s640/regex.png" width="640"></p>
<p>Regular Expressions can help you save a lot of time. Instead of writing complex String pattern searches which span over multiple lines, regex gets the job done really easily and really fast.<br>
Let's look at a simple scenario where you might want to use Regex. Let's say you have a String and you want to check if it is a Website URL. So, here are a few conditions that a URL should satisfy</p>
<ul>
<li>Should have <strong>http://</strong> , <strong>https://</strong></li>
<li>May or may not have <strong>www.</strong></li>
<li>Should have a <strong>.com</strong> , <strong>.org</strong> or something similar</li>
<li>Can have characters, digits, underscores etc.,</li>
<li>Might even have some sort of port numbers at the end <strong>http://google.com:80/</strong></li>
</ul>
<p>So, matching all these individually in any Programming language with various String parsing conditions can be a really challenging task. But, using RegEx the same thing can be achieved much easier and much faster.</p>
<p>That sounds like fun,doesn't it? . Well, Lets get started.</p>
<p>If you have used Linux shell/terminal before you probably would have used Regular Expressions already. Bash Shells have some basic Pattern matching capabilities built in to them. So, Let's look at an example</p>
<p>I am currently inside a folder with some files in it.</p>
<div class="highlight"><pre><span></span><code>$ ls
file.csv picture.jpg README_en.txt touch2.txt video.mp4
HelloWorld.rb program1.java touch1.txt touch2.vim
</code></pre></div>
<p>If i want to see only the files with the extension <em>.txt, </em>I can do this.</p>
<div class="highlight"><pre><span></span><code>$ ls *.txt
README_en.txt touch1.txt touch2.txt
</code></pre></div>
<p>This can be thought of as regex in its basic form. We are giving a search pattern and we are seeing the output that matches this pattern. This '<strong><em>*' </em></strong>here is called a Wild card Character which basically matches anything and everything.</p>
<p>This time let's say I want to search for a txt file whose name is 'touch' followed by something [In this folder we have touch1.txt , touch2.txt ]. Let's say i don't remember the exact number following the touch. To search for that, I can use the <strong><em>? </em></strong>operator and that will give me this.,</p>
<div class="highlight"><pre><span></span><code>$ ls touch?.txt
touch1.txt touch2.txt
</code></pre></div>
<p>Now, the<strong><em> '?'</em></strong> operator is also a wild card character but just for one character match. So, if there is a file called <em>touchA.txt</em> or <em>touch%.txt</em>, they will be matched too but <em>touchAB.txt</em> will not be matched.</p>
<p>So, these are how you can improve your search results using search patterns. We can use programs like <strong><em>grep</em></strong>, <strong><em>egrep</em></strong> and <em><strong>sed</strong> </em>to do a lot more than just this. So, we will use them in the upcoming tutorials.</p>
<p>That's everything for this tutorial. So, stay tuned for the upcoming ones.<br>
Happy RegExing!</p>NERDTree | Your very own Vim file tree2016-06-19T02:00:00+05:302016-06-19T02:00:00+05:30Durga Swaroop Perlatag:freblogg.com,2016-06-19:/your-very-own-vim-file-tree-nerdtree<p>NERDTree is a real time saver and a pretty cool extension to your Vim setup, to make it more user friendly. Almost every other Text editor out there comes with an ability to show the file directory listing in which the current file is present. And, if you are wondering …</p><p>NERDTree is a real time saver and a pretty cool extension to your Vim setup, to make it more user friendly. Almost every other Text editor out there comes with an ability to show the file directory listing in which the current file is present. And, if you are wondering how you can do the same in Vim, then look no further because <a href="https://github.com/scrooloose/nerdtree">NERDTree</a> is what you want.<br>
<img alt="vim logo" src="https://s3.amazonaws.com/hackdesign/tools/app_images/000/000/051/icon_small/vim-logo-en.png?1391303578"></p>
<p>So, what does it do?
It just shows all the files, folders in the current working directory. Also, you can add,delete files right from the list. That's pretty cool.</p>
<p>So, take a look, </p>
<p><img alt="Nerd Tree in action" src="https://freblogg.com/images/nerdtree.gif"></p>
<p>Some useful stuff regarding this plugin,</p>
<ul>
<li><code>:NERDTreeToggle</code> - Toggles the file pane On/Off. So, you might want to map that in your vimrc. I have it mapped to <strong><em><LEADER> n</em></strong></li>
<li><code>?</code> - Hit '?' and you'll get all the help you need for using it</li>
<li><code>m</code> - Hit 'm' and you'll be presented with a menu to create,delete and list</li>
<li>For all extra info - <code>:help NERDTree</code></li>
</ul>
<p>So, that is all i have about this plugin. It is a really great addition to your workflow and you will love it. </p>
<p>Download Link : <a href="https://github.com/scrooloose/nerdtree">https://github.com/scrooloose/nerdtree</a></p>
<p>Happy Vimming!</p>
<p>PS: To see all the vim tutorials of Freblogg , visit : <a href="https://freblogg.com/tags/vim">Freblogg/vim</a></p>
<hr>
<p>Image Credits:</p>
<p>vim logo - hackdesign.org - https://goo.gl/ADCh6R</p>Matrix Rain / Falling Matrix code : Notepad trick2013-05-11T21:16:00+05:302013-05-11T21:16:00+05:30Durga Swaroop Perlatag:freblogg.com,2013-05-11:/matrix-rain-falling-matrix-code-notepad<p>Have you watched the movie matrix? If you have watched it you would surely have noticed the green coded numbers running up and down (also called <em>Matrix Rain</em>) on the screen. In this That falling code trick is very easy to create on your own. Now, I'll show you how …</p><p>Have you watched the movie matrix? If you have watched it you would surely have noticed the green coded numbers running up and down (also called <em>Matrix Rain</em>) on the screen. In this That falling code trick is very easy to create on your own. Now, I'll show you how to do that. </p>
<p><img alt="matrix movie stype command prompt" height="450" src="http://3.bp.blogspot.com/-bJ48CNFmyh8/UY5oSpXzn5I/AAAAAAAAAsE/PaR4cYZMTe8/s1600/matric+blog.png"></p>
<h2 id="steps-to-generate-failling-matrix-code">Steps To Generate Failling Matrix Code</h2>
<p>1)Open Notepad on your computer<br>
2)Copy and paste the following code in to it </p>
<div class="highlight"><pre><span></span><code><span class="p">@</span><span class="k">echo</span> off
<span class="k">color</span> 02
<span class="p">:</span><span class="nl">start</span>
<span class="k">echo</span> <span class="nv">%random%</span> <span class="nv">%random%</span> <span class="nv">%random%</span> <span class="nv">%random%</span> <span class="nv">%random%</span> <span class="nv">%random%</span> <span class="nv">%random%</span> <span class="nv">%random%</span> <span class="nv">%random%</span> <span class="nv">%random%</span>
<span class="k">goto</span> <span class="nl">start</span><span class="c1"> </span>
</code></pre></div>
<p>3)Save it with any name you wish, but with the extension '.bat' (.bat stands for batch file). Save it wherever you want in your file system.</p>
<p><img alt="save the batch file" height="161" src="http://4.bp.blogspot.com/-_99MKgY-jWs/UY5gpc4H_KI/AAAAAAAAArg/Vf7KRr27QqE/s1600/pic1.png" width="640"></p>
<p>4)Now, double click on the file and feel like a bad ass programmer. Try showing this to your friends and feel proud of yourself when they look high of you. </p>
<p><img alt="matrix failling live demo" src="http://4.bp.blogspot.com/-0G46z9Qs4b0/UY5hdyrRiVI/AAAAAAAAArw/Q5tjd2cKtc0/s1600/matrix.gif"></p>
<p>Watch the youtube video explaining the same</p>
<iframe width="879" height="549" src="https://www.youtube.com/embed/kz5EzwswOFA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<p>5)You can also change the color of the numbers by changing the 'color 02' in the code to what ever number you want<br>
Following are the colors for various possible numbers<br>
<br>
00 - Black colored <br>
01 - Blue<br>
02 - Green <br>
03 - Aqua Blue (greenish blue)<br>
04 - Red <br>
05 - Purple<br>
06 - Yellow <br>
07 - White<br>
08 - Grey <br>
09 - Light Blue<br>
0A - Light Green <br>
0B - Light Aqua Blue<br>
0C - Light Red <br>
0D - Light Purple<br>
0E - Light Yellow <br>
0F - Bright White </p>
<p>6)To change the background change the first digit of the number<br>
Some examples are:<br>
1X - Blue Background + color of the letters corresponding to the number X<br>
2X - Green Background + Color of the letters corresponding to X<br>
AX - Light Green Background + Corresponding color of letters<br>
So on and you can keep going </p>
<p>If No Argument is given, this command restores the color to what it was when CMD.EXE started. This value either comes from the current console window or from the DefaultColor registry value </p>
<p>Thanks for stopping by. For more interesting and awesome tricks and tweaks subscribe to our blog feed. </p>SAMSUNG GALAXY S4 Vs HTC ONE : Complete Review2013-05-10T11:54:00+05:302013-05-10T11:54:00+05:30Durga Swaroop Perlatag:freblogg.com,2013-05-10:/samsung-galaxy-s4-vs-htc-one-complete<p>Are you planning to buy Samsung Galaxy S4 or the HTC One ??</p>
<p><img alt="samsung s4 and htc one phones" height="400" src="http://2.bp.blogspot.com/-GPevMUnqwWo/UYvrqavwFJI/AAAAAAAAAm0/oVv9C_s743s/s1600/sam+and+htc.png" width="396"></p>
<p><strong>Samsung Galaxy S4</strong> and <strong>HTC One</strong> , the two new, popular, hyped and high performance HD Android Smart Phones are the talk of town these days. They are arguably the mot popular and most sought out phones of 2013 …</p><p>Are you planning to buy Samsung Galaxy S4 or the HTC One ??</p>
<p><img alt="samsung s4 and htc one phones" height="400" src="http://2.bp.blogspot.com/-GPevMUnqwWo/UYvrqavwFJI/AAAAAAAAAm0/oVv9C_s743s/s1600/sam+and+htc.png" width="396"></p>
<p><strong>Samsung Galaxy S4</strong> and <strong>HTC One</strong> , the two new, popular, hyped and high performance HD Android Smart Phones are the talk of town these days. They are arguably the mot popular and most sought out phones of 2013. These are the biggest (nothing to do with their size) smart phones launched in 2013. They will go head to head and toe to toe against each other and their match up will go on till the end of 2013 and much beyond. Both of them initially when launched in to the market were thought of as tough and worthy opponents to Iphone 5. But they eventually crossed that mark and they have gone well further and now they are no longer seen as a competitor to Iphone or other smart phones but they just stand out of the pack. These two have clearly set them as the kings of 1080P smart phones. So, here is the side by side comparison of two greatest flagship phones of 2013 (of all time), <em>[SAMSUNG GALAXY S4]{style="color: blue;"} * Vs </em>[HTC ONE]{style="color: blue;"}* </p>
<h2 id="individual-stories">Individual Stories</h2>
<h3 id="galaxy-s4">Galaxy S4</h3>
<p>Galaxy S4 is one of the most popular phones of 2013 because of all the hype and publicity done by Samsung and because of its predecessor S3 which is so popular. These made everybody expect it to be a good phone. Every body thought of it as a Revolutionary update of S3 but when it came out it was more like an Evolution rather than a Revolution. Although Samsung says they have made over a 100 changes, people don't really see them. All they can see is a Thinner, Lighter, Squarer S3 which also gets some of its looks from Galaxy Note 2. So, S4 remains as a minor update of S3 with some new features and a high power battery. Its new features such as Air Gesture, Smart scroll, Smart stay made it worth waiting for. </p>
<h3 id="htc-one">HTC One</h3>
<p>HTC One is an interesting phone this year for a few reasons. It is the Flagship phone for HTC and it is surely their desperate effort to stay high in the market and is a Must win situation for them mostly because of its predecessor One X which got some good reviews but was Commercially unsuccessful. HTC is clearly hoping this to be the <em>ONE</em> to change their fate and hence the name ONE. Secondly, it is competing directly against two other great smart phones Iphone and Samsung's S4. Just by watching the specs you can tell that HTC has surely taken a few risks trying to get the market share. Most of the emphasis of ONE is on unique hardware and software features. A new camera with an <em>Image sensor</em> to perform well in low light environments and many other features makes it a good pick for anyone. </p>
<h2 id="face-to-face-features-pros-cons">Face to Face: Features, Pros & Cons</h2>
<h3 id="1-build-and-looks">1) Build and Looks</h3>
<p>HTC One is probably the best build phone ever made (at least among HTC's). The build quality is one of the biggest differences between these two phones. Build quality is the first thing a person notices when he holds one. The immediate first impression of a phone is how it feels. I must say, HTC stands out in this aspect due to its 'All metal Uni body back design' which also wraps up a bit on to the front side. It has that sturdy and hefty feel when you hold it in your hand which is one of the greatest things about this phone. While S4 feels completely different. It is made of plastic, so it feels flexible and light. If you don't mind a few extra grams in your hand you could go for ONE but if you want your phone to be as light as possible then you can go for S4. Both Plastic and Metal have their own Pro's & Con's but they balance out each other.<br>
Another major difference is the Speakers. S4 has a small rear speaker which is ok (you know what i mean), but ONE has these massive stereo front facing speakers which are amazing. HTC calls it <em>'Boom Sound'.</em> No one else did it this well like HTC. The Beats Audio is surely another plus. </p>
<h3 id="2-screen-size-display">2) Screen Size & Display</h3>
<hr>
<p><a href="http://2.bp.blogspot.com/-H1pJQWVVSyM/UYwngRjg-2I/AAAAAAAAAnU/eHPCbN6UIfk/s1600/hd_resolution_logos.png"><img alt="HD 1080 image" height="116" src="http://2.bp.blogspot.com/-H1pJQWVVSyM/UYwngRjg-2I/AAAAAAAAAnU/eHPCbN6UIfk/s1600/hd_resolution_logos.png" width="200"></a></p>
<p>By: <a href="http://jokaone.deviantart.com/">jokaone</a></p>
<hr>
<p>HTC One is a 4.7 inch LCD Screen and Galaxy S4 has a 5 inch SUPER AMOLED screen. Both are 1080P full HD (1,920 x 1,080) smart phones. Both have a crystal clear, razor sharp display. HTC ONE has a bit brighter display because of it LCD screen compared to the AMOLED screen. It also displays more accurate colors. HTC can surely be proud about their high <em>Pixel Density</em> (more pixels in smaller screen).<br>
HTC - 468 ppi * PPI - Pixels Per Inch<br>
S4 - 441 ppi </p>
<h3 id="3-battery">3) Battery</h3>
<p><a href="http://2.bp.blogspot.com/-yZAOW0NPv8c/UYxwtmveo1I/AAAAAAAAAqA/zXoGiKQVV2g/s1600/battery.png"><img alt="phone battery icon" src="http://2.bp.blogspot.com/-yZAOW0NPv8c/UYxwtmveo1I/AAAAAAAAAqA/zXoGiKQVV2g/s1600/battery.png"></a> Galaxy S4 is built-in 2600 mAh Li-ion battery while HTC One has a 2300 mAh Li-Po (Lithium Polymer) battery. Clearly, S4 has an advantage in this aspect. It lasts a bit longer than that of One. Further more the battery of S4 is replaceable unlike that of ONE. S4 has one more feature, the battery can be charged wireless. Nevertheless, both S4 and ONE are massive power houses destined to give high performance. The battery life is almost the same in both of these phones. The High clock speed of S4 consumes more battery power and it probably eats away some of the extra advantage because of the 2600 mAh battery. </p>
<h3 id="4-operating-system">4) Operating System</h3>
<p><a href="http://4.bp.blogspot.com/-nkICc9OKdfQ/UYxwDmlgjVI/AAAAAAAAAp4/MtiXixHphRw/s1600/504px-Android_robot.png"><img alt="android robot icon" height="200" src="http://4.bp.blogspot.com/-nkICc9OKdfQ/UYxwDmlgjVI/AAAAAAAAAp4/MtiXixHphRw/s1600/504px-Android_robot.png" width="168"></a>The S4 is getting shipped with the latest version of Android (4.2.2) and Samsung's very own Touchwiz user experience while ONE will be shipping with the previous version of Android (4.1.2) and its own Sense 5. The distinction here clearly depends on your tastes. If you like Touchwiz (i doubt that) you can go for S4 or if SENSE makes sense to you, you can prefer ONE. Though ONE will miss some of the updated features of the new version of Android for now, but you can be sure to see the new updates right around the corner. </p>
<h3 id="5-camera">5) Camera</h3>
<p>One of the highly debated topic when comparing S4 and ONE is the 'Camera'. HTC ONE has a 4 Ultra Pixel (same as 4 MP) Rear camera while Galaxy S4 has a 13 MP Rear camera. Though both can take pretty good photos, ONE's 4 MP cam is a sort of downside (though it has its own pros). S4 takes these<br>
<a href="http://1.bp.blogspot.com/-q-yWZHH4rlQ/UYwsod98lmI/AAAAAAAAAoI/JpzalKxqX6o/s1600/camera.png"><img alt="camera icon" src="http://1.bp.blogspot.com/-q-yWZHH4rlQ/UYwsod98lmI/AAAAAAAAAoI/JpzalKxqX6o/s1600/camera.png"></a>amazingly crisp and high quality images thanks to its 13 MP sensor in almost all times, while ONE can take more detailed images in low-light environments because of its 4 MP camera. For regular uses and purposes we can't see much of a difference between them, but when we zoom or crop the image ONE's image fall's apart soon enough while S4's picture stay's crisp and crunchy all the way through which gives a photographers advantage for S4. So, if u are a pixel sucker (the one who wishes for more & more pixels) the S4 will be a great choice for you. One soothing thing for ONE lovers is its '<em>wide angle lens' </em>both in the Front facing camera and Rear camera by which you can include more room in to your shot which is a noticeable difference between these two great phones. </p>
<table>
<thead>
<tr>
<th>Phone</th>
<th>Rear Camera</th>
<th>Front Camera</th>
</tr>
</thead>
<tbody>
<tr>
<td>HTC ONE</td>
<td>4 MP with Auto Focus<br />1080P HD Recording<br />HDR Recording</td>
<td>2.1 MP<br />1080p HD recording</td>
</tr>
<tr>
<td>Galaxy S4</td>
<td>13MP with Auto Focus<br />1080P HD Recording<br />HDR Recording</td>
<td>2 MP<br />1080p HD recording</td>
</tr>
</tbody>
</table>
<h3 id="6-performance">6) Performance</h3>
<p><a href="http://3.bp.blogspot.com/-VYUzJFA96gY/UYwuLClwFpI/AAAAAAAAAoc/bLkpYOlF0Q4/s1600/snap.jpg"><img alt="snapdragon chip" src="http://3.bp.blogspot.com/-VYUzJFA96gY/UYwuLClwFpI/AAAAAAAAAoc/bLkpYOlF0Q4/s1600/snap.jpg"></a></p>
<p>Both S4 and ONE are performance drivers. HTC One comes with a 1.7 GHz Quad-core Snapdragon 600 processor. In case of S4's quad-core variant, it has a 1.9 GHz Snapdragon 600 Soc while the octa-core version comes with a 1.6 GHz A15 quad-core cluster and 1.2 GHz A7 quad-core cluster. Since, octa-core versions are less popular, we'll try to ignore them, Which gives S4 a plus in this aspect. But a high clock speed indicates using more battery power and that is why S4 uses a 2600 mAh battery. Coming to response speed, HTC takes on the lead. It is very quick and has a highly responsive touch screen compared to S4, which can stutter sometime, which gives a slight edge to ONE. </p>
<h3 id="7-internal-memory-storage">7) Internal Memory Storage</h3>
<p>When it comes to Memory storage Galaxy S4 tends to be more versatile and flexible. S4 comes with 16 GB, 32 GB, 64 GB versions and also comes with a Micro SD card slot which is extendable up to 64GB. </p>
<p><a href="http://2.bp.blogspot.com/-rWRYTtBBxyE/UYwvSfJs76I/AAAAAAAAAow/OcmymzYKl5E/s1600/micro-sd-cards-250x250.png"><img alt="micro SD card" src="http://2.bp.blogspot.com/-rWRYTtBBxyE/UYwvSfJs76I/AAAAAAAAAow/OcmymzYKl5E/s1600/micro-sd-cards-250x250.png"></a> HTC ONE comes in two versions 32 GB or 64 GB and does not have a SD card Option (except in china). </p>
<h3 id="8-other-features">8) Other features</h3>
<p>Some other interesting and worth mentioning features of these phones </p>
<h4 id="htc-one_1">HTC ONE</h4>
<ul>
<li>ONE features a new News aggregator, known as 'Blink Feed' which displays a scrolling list of news and other content from social networking sites</li>
<li>ZOE: The Camera app includes a new shooting mode called, ZOE with which you can film 4 secs of video and create your own gif's</li>
<li>Remote Control: HTC One an electronic program guide powered by Peel, by which it can act as a remote control for TV</li>
</ul>
<h4 id="galaxy-s4_1">GALAXY S4</h4>
<ul>
<li>SMART SCROLL: Screen can be scrolled up or down by tilting the phone </li>
<li>SMART PAUSE: Video gets paused on its own if you are not looking at the screen. It will resume playing when you look at the screen again</li>
<li>GROUP PLAY: Allows you to share files with other Galaxy S4 phones. You can play the same game or you can listen to the same song with the other shared S4's acting as supporting speakers</li>
<li>AIR-View: Allows users to preview an image or a video by hovering their finger over it</li>
<li>ERASER: Allows user to remove unnecessary things from the image while capturing</li>
<li>DUAL-SHOT: Allows the person taking the picture to be in the picture (So, no more 'who took this image?' sort of questions)</li>
<li>SOUND & SHOT : Allows user to record a small voice clip along side a picture</li>
<li>KNOX: A new feature, which allows user to divide the phone for business and personal uses </li>
</ul>
<h2 id="final-say">Final say</h2>
<p>Both of these are great phones at the end of the day. I will happily recommend anyone of these phones because no matter what you buy, its worth your money and time. Both of them counter each other in their features. So, Either you may buy S4 or HTC One, you can be sure that you'll get the high end quality performance you expect. The only thing i would like to add is to Put your money where ever your interest lies in. So, this concludes the S4 Vs ONE Review. What will you pick ? What do you prefer? Let us know what you think of each of these phones. We would be grateful if you can describe your reasons to pick one over the other. Thanks for stopping by. Have a happy day ahead. </p>
<h2 id="attributions">Attributions</h2>
<p>S4 Black mist image - Author: <a href="http://www.flickr.com/people/60952012@N06">Samsung Belgium</a> (creative commons license)<br>
HTC ONE image - Author: Hi-tech@Mail.Ru (creative commons license)<br>
Android Image - Author: Google (creative commons license) </p>