January 12, 2018

Build A Web Application With Flask In Python Part I

23:54 Posted by Durga swaroop Perla , , , , No comments

Flask is a popular micro web application framework for Python using which you can create web apps. Unlike another popular framework like Django, Flask keeps its foot print to a minimum providing only the basic functionality required instead of picking out the entire stack for you the way Django does. And we call it a micro framework for this very reason. Using flask's extensibility at the core, you can build any type of applications by picking the components you want to use. Several big name companies like LinkedIn, Pinterest use Flask for their products.

Flask Logo

In this tutorial we will get started with using Flask and create a simple web application with it.

Prerequisites

To follow along with this series you should have some knowledge of Python language. I'm using 3.6 for these tutorials and if you would like to follow along without any issues, I would suggest you to use the same version. For any of the previous versions, there might be a couple of changes in the syntax but the ideas and concepts will remain same.

You will also need to install Flask. You can do that with pip.

pip install -U flask

This will install flask if you don't already have it and update the version to latest if you have a previous version installed.

With those two things, you are good to go.

Getting Started

Just like with anything else you start by importing the stuff you want.

from flask import Flask

And this will make Flask ready for you to use. After this you have to create an app object by calling the Flask constructor like this:

app = Flask("hello")

This will create our app object. The name hello I've specified in the constructor can be anything. But the usual convention is to keep it __main__. Also, the app is just a variable. So, you can name it anything you want.

Next you have to define the routes. Using routes you configure your server to do different actions. Let's say when you type in some website URL in to your browser, you will be taken to its home page. Now if you do a <website>/info it will take you to the info page. So, this mapping of the call to /info URL to the info page is what we call as a route. For the home page the route is simply /.

Let's say we want our server's homepage to display Hello World. You can configure that with a method like this:

@app.route('/')
def index():
    return "Hello World"

With the @app.route('/'), we are defining a route on our server. So, when ever somebody opens that route, which for us i the homepage, the index() method associated with that route annotation will be called. And when the index() method is called it will return Hello World just as we expect it to.

And there is one final command to start and run our server which is:

app.run(debug=True)

And that's it. This will run the app that we have created when you run the python file. The debug=True option is useful while developing and testing applications. So, we'll keep that for now.

Just run your python script and you should output like this on the console:

* Debugger is active!
* Debugger PIN: 127-398-124
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Now If you go to http://localhost:5000, you can see Hello World displayed.

That's it. You have successfully created your first web application with flask in just 3 lines of code. Now, that is awesome. Stay tuned for the next part.

That is all for this article.


For more programming articles, checkout Freblogg Freblogg/Python

Some articles on automation:

Web Scraping For Beginners with Python

My semi automated workflow for blogging

Publish articles to Blogger automatically

Publish articles to Medium automatically


This is the 21st article as part of my twitter challenge #30DaysOfBlogging. Nine more articles on various topics, including but not limited to, Java, Git, Vim, Software Development, Python, to come.

If you are interested in this, make sure to follow me on Twitter @durgaswaroop. While you're at it, Go ahead and subscribe here on medium and my other blog as well.


If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, Delorean which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.


Thanks for reading. See you again in the next article.

January 10, 2018

Json Parsing With Python

23:56 Posted by Durga swaroop Perla , , , , , No comments

JSON has become an ubiquitous data exchange format everywhere. Pretty much every service has a JSON API. And since it is so popular, most of the programming languages have built-in JSON parsers. And Of course, Python is no exception. In this article, I'll show you how you can parse JSON with Python's json library.

Python Logo

JSON parsing in Python is quite straight forward and easy unlike in some languages, where it is unnecessarily cumbersome. Like everything else in Python, You start by importing the library you want.

import json

In this article, I am going to use the following JSON I got from json.org

{
  "menu": {
    "id": "file",
    "value": "File",
    "popup": {
      "menuitem": [
        {"value": "New", "onclick": "CreateNewDoc()"},
        {"value": "Open", "onclick": "OpenDoc()"},
        {"value": "Close", "onclick": "CloseDoc()"}
      ]
    }
  }
}

We have got a good set of dictionaries and arrays to work with in this data. If you want to follow along, you can use the same JSON or you can use anything else as well.

The first thing to do is to get this json string into a variable.

json_string = """{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}"""

And now we parse this string into a dictionary object with the help of the json library's loads() method.

json_dict = json.loads(json_string)

And you're done. The JSON is parsed and is stored in the json_dict object. The json_dict here is a python dictionary object. If you want to verify, you can do that by calling the type() on it with

print(type(json_dict))

And it will show that it is <class 'dict'>.

Getting back, We have the entire json object as a dictionary in json_dict object and you can just drill down into the dictionary with the keys. On the top level, We just have one key in the dictionary which is menu. We get can get that by indexing the dictionary with that key.

menu = json_dict['menu']

And of course menu is a dictionary too with the keys id, value, and popup. We can access them and print them as well.

print(menu['id'])            ## => 'file'
print(menu['value'])         ## => 'File'

And then finally we've got popup which is another dictionary as well with the key menuitem which is a list. We can verify this by checking the types of these objects.

popup = menu['popup']
print(type(popup))           ## => <class 'dict'>

menuitem = popup['menuitem']
print(type(menuitem))        ## => <class 'list'>

And Since menuitem is a list, we can iterate on it and print the values.

for item in menuitem:
    print(item)

And the output is

{'value': 'New', 'onclick': 'CreateNewDoc()'}
{'value': 'Open', 'onclick': 'OpenDoc()'}
{'value': 'Close', 'onclick': 'CloseDoc()'}

And of course each of these elements are dictionaries and so you can go further inside and access those keys and values.

For example, If you want to access New from the above output, you can do this:

print(menuitem[0]['value'])  ## => New

And so on and so forth to get any value in the JSON.

And not only that, json library can also accept JSON responses from web services. One cool thing here is that, web server responses are byte strings which means that if you want to use them in your program you'd have convert them to regular strings by using the decode() method. But for json you don't have to do that. You can directly feed in the byte string and it will give you a parsed object. That's pretty cool!

That is all for this article.


For more programming articles, checkout Freblogg Freblogg/Python

Some articles on automation:

Web Scraping For Beginners with Python

My semi automated workflow for blogging

Publish articles to Blogger automatically

Publish articles to Medium automatically


This is the 19th article as part of my twitter challenge #30DaysOfBlogging. Eleven more articles on various topics, including but not limited to, Java, Git, Vim, Software Development, Python, to come.

If you are interested in this, make sure to follow me on Twitter @durgaswaroop. While you're at it, Go ahead and subscribe on medium and my blog as well.


If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, Delorean which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.


Thanks for reading. See you again in the next article.

January 08, 2018

Datasets In Apache Spark - Part 3 | Writing Datasets to Disk

In the last tutorial we've seen how to create parametrized datasets. Once you create datasets and perform some operations on them, you would like to save those results back into storage. This is what we'll try to do in this article - Saving Datasets to storage.

Spark Logo

The first thing we'll do as always is to create the spark-session variable.

// Initialize Sparksession
SparkSession spark = SparkSession.builder().appName("Freblogg-Spark").master("local").getOrCreate();

Using that session variable, we read the fake-people.csv file which has data like this:

id,first_name,last_name,email,gender,ip_address
1,Netti,McKirdy,nmckirdy0@slideshare.net,Female,148.3.248.193
2,Nickey,Curreen,ncurreen1@tripadvisor.com,Male,206.9.48.216
3,Allayne,Chatainier,achatainier2@trellian.com,Male,191.118.4.217
...

We read this file into a dataset as following:

// Read csv file
Dataset<Row> peopleDs = spark.read().option("header", "true").csv("fake-people.csv");

After we have the dataset, Let's assume you've performed some operations on it. Some column selections, some filtering, some sorting etc. And we have a new dataset after all those operations.

// After performing several awesome operations
Dataset<Row> newDs = ....

We want to store this dataset back on the disk. We can do that with the write() on spark session variable, just like read().

newDs.write().csv("processed-data");

The processed-data in the above command is not the name for the output CSV file but instead for the output directory. When you write a Dataset to a file, it will store the data in the format you asked for, CSV in this case, along with adding some check files and status flags as well creating a directory with that name.

These are the files that get created in the processed-data folder.

$ ls ../../apache-spark/processed-data
_SUCCESS  part-00000-311049cf-3e48-4286-b93c-7d2096a18678-c000.csv

There are two more hidden CRC files that I'm not showing here. The part-00000-31hxxxxxxxxx.csv is the actual data file which has the data from the new dataset.

You can also create a json file by running

newDs.write().json("processed-data")

And that will create another folder with json file and the _SUCCESS file inside it.

You can also save this data to an external Database if you want to. You'll use the jdbc() method along with the connection string and the table name. And Spark will write it to the DB.

Parquet Logo

Apart from the CSV and JSON formats, there is one more popular data format in the Data Science and Big Data world. That is Parquet. Parquet is a data format that is highly optimized and well suited for column-wise operations. It is widely used in a lot of projects in the Big Data ecosystem as a data serialization format. And In Spark, Parquet is the default file storage format. Of course one main difference between Parquet and formats like CSV, JSON is that Parquet is not meant to be used for humans. It can only be read by a parquet reader. A sample file looks something like this:

PAR1   �k �>, �          999  1     �5,   �      1   2   3   4   5   6   7   8   9  - 0   1   2   3   4   5   6   7   8   < 2 < 2 < 2 < 2 < 2 < 2 < 2 <
.....

Utterly gibberish. But spark can read and understand it. In fact, As Parquet is designed for speed and throughput, it can be 10-100 times faster than reading/writing from an ordinary data format like CSV or JSON, depending on the type of data.

You save dataset to Parquet as follows:

newDs.write().parquet("processed");

And this will save the dataset as a parquet file along with the _SUCCESS status file.

That is all for this article.


For more programming articles, checkout Freblogg, Freblogg/Java, Freblogg/Spark

Articles on Apache Spark:

Map Vs Flat map

Spark Word count with Java

Datasets in Spark | Part I

Datasets in Spark | Part II


This is the 17th article as part of my twitter challenge #30DaysOfBlogging. Thirteen more articles on various topics, including but not limited to, Java, Git, Vim, Software Development, Python, to come.

If you are interested in this, make sure to follow me on Twitter @durgaswaroop. While you're at it, Go ahead and subscribe here on medium and my other blog as well.


If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, Delorean which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.


Thanks for reading. See you again in the next article.

January 06, 2018

Remove Duplicate Elements From An Array

23:35 Posted by Durga swaroop Perla , , , No comments

Interviews are a great place to learn about your strengths and weaknesses, which makes them a great way to improve oneself. In one of my interviews, I was asked to Remove duplicate elements from an array. So, given the array a as below, I've to produce b.

a = {1, -2, 3, 1, 0, 9, 5, 6, 4, 5, 3, 1, 0}
b = {1, -2, 3, 0, 9, 5, 6, 4}

Here b has the same order of elements as a but per the problem statement, It is not necessary to do that.

I was flustered for a bit after getting the question. It took me a while to get to a proper solution, not before getting my first solution rejected for using HashMap which apparently, I was not supposed to. I am attributing this mainly to the fact that I was told to write Java code on a piece of paper and not an IDE. Anyway, I came home after that and decided to try it out and find what other's have done online. That is what this article is about.

Array of Donuts

Since that particular interview was in Java, It is only fair that I use Java for the solution here, although I really wanted to do it in Python. Maybe some other time.

Approaches for solving the problem:

Approach #1

The most naive approach would be to just look through the entire array and compare each element to every other element to see if there's a duplicate. Of course, this is useless as its time complexity is O(n^2). So, Let's skip this one and go to the next one.

Approach #2

Another approach is using a HashMap to keep track of elements. This is what I've tried initially but was rejected because I've used HashMap when I wasn't supposed to. The pseudo code would be:

map = new Map // Create map
new_arrray = []

for number in numbers_array
  if not map.contains(number)
  map += number
  new_array += number

print(new_array)

Of course, Since I wrote my implementation of this in Java, I had to make a few modifications to this as you need to first define the size of the array and only then can you add elements to it. So, I've added a count variable to count unique elements and then created a new array after the iteration with that size. This would require two loop iterations, but it is still O(n) which is fine. But Alas I couldn't use this.

And so, then comes my final approach.

Approach #3

The third solution is to first sort the array and then from the sorted array, remove duplicates. We can do this because the problem didn't want us to maintain the given input order. Otherwise, we wouldn't have been able to sort the array.

Sorting is easy enough. We just use the built-in sort method, which will sort the array in place.

Arrays.sort(numbers);

Then comes the major part which is removing the duplicates in that sorted array. We can accomplish that by using two pointers i, j on our array. i goes through the entire loop while j is the slow-moving pointer that only changes based on a condition.

 int j = 0; // Slow moving index

// i is the fast moving index that loops through the entire array
for (int i = 1; i < numbers.length; i++) {
    if (numbers[i] != numbers[j]) {
        j++;
        numbers[j] = numbers[i];
    }
}

The j index is basically playing catch up with i. When there is a duplicate element, i moves ahead while j stays back at the first duplicate element and then with numbers[j] = numbers[i], we assign the next unique value to the j location. After this, our original array has unique elements till index j but after that, we'll have leftover elements. To take care of that, we can create a new array from the numbers array.

int[] result = Arrays.copyOf(numbers, j + 1);
System.out.println(Arrays.toString(result));

And that's it. This will remove all the duplicated elements from the array. To test it, let me run the code:

Input array: [1, -2, 3, 1, 0, 9, 5, 6, 4, 5]
Final result after removing duplicates: [-2, 0, 1, 3, 4, 5, 6, 9]

The sorting of the array can be assumed to be done in O(nlogn). And then the iteration after that is O(n). Put together you still technically get O(n), which is the same as the previous case. Of course depending on a more specific kind of array, the sort might take less time as well. O(n) for the average case is what you finally get.

The full code is present as a gist:

Let me know if you have any more questions that need answers. That is all for this article.


For more programming articles, checkout Freblogg Freblogg/Java


This is the 15th article as part of my twitter challenge #30DaysOfBlogging. Fifteen more articles on various topics, including but not limited to, Java, Git, Vim, Software Development, Python, to come.

If you are interested in this, make sure to follow me on Twitter @durgaswaroop. While you're at it, Go ahead and subscribe here on medium and my other blog as well.


If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, Delorean which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code but also with usage documentation and also by identifying any bugs in its functionality.


Thanks for reading. See you again in the next article.


Image of Donuts: http://echmondprojects.com/wp-content/uploads/2016/04/two-arrays-700x525.png

January 04, 2018

Reduce Image Size With Python And Tinypng

23:11 Posted by Durga swaroop Perla , , , , , No comments

Whenever I want to upload images with my articles, I make sure they are of the right size first and then I have to check the file sizes and if they are too big, I will have to compress them. For this compression, I use Tinypng. They compress your images to a small size all the while keeping the image looking the same. I've tried some other services as well, but TinyPNG is definitely the best as their compression ratio is quite impressive.

In this article I'll show you how I'm planning to automate the image compression process using TinyPNG's developer API. And of-course we are going to using python.

Setting up

First of all, you need to have a developer key to connect to TinyPNG and use their services. So, go to Developer's API and enter your name and email.

TinyPNG API registration

Once you've registered, you'll get a mail from TinyPNG with a link and once you click on that, you'll go to your developers page which also has your API key and your usage information. Do keep it mind that for the free account, you can only compress 500 images per month. For someone like me, that's a number I won't really be reaching in a month anytime soon. But if do, you should probably check out their paid plans.

Developers API key page

PS: That's not my real key :D

Get started

Once you've the developer key, you can start compressing images using their service. The full documentation for Python is here.

You start by installing Tinify, which is TinyPNG's library for compression.

pip install --upgrade tinify

Then we can start using tinify in code by importing it and setting the API key from your developer's page.

If you've to send your requests over a proxy, you can set that as well.

tinify.proxy = "http://user:pass@192.168.0.1:8080"

Then, you can start compressing your image files. You can upload either PNG or JPEG files and tinify will compress it for you.

For the purpose of this article, I'm going to use the following delorean.jpeg image.

Delorean uncompressed

And I'll compress this to delorean-compressed.jpeg. For that we'll use the following code:

source = "delorean.jpeg"
destination = "delorean-compressed.jpeg"

original = tinify.from_file(source)
original.to_file(destination)

And that gives me this file:

Delorean compressed

If they both look the same, then that is the magic of TinyPNG's compression algorithm. It looks pretty much identical but it did compress it. To verify that, let's print the file sizes.

import os.path as path

original_size = path.getsize(source)
compressed_size = path.getsize(destination)
print(original_size/1024, compressed_size/1024)

And this prints,

29.0029296875 25.3466796875 1.144249662878058

The file was original 29 KB and now after compression it is 25.3 KB which is a fairly good compression for such a small file. If the original file was bigger, you will be able to see an even tighter compression.

And since this is the free version, there's a limit on the number of requests we can make. We can keep track of that with a built-in variable compression_count. You can print that after every requests to make sure you don't go over that.

compressions_this_month = tinify.compression_count
print(compressions_this_month)

You can also compress images from their URL's and store it locally. You will just do:

original = tinify.from_url("https://raw.githubusercontent.com/durgaswaroop/delorean/master/delorean.jpeg")

And then you can store the compressed file locally just like before.

Apart from just compressing the images, you can also resize them with TinyPNG's API. We'll cover that in the tomorrow's article here.

So, That is all for this article.


For more programming articles, checkout Freblogg, Freblogg/Python

Some articles on automation:

Web Scraping For Beginners with Python

My semi automated workflow for blogging

Publish articles to Blogger automatically

Publish articles to Medium automatically


This is the 13th article as part of my twitter challenge #30DaysOfBlogging. Seventeen more articles on various topics, including but not limited to, Java, Git, Vim, Software Development, Python, to come.

If you are interested in this, make sure to follow me on Twitter @durgaswaroop. While you're at it, Go ahead and subscribe here on medium and my other blog as well.


If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, Delorean which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.


Thanks for reading. See you again in the next article.

January 02, 2018

Datasets In Apache Spark | Part 2

In the two last tutorials we have covered what Apache Spark is and also got ourselves familiar with Datasets in Apache Spark, which is the primary data abstraction in Spark. In this tutorial we will see how to read a data file as a parametrized Bean object Dataset using Encoders.

Spark Image Logo 

This tutorial is going to be short, but this is very important as you would find yourself doing this frequently. In the last article you've seen how to read a CSV or JSON file as a Dataset. You might have noticed that we were using Dataset<Row> for everything. If you're not familiar with Generics in Java, Dataset<Row> can be thought of as a Dataset consisting of Row objects. The Row object is a spark sql class and is the default when creating a Dataset.

Although the Row class has some useful methods, as a generic object suitable for all types, it is not suitable for everything. Since Datasets usually store data that usually corresponds to a Bean class, it is better to create a Dataset of that bean class instead of Row. With this, you'll have access to all your usual getters and setters of the bean class. That's what We'll do in this article. We'll create a Dataset of POJO's instead of Row objects.

I'm using the same fake-people.csv file that I used in the last article that looks like this:

id,first_name,last_name,email,gender,ip_address
1,Netti,McKirdy,nmckirdy0@slideshare.net,Female,148.3.248.193
2,Nickey,Curreen,ncurreen1@tripadvisor.com,Male,206.9.48.216
3,Allayne,Chatainier,achatainier2@trellian.com,Male,191.118.4.217
...

To represent this data, I've created a POJO called FakePeople.java, which looks like this:

import lombok.Data;
public @Data class FakePeople {
    final int id;
    final private String firstName;
    final private String lastName;
    final private String email;
    final private String gender;
    final private String ipAddress;
}

I'm using Project Lombok here, to generate the required Getters, Setters and other POJO methods. (If you don't know about Lombok, you should definitely check that out. It is quite handy).

We have our POJO now, Let's get a parametrized Dataset. To achieve this we first need to create an Encoder. We do that for the FakePeople class as following:

Encoder<FakePeople> fakePeopleEncoder = Encoders.bean(FakePeople.class);

This will register our encoder which will help us parse our CSV data.

Of course we need our spark session variable as well.

// Initialize Sparksession
SparkSession spark = SparkSession.builder().appName("Freblogg-Spark").master("local").getOrCreate();

Now we can go ahead and read the CSV file, very much like the way we did before with just one addition.

// Without Encoder
Dataset<Row> people = spark.read().option("header", "true").csv("fake-people.csv");

// With Encoder
Dataset<FakePeople> people = spark.read().option("header", "true").csv("fake-people.csv").as(fakePeopleEncoder);

And the output of people.show(5) is the same as what you'd expect.

+---+----------+----------+--------------------+------+--------------+
| id|first_name| last_name|               email|gender|    ip_address|
+---+----------+----------+--------------------+------+--------------+
|  1|     Netti|   McKirdy|nmckirdy0@slidesh...|Female| 148.3.248.193|
|  2|    Nickey|   Curreen|ncurreen1@tripadv...|  Male|  206.9.48.216|
|  3|   Allayne|Chatainier|achatainier2@trel...|  Male| 191.118.4.217|
|  4|     Tades|    Emmett|temmett3@barnesan...|  Male|153.113.87.195|
|  5|     Shawn|    McGenn|smcgenn4@shop-pro.jp|  Male|  247.45.80.68|
+---+----------+----------+--------------------+------+--------------+

As you can see the only difference in creating the Dataset is .as(fakePeopleEncoder) and that gets us Dataset<FakePeople> instead of Dataset<Row>. And with that, we now have access to all the getters, setters of FakePeople class which we wouldn't otherwise have with a Row object. We'll explore more about how this is useful in a future tutorial.

For more information on Datasets: Spark SQL, DataFrames and Datasets Guide

That is all for this article.


For more programming articles, checkout Freblogg, Freblogg/Java, Freblogg/Spark

Apache Spark articles:

Word count with Apache Spark and Java

Datasets in Apache Spark | Part 1

Datasets in Apache Spark | Part 2


This is the 11th article as part of my twitter challenge #31DaysOfBlogging. Nineteen more articles on various topics, including but not limited to, Java, Git, Vim, Software Development, Python, to come.

If you are interested in this, make sure to follow me on Twitter @durgaswaroop. While you're at it, Go ahead and subscribe here on medium and my other blog as well.


If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, Delorean which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.


Thanks for reading. See you again in the next article.

December 31, 2017

My Almost Fully Automated Blogging Workflow

18:07 Posted by Durga swaroop Perla , , , , , , , No comments

In the article My semi automated workflow for blogging, I have outlined what my blogging process is like and how I've started to automate it. Ofcourse, at the time of that article, the process was still in early stages and I hadn't automated everything I do. And, that's where this article comes in. This is the second attempt at automating my entire Blogging workflow.

Medium blogger python logo

Just to give you some context, here are the things that I do when I'm blogging.

  1. Open a markdown file in Vim with the title of the article as the name along with some template text
  2. Open a browser with the html of the newly created markdown file
  3. Convert markdown to html with pandoc several times during the writing process
  4. Once the article is done and html is produced, edit the html to make some changes specific based on whether I'm publishing on Medium or if I'm publishing on Blogger
  5. Read the tags/labels and other attributes from the file and Publish the code as draft on Medium or Blogger.
  6. Once it looks good, Schedule or Publish it (This is a manual process. There's no denying it.)
  7. Finally tweet about the post with the link to the article

I have the individual pieces of this process ready. I have already written about them in the following articles.

Semi Automated Blogging Workflow

Publish Articles To Blogger In One Second

Publish Articles To Medium In One Second

Tweeting With Python & Tweepy

Now, since the individual pieces are ready, it might seem that everything is done. But, as it turns out (unsurprisingly), the integration is of-course a big deal and took a lot more effort than I was expecting. And I am documenting that in this article along with the complete flow.

It starts with the script blog-it which opens vim for me, opens chrome and also sets up a process for converting markdown to html, continuously.

That script calls blog.py which is what opens the vim along with the default text template. I would like to put the complete gist here, but it is just too long and so instead I'm showing the meat of the script.

article_title = title.replace("_", " ").title()

# Create the markdown file and add the title
f = open(md_file, "w+")
f.write(generate_comments_header(article_title))
f.write(article_title)  # Replace underscores and title case it
f.write("\n")
f.write("-" * len(title))
f.write("\n")
f.write(generate_footer_text())
f.close()

# Now, create the html file
html_file = title + ".html"
open(html_file, "w").close()

# Start vim with the markdown file open on line #10
subprocess.run(['C:/Program Files (x86)/Vim/vim80/gvim.exe', '+10', md_file])

Then comes m2h which continuously converts markdown to html.

This ends one flow. Next comes, publishing. I have broken this down because publishing is a manual process for me unless I can complete the entire article in one sitting, which is never going to be possible. So, Once I'm doing with writing it, I'll start the publishing.

I'll run publish.py which depending on the comments in the html publishes it to either Blogger or Medium. Again, I'm only showing a part of it. The full gist is available here.

with open(html_file) as file:
    html_file_contents = file.read()

re_comments = re.compile('\s*<!--(.*)-->', re.DOTALL)
comments_text = re_comments.search(html_file_contents).group(1).strip()
comments_parser = CommentParser.parse_comments(comments_text)

if comments_parser.destination.lower() == 'blogger':
    blogger_publish.publish(html_file, comments_parser.title, comments_parser.labels, comments_parser.post_id)
elif comments_parser.destination.lower() == 'medium':
    medium_publish.publish(html_file, comments_parser.title, comments_parser.labels)
else:
    print(
        'Unknown destination: ' + comments_parser.destination + '. Supported destinations are Blogger and Medium.')

Then comes the individual publishing scripts that publish to blogger and medium.

For blogger-publish.py (Gist here), I do any required modifications with blogger_modifications.py (Gist here) which converts some tags as expected my blogger page.

Then for medium-publish.py (Gist here), I take the parameters and publish to blogger as html. No, modifications needed to be done here.

access_token_file = '~/.medium-access-token'
expanded_path = os.path.expanduser(access_token_file)
with open(expanded_path) as file:
  access_token = file.read().strip()

headers = get_headers(access_token)
user_url = get_user_url(headers)

# Publish new post
posts_url = user_url + 'posts/'
payload = generate_payload(title, labels, html_file)
response = requests.request('POST', posts_url, data=payload, headers=headers)

Actually this publishing does send it to the site as a draft instead of actually publishing it. This is a step that I don't know how to automate as I have to manually take a look at how the article looks in preview. May be I should try doing this with selenium or something like that.

Once, I've verified that the post looks good, I will publish it and take the URL of the published article and call the tweeter.py (Gist here) which then opens a Vim file with some default text for title, and URL already filled in along with some hashtags. I'll complete the tweet and once, I close it, It gets published on Twitter.

And that completes the process. Obviously there are still a couple of manual steps. Although I can't eliminate all of them, I might be able to minimize them as well. But, so far it looks pretty good especially with just the little effort I've put into this in just one week. Of course, I'll keep on tuning it as needed to make it even better and may be I'll publish one final article for that.

That is all for this article.


For more programming articles, checkout Freblogg, Freblogg/Python

Some articles on automation:

Web Scraping For Beginners with Python My semi automated workflow for blogging Publish articles to Blogger automatically Publish articles to Medium automatically


This is the 9th article as part of my twitter challenge #30DaysOfBlogging. Twenty one more articles on various topics, including but not limited to, Java, Git, Vim, Software Development, Python, to come.

If you are interested in this, make sure to follow me on Twitter @durgaswaroop. While you're at it, Go ahead and subscribe here on medium and my other blog as well.


If you are interested in contributing to any open source projects and haven't found the right project or if you were unsure on how to begin, I would like to suggest my own project, Delorean which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.


Thanks for reading. See you again in the next article.