Google Summer of Code 2017

August 27, 2017

GSoC has come to an end and although I’m sad to be finishing up, I’m very proud of what we’ve managed to accomplish and grateful to have been a part of this project.

For the last three months myself and fellow gsocer Polina have worked on the Unified Push Server to come up with a proof of concept for using Apache Kafka as its internal messaging system. Here is a quick an overview of what we did, what can still be improved on and what still has to be done. Skip to the bottom for a summary of what I gained from the experience and my advice to any future Gsocers!

Overall Stats

  • 88 total Jira Tasks created
  • 38 pull requests to the UPS
  • 7 pull requests to Kafka CDI
  • The GSoC branch is 60 commits ahead of the master

What was done

With the help of our mentors, we managed to replace the Java Messaging System with a completely Kafka based workflow for push message routing:

  1. When the Push Notification Sender endpoint is invoked, a Kafka producer in that endpoint produces the request with key/value pair (PushApplication, InternalUnifiedPushMessage) to the agpush_pushMessageProcessing topic.

  2. The Notification Router streams class is initialized on application startup and begins its work when a message is sent to the above-mentioned topic. After some processing, it streams the push messages out (in the form of a MessageHolderWithVariants object) to 6 different output topics based on the message’s variant type (Android, iOS, etc).

  3. The MessageHolderWithVariantsKafkaConsumer listens for messages on the 6 streams output topics and passes them to the Token Loader. With some further complicated processing, the Token Loader loads batches of tokens from the database for devices that match requested database parameters.

  4. The messages with tokens are dispatched and received by the MessageHolderWithTokens producer which produces each message to its respective topic (again, based on variant type).

  5. Finally, we have a MessageHolderWithTokens consumer which consumes the MessageHolderWithTokens objects and fires a CDI event. This event is handled on the other side by the NotificationDispatcher, which sends each message to its appropriate push network (Adm, Apns, etc). We decided to stick with CDI events over regular Producers and Consumers to offer an extra layer of abstraction.

We’ve also looked at Kafka for collecting and processing push related metrics. So far, we have topics for invalid tokens (which will be used for removing them from the database), push message failures and successes, and iOS-specific token failures and successes. We’re currently working on a demo application that will read from these topics and perform processing on the data using Kafka Streams.

This work can all be broken down into the PRs, blog posts, and mailing list threads below (along with PRs by Polina and Matthias, not listed):

Kafka-CDI library

UnifiedPush Server

  • Describe installation of Apache Kafka for dev environment on Docker and Openshift (#838, #862, #902, final readme)
  • Setup Kafka test environment (#848)
  • Implementation of the first Kafka Producer in the InstallationRegistrationEndpoint (#841, #852)
  • Integrating the Kafka CDI library, replacing producers with injection (#857, #861)
  • Analysis of codebase with Structure 101 and SonarQube with JaCoCo code coverage (#865, #863, blog post, mailing list thread #1)
  • Create a Kafka module for Kafka related configuration and tests (#870)
  • Jackson polymorphic serialisation for abstract Variant class (#889, mailing list thread #1)
  • Update Installation metrics producer (#890)
  • Create NotificationDispatcher consumer (closed, in favour of CDI events abstraction) (#897)
  • Use Kafka Streams for push message processing (#900)
  • Add a producer for APNS specific token metrics (#908)
  • Cleanup and bug fix for MessageHolderWithVariants events (#917)

Research & Other

And finally, all mailing list updates: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 (UPS roadmap)

All blog posts: 1, 2, 3, 4, 5

What is left to do

Migration to HBase: Our project diverged naturally from the initial proposal. We encountered several unexpected roadbumps and we ran out of time for database migration (which is a huge task within itself).

Unit and integration testing: The UPS average code coverage percentage is quite low. Ideally I would liked to have be more thorough in the testing of our new branch, and improved the overall test coverage of the master branch. This is something we want to work on in the future.

Kafka Security: Our final goal (as agreed upon with our mentors) was always a working proof of concept, as opposed to a production ready product. The remaining Jiras in our backlog are mostly related to security, which will be worked on after GSoC is over.

All other remaining tickets in our backlog can be found under the gsoc_2017 label, here

What I gained from the experience

Being able to participate in GSoC has has helped me immensely and I’ve grown a lot both technically and non-technically in the last 3-4 months. This includes:

  • Improving my programming skills. Working on a real world problem with such a strong programmer were the major factors in this. I really enjoyed the process of peer reviews and I learnt a lot from doing them.
  • Learning LOADS of new technologies, from Kafka to Openshift and Java EE to Sonarqube, to name a few.
  • Learning how to work in a team, improving my communication skills and my confidence in general.

I would strongly encourage anyone to consider applying for Google Summer of Code. There’s a huge variety of projects and organisations to work with and you won’t lose an awful lot by trying. Based on my own experience, the best adivce I could give to future Gsocers would be:

  • Don’t worry about not being good enough
  • Don’t be scared to ask for help and don’t let yourself get stuck unnecessarily. Even if you have minor questions or concerns, its important to ask. That’s what your mentors and the community are there for, and I can guarantee they won’t judge you for it. In my experience asking questions does not display weakness but a willingness to learn
  • Get involved in the community! Don’t underestimate how important that can be for a successful project.
  • Work hard but most importantly try to enjoy the opportunity!

A huge thanks to my mentors, @mziccard @matzew and @lgriffin for the amount of time, energy and patience they dedicated to us and the project, and thanks to @polinankoleva for being an all round great team-mate!

comments powered by Disqus