Google Summer of Code 2017
August 27, 2017
GSoC has come to an end and although I’m sad to be finishing up, I’m very proud of what we’ve managed to accomplish and grateful to have been a part of this project.
For the last three months myself and fellow gsocer Polina have worked on the Unified Push Server to come up with a proof of concept for using Apache Kafka as its internal messaging system. Here is a quick an overview of what we did, what can still be improved on and what still has to be done. Skip to the bottom for a summary of what I gained from the experience and my advice to any future Gsocers!
Useful links
GSoc 2017 Branch
and commits- All UPS PRs and Kafka CDI PRs by me
GSoC 2017 Jira board
Overall Stats
- 88 total Jira Tasks created
- 38 pull requests to the UPS
- 7 pull requests to Kafka CDI
- The GSoC branch is 60 commits ahead of the master
What was done
With the help of our mentors, we managed to replace the Java Messaging System with a completely Kafka based workflow for push message routing:
When the Push Notification Sender endpoint is invoked, a Kafka producer in that endpoint produces the request with key/value pair (
PushApplication
,InternalUnifiedPushMessage
) to theagpush_pushMessageProcessing
topic.The
Notification Router
streams class is initialized on application startup and begins its work when a message is sent to the above-mentioned topic. After some processing, it streams the push messages out (in the form of aMessageHolderWithVariants
object) to 6 different output topics based on the message’s variant type (Android, iOS, etc).The
MessageHolderWithVariantsKafkaConsumer
listens for messages on the 6 streams output topics and passes them to theToken Loader
. With some further complicated processing, the Token Loader loads batches of tokens from the database for devices that match requested database parameters.The messages with tokens are dispatched and received by the
MessageHolderWithTokens
producer which produces each message to its respective topic (again, based on variant type).Finally, we have a
MessageHolderWithTokens
consumer which consumes theMessageHolderWithTokens
objects and fires a CDI event. This event is handled on the other side by theNotificationDispatcher
, which sends each message to its appropriate push network (Adm, Apns, etc). We decided to stick with CDI events over regular Producers and Consumers to offer an extra layer of abstraction.
We’ve also looked at Kafka for collecting and processing push related metrics. So far, we have topics for invalid tokens (which will be used for removing them from the database), push message failures and successes, and iOS-specific token failures and successes. We’re currently working on a demo application that will read from these topics and perform processing on the data using Kafka Streams.
This work can all be broken down into the PRs, blog posts, and mailing list threads below (along with PRs by Polina and Matthias, not listed):
Kafka-CDI library
- Added default generic serializer/deseralizer to handle objects of type
T
(#17, mailing list thread #1, mailing list thread #2) - Unit testing for
serialization
package (#17, #23)
UnifiedPush Server
- Describe installation of Apache Kafka for dev environment on Docker and Openshift (#838, #862, #902, final readme)
- Setup Kafka test environment (#848)
- Implementation of the first Kafka Producer in the
InstallationRegistrationEndpoint
(#841, #852) - Integrating the
Kafka CDI
library, replacing producers with injection (#857, #861) - Analysis of codebase with Structure 101 and SonarQube with JaCoCo code coverage (#865, #863, blog post, mailing list thread #1)
- Create a Kafka module for Kafka related configuration and tests (#870)
- Jackson polymorphic serialisation for abstract
Variant
class (#889, mailing list thread #1) - Update Installation metrics producer (#890)
- Create
NotificationDispatcher
consumer (closed, in favour of CDI events abstraction) (#897) - Use Kafka Streams for push message processing (#900)
- Add a producer for APNS specific token metrics (#908)
- Cleanup and bug fix for
MessageHolderWithVariants
events (#917)
Research & Other
- AGPUSH-2098 Spike for initial Kafka integration
- AGPUSH-2107 Spike for Kafka Stream API usage
- AGPUSH-2104 Research Java EE programming model for Kafka
- AGPUSH-2108, AGPUSH-2148 Research Kafka on Openshift
- AGPUSH-2109 Research Kafka Security
- AGPUSH-2181 Research custom ser/des
- AGPUSH-2110 Spike for Push notification delivery
- AGPUSH-2111 Spike for Push notification Metrics
- AGPUSH-2157 Kafka performance metrics
- Small tweaks to the UPS mock data loader (#5)
- Small tweaks to the Java-ADM library (#5)
- Stream processing demo application
And finally, all mailing list updates: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 (UPS roadmap)
What is left to do
Migration to HBase: Our project diverged naturally from the initial proposal. We encountered several unexpected roadbumps and we ran out of time for database migration (which is a huge task within itself).
Unit and integration testing: The UPS average code coverage percentage is quite low. Ideally I would liked to have be more thorough in the testing of our new branch, and improved the overall test coverage of the master branch. This is something we want to work on in the future.
Kafka Security: Our final goal (as agreed upon with our mentors) was always a working proof of concept, as opposed to a production ready product. The remaining Jiras in our backlog are mostly related to security, which will be worked on after GSoC is over.
All other remaining tickets in our backlog can be found under the gsoc_2017
label, here
What I gained from the experience
Being able to participate in GSoC has has helped me immensely and I’ve grown a lot both technically and non-technically in the last 3-4 months. This includes:
- Improving my programming skills. Working on a real world problem with such a strong programmer were the major factors in this. I really enjoyed the process of peer reviews and I learnt a lot from doing them.
- Learning LOADS of new technologies, from Kafka to Openshift and Java EE to Sonarqube, to name a few.
- Learning how to work in a team, improving my communication skills and my confidence in general.
I would strongly encourage anyone to consider applying for Google Summer of Code. There’s a huge variety of projects and organisations to work with and you won’t lose an awful lot by trying. Based on my own experience, the best adivce I could give to future Gsocers would be:
- Don’t worry about not being good enough
- Don’t be scared to ask for help and don’t let yourself get stuck unnecessarily. Even if you have minor questions or concerns, its important to ask. That’s what your mentors and the community are there for, and I can guarantee they won’t judge you for it. In my experience asking questions does not display weakness but a willingness to learn
- Get involved in the community! Don’t underestimate how important that can be for a successful project.
- Work hard but most importantly try to enjoy the opportunity!
A huge thanks to my mentors, @mziccard @matzew and @lgriffin for the amount of time, energy and patience they dedicated to us and the project, and thanks to @polinankoleva for being an all round great team-mate!