Thursday, April 14, 2016

My Interview @ Global Big Data Conference, Dallas

One more month to go for the conference and I am busy preparing slide content and hands-on data wrangling exercises. It is an awesome experience to learn how to teach :)

Anyways, here is the link to my interview that got published a couple of days ago to give the audience a feel for what they can expect. 

http://globalbigdataconference.com/news/129000/interview-with-ashwini-kuntamukkala-software-architect-vizient.html

Monday, March 21, 2016

Upcoming talks at Global Big Data Convention in Dallas in May 13/14/15 2016

I am excited to share that I will be giving two talks at the Dallas Big Data Convention on the weekend of May 13th 2016 as Las Colinas Convention Center. Here are the topics and corresponding abstracts. 


Data Wrangling - What, Why and How? [Industry state, business applicability]

Abstract:

"Garbage in -> Garbage out (GIGO)" is a popular  quote in the field of computer science. But is that really true? 

We are now nearing post "Big Data” era where scaling data storage and compute capacity is almost as easy as pushing a button. The ecosystem of data processing tools is getting richer by the day. In such a thriving environment, data is not the new “oil" but the new “soil” where companies can grow several data driven business models. Many "forward-looking" companies are already unlocking the hidden insights in treasure troves of data they already have along with publicly available data. 

Just as gold is extracted from ore through a very rigorous refinement process, insights from data in the crudest form have to be discovered through rigorous process. This raw data is typically locked up in spread sheets, web pages, web/machine logs, PDFs, CSVs, TSVs, XML, JSON, Word, images, videos, hand written notes, audio, RDFs, sensor signals, databases etc. 

One can’t perform accurate statistical analysis on inaccurate data. So, in order to facilitate effective use of raw data, there is an upsurge in the market place for plethora of tools that ease converting raw data into usable form. This process of converting raw data into usable form is called "Data Wrangling”. 

After this iterative step of curated datasets, we can unleash rich analytics, visualizations etc to drive the end business objectives.  

In this talk, I will cover the essentials of data wrangling, the necessary workflow, open source tools available at our disposal and also provide comparison among popular commercial vendors in this field based on my own experience and use cases. 


Transform your Enterprise into Data-Driven Digital Business

Abstract: 

Is your company in on “Digital Transformation”? 

This phenomenon is causing enterprises to rethink their strategy as they modernize themselves to stay relevant in an ultra competitive world. Darwin's quote “The fittest survive” is more relevant in business today than ever before. The enterprises that adapt and adopt this fundamentally a new “culture" will disrupt their own business and stand the test of time. 

The reason I say “culture” instead of “strategy” is because Peter Drucker, father of modern management, has said that “Culture eats strategy for breakfast”. So in a company if the spirit of innovation, agility are missing, no disruptive strategy has a chance to work.

So if you are someone who wants to be a change agent, a catalyst or a visionary for your company to create more value or enter a new growth spurt or perhaps even a new market but are running into walls because the culture does not facilitate that, you can be frustrated like the driver trying to steer a parked car. 

In this talk, I will share my insights from my own experience and much of others that I have had the good fortune to collaborate with. This will help you carve a successful digital transformation roadmap for your company. 


You will be empowered with practical insights, tidbits, and result oriented successful practices that you can take back and start your company’s "digital transformation".

--------

I look forward to hearing your thoughts as I put together content for my talk. Please feel free to reach on LinkedIn or Twitter [@akuntamukkala] if you have any questions or comments. 

See you there!



Tuesday, November 18, 2014

DZone Refcard on Apache Spark

Glad to share that DZone Refcard for Apache Spark is now available for download at http://t.co/s3tNmWPqcr

It is a short digest of what Apache Spark is about and capabilities it enables for engineers and data scientists. 

Monday, November 10, 2014

DZone - Developer of the Week

Over the years I have benefited from the curated content published by DZone. I am amazed by the way the content writers at DZone publish intriguing and engaging content and especially the useful developer friendly Refcards.

Recently I got an opportunity to work with DZone content creators to write a Refcard on Apache Spark. It was a fantastic experience and one that I highly recommend for anyone who intends to work with the best in the industry.

As a preview to the Refcard, I was honored to be featured as the developer of the week on DZone. My interview is published here. 

I am very excited about the future of Apache Spark in the Big Data ecosystem.
Per the industry trends, many IT professionals are transitioning their careers to Big Data as companies are realizing that data is their new currency as it can potentially unlock the door to new revenue streams.

Herein lies a challenge where many struggle with "separating noise from the voice". Since Hadoop ecosystem has grown over the last 10 years, it can be a daunting task for anyone who wants to get into this space because there are so many tools and solutions to solve plethora of Big Data use cases.

This is exactly the reason I am excited about Apache Spark. It is compelling platform that provides a unified approach to solve most common Big Data use cases classified into batch, interactive and real time data processing.

In the DZone Refcard on Apache Spark, I have catered to new or moderately experienced IT professionals who want to discover the capabilities of Apache Spark. I have included simple hands on examples and techniques that demonstrate how easily can one become productive using Apache Spark and start solving Big Data use cases.

At SciSpike, we are excited about helping our clients adopt Apache Spark in their Big Data infrastructure and prove its merits and become a platform of choice for Big Data applications. 


I look forward to hearing your thoughts on the Refcard. It should be out in Nov 2014.

I am reachable via Twitter @akuntamukkala

Thursday, May 29, 2014

ActiveMQ - Network of Brokers Explained - Part 5


In the previous part 4 we have seen how to load balance remote consumers on a queue using network connectors.

In this part 5, we will see how the same configuration would work if we had concurrent remote durable subscribers on a topic.  Consider the following configuration....  


Fig 1: Network of Brokers - Load balance subscribers on a topic

As shown above, we have Broker-1 which initiates two network connectors to Broker-2 and Broker-3. A producer sends messages to a topic "moo.bar" on Broker-1 while Broker-2 has subscriber C1 and Broker-3 has two subscribers C2 and C3 on the same topic "moo.bar". 

You may observe that this set up is very similar to part 4. The only difference is that here we are dealing with topics while in part 4, we were dealing with queues. 

Let's see this in action


  1. Add the following network connector configuration in Broker-1's activemq.xml configuration file

     <networkConnectors>
    <networkConnector
    name="T:broker1->broker2"
    uri="static:(tcp://localhost:61626)"
    duplex="false"
    decreaseNetworkConsumerPriority="false"
    networkTTL="2"
    conduitSubscriptions="false"
    dynamicOnly="true">
    <excludedDestinations>
    <queue physicalName="&gt;" />
    </excludedDestinations>
    </networkConnector>
    <networkConnector
    name="T:broker1->broker3"
    uri="static:(tcp://localhost:61636)"
    duplex="false"
    decreaseNetworkConsumerPriority="false"
    networkTTL="2"
    conduitSubscriptions="false"
    dynamicOnly="true">
    <excludedDestinations>
    <queue physicalName="&gt;" />
    </excludedDestinations>
    </networkConnector>
    </networkConnectors>


  2. Let's start broker-2, broker-3 and broker-1 in that order.
  3. akuntamukkala@localhost~/apache-activemq-5.8.0/cluster/broker-2/bin$ ./broker-2 console
  4. akuntamukkala@localhost~/apache-activemq-5.8.0/cluster/broker-3/bin$ ./broker-3 console
  5. akuntamukkala@localhost~/apache-activemq-5.8.0/cluster/broker-1/bin$ ./broker-1 console

  6. Broker-1's admin console connections show that two network connectors have been established as configured from Broker-1 to Broker-2 and Broker-3 respectively
  7. Broker-1's Connections @ http://localhost:8161/admin/connections.jsp







  8. Let's start the subscriber C1 on Broker-2 subscribing to messages to topic "moo.bar" and subscribers C2 and C3 on Broker-3 subscribing to messages on same topic "moo.bar"
  9. Durable Subscribers require unique combination of client id and subscriber name. In order for us to create durable subscribers C2 and C3 we need to enhance the functionality provided in /Users/akuntamukkala/apache-activemq-5.8.0/example/src/ConsumerTool.java where /Users/akuntamukkala/apache-activemq-5.8.0 is the directory where ActiveMQ is installed.
  10. The modified code consists of editing build.xml and ConsumerTool.java to add a new parameter "subscriberName". The edited files build.xml and ConsumerTool.java can be obtained from here and here respectively.
  11. Let's start the subscribers now.
  12. akuntamukkala@localhost~/apache-activemq-5.8.0/example$ ant consumer -Durl=tcp://localhost:61626 -Dtopic=true -Dsubject=moo.bar -DclientId=C1 -Ddurable=true -DsubscriberName=mb.C1
  13. akuntamukkala@localhost~/apache-activemq-5.8.0/example$ ant consumer -Durl=tcp://localhost:61636 -Dtopic=true -Dsubject=moo.bar -DclientId=C2 -Ddurable=true -DsubscriberName=mb.C2
  14. akuntamukkala@localhost~/apache-activemq-5.8.0/example$ ant consumer -Durl=tcp://localhost:61636 -Dtopic=true -Dsubject=moo.bar -DclientId=C3 -Ddurable=true -DsubscriberName=mb.C3

  15. Durable subscriber on Broker-2

    http://localhost:9161/admin/subscribers.jsp
  16. Durable subscribers on Broker-3
    http://localhost:10161/admin/subscribers.jsp

  17. Durable subscribers on Broker-1 (because of network connectors)
    http://localhost:8161/admin/subscribers.jsp
  18.  Now let's send 10 durable messages to topic moo.bar on Broker-1
  19. akuntamukkala@localhost~/apache-activemq-5.8.0/example$ ant producer -Durl=tcp://localhost:61616 -Dtopic=true -Dsubject=moo.bar -Dmax=10 -Ddurable=true
  20. See the console on Broker-3
    Log file output on Broker-3
  21. As you may observe, Broker-3 receives the same message twice, once per each subscription C2 and C3. ActiveMQ by default does not permit processing of duplicate messages.
  22. This happens because both the subscriptions mb.C2 and mb.C3 on Broker-3 are propagated to Broker-1. So when 10 messages are published to moo.bar on Broker-1, those messages are sent over to subscribers mb.C2 and mb.C3 on the same broker: Broker-3. Since the messages have the same ID, duplicate messages are discarded and hence the warning shown in the log messages....(shown in step 19)
  23. Here is the console showing statistics on Broker-1
    http://localhost:8161/admin/subscribers.jsp

  24. Here is the console showing statistics on Broker-3
    http://localhost:10161/admin/subscribers.jsp

  25. As you can see even though the enqueue counter shows 20, the dequeue counter shows only 10, since the other 10 messages were discarded by the Broker-3. This is a useful feature which helps to ensure that a message gets processed at most once by a broker.
The reason why this is occurring is because both subscriptions C2 and C3 are propagated to upstream broker Broker-1 


Duplicate Messages on Broker-3


Let's retry the same scenario using a minor tweak in the network connector settings by making conduitSubscriptions="true" on both network connectors from Broker-1 to Broker-2 and Broker-3 respectively. After restarting the brokers, delete the inactive durable subscribers and then repeat the above steps. 

   <networkConnectors>
<networkConnector
name="T:broker1->broker2"
uri="static:(tcp://localhost:61626)"
duplex="false"
decreaseNetworkConsumerPriority="false"
networkTTL="2"
conduitSubscriptions="true"
dynamicOnly="true">
<excludedDestinations>
<queue physicalName="&gt;" />
</excludedDestinations>
</networkConnector>
<networkConnector
name="T:broker1->broker3"
uri="static:(tcp://localhost:61636)"
duplex="false"
decreaseNetworkConsumerPriority="false"
networkTTL="2"
conduitSubscriptions="true"
dynamicOnly="true">
<excludedDestinations>
<queue physicalName="&gt;" />
</excludedDestinations>
</networkConnector>
</networkConnectors>



The following screenshot shows that Broker-1 now sees only two durable subscribers, one from each broker,  Broker-2 and Broker-3. 

Durable Subscribers in Broker-1 when conduitSubscriptions="true"

Upon publishing 10 durable messages on Broker-1, we find that we don't have the same issue of duplicate messages this time. 

As expected all the 10 messages are processed by C1, C2 and C3 as shown by screenshots below. 

Broker-1's Durable Topic Subscribers

Broker-3's Durable Topic Subscribers C2 and C3 receive and process 10 messages each


Hence we have seen how conduitSubscriptions attribute can help in reducing message traffic by avoiding duplicate messages in a network of brokers.


In the next part 6, we will see how ActiveMQ provides "message replay" capabilities in order to prevent stuck message scenarios. 


Friday, May 9, 2014

Speaking at Global Big Data Conference in Dallas May 11, 2014

I am going to be speaking about Apache Spark in Global Big Data Conference on May 11th 2014 from 11.00am to 12.00pm @ Irving Convention Center, 500 W Las Colinas Blvd, Irving, TX 75039 

Here is the abstract of the presentation: 

I am impressed with the capabilities Apache Spark enables to unify batch, streaming and interactive big data use cases. The brilliant folks at AMPLabs @ UC Berkeley have created a tremendous solution that takes big data processing to the next level! 

Let's see some lightning fast big data analytics powered by Apache Spark!

Look forward to seeing you there!

Wednesday, March 26, 2014

ActiveMQ - Network of Brokers Explained - Part 4

In the previous part 3 , we have seen how ActiveMQ helps distinguish remote consumers from local consumers which helps in determining shorter routes from message producers to consumers.

In this part 4,  we will look into how to load balance concurrent consumers on remote brokers.

Let’s consider a bit more advanced configuration to load balance concurrent message consumers on a queue in remote brokers as shown below. 

Part 4 - Network of brokers 

In the above configuration, we have a message producer sending messages into a queue moo.bar on broker-1. Broker-1 establishes network connectors to broker-2 and broker-3. Consumer C1 consumes messages from queue moo.bar on broker-2 while consumers C2 and C3 are concurrent consumers on queue moo.bar on broker-3. 

Let's see this in action


Let's create three brokers instances...
  1. Ashwinis-MacBook-Pro:bin akuntamukkala$ pwd
    /Users/akuntamukkala/apache-activemq-5.8.0/bin

  2. Ashwinis-MacBook-Pro:bin akuntamukkala$./activemq-admin create ../cluster/broker-1

  3. Ashwinis-MacBook-Pro:bin akuntamukkala$./activemq-admin create ../cluster/broker-2

  4. Ashwinis-MacBook-Pro:bin akuntamukkala$./activemq-admin create ../cluster/broker-3

  5. Fix the broker-2 and broker-3 transport, amqp connectors and jetty http port by modifying the corresponding conf/activemq.xml and conf/jetty.xml as follows:

    BrokerOpenwire PortJetty HTTP PortAMQP Port
    broker-1
    61616
    8161
    5672
    broker-2
    61626
    9161
    5682
    broker-3
    61636
    10161
    5692


  6. Fix network connector on broker-1 such that messages on queues can be forwarded dynamically to consumers on broker-2 and broker-3. This can be done by adding the following XML snippet into broker-1's conf/activemq.xml

    <networkConnectors>

        
    <networkConnector
    1.   name="Q:broker1->broker2"
        uri="static:(tcp://localhost:61626)"
        duplex="false"
        decreaseNetworkConsumerPriority="true"
        networkTTL="2"
        dynamicOnly="true">
        <excludedDestinations>
           <topic physicalName="&gt;" />
        </excludedDestinations>
      </networkConnector>

      <networkConnector
         name="Q:broker1->broker3"
         uri="static:(tcp://localhost:61636)"
         duplex="false"
         decreaseNetworkConsumerPriority="true"
         networkTTL="2"
         dynamicOnly="true">
         <excludedDestinations>
              <topic physicalName="&gt;" />
         </excludedDestinations>
      </networkConnector>
    </networkConnectors>

  7.  Start broker-2, broker-3 and broker-1. We can start these in any order.
    1. /apache-activemq-5.8.0/cluster/broker-3/bin$ ./broker-3 console
    2. /apache-activemq-5.8.0/cluster/broker-2/bin$ ./broker-2 console
    3. /apache-activemq-5.8.0/cluster/broker-1/bin$ ./broker-1 console
  8. Let's start the consumers C1 on broker-2 and C2, C3 on broker-3 but on the same queue called "moo.bar"
    1. /apache-activemq-5.8.0/example$ ant consumer -Durl=tcp://localhost:61626 -Dsubject=moo.bar
    2. /apache-activemq-5.8.0/example$ ant consumer -Durl=tcp://localhost:61636 -Dsubject=moo.bar -DparallelThreads=2

      The consumer subscriptions are forwarded by broker-2 and broker-3 to their neighboring broker-1 which has a network connector established to both broker-2 and broker-3 by the use of advisory messages. 
  9. Let's review the broker web consoles to see the queues and corresponding consumers. 
    1. We find that broker-2's web console shows one queue "moo.bar" having 1 consumer, broker-3's web console shows one queue "moo.bar" having 2 concurrent consumers
    2. Though there are three consumers (C1 on broker-2 and C2,C3 on broker-3), broker-1 sees only two consumers (representing broker-2 and broker-3).
    3. http://localhost:8161/admin/queues.jsp










    4. This is because the network connector from broker-1 to broker-2 and to broker-3 by default has a property "conduitSubscriptions" which is true. 
      Due to which broker-3's C2 and C3 which consume messages from the same queue "moo.bar" are treated as one consumer in broker-1.
  10. Let's produce 30 messages into broker-1's queue moo.bar and see how the messages are divvied among the consumers C1, C2 and C3
Shows how the messages were propagated from producer to consumers C1, C2, C3
As seen above, even though there were three consumers and 30 messages, they didn't get to process 10 messages each as C2, C3 subscriptions were consolidated into one consumer at broker-1. 

conduitSubscriptions="true" is a useful setting if we were creating subscribers on topics as that would prevent duplicate messages. More on this in part 5.

So, in order to make C2 and C3 subscriptions on queue moo.bar propagate to broker-1, let's redo the same steps 6, 7, 8, 9 and 10 after setting conduitSubscriptions="false" in broker-1's network connector configuration in conf/activemq.xml 

Here is the new network connector configuration snippet for broker-1:

<networkConnectors>
  <networkConnector
    name="Q:broker1->broker2"
    uri="static:(tcp://localhost:61626)"
    duplex="false"
    decreaseNetworkConsumerPriority="true"
    networkTTL="2"
    conduitSubscriptions="false"
    dynamicOnly="true">
    <excludedDestinations>
       <topic physicalName="&gt;" />
    </excludedDestinations>
  </networkConnector>
  <networkConnector
    name="Q:broker1->broker3"
    uri="static:(tcp://localhost:61636)"
    duplex="false"
    decreaseNetworkConsumerPriority="true"
    networkTTL="2"
    conduitSubscriptions="false"
    dynamicOnly="true">
    <excludedDestinations>
       <topic physicalName="&gt;" />
    </excludedDestinations>
  </networkConnector>
</networkConnectors>

Upon restarting the brokers and consumers C1, C2 and C3 and producing 30 messages into broker-1's moo.bar queue, we find that all of the three consumer subscriptions are visible at broker-1. As a result broker-1 dispatches 10 messages to each of the consumers in a round-robin fashion to load balance. This is depicted pictorially below. 

Shows how the messages were propagated from producer to consumers C1, C2, C3
Broker-1's web console @ http://localhost:8161/admin/queueConsumers.jsp?JMSDestination=moo.bar shows that broker-1 now sees 3 consumers and dispatches 10 messages to each consumer



Thus in this part 4 of the blog series, we have seen how we can load balance remote concurrent consumers which are consuming messages from a queue. 

As always, your comments and feedback is appreciated!

In the next part 5, we will explore how the same scenario will play out if we were to use a topic instead of a queue. Stay tuned...

References

Resources

  • The configuration files (activemq.xml and jetty.xml) of all the brokers used in this blog are available here.