Skip to main content

Developers have to keep pace with the rise of data streaming

The rise of data streaming has forced developers to either adapt and learn new skills or be left behind. The data industry evolves at supersonic speed, and it can be challenging for developers to constantly keep up.

SD Times recently had a chance to speak with Michael Drogalis, the principal technologist at Confluent, a company that provides a complete set of tools needed to connect and process data streams. (This interview has been edited for clarity and length.)

SD Times: Can you set the context for how much data streaming is growing today and how important is it that developers pay more attention to it?

Drogalis: I remember back in like 2013 or 2014, I attended the Strange Loop Conference, which was really great. And as I was walking around, I saw there was this talk on the main stage by Jay Kreps, who’s now Confluent’s CEO, and it was about Apache Kafka. I walked away with two things on my mind. Number one, this guy was super tall like 6 foot 8 which was very impressionable. And then the other was that there are at least two people in the world who care about streaming, which is basically the vibe back then it was a very new technology.  

There were a lot of academic papers about it, and there were clearly patches of interest in the technology landscape that could be put together, but none of them had really broken out. 

The other project at that time was Apache Storm, which was a real-time stream processor, but it kind of just lacked the components around it. And so there was like a set of people: a small community. 

And then fast forward to today, and it’s just a completely different world. I have the privilege of working here and seeing companies, every size, every vertical, every industry, every use case, and with every latency requirement. And the transition is kind of just shocking to me that you don’t see a lot of technologies break out that quickly over the course of a decade.

SD Times: Are there any projects around this that you’re seeing are interesting?

Drogalis: I saw a few stats that are interesting this year. The Apache Foundation’s Kafka is one of the most active projects, which is pretty cool, because the Apache Foundation now has a huge number of projects that it incubates. And I also saw on the StackOverflow annual developer survey that Kafka was ranked as one of the most loved or one of the most recognizable technologies. To see it break out from being an undercurrent to something that’s really important and on peoples’ minds is pretty great.

SD Times: What are some of the challenges of handling data streaming today?

Drogalis: It’s kind of like driving on the opposite side of the road than you’re used to. You go to school, and you’re taught to program in maybe Java or Python. And so the basic paradigm everyone is taught is, you have a blob of data in a data structure in a file, and you suck it up, and then you process it, and then you spit it out somewhere. And you do this over and over again until you perform your data processing task, or you do whatever needs to be done. 

And streaming really turns this all on its head. You have this inversion of flow, and instead of bounded data structures, you have unbounded data structures. The data continuously comes in and you have to constantly process the very next thing that shows up. You really can’t arbitrarily scan into the future, because you don’t really know what’s coming. Events may be arriving out of order, and you don’t know if you have the complete picture yet. Everything is effectively asynchronous by default. And it takes some getting used to since it’s becoming an increasingly robust paradigm. 

But, it certainly is a big change to get your head around. I kind of liken it to when people were starting to adopt JavaScript on the server, and it’s async. So it definitely takes a little bit of getting used to but the power makes it worth it.

SD Times: So what are some of the best practices and most common skills that are needed to deal with the growth of data streaming?

Drogalis: A lot of it kind of comes down to experience. I mean, this is sort of a newer technology that’s kind of evolved somewhat recently. So a lot of it is just getting your hands dirty, going out and figuring out how does it work? What will work best? 

As far as best practices, I think a couple of things jumped out to me. Number one is getting your head around the idea of data retention. When you work with batch-oriented systems, the idea is to generally just kind of keep all your data forever, which can work. You may have some expiration policy that sort of works in the background where you mop up data that you don’t need at some point, but the streaming systems seem to have this idea of retention built into them where you age out old data, and you make this trade-off between what do I keep versus what do I throw away and what you keep is kind of the boundary of what you’re you’re able to process. 

The second thing that’s worth studying up on is to be intentional about your designs and the idea of time. With streaming, your data can kind of come out of order. I think a classic example of this is maybe you’re collecting events that are coming off of cell phones, and maybe somebody takes a cell phone and they drive into the Amazon rainforest, and they have no connectivity. And then they come out and they reconnect. And then the upload data from last week, the systems that you design have to be able to be intelligent enough to kind of look at it and say this data didn’t actually just happen. It’s from like a week ago. There’s power and there’s complexity, and the power is obviously that you can really retroactively update your view of the world. And you can take all kinds of special actions depending on whatever you want to do with your domain. But the complexity is that you have to figure out how to deal with that and factor that into your programming model. 

The post Developers have to keep pace with the rise of data streaming appeared first on SD Times.



from SD Times https://ift.tt/3hn8Rqa

Comments

Popular posts from this blog

Difference between Web Designer and Web Developer Neeraj Mishra The Crazy Programmer

Have you ever wondered about the distinctions between web developers’ and web designers’ duties and obligations? You’re not alone! Many people have trouble distinguishing between these two. Although they collaborate to publish new websites on the internet, web developers and web designers play very different roles. To put these job possibilities into perspective, consider the construction of a house. To create a vision for the house, including the visual components, the space planning and layout, the materials, and the overall appearance and sense of the space, you need an architect. That said, to translate an idea into a building, you need construction professionals to take those architectural drawings and put them into practice. Image Source In a similar vein, web development and design work together to create websites. Let’s examine the major responsibilities and distinctions between web developers and web designers. Let’s get going, shall we? What Does a Web Designer Do?

A guide to data integration tools

CData Software is a leader in data access and connectivity solutions. It specializes in the development of data drivers and data access technologies for real-time access to online or on-premise applications, databases and web APIs. The company is focused on bringing data connectivity capabilities natively into tools organizations already use. It also features ETL/ELT solutions, enterprise connectors, and data visualization. Matillion ’s data transformation software empowers customers to extract data from a wide number of sources, load it into their chosen cloud data warehouse (CDW) and transform that data from its siloed source state, into analytics-ready insights – prepared for advanced analytics, machine learning, and artificial intelligence use cases. Only Matillion is purpose-built for Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Azure, enabling businesses to achieve new levels of simplicity, speed, scale, and savings. Trusted by companies of all sizes to meet

2022: The year of hybrid work

Remote work was once considered a luxury to many, but in 2020, it became a necessity for a large portion of the workforce, as the scary and unknown COVID-19 virus sickened and even took the lives of so many people around the world.  Some workers were able to thrive in a remote setting, while others felt isolated and struggled to keep up a balance between their work and home lives. Last year saw the availability of life-saving vaccines, so companies were able to start having the conversation about what to do next. Should they keep everyone remote? Should they go back to working in the office full time? Or should they do something in between? Enter hybrid work, which offers a mix of the two. A Fall 2021 study conducted by Google revealed that over 75% of survey respondents expect hybrid work to become a standard practice within their organization within the next three years.  Thus, two years after the world abruptly shifted to widespread adoption of remote work, we are declaring 20