Skip to main content

SD Times Open-Source Project of the Week: spark-inequality-impact

LinkedIn is sharing its “Project Every Member” initiative with the open sourcing of spark-inequality-impact, an Apache Spark library that can be used by other organizations in any domain where measuring and reducing inequality, or avoiding unintended inequality consequences may be desirable.  

“This work is furthering our commitment to closing the network gap and making sure everyone has a fair shot at finding and accessing opportunities, regardless of their background or connections,” LinkedIn wrote in a blog post.

LinkedIn announced last month that it would be building inclusive products through A/B testing in the initiative called Project Every Member. 

LinkedIn stated that any change on its platform is subjected to a series of testing and analysis processes to ensure that it achieves intended product goals and business objectives through A/B testing. The best way to go about it is to start by giving a preview of the change or feature to a few members for a limited time, and then measure the results. 

The Atkinson index is then used to determine which end of the distribution contributed most to the observed inequality and allows developers to encode other information about the population being measured into the analysis to overcome any shortcomings that A/B testing has. 

LinkedIn decided to implement Atkinson index computations using Apache Spark due to scalability considerations with respect to the size of the data over which to compute inequality, for example, the number of individuals who are part of specific A/B tests and the number of times inequality needs to be computed. 

While inequality metrics can already be computed on R and Python, they typically require users to fit all the data in memory within a single machine. 

“We are releasing a package that leverages the fact that the Atkinson index can be decomposed as a sum, which means the data does not to be held in memory all at once. We then use it as part of a larger pipeline that applies it to many A/B tests at once,” LinkedIn wrote. 

The code is available on GitHub here.

The post SD Times Open-Source Project of the Week: spark-inequality-impact appeared first on SD Times.



from SD Times https://ift.tt/2XHD50f

Comments

Popular posts from this blog

A guide to data integration tools

CData Software is a leader in data access and connectivity solutions. It specializes in the development of data drivers and data access technologies for real-time access to online or on-premise applications, databases and web APIs. The company is focused on bringing data connectivity capabilities natively into tools organizations already use. It also features ETL/ELT solutions, enterprise connectors, and data visualization. Matillion ’s data transformation software empowers customers to extract data from a wide number of sources, load it into their chosen cloud data warehouse (CDW) and transform that data from its siloed source state, into analytics-ready insights – prepared for advanced analytics, machine learning, and artificial intelligence use cases. Only Matillion is purpose-built for Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Azure, enabling businesses to achieve new levels of simplicity, speed, scale, and savings. Trusted by companies of all sizes to meet...

2022: The year of hybrid work

Remote work was once considered a luxury to many, but in 2020, it became a necessity for a large portion of the workforce, as the scary and unknown COVID-19 virus sickened and even took the lives of so many people around the world.  Some workers were able to thrive in a remote setting, while others felt isolated and struggled to keep up a balance between their work and home lives. Last year saw the availability of life-saving vaccines, so companies were able to start having the conversation about what to do next. Should they keep everyone remote? Should they go back to working in the office full time? Or should they do something in between? Enter hybrid work, which offers a mix of the two. A Fall 2021 study conducted by Google revealed that over 75% of survey respondents expect hybrid work to become a standard practice within their organization within the next three years.  Thus, two years after the world abruptly shifted to widespread adoption of remote work, we are dec...

October 2025: AI updates from the past month

OpenAI announces agentic security researcher that can find and fix vulnerabilities OpenAI has released a private beta for a new AI agent called Aardvark that acts as a security researcher, finding vulnerabilities and applying fixes, at scale. “Software security is one of the most critical—and challenging—frontiers in technology. Each year, tens of thousands of new vulnerabilities are discovered across enterprise and open-source codebases. Defenders face the daunting tasks of finding and patching vulnerabilities before their adversaries do. At OpenAI, we are working to tip that balance in favor of defenders,” OpenAI wrote in a blog post . The agent continuously analyzes source code repositories to identify vulnerabilities, assess their exploitability, prioritize severity, and propose patches. Instead of using traditional analysis techniques like fuzzing of software composition analysis, Aardvark uses LLM-powered reasoning and tool-use. Cursor 2.0 enables eight agents to work in pa...