Introduced in first quarter, moving volumes of data for AI and analytics
Airbyte, the leading open data movement platform, today announced that its PyAirbyte open-source Python library, that was introduced in late February, has helped more than 10,000 AI and data engineers to sync over 6 billion records of data. Users have completed more than 221,000 PyAirbyte sync jobs, or over 10,000 syncs per week. PyAirbyte now boasts an average of 25,000 monthly downloads, according to metrics from the popular PyPi Python package repository.
Since its launch in February, PyAirbyte has seen an impressive 38 releases, averaging one update every three business days. This release cadence underscores the ongoing enhancements and active community engagement, with 17 contributors, 11 of whom are from the user community plus six Airbyte developers.
Significant community contributions have enhanced PyAirbyte’s capabilities, including the following features: Docker executor and networking support; integration with Apache Arrow; support for FastAPI and other web frameworks; an expanded set of BigQuery authentication options; and other numerous bug fixes and improvements.
Users extract data to populate AI frameworks like LangChain and LlamaIndex and facilitate building LLM-powered applications. Also popular among users is moving data from Amazon, Facebook, Google, Hubspot, and Salesforce to make data-driven marketing decisions. Next most popular is help desk or IT related with data from Jira, GitHub, and Zendesk. PyAirbyte consolidates data from various sources for analysis to improve decision making.
PyAirbyte makes it easy to move data across API sources and destinations by enabling Airbyte resources to be created and managed using code, rather than the user interface (UI), providing a natural fit into the coding workflow. Python users have access to Airbyte’s more than 250 data connectors – rather than having to build and maintain those themselves. Airbyte is the first to provide Python users this capability with availability of over 250 connectors.
“With a majority of existing data pipelines written in Python today, we expected PyAirbyte to be popular but it has exceeded our expectations,” said Michel Tricot, co-founder and CEO, Airbyte. “We’ve seen more than 150 unique data sources used to move data to destinations, like AI frameworks and data warehouses.”
Python is a popular programming language for data engineers because of its simplicity, versatility, and scalability. It also offers a wide range of libraries and frameworks for data manipulation, exploration, and visualization. Python easily integrates with tools commonly used in data analysis and data science, such as SQL databases, Hadoop, and Spark.
PyAirbyte is an addition to the Airbyte API and Terraform Provider, which enable programmatic management of Airbyte resources, streamlining workflows and integrating Airbyte configurations with existing data infrastructure. Other deployment models include Airbyte Open Source, Airbyte Self-Managed Enterprise, Airbyte Cloud, and Powered by Airbyte.
As part of the June 2023 PyAirbyte & AI Hackathon, 11 new feature contributions were accepted, further enhancing PyAirbyte and its AI-related connectors. Additionally, 25 new tutorials were created to help guide new community members to leverage PyAirbyte for generative AI use cases.
Airbyte makes moving data easy and affordable across almost any source and destination, helping enterprises provide their users with access to the right data for analysis and decision-making. With more than 900 contributors, Airbyte has the largest data engineering contributor community and the best tooling to build and maintain connectors.