From automatic code completion to improving software quality to data management, smart digital assistants are revolutionizing the work of developers, data scientists and business users. With the explosive development of artificial intelligence, it is undergoing one of the most significant transformations in the history of IT, but the transformation is accompanied by risks that can be measured against the huge opportunities that are revealed.
Does the sum of artificial intelligence and programming equal profit? – asked Bence Arató, managing director of BI Consulting, in his opening presentation on the fourth day of the Budapest Data+ML Forum. In June of this year, the consulting company organized a two-in-one event for the first time in the history of the two conferences, and the choice of topic for the keynote presentation also points to the intertwining between data technology areas, which is being drawn closer and closer by the rapid development of artificial intelligence.
In addition to the mutual announcements of leading technology suppliers, the breakneck pace is felt by the American Y Combinator W23, which is considered one of the world’s largest startup accelerator companies, i.e. this winter’s overview of start-up companies developing generative AI technologies, in which only in the fields of engineering – including software development – about 30 companies are listed, and in business areas there are even more.
In the analysis published by Omniscien Technologies in January of this year, from basic models to natural language processing to document extraction, three dozen or so AI technologies are also included in the various stages of the maturity curve. Some of them can reach wide-scale, practical application within 2-5, most of them 5-10 years – the plateau of productivity – and code generation, which is still in its initial stages of innovation, is among them, Bence Arató highlighted in his presentation.
Marching code assistants
GitHub Copilot, the artificial intelligence that helps developers work with automatic code completion, debuted in June 2021. The cloud-based Copilot, operating in a subscription model, is based on OpenAI’s GPT-3 language model – a special version of it, Codex – which is licensed by Microsoft, GitHub’s parent company.
According to GitHub’s survey last fall, developers are extremely satisfied with Copilot’s contribution. The vast majority of them (88 percent) said that they can work more efficiently with the support of the code generating digital assistant, and almost all of them (96 percent) emphasized that they can finish repetitive tasks particularly quickly, and about three quarters (77-73 percent) confirmed that fewer searches thanks to which you can immerse yourself more deeply in your work.
It is understandable that the first swallow was soon followed by new code assistants – such as the Amazon CodeWhisperer announced last summer and generally available from April this year, as well as its brother, CodeGuru. Developers can also use the latter in their existing workflows to automatically check the quality of the software code, to improve it based on received suggestions, and to optimize the performance and cost of the application being made. In May of this year, Huggingface announced its StarCoder and StarCoderBase coding large language models (LLM), while CodeComplete’s programming AI assistant – which can currently be tested in a private beta version – can be implemented by companies in their own cloud or local data center for increased security, and its capabilities are also available in their own they can fine-tune it on their code base.
Digital assistants for automating individual development tasks also appear one after the other. For example, CodiumAI generates tests, and Buildt helps to quickly search and understand large codebases – thus providing especially useful support for new developers to start work. A similar purpose is served by Bloop, which summarizes and explains the operation of the software code in human language in response to questions formulated in natural language, thereby helping evaluation and further planning in addition to understanding. Grit’s co-pilot automates code migration and updating dependencies, while Google’s DIDACT research project uses a novel method to train its massive AI model on the process of software development instead of ready-made code for intelligent support of software engineering work.
Despite all the AI support and automation, however, developers cannot sit back, not least because, in addition to using assistants, they have to familiarize themselves with new programming languages, interfaces, environments and methods.
For example, Pandas AI is a Python library that adds generative artificial intelligence capabilities to Pandas, the popular data analysis and manipulation tool, that it can be used alongside. With the help of Jupyter AI, which is available as a supplement to JupyterLab, developers can exploit the possibilities of generative artificial intelligence models in notebooks in a user-friendly and efficient way, thus increasing the productivity of JupyterLab and Jupyter Notebook.
Integrated, collaborative data science notebooks such as Noteable, which is enhanced with the ChatGPT extension, and Hex Magic, which is available in the public beta version, support the work of data researchers with natural language processing. And Mito AI – which is already used by large financial institutions worldwide – with its chatbot and spreadsheet, generates Python code to automate Excel reports and data analysis.
Among the new programming languages, LMQL combines the benefits of natural language instructions with the expressive power of Python to make working with LLMs more efficient, while the Mojo programming language combines the usability of Python with the power of C for all AI developers.
As it turns out, ChatGPT itself can be instructed to imagine itself as a database server and respond to commands with query results instead of information and descriptions as a Microsoft SQL Server. More exciting, however, is that developers using LLMs available in the public cloud can take sensitive software code outside the corporate network and thereby undermine security. The open source DB-GPT experimental project available on GitHub offers a solution to this problem with localized large language models that companies can use one hundred percent within their own, secure data and IT environment.
Security is also increased by Atlan’s AI co-pilot, which helps in data management, the cooperation of the relevant company teams and the complete preparation of documentation, while Lume’s artificial intelligence simplifies data integration by automatically transforming data schemas. At the other end of the spectrum, Stemma AI Discover Assistant makes it easier for business users to understand trends with its data discovery capabilities, be it revenue fluctuations, customer retention or service optimization opportunities in any industry.
Focus on safety
In an additional presentation on the closing day of the Budapest Data+ML Forum 2023, Bence Arató also gave a brief overview of the trends currently defining the field of artificial intelligence. According to one of the supplier reports he cited (Scale 2023 AI Readiness Report), the significant development of generative models experienced last year had an extraordinary impact on the AI strategy of companies. 65 percent of Scale’s clients said that as a result of the trend, they accelerated the implementation of their existing strategy or developed such a development plan for the first time. But while 60 percent of the respondents are experimenting with generative models – or are preparing to do so in the next 12 months – only a fifth of them (21 percent) are currently using such models in production environments.
The focus is increasingly on security in the leading AI laboratories, where the number of researchers working in the field has more than tripled in one year, according to the 2022 edition of the State of AI Report prepared on the investor side. The report also points out that since 2010, Chinese research centers have published four and a half times as many scientific works as similar institutions in the US, India, England and Germany combined. Moreover, China leads in research areas such as video surveillance, object detection, situational interpretation and self-control, which can affect not only security, but also geopolitical conditions.
Stanford University’s annual report (2023 AI Index) also draws attention to security risks, along with many other trends – for example, the academic sector lagging behind the competition. Among other things, he cites the results of last year’s IPSOS survey, according to which the highest proportion (78 percent) of Chinese citizens have a positive opinion of AI products and services, while in the United States only 35 percent of respondents said that the advantages of the technology are greater , as its disadvantages. Examining the legislative practice of 127 countries, the university also found that the number of laws mentioning artificial intelligence has increased from one to 37 since 2016, and the technology has already been put on the agenda in the parliaments of 81 countries. Given the upcoming AI law of the European Union – which many expect to have a similar effect to the GDPR – these numbers may increase significantly in the near future.
In an interview with Forbes magazine this February, Bill Gates called the last 12 months of AI as an important milestone in the history of digital technology as the appearance of the PC, graphical user interface and the Internet.
But remember what OpenAI CEO Sam Altman wrote in a tweet two months earlier: ChatGPT is incredibly limited, but still good enough at some things to give a misleading impression of greatness. Now it would be a mistake to rely on him in any important matter. It shows the direction of development; we still need to work a lot on his reliability and truthfulness.