Will Large Language Models Really Change How Work Is Done?

Even as organizations adopt increasingly powerful LLMs, they will find it difficult to shed their reliance on humans.

Reading Time: 18 min 


Permissions and PDF

Dan Page/theispot.com

Large language models (LLMs) are a paradigm-changing innovation in data science. They extend the capabilities of machine learning models to generating relevant text and images in response to a wide array of qualitative prompts. While these tools are expensive and difficult to build, multitudes of users can use them quickly and cheaply to perform some of the language-based tasks that only humans could do before.

This raises the possibility that many human jobs — particularly knowledge-intensive jobs that primarily involve working with text or code — could be replaced or significantly undercut by widespread adoption of this technology. But in reality, LLMs are much more complicated to use effectively in an organizational context than is typically acknowledged, and they have yet to demonstrate that they can satisfactorily perform all of the tasks that knowledge workers execute in any given job.

LLMs in Organizations

Most of the potential areas of use for LLMs center on manipulating existing information, much of it specific to an individual organization. This includes summarizing content and producing reports (which represents 35% of use cases, according to one survey) and extracting information from documents, such as PDFs containing financial information, and creating tables from them (33% of use cases).1 Other popular and effective uses of LLMs include creating images with tools like Dall-E 2 or generating synthetic data for applications when real data is difficult to obtain, such as data to train voice recognition tools like Amazon’s Alexa.2

Most organizations using LLMs are still in the exploration phase. Customer interactions, knowledge management, and software engineering are three areas of extensive organizational experiments with generative AI. For example, Audi recruited a vendor to build and deploy a customized LLM-based chatbot that would answer employees’ questions about available documentation, customer details, and risk evaluations. The chatbot retrieves relevant information from a variety of proprietary databases in real time and is supposed to avoid answering questions if the available data is insufficient. The company used prompt engineering tools developed by Amazon Web Services for retrieval augmented generation (RAG), a common customization procedure that uses organization-specific data without requiring changes to the underlying foundation model.



1.Beyond the Buzz: A Look at Large Language Models in Production,” PDF (San Francisco: Predibase, 2023), https://go.predibase.com.

2. A. Rosenbaum, S. Soltan, and W. Hamza, “Using Large Language Models (LLMs) to Synthesize Training Data,” Amazon Science, Jan. 20, 2023, www.amazon.science.

3.Storm Reply Launches RAG-Based AI Chatbot for Audi, Revolutionising Internal Documentation,” Business Wire, Dec. 21, 2023, www.businesswire.com.

4. “Beyond the Buzz.”

5. P. Vaithilingam, T. Zhang, and E.L. Glassman, “Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models,” in “CHI EA ’22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems,” ed. S. Barbosa, C. Lampe, C. Appert, et al. (New York: Association for Computing Machinery, April 2022), 1-7.

6. S. Noy and W. Zhang, “Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence,” Science 381, no. 6654 (July 13, 2023): 187-192.

7. L. Chen, M. Zaharia, and J. Zou, “How Is ChatGPT’s Behavior Changing Over Time?” arXiv, revised Oct. 21, 2023, https://arxiv.org.

8. S. Ouyang, J.M. Zhang, M. Harman, et al., “LLM Is Like a Box of Chocolates: The Non-Determinism of ChatGPT in Code Generation,” arXiv, submitted Aug. 5, 2023, https://arxiv.org.

9. P. Cappelli, “Stop Overengineering People Management,” Harvard Business Review 98, no. 5 (September-October 2020): 56-63.

10. E. Brynjolfsson, D. Li, and L.R. Raymond, “Generative AI at Work,” working paper 31161, National Bureau of Economic Research, Cambridge, Massachusetts, April 2023. We cannot tell the extent to which the improvement was due to the LLM per se because it was bundled together with an algorithm, which is a different tool.

11. F. Dell’Acqua, E. McFowland III, E. Mollick, et al., “Navigating the Jagged Technological Frontier: Field Experimental Evidence on the Effects of AI on Knowledge Worker Productivity and Quality,” working paper 24-013, Harvard Business School, Boston, September 2023.

12. C.B. Leon, “Occupational Winners and Losers: Who They Were During 1972-80,” Monthly Labor Review 105, no. 6 (June 1982): 18-28.

13. M. Cerullo, “Here’s How Many U.S. Workers ChatGPT Says It Could Replace,” CBS News, April 5, 2023, www.cbsnews.com; and L. Nedelkoska and G. Quintini, “Automation, Skills Use, and Training,” working paper 202, Organization for Economic Cooperation and Development, Paris, March 2018.

14. X. Hui, O. Reshef, and L. Zhou, “The Short-Term Effects of Generative Artificial Intelligence on Employment: Evidence From an Online Labor Market,” SSRN, Aug. 1, 2023, https://papers.ssrn.com.

15. J. Liu, X. Xu, Y. Li, et al., “‘Generate’ the Future of Work Through AI: Empirical Evidence From Online Labor Markets,” SSRN, Aug. 3, 2023, https://papers.ssrn.com; and O. Demirci, J. Hannane, and X. Zhu, “Who Is AI Replacing? The Impact of ChatGPT on Online Freelancing Platforms,” SSRN, Oct. 15, 2023, https://papers.ssrn.com.

16. For an example of an acceptable use policy for LLMs, see ACA Global’s template: “Sample Policy: Acceptable Use Policy for Employee Use of Large Language Models on Company Devices,” ACA Aponix, May 2023, https://web.acaglobal.com.

Reprint #:


More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.