Search
00
GBAF Logo
trophy
Top StoriesInterviewsBusinessFinanceBankingTechnologyInvestingTradingVideosAwardsMagazinesHeadlinesTrends

Subscribe to our newsletter

Get the latest news and updates from our team.

Global Banking and Finance Review

Global Banking & Finance Review

Company

    GBAF Logo
    • About Us
    • Profile
    • Privacy & Cookie Policy
    • Terms of Use
    • Contact Us
    • Advertising
    • Submit Post
    • Latest News
    • Research Reports
    • Press Release
    • Awards▾
      • About the Awards
      • Awards TimeTable
      • Submit Nominations
      • Testimonials
      • Media Room
      • Award Winners
      • FAQ
    • Magazines▾
      • Global Banking & Finance Review Magazine Issue 79
      • Global Banking & Finance Review Magazine Issue 78
      • Global Banking & Finance Review Magazine Issue 77
      • Global Banking & Finance Review Magazine Issue 76
      • Global Banking & Finance Review Magazine Issue 75
      • Global Banking & Finance Review Magazine Issue 73
      • Global Banking & Finance Review Magazine Issue 71
      • Global Banking & Finance Review Magazine Issue 70
      • Global Banking & Finance Review Magazine Issue 69
      • Global Banking & Finance Review Magazine Issue 66
    Top StoriesInterviewsBusinessFinanceBankingTechnologyInvestingTradingVideosAwardsMagazinesHeadlinesTrends

    Global Banking & Finance Review® is a leading financial portal and online magazine offering News, Analysis, Opinion, Reviews, Interviews & Videos from the world of Banking, Finance, Business, Trading, Technology, Investing, Brokerage, Foreign Exchange, Tax & Legal, Islamic Finance, Asset & Wealth Management.
    Copyright © 2010-2025 GBAF Publications Ltd - All Rights Reserved.

    Editorial & Advertiser disclosure

    Global Banking and Finance Review is an online platform offering news, analysis, and opinion on the latest trends, developments, and innovations in the banking and finance industry worldwide. The platform covers a diverse range of topics, including banking, insurance, investment, wealth management, fintech, and regulatory issues. The website publishes news, press releases, opinion and advertorials on various financial organizations, products and services which are commissioned from various Companies, Organizations, PR agencies, Bloggers etc. These commissioned articles are commercial in nature. This is not to be considered as financial advice and should be considered only for information purposes. It does not reflect the views or opinion of our website and is not to be considered an endorsement or a recommendation. We cannot guarantee the accuracy or applicability of any information provided with respect to your individual or personal circumstances. Please seek Professional advice from a qualified professional before making any financial decisions. We link to various third-party websites, affiliate sales networks, and to our advertising partners websites. When you view or click on certain links available on our articles, our partners may compensate us for displaying the content to you or make a purchase or fill a form. This will not incur any additional charges to you. To make things simpler for you to identity or distinguish advertised or sponsored articles or links, you may consider all articles or links hosted on our site as a commercial article placement. We will not be responsible for any loss you may suffer as a result of any omission or inaccuracy on the website.

    Home > Technology > AI experts ready ‘Humanity’s Last Exam’ to stump powerful tech
    Technology

    AI experts ready ‘Humanity’s Last Exam’ to stump powerful tech

    AI experts ready ‘Humanity’s Last Exam’ to stump powerful tech

    Published by Uma Rajagopal

    Posted on September 17, 2024

    Featured image for article about Technology

    By Jeffrey Dastin and Katie Paul

    (Reuters) – A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child’s play.

    Dubbed “Humanity’s Last Exam,” the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.

    The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which “destroyed the most popular reasoning benchmarks,” said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk’s xAI startup.

    Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like U.S. history, the other probing models’ ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.

    At the time of those papers, AI was giving almost random answers to questions on the exams. “They’re now crushed,” Hendrycks told Reuters.

    As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.

    These common benchmarks have less meaning as a result.

    AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.

    Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. “Humanity’s Last Exam” will require abstract reasoning, he said.

    Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on “Humanity’s Last Exam” will remain private to make sure AI systems’ answers are not from memorization.

    The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.

    “We desperately need harder tests for expert-level models to measure the rapid progress of AI,” said Alexandr Wang, Scale’s CEO.

    One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.

    (Reporting by Jeffrey Dastin in San Francisco and Katie Paul in New York; Editing by Christina Fincher)

    Related Posts
    Treasury transformation must be built on accountability and trust
    Treasury transformation must be built on accountability and trust
    Financial services: a human-centric approach to managing risk
    Financial services: a human-centric approach to managing risk
    LakeFusion Secures Seed Funding to Advance AI-Native Master Data Management
    LakeFusion Secures Seed Funding to Advance AI-Native Master Data Management
    Clarity, Context, Confidence: Explainable AI and the New Era of Investor Trust
    Clarity, Context, Confidence: Explainable AI and the New Era of Investor Trust
    Data Intelligence Transforms the Future of Credit Risk Strategy
    Data Intelligence Transforms the Future of Credit Risk Strategy
    Architect of Integration Ushers in a New Era for AI in Regulated Industries
    Architect of Integration Ushers in a New Era for AI in Regulated Industries
    How One Technologist is Building Self-Healing AI Systems that Could Transform Financial Regulation
    How One Technologist is Building Self-Healing AI Systems that Could Transform Financial Regulation
    SBS is Doubling Down on SaaS to Power the Next Wave of Bank Modernization
    SBS is Doubling Down on SaaS to Power the Next Wave of Bank Modernization
    Trust Embedding: Integrating Governance into Next-Generation Data Platforms
    Trust Embedding: Integrating Governance into Next-Generation Data Platforms
    The Guardian of Connectivity: How Rohith Kumar Punithavel Is Redefining Trust in Private Networks
    The Guardian of Connectivity: How Rohith Kumar Punithavel Is Redefining Trust in Private Networks
    BNY Partners With HID and SwiftConnect to Provide Mobile Access to its Offices Around the Globe With Employee Badge in Apple Wallet
    BNY Partners With HID and SwiftConnect to Provide Mobile Access to its Offices Around the Globe With Employee Badge in Apple Wallet
    How Integral’s CTO Chidambaram Bhat is helping to solve  transfer pricing problems through cutting edge AI.
    How Integral’s CTO Chidambaram Bhat is helping to solve transfer pricing problems through cutting edge AI.

    Why waste money on news and opinions when you can access them for free?

    Take advantage of our newsletter subscription and stay informed on the go!

    Subscribe

    Previous Technology PostWhy Financial Institutions Should Invest in Payment Gateway Software Development
    Next Technology PostNvidia’s stock market dominance fuels big swings in the S&P 500

    More from Technology

    Explore more articles in the Technology category

    Why Physical Infrastructure Still Matters in a Digital Economy

    Why Physical Infrastructure Still Matters in a Digital Economy

    Why Compliance Has Become an Engineering Problem

    Why Compliance Has Become an Engineering Problem

    Can AI-Powered Security Prevent $4.2 Billion in Banking Fraud?

    Can AI-Powered Security Prevent $4.2 Billion in Banking Fraud?

    Reimagining Human-Technology Interaction: Sagar Kesarpu’s Mission to Humanize Automation

    Reimagining Human-Technology Interaction: Sagar Kesarpu’s Mission to Humanize Automation

    LeapXpert: How financial institutions can turn shadow messaging from a risk into an opportunity

    LeapXpert: How financial institutions can turn shadow messaging from a risk into an opportunity

    Intelligence in Motion: Building Predictive Systems for Global Operations

    Intelligence in Motion: Building Predictive Systems for Global Operations

    Predictive Analytics and Strategic Operations: Strengthening Supply Chain Resilience

    Predictive Analytics and Strategic Operations: Strengthening Supply Chain Resilience

    How Nclude.ai   turned broken portals into completed applications

    How Nclude.ai turned broken portals into completed applications

    The Silent Shift: Rethinking Services for a Digital World?

    The Silent Shift: Rethinking Services for a Digital World?

    Culture as Capital: How Woxa Corporation Is Redefining Fintech Sustainability

    Culture as Capital: How Woxa Corporation Is Redefining Fintech Sustainability

    Securing the Future: We're Fixing Cyber Resilience by Finally Making Compliance Cool

    Securing the Future: We're Fixing Cyber Resilience by Finally Making Compliance Cool

    Supply chain security risks now innumerable and unmanageable for majority of cybersecurity leaders, IO research reveals

    Supply chain security risks now innumerable and unmanageable for majority of cybersecurity leaders, IO research reveals

    View All Technology Posts