Weekly Tech Bytes with ThurAI Episode #7 (Feb 24 - Mar 2, 2025)

Weekly analysis of AI & Emerging Tech news from an analyst's point of view.

1️⃣OpenAI releases the mighty GPT-4.5

Details:

OpenAI officially released its latest model, GPT-4.5. GPT-4.5's pricing is 13 to 30 times higher than its predecessor, GPT-4. One of the most notable changes in GPT 4.5 is its enhanced focus on emotional intelligence. Through expanded unsupervised learning, GPT-4.5 claims to significantly improve its intuitive understanding of the world and its breadth of knowledge, reducing the model's common "hallucination" issues. Unlike the o1 series, which relies on reasoning chains, GPT-4.5 primarily relies on massive data and compute power to optimize performance. According to the initial benchmarks, GPT-4.5 still lags behind Anthropic’s Claude 3.5, released in October last year, in coding and software development tasks. Based on early testing, developers may find GPT‑4.5 particularly useful for applications that benefit from its higher emotional intelligence and creativity, such as writing help, communication, learning, coaching, and brainstorming. It also shows strong capabilities in agentic planning and execution, including multi-step coding workflows and complex task automation.

Other GPT-4.5 highlights:

GPT-4.5 scales unsupervised learning by scaling up the compute and data and reduces hallucinations.
GPT-4.5 scored better than GPT-4o on collaboration skills.
The model doesn't think before it responds.
ChatGPT Pro users can select GPT-4.5 today with Pro and Team users getting it next week, followed by Enterprise and education users.
GPT 4.5 is compute-intensive and more expensive than GPT-4o. It is not viewed as a replacement.
GPT 4.5 and perhaps the biggest takeaway is that it can fake emotional intelligence pretty well--potentially better than a few humans we know.

Analysis:

This is not a reasoning model but an unsupervised learning model. This means doesn’t have a chain of thought like the reasoning models and therefore can respond faster. Instead of combining the reasoning models and responding models, they chose to separate them as the execution costs, and model training costs can be very high for reasoning models such as OpenAI O1 and O3 mini. With the models running out of the world’s data and the model differentiation from the large LLM providers becoming less and less differentiated, all of them are looking for ways to differentiate themselves. OpenAI has taken the direction of making their chatbot more human. They claim their EQ is better than other models, meaning the model recognizes tone and intent, responding with empathy rather than just solutions—just like a real human would. With that, the interactions can feel more natural and human-like.

OpenAI is trying to differentiate itself from competition in many ways. As a first step, they released a detailed system card voluntarily. A major issue that OpenAI has faced from enterprises that are trying to adopt is governance, security, safety, and privacy issues, which are also honed in by other LLM providers to beat up on them. By doing this, OpenAI hopes enterprises will feel comfortable adopting their LLM over others in production environments. OpenAI claims that they conducted rigorous safety testing, preparedness evaluations, and governance reviews. They also claim additional safety testing to better understand incremental risks associated with deep research’s ability to browse the web and added new mitigations. Key areas of new work included strengthening privacy protections around personal information that is published online and training the model to resist malicious instructions that it may come across while searching the Internet.

These are clear indications that they are trying to take on this issue head-on rather than avoiding it, which they have done it for a long time. By being transparent, they hope will make more enterprises comfortable in implementing their models in their production easily.

OpenAI system card can be seen here - https://cdn.openai.com/deep-research-system-card.pdf

However, at the current time, these models are very expensive. It is also restricted only to Pro users for now because of GPU shortage per OpenAI’s claim. GPT 4.5 pricing: Price Input: $75.00 / 1M tokens Cached input: $37.50 / 1M tokens Output: $150.00 / 1M tokens GPT 4o pricing: Price Input: $2.50 / 1M tokens Cached input: $1.25 / 1M tokens Output: $10.00 / 1M tokens

2️⃣Tencent releases HunYuan Turbo S, the next-generation fast-thinking model

Details:

Tencent officially released Hunyuan Turbo S, a new generation of a fast-thinking model, marking a significant breakthrough in response speed and performance optimization of large language models. Unlike traditional slow-thinking models like OpenAI O1 and Deepseek R1, Hunyuan Turbo S achieves "instant replies," significantly improving the speed of outputting answers, doubling the word output speed, and reducing the first-word latency by 44%. Combined with a slow-thinking mode of rational analysis, it provides the large model with more intelligent and efficient problem-solving capabilities. In several widely used public benchmark tests, Hunyuan Turbo S demonstrates performance comparable to leading models such as DeepSeek V3, GPT4o, and Claude. Hunyuan Turbo S adopts a Hybrid-Mamba-Transformer fusion mode, effectively reducing the computational complexity and KV-Cache cache occupancy of the traditional Transformer architecture, significantly lowering training and inference costs. This model has been fully launched on Tencent Yuanbao and will soon be available via API access.

Through the fusion of long and short reasoning chains, the model significantly improves its science reasoning capabilities, resulting in a substantial overall performance improvement. This hybrid architecture overcomes the challenges of high training and inference costs associated with traditional large models for long texts, leveraging the advantages of the Mamba architecture in handling long sequences while retaining the Transformer's ability to capture complex contexts. This marks the first successful application of the Mamba architecture to ultra-large MoE models in the industry without loss of performance.

Analysis:

Tencent, along with almost all other major AI model providers, was caught off guard by DeepSeek. Since then, many of them have released models that directly compete with DeepSeek, claiming to perform better than DeepSeek. Hunyuan Turbo S model claims that it can think faster than the slower DeepSeek. Their claim of a shorter chain of thought than other models, which can be good or bad, can affect the accuracy of the model output. That is applicable only to their chatbot quick-answer models. They also have a slow-thinking chain that provides reasoning capabilities for scientific, mathematical, and rational answers. They use a MoE (Mixture of Experts) model, which is an entirely different way of architecting, training, and inferencing LLMs. MoE splits the AI model according to the distinct expertise and works together to solve problems, which means the models can be trained faster, loaded faster in memory, more memory and energy efficient, and can respond faster to questions than monolithic big models such as OpenAI. Their unique differentiation, which they claim is that they used Mamba architecture to the ultra-large Mixture of Experts model without damage.

Their cost is much cheaper than other comparable models at 0.8 yuan/million tokens for input and 2 yuan/million tokens for output. It will be interesting to see if cost conscious companies will start using some of these cheaper hosted models from China or will stick to US and Europe based models.

3️⃣Amazon revamps Alexa with Alexa+

Details:

Alexa+ can perform multiple tasks, including conversing with Alexa, receiving news summaries, managing calendars, creating and sending emails and texts, getting email summaries, ordering groceries, booking appointments, and moving the chatbot from informational purposes only to performing action items. Amazon is integrating many capabilities with Amazon Alexa+, from reasoning and knowledge agents. Alexa+ is built on the Bedrock platform and harnesses multiple large language models (LLMs), including Amazon’s own Nova and Anthropic Claude. It also uses the model routing feature, so it is not always fixed on the backend LLM to use like most other single LLM-powered solutions. It can select the better model for the task based on need (currently only two). Alexa+ is set to roll out in the United States over the next few weeks, starting with Echo Show models 8, 10, 15, and the new Echo Show 21. Amazon plans to expand compatibility to other Alexa-enabled devices and introduce a dedicated Alexa+ mobile app in the coming months. The service is priced at $19.99 per month for non-Prime members. However, Amazon Prime subscribers will have complimentary access to Alexa+. Alexa+ is capable of remembering, making it able to recall personal details such as dietary preferences, favorite activities, and frequently contacted individuals.

Analysis:

Even before the current chatbot/Generative AI wave became popular, Alexa has found its way into people’s homes for a long time now. With more than 600 million devices enabled all around the world, Alexa was the most adopted conversational AI technology in the B2C world, outdoing Google and Siri. The major differences that Alexa+ brings are,

Alexa+ brings the power of hyper-personalization, which was the biggest missing piece about Alexa, which is just a basic conversational interface. Alexa+ will remember everything about you and personalize things based on your needs.
Given that Amazon Prime has 300 million+ members, making it available to all of them for free will immediately drive the adoption to 300 million plus active users competing with OpenAI which is the top most adopted LLM currently.
One can also add to Alexa+ knowledge by sharing documents, emails, photos, and messages.
Most importantly, unlike other LLMs, Alexa comes fully integrated with Amazon eco-system which means from day one, you can integrate with door bells, videos, home security systems, Prime Music, Prime Video, etc. rather than figuring out how to integrate all this in a meaningful way.

So far Alexa has not been widely adopted in the enterprise world. But with given LLM integration, and running this on Amazon Bedrock could make this easy for enterprises can adopt. Only time can tell if they will be successful.

4️⃣IBM releases Granite 3.2. Models that can do reasoning, vision, and forecasting

Details:

The new Granite 3.2 8B Instruct and Granite 3.2 2B Instruct offer experimental chain-of-thought reasoning capabilities that significantly improve their ability to follow complex instructions with no sacrifice to general performance. The reasoning process can be toggled on and off, allowing for efficient use of computing resources.
The latest additions to the Granite Time series model family, Granite-Timeseries-TTM-R2.1, expand TTM’s forecasting capabilities to include daily and weekly predictions in addition to the minutely and hourly forecasting tasks already supported by prior TTM models.
New model sizes for Granite Guardian 3.2, including a variant derived from our 3B-A800M mixture of experts (MoE) languagemodel. The new models offer increased efficiency with minimal loss in performance.
The Granite Embedding model series now includes the ability to learn sparse embeddings. Granite-Embedding-30M-Sparse balances efficiency and scalability across diverse resource and latency budgets.
Like their predecessors, all new IBM Granite models are released open-sourced under a permissive Apache 2.0 license.
Granite 3.2 models are now available on IBM watsonx.ai, Hugging Face, Ollama, LMStudio, and Replicate.

Analysis:

IBM ups the ante on open, efficient, and trusted enterprise AI. While DeepSeek stole the latest thunder in open-source models, IBM has been quietly releasing enterprise-grade open-source models and making them available on their platform. IBM has smartly decided to make the user pay for platform capabilities rather than the model itself. This release expands the company’s emphasis on smaller, efficient large language models without sacrificing performance or enterprise-grade capabilities. By building models in-house, IBM has complete end-to-end visibility into training data, control architecture choices, and safety parameters. This, in turn, allows them to provide enterprise-grade models that can address compliance, data provenance, and scaling challenges more effectively.

IBM has been trying to provide trustworthy, cost-efficient AI solutions. One of their concentration has been providing transparent models that might be smaller than the massive models offered by others but are more efficient and powerful to use in the enterprise setting.

In other news,

Study finds 92% of UK university students rely on AI for assignments. This poses new challenges for assessments.

A recent study, jointly published by the Higher Education Policy Institute and digital textbook provider Kortext, surveyed 1000 domestic and international students about generative AI (genAI) usage in their studies and found that a staggering 92% of UK university students are using (genAI) in their studies. The findings show a dramatic surge in genAI usage over the past 12 months, with virtually all undergraduates actively employing these tools. This went up from 53% last year. Given the new norm, the universities are urged to "stress-test" all assessments to ensure they can't be easily completed by AI. The survey also revealed that students from affluent backgrounds and those in STEM subjects were more likely to use AI tools.

The unfortunate thing is that even the students who don’t use AI can get penalized if the AI plagiarism tool flags the paper as being AI-generated. I wrote a detailed article on this in Forbes last year: https://www.forbes.com/sites/joemckendrick/2024/05/20/being-mindful-abou...

19-year-old female Go player banned for 8 Years for cheating with AI.

The Chinese Weiqi Association recently announced severe penalties against professional Go player Qin Siyue for cheating using AI during the National Go Championship (Individual) Women's Group competition. Qin Siyue apparently used an AI program in a concealed mobile phone to cheat during the match on December 15, 2024.

We are getting into an unknown territory of where you can use AI and where it can’t be used to compete. Looks like it is OK to use AI when it comes to competitive exams, school work, and college applications but apparently not allowed in professional sports - at least not yet.

I wrote a piece on Forbes on this very topic last year: https://www.forbes.com/sites/joemckendrick/2024/05/15/ai-for-career-adva...

Anthropic releases Claude AI GitHub integration to boost developer code efficiency.

Recently released Claude's GitHub integration is available to all users, including free users. This helps developers with their coding, testing, and debugging work, enabling more efficient project development. This feature was available only to enterprise users in the past. Developers can synchronize their code repositories to Claude for enhanced code analysis and debugging support.

OpenAI ex-CTO Mira Murati launches a new AI company with a $9 billion valuation.

Mira Murati's new company, Thinking Machines Lab, is attracting significant investor attention. According to Business Insider, the less-than-a-year-old startup is raising $1 billion in funding, with a projected valuation of a staggering $9 billion. Murati spent six and a half years at OpenAI, overseeing the development of several AI projects, including ChatGPT. Following boardroom turmoil in November 2023, she briefly served as interim CEO before returning to her role as CTO. In her blog, Murati suggested that the NewCo will be a laboratory for AI research and products dedicated to advancing the accessibility and development of artificial intelligence. They also recruited top engineers and AI researchers from OpenAI, Meta, and Anthropic, including ChatGPT co-founder John Schulman. Other ex-OpenAI executives also started their own companies after the fiasco last year; chief scientist Ilya Sutskever founded Safe Superintelligence after leaving in May 2024, and Dario Amodei and Daniela Amodei founded Anthropic back in 2021. One thing is for sure in the AI world - new model releases and high valuations are happening at a dizzying speed.

ByteDance releases Trae - an AI coding product.

Trae AI IDE, the first AI-powered Chinese IDE, aims to deeply understand Chinese development scenarios and provide intelligent collaborative support for developers in a localized language. Most other AI coding tools are in English, which made it harder for non-native English speakers to adopt. This tool aimed at developer efficiency will include AI Q&A, code auto completion, and Agent-based AI programming.

Meta plans a standalone AI chat app to compete with ChatGPT.

Meta is planning to launch a standalone AI assistant app called Meta AI to compete with OpenAI's ChatGPT and Google's Gemini. Until Meta was about offering free models and integrated meta platform experience (Facebook/WhatsApp), now they are exploring the option to experiment and monetize their AI and models. While this might gain some traction, their restrictive licensing model might be a stopper for enterprises to adopt on a large scale. Meta is also holding their first AI-focused developer conference, LlamaCon, in April. They plan to showcase their AI technologies, roadmap, and vision. While they claim to have 700 million+ active monthly users, it will be interesting to see how many will convert to adopt Meta’s AI as most of them have already engaged with one of the competitive technologies, such as OpenAI.

OpenAI plans to integrate Sora into ChatGPT to create a single unified platform.

OpenAI says their AI video generation tool, Sora, will be integrated directly into the ChatGPT platform. The current standalone Sora offers users the option to not only create high-quality video clips up to 20 seconds but also edit and stitch videos, which might be simplified or eliminated in the integrated version.

DeepSeek unveils the cost efficiency behind its inference.

DeepSeek published “DeepSeek-V3/R1 Inference System Overview,” a detailed disclosure of its model inference system's optimization details and cost-profit margin information. A surprising cost/profit margin was revealed in the article, which most other LLM providers guard as a secret. "Assuming a GPU rental cost of $2/hour, the total cost is $87,072/day. If all tokens are priced according to DeepSeek R1 pricing, the theoretical total daily revenue is $562,027, resulting in a cost-profit margin of 545%.” DeepSeek claims to achieve higher throughput and lower latency by using large-scale cross-node expert parallelism (EP) technology. They discuss how EP technology is leveraged to increase batch size, hide transfer latency, and achieve load balancing.

A US lawyer was fined $15,000 for using an AI-generated fictitious case.

Magistrate Judge Mark D. Dinsmore of the Southern District of Indiana recommended a $15,000 fine for a lawyer who cited nonexistent court cases in legal filings. The lawyer in question is Rafael Ramirez of Rio Hondo, Texas. On October 29, 2024, he cited three fabricated cases in his legal filings. Judge Dinsmore's recent report stated that Ramirez failed to verify the validity and accuracy of the cited cases in three legal documents, thus recommending a $5,000 fine for each document. During the proceedings, Ramirez admitted to using AI tools when drafting the documents and stated that he was unaware these tools could generate false cases and citations. 😂

Judge Dinsmore pointed out that Ramirez's unfamiliarity with the AI tool highlighted the severity of the issue. This is going to be a major issue. People are using AI without understanding its power and issues.

Microsoft releases Phi-4: A multimodal for enhanced speech, vision, and text processing.

Microsoft expanded its Phi-4 family with two new models: Phi-4-multimodal and Phi-4-mini. Phi-4-multimodal, Microsoft's first unified architecture model integrating voice, vision, and text processing, is a 56 million parameter model. The initial benchmark tests seem to indicate that they perform better than competition, notably Google's Gemini 2.0 series. It currently has the top spot on the Hugging Face OpenASR leaderboard. The Phi-4-mini model, a 38 million parameter model, focuses on text processing tasks.

Both models are aimed at low-cost and low-latency applications. They are available on Azure AI Foundry, Hugging Face, and the NVIDIA API catalog for developer use.

SenseTime launches LazyLLM, which can build complex AI Apps with just 10 Lines of code!

At the 2025 Global Developer Pioneer Conference, SenseTime announced the launch of its open-source low-code platform, LazyLLM. Using this platform, developers can now build complex, multi-agent, large-model applications with just 10 lines of code. With LazyLLM, developers can quickly build complete RAG multi-path retrieval applications, supporting the construction and management of enterprise-level knowledge bases and enabling efficient data processing and functional integration.

Google's ultra-affordable AI model Gemini 2.0 Flash-lite launched.

Google recently released its most economical model, Gemini 2.0 Flash-Lite, now generally available. Gemini 2.0 Flash-Lite costs $0.075 per million input tokens and $0.30 per million output tokens. This competitive pricing strategy undercuts options like OpenAI's GPT-4-mini (input $0.15/million, output $0.60/million). It boasts a context window of 1 million tokens, is capable of handling massive datasets, and outperforms Gemini 1.5 Flash in most benchmarks, maintaining the same speed and lower cost, making it especially suitable for high-frequency tasks. The focus on text generation makes it perfect for scenarios requiring fast, low-cost solutions. It can generate single-line captions for approximately 40,000 photos for under $1.

Anthropic's next funding round - $3.5 billion at a valuation of $61.5 billion.

According to the Wall Street Journal, AI startup Anthropic is about to close a new funding round that has been increased from its initial target of $2 billion to $3.5 billion. This round values the company at $61.5 billion. Investors in this round include Lightspeed Venture Partners, General Catalyst, Bessemer Venture Partners, and Abu Dhabi-based investment firm MGX, among others. With this, their total funding to date would be $18 billion. With a claimed revenue of roughly $1 billion, the company is still operating at a loss.

Intel unveils Xeon 6 processors: Double the AI processing power.

Intel recently launched its next-generation Xeon 6 processors, achieving up to double the performance improvement in artificial intelligence (AI) processing. Its built-in Intel vRANBoost technology can increase the capacity of Radio Access Network (RAN) workloads by up to 2.4 times. According to Intel, the Xeon 6700/6500 series processors excel in modern data centers, offering a 1.4x performance improvement over the previous generation and broad applicability across enterprise workloads. Intel claims that these processors can work in conjunction with GPUs to deliver efficient AI inference performance. Xeon 6 is the industry's first server SoC with built-in media acceleration, offering up to a 14x performance improvement over previous models.

Intel chips have been a laggard in the AI space. With this release, Intel aims to capture a big share of the rapidly growing AI market, trying to shift the AI inference loads to CPU-based systems.

Intel also introduced two new Ethernet controllers and network adapter products to meet the growing demands of edge AI applications. Overall, Intel is trying to position itself as a leader in the AI inference space as it lost the war to NVIDIA in the AI training market.

Meta plans a $200 billion investment in massive AI Data Centers.

According to Information, Meta is considering several potential locations to build massive data center campuses in Louisiana, Wyoming, and Texas, with senior executives planning site visits this month. CEO Mark Zuckerberg previously revealed plans to invest up to $65 billion in expanding AI infrastructure by 2025.

More worthy news, no analysis 🙂

♦ Alibaba announces 52 billion in cloud and AI hardware infrastructure investment over the next 3 years.

♦ Man discovers his online girlfriend of 2 months is AI and loses $200,000. 😂

#GenerativeAI #GenAI #AI #LLM #Grok #OpenAI #Meta #IBM #Amazon #AWS

Weekly Tech Bytes with ThurAI Episode #7 (Feb 24 - Mar 2, 2025)

1️⃣OpenAI releases the mighty GPT-4.5

2️⃣Tencent releases HunYuan Turbo S, the next-generation fast-thinking model

3️⃣Amazon revamps Alexa with Alexa+

4️⃣IBM releases Granite 3.2. Models that can do reasoning, vision, and forecasting

In other news,

More worthy news, no analysis 🙂

Business Research Themes

Roles

Tags

Memberships

Research

Services

1️⃣OpenAI releases the mighty GPT-4.5

2️⃣Tencent releases HunYuan Turbo S, the next-generation fast-thinking model

3️⃣Amazon revamps Alexa with Alexa+

4️⃣IBM releases Granite 3.2. Models that can do reasoning, vision, and forecasting

In other news,

More worthy news, no analysis 🙂

Business Research Themes

Roles

Tags

Memberships

Research

Services

Follow Us