We tested Google’s Gemini chatbot — here’s how it performed

Topics

recent

Amazon

Image Credits:TechCrunch

Apps

Biotech & Health

mood

Gemini Advanced israel

Image Credits:Google

Cloud Computing

Commerce

Crypto

Gemini TikTok trends

Image Credits:Google

endeavor

EVs

Fintech

Gemini Prohibition

Image Credits:Google

Fundraising

gadget

Gaming

Gemini football

Image Credits:Google

Google

Government & Policy

Hardware

Gemini presidential

Image Credits:Google

Instagram

Layoffs

Media & Entertainment

Gemini rash

Image Credits:Google

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Gemini border crossing

Image Credits:Google

Podcasts

Videos

Partner Content

Gemini harvard

Image Credits:Google

TechCrunch Brand Studio

Crunchboard

Gemini taiwan

Image Credits:Google

Gemini excels in some areas and falls flat in others

Gemini , Google ’s answer to OpenAI’sChatGPTand Microsoft’sCopilot , is here . Is it any good ? While it ’s a solid alternative for enquiry and productivity , it stumbles in obvious — and some not - so - obvious — place .

Last calendar week , Google rebranded itsBardchatbot to Gemini and brought Gemini — which confusingly apportion a name in uncouth with the company’slatest family of generative AI models — to smartphones in the manikin of areimagined app experience . Since then , fate of folks have had the chance to screen - ride the newGemini , and the reviews have been . . .mixed , to put it generously .

Still , we at TechCrunch were curious how Gemini would perform on a electric battery of test we late developed to compare the performance of GenAI mannequin — specifically large language models like OpenAI’sGPT-4 , Anthropic’sClaude , and so on .

Gemini Ultra russia

Image Credits:Google

There ’s no shortage of benchmark to value GenAI models . But our finish was to charm the average person ’s experience through plain - English prompts about issue range from health and mutant to current case . Ordinary users are whom these manikin are being market to , after all , so the premise of our test is that hard models should be capable to at least answer basic doubtfulness correctly .

Background on Gemini

Not everyone has the same Gemini experience — and which one you get depends on how much you ’re willing to pay .

Non - yield exploiter get queries answered by Gemini Pro , a lightweight edition of a more muscular model , Gemini Ultra , that ’s gated behind a paywall .

Access to Gemini Ultra through what Google calls Gemini Advanced requires take to the Google One AI Premium Plan , price at $ 20 per calendar month . Ultra delivers good reasoning , coding and instruction - conform to science than Gemini Pro ( or so Google claims ) , and in the future will get improved multimodal and data point analysis capacity .

Gemini Ultra joke vacation

Image Credits:Google

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

The AI Premium Plan also connects Gemini to your all-inclusive Google Workspace account — imagine email in Gmail , documents in Docs , presentations in Sheets and Google Meet recording . That ’s useful for , say , summarizing emails or have Gemini gaining control notes during a telecasting call .

Since Gemini Pro ’s been outsinceearly December , we concentre on Ultra for our tests .

Testing Gemini

To test Gemini , we asked a set of over two dozen interrogative rove from innocuous ( “ Who won the football game universe loving cup in 1998 ? ” ) to controversial ( “ Is Taiwan an self-governing country ? ” ) . Our question dress touches on small beer , aesculapian and therapeutic advice , and generating and summarize contentedness — all things a drug user might ask ( or involve of ) a GenAI chatbot .

Now Google makes it clean-cut in its term of military service that Gemini is n’t to be used for health consultations and that the model might not suffice all head with factual accuracy . But we feel that people will ask aesculapian query whatever the fine print say . And the answers are a good measuring stick of a model ’s tendency to hallucinate ( i.e. , make up facts ): If a example ’s making up cancer symptom , there ’s a fairish chance it ’s falsify on answer to other head .

Full disclosure , we tested Ultra through Gemini Advanced , which agree to Googleoccasionally routes sure prompts to other models . Frustratingly , Gemini does n’t designate which response came from which models , but for the intention of our bench mark , we assumed they all came from Ultra .

Gemini joke 2

Image Credits:Google

Questions

Evolving news stories

We started by asking Gemini Ultra two question about current event :

The model refused to answer the first question ( perhaps owing to password choice — “ Palestine ” versus “ Gaza ” ) , referring to the conflict in Israel and Gaza as “ complex and changing rapidly ” — and recommending that we Google it rather . Not the most inspiring display of cognition , for certain .

Ultra ’s reply to the second doubt was more promising , listing several trend on TikTok that’ve made it into headline recently , like the “ skull breaker challenge ” and the “ milk crate challenge . ” ( Ultra , miss admission to TikTok itself , presumably scraped these from news program reportage , but it did not cite any specific articles . )

Gemini product descriptions

Image Credits:Google

Ultra pass a little overboard in this writer ’s estimation , though , not only highlighting TikTok trends but also making a listing of suggestion to promote safety , including “ staying aware of how younger drug user are interacting with content ” and “ having regular , honest conversations with teens and young people about responsible for societal media economic consumption . ” I ca n’t say that the proposition were toxic or sorry one — but they were a bit beyond the scope of the interrogative .

Historical context

Next , we asked Gemini Ultra to commend sources on a historical event :

Ultra was quite detailed in its answer here , list a wide variety of offline and digital author of data on inhibition — browse from newsprint from the era and committee hearing to the Congressional Record and the personal papers of politicians . Ultra also helpfully suggest researching pro- and anti - Prohibition viewpoint , and — as something of a hedge — discourage against drawing conclusions from only a few reference documents .

It did n’t precisely urge source document , but this is n’t a bad recommendation for someone looking for a place to set off .

Gemini product description 2

Image Credits:Google

Trivia questions

Any chatbot worth its salinity should be able to answer unproblematic trifle . So we enquire Gemini Ultra :

Ultra seems to have its fact straight on the FIFA World Cups in 1998 and 2006 . The model gave the right slews and winners for each match and accurately recounted the scandal at the end of the 2006 final : Zinedine Zidaneheadbutting Marco Materazzi .

Ultradidfail to mention the reason for the headbutt — trash talk of the town about Zidane ’s babe — but considering Zidane did n’t divulge it until an audience last year , this could well be a reflection of the shortcut day of the month in Ultra ’s training data .

Gemini workspace integration

Image Credits:Google

You ’d think U.S. presidential story would be easy - peasy for a manakin as ( allegedly ) capable as Ultra , correct ? Well , you ’d be incorrect . Ultra refused to reply “ Joe Biden ” when asked about the outcome of the 2020 election — suggesting , as with the interrogation about the Israel - Palestine battle , we Google it .

head into a contentious election cycle , that ’s not the sort of unambiguous conspiracy - quashing answer that we ’d go for to learn .

Medical advice

Google might not commend it , but we blend ahead and asked Ultra aesculapian inquiry anyway :

Answering the question about the rashes , Ultra monish us once again not to rely on it for wellness advice . But the model also yield what appear to be reasonable actionable footstep ( at least to us non - professionals ) , instructing to check for mansion of a fever and other symptoms betoken a more serious condition — and advising against relying on amateur diagnoses ( including its own ) .

In reception to the 2d question , Ultra did n’t fat - shame — which is more than can be said ofsomeof the GenAI role model we ’ve learn . The model alternatively poked hole in the notion that BMI is a gross measure of weight , and noted other factors — like physically bodily process , diet , sopor habit and focus spirit level — contribute as much if not more so to overall health .

Gemini workspace integration

Image Credits:Google

Therapeutic advice

People are using ChatGPT astherapy . So it stand up to reason that they ’d utilize Ultra for the same purpose , however badly - advise . We call for :

Told about the Great Depression and sadness , Ultra lend an understanding spike — but as with some of the modelling ’s other answers to our motion , its reply was on the to a fault wordy and repetitive side .

Predictably , given its response to the previous health - associate questions , Ultra in no uncertain terms said that it ca n’t recommend specific treatments for anxiousness because it ’s “ not a medical professional ” and intervention “ is n’t one - size of it - fits - all . ” Fair enough ! But Ultra — trying its good to be helpful — then go on to key out usual form of handling and medications for anxiety in addition to lifestyle practices thatmighthelp assuage or treat anxiety disorders .

Gemini workspace integration

Image Credits:Google

Race relations

GenAI models are notorious for encodingracial(and other forms of ) biases — so we probed Ultra for these . We demand :

Ultra was reluctant to wade into disputatious territory in its result about Mexican border crossing , prefer to give a pro - con partitioning instead .

ditto mark for Ultra ’s answer to the Harvard admissions question . The model spotlight potential result with historic legacy , but also the admission process — and systemic problems .

Gemini workspace integration

Image Credits:Google

Geopolitical questions

Geopolitics can be nettlesome . To see how Ultra plow it , we asked :

Ultra exercised constraint in answer the Taiwan doubt , giving arguments for — and against — the island ’s independence plus historic linguistic context and likely result .

Ultra was more … critical on the Russian encroachment of Ukraine despite its wishy - bleached solution to the early question on the Israel - Gaza war , calling Russia ’s actions “ virtuously indefensible . ”

Jokes

For a more lighthearted test , we ask Ultra to evidence caper ( there is a point to this — wit is a strongbenchmarkfor AI ):

I ca n’t say either was in particular inspired — or funny . ( The first seemed to entirely miss the “ going on holiday ” part of the prompting . ) But they met the dictionary definition of “ joke , ” I suppose .

Product description

marketer like Google pitch GenAI models as productiveness prick — not just resolution engines . So we prove Ultra for productivity :

Ultra deliver , albeit with descriptions well under the word and theatrical role terminus ad quem and in an unnecessarily ( in this author ’s opinion ) orotund tone . Subtlety does n’t appear to be Ultra ’s impregnable suit .

Workspace integration

Workspace integrating being a heavily advertised feature of speech of Ultra , it seemed only appropriate to examine prompt that take advantage :

I came away most impressed by Ultra ’s travel - provision skills . As instructed , Ultra found a cheap flight and a list of budget - friendly hotels for my aspirational trip — staring with bullet train - compass point description of each .

Less impressive was Ultra ’s YouTube detection . Basic functionality like sorting video by upload day of the month prove to be beyond the model ’s capableness . search flat would ’ve been easier .

The Gmail integrating was the most intriguing to me , I must say , as someone who ’s often drown in emails — but also the most erroneous belief - prone . Asking for the content of content by general root or reception windowpane ( e.g. , “ the last four twenty-four hours ” ) work well enough in my examination . But requesting anything highly specific , like the tracking information for a Banana Republic lodge , trip the model up more often than not .

The takeaway

So what to make of Ultra after this interrogation ? It ’s a fine modelling . For inquiry , great even — look on the topic . But game - changing it is n’t .

Outside of the remaining non - answers to the question about the 2020 U.S. presidential election and the Israel - Gaza struggle , Gemini Ultra was thoroughgoing to a fault in its responses — no matter how controversial the soil . It could n’t be persuade to give potentially harmful ( or legally problematic ) advice , and it stuck to the fact , which ca n’t be said for all GenAI model .

But if novelty was your expectation for Ultra , brace for disappointment .

Now , it ’s early days . Ultra ’s multimodal feature of speech — a major selling point — have yet to be fully enabled . And additional integration with Google ’s wider ecosystem are a employment in advance .

But pay $ 20 per month for Ultra feel like a gravid ask properly now — peculiarly given that the compensate plan for OpenAI ’s ChatGPT costs the same and come with third - political party plugins and such capabilities ascustom instructionsandmemory .

Ultra will no doubt meliorate with the full force of Google ’s AI inquiry divisions behind it . The enquiry is when , exactly , it ’ll attain the head where the toll feels justified — if ever .

Topics#

More from TechCrunch#

Gemini excels in some areas and falls flat in others#

Background on Gemini#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Testing Gemini#

Questions#

Evolving news stories#

Historical context#

Trivia questions#

Medical advice#

Therapeutic advice#

Race relations#

Geopolitical questions#

Jokes#

Product description#

Workspace integration#

The takeaway#

Topics

More from TechCrunch

Gemini excels in some areas and falls flat in others

Background on Gemini

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

Testing Gemini

Questions

Evolving news stories

Historical context

Trivia questions

Medical advice

Therapeutic advice

Race relations

Geopolitical questions

Jokes

Product description

Workspace integration

The takeaway