Skip to content

Commit 2d8d97a

Browse files
chore: Update crawled data
1 parent 8616a17 commit 2d8d97a

File tree

44 files changed

+7187
-356
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+7187
-356
lines changed

crawled_output/.txt

Lines changed: 35 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Google is a 'bad actor' says People CEO, accusing the company of stealing content | TechCrunch
1+
Why the Oracle-OpenAI deal caught Wall Street by surprise | TechCrunch
22
TechCrunch Desktop Logo
33
TechCrunch Mobile Logo
44
LatestStartupsVentureAppleSecurityAIAppsDisrupt 2025
@@ -51,18 +51,20 @@ Partner Content
5151
TechCrunch Brand Studio
5252
Crunchboard
5353
Contact Us
54-
Image Credits:Fortune Brainstorm Tech (opens in a new window)
54+
Image Credits:Algi Febri Sugita / SOPA Images / LightRocket / Getty Images
5555
AI
56-
Google is a ‘bad actor’ says People CEO, accusing the company of stealing content
57-
Sarah Perez
58-
11:53 AM PDT · September 12, 2025
59-
The CEO of the largest digital and print publisher in the U.S. has accused Google of being a bad actor for crawling its websites to support the search giant’s AI products.
60-
Neil Vogel, CEO of People, Inc. (formerly Dotdash Meredith), a publisher that operates over 40 brands, including People, Food & Wine, Travel & Leisure, Better Homes & Gardens, Real Simple, Southern Living, AllRecipes, and others, said that Google is not playing fair because it uses the same bot to crawl websites to index them for the Google search engine as it does to support its AI features.
61-
“Google has one crawler, which means they use the same crawler for their search, where they still send us traffic, as they do for their AI products, where they steal our content,” said Vogel, speaking at the Fortune Brainstorm Tech conference this week.
62-
He noted that three years ago, Google Search accounted for about 65% of the company’s traffic and that has since dropped to the “high 20s.” (Vogel shared an even more startling statistic with AdExchanger last month, saying that as of several years ago, Google’s traffic accounted for as much as 90% of People Inc.’s traffic from the open web.)
63-
“I’m not complaining. We’ve grown our audience. We’ve grown our revenue,” Vogel told conference attendees. “We’re doing great. What is not right about this is: you cannot take our content to compete with us.”
64-
Vogel believes publishers need more leverage in the AI era, which is why he feels it’s necessary to block AI crawlers — automated programs that scan websites to train AI systems — as that can force them into content deals. His company, for example, has a deal with OpenAI, which Vogel described as a “good actor.”
65-
People, Inc. has been leveraging web infrastructure company Cloudflare’s latest solution to block AI crawlers that don’t pay, prompting AI players to approach the publisher with potential content deals. While Vogel wouldn’t directly name the companies involved, he said they were “large LLM providers.” No deals have been signed yet, but Vogel said the company is “much further along” than before it adopted the crawler-blocking solution.
56+
Why the Oracle-OpenAI deal caught Wall Street by surprise
57+
Tim De Chant
58+
Rebecca Szkutak
59+
1:01 PM PDT · September 12, 2025
60+
This week, OpenAI and Oracle shocked the markets with a surprise $300 billion, five-year agreement, part of a surge of new business that sent the cloud provider’s stock skyrocketing. But maybe the markets shouldn’t have been taken by surprise. The deal is a reminder that, despite Oracle’s legacy status, the company still plays a major role in AI infrastructure.
61+
On the OpenAI side, the agreement was more revealing than the lack of details suggest. For one, the startup’s willingness to pay so much for compute provides a measurement of the startup’s appetite — even if it’s unclear where the electricity to power said compute is coming from or how it will pay for it.
62+
Chirag Dekate, a vice president at research firm Gartner, told TechCrunch it’s clear why both sides were interested in this deal. It makes sense for OpenAI to work with several infrastructure providers, he noted. It also diversifies the company’s infrastructure — spreading out risk among several cloud providers — and gives OpenAI a scaling advantage compared to competitors.
63+
“OpenAI seems to be putting together one of the most comprehensive global AI supercomputing foundations for extreme scale, inference scaling where appropriate,” Dekate said. “This is quite unique. This is probably exemplary of what a model ecosystem should look like.”
64+
Some industry watchers expressed surprise that Oracle was involved, citing the company’s diminished role in the AI boom compared to cloud rivals like Google, Microsoft Azure, and AWS. But Dekate argues that observers shouldn’t be so surprised: Oracle has worked with hyperscalers before, and provides the infrastructure for TikTok’s sizable U.S. business.
65+
“Over the decades, they actually built core infrastructure capabilities that enabled them to deliver extreme scale and performance as a core part of their cloud infrastructure,” Dekate said.
66+
Payment and power
67+
But even as the stock market celebrates the deal, key details are missing and questions around power and payment remain.
6668
Techcrunch event
6769
Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025
6870
Netflix, Box, a16z, ElevenLabs, Wayve, Sequoia Capital, Elad Gil — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before Sept 26 to save up to $668.
@@ -72,21 +74,28 @@ San Francisco
7274
|
7375
October 27-29, 2025
7476
REGISTER NOW
75-
However, Vogel pointed out, Google’s crawler can’t be blocked because doing so would also prevent the publisher’s websites from being indexed in Google Search, cutting off that “20%-ish” of traffic that Google still delivers.
76-
“They know this, and they’re not splitting their crawler. So they are an intentional bad actor here,” Vogel declared.
77-
Janice Min, the editor-in-chief and CEO at newsletter provider Ankler Media, agreed, calling big tech companies like Google and Meta longtime “content kleptomaniacs.”
78-
“I don’t see the benefit to us in partnering with any AI company right now,” she said, adding that her company blocks AI crawlers.
79-
Meanwhile, Cloudflare CEO Matthew Prince, whose company makes the AI-blocking solution (and who was also on the panel), said he believed that things would still change in the future when it comes to how the AI companies behave. He suspected those changes could be prompted by new regulations.
80-
The Cloudflare exec also questioned whether fighting the AI companies using legal solutions around things like copyright law, created for the pre-AI era, was the right answer.
81-
“I think that it’s a fool’s errand to go down that path, because, in copyright law, typically, the more derivative something is, the more it’s protected under fair use…What these AI companies are doing is they’re actually creating derivatives,” Prince said. “And so if you look at the best case law that’s come out so far, it’s actually said that the use by Anthropic and others — the reason Anthropic settled the other day with all the book publishers for $1.5 billion — was for them to be able to preserve the positive copyright ruling that they got.”
82-
Prince also proclaimed that “everything that’s wrong with the world today is, at some level, Google’s fault,” because the search giant had taught publishers to value traffic over original content creation, triggering publishers like BuzzFeed to write for clicks. Still, he admitted that Google was in a tough spot right now from a competitive standpoint.
83-
“Internally, they’re having massive fights about what they do, and my prediction is that, by this time next year, Google will be paying content creators for crawling their content and taking it and putting it in AI models,” he said.
77+
OpenAI has made a string of infrastructure investment announcements over the past year, each one with an eye-popping price tag. OpenAI has committed to spend around $60 billion a year for compute from Oracle and $10 billion to develop custom AI chips with Broadcom.
78+
Meanwhile, OpenAI said in June it hit $10 billion in annual recurring revenue, up from around $5.5 billion last year. That figure includes revenue from the company’s consumer products, ChatGPT business products, and its API. And while its CEO Sam Altman has painted a rosy picture of its future prospects in terms of subscribers, products, and revenue, the company is burning through billions of dollars in cash each year.
79+
Power is another question, or more specifically where the companies plan to source the energy needed to run this level of compute.
80+
Industry observers have been predicting a near-term boost for natural gas, though solar and batteries are arguably better positioned to deliver power sooner and at lower cost in many markets. Tech companies are also betting big on nuclear.
81+
Despite market moving headlines, the energy impact of OpenAI’s anticipated growth isn’t entirely unexpected. Data centers are anticipated to consume 14% of all electricity in the U.S. by 2040, according to a report the Rhodium Group published yesterday.
82+
Compute has always been a constraint for AI companies, so much so that investors have bought thousands of Nvidia chips to ensure their startups have access to the power they need. Andreessen Horowitz has reportedly purchased over 20,000 GPUs, while Nat Friedman and Daniel Gross rented access to a 4,000 GPU cluster (though maybe Meta owns that now).
83+
But compute is worthless without power. To ensure their data centers remain juiced, large tech companies have been snapping up solar farms, buying nuclear power plants, and inking deals with geothermal startups.
84+
So far, OpenAI has been relatively quiet on that front. CEO Sam Altman has placed several prominent bets in the energy sector, including Oklo, Helion, and Exowatt, but the company itself hasn’t thrown money into the space like Google, Meta, or Amazon.
85+
With a 4.5 gigawatt compute deal, that may soon change.
86+
The company may play an indirect role, paying Oracle to handle the physical infrastructure — something it has extensive experience with — just as Altman invested in startups aligned with OpenAI’s future power needs. That will leave the company “asset light,” something that will undoubtedly please its investors and help keep its valuation in line with other software-centric AI startups and not with legacy tech firms, which are burdened with pricy infrastructure.
8487
Topics
85-
AI, AI, Google, media, Media & Entertainment, people inc, pubishers, TC
86-
Sarah Perez
87-
Consumer News Editor
88-
Sarah has worked as a reporter for TechCrunch since August 2011. She joined the company after having previously spent over three years at ReadWriteWeb. Prior to her work as a reporter, Sarah worked in I.T. across a number of industries, including banking, retail and software.
89-
You can contact or verify outreach from Sarah by emailing sarahp@techcrunch.com or via encrypted message at sarahperez.01 on Signal.
88+
AI, data centers, Enterprise, OpenAI, oracle
89+
Tim De Chant
90+
Senior Reporter, Climate
91+
Tim De Chant is a senior climate reporter at TechCrunch. He has written for a wide range of publications, including Wired magazine, the Chicago Tribune, Ars Technica, The Wire China, and NOVA Next, where he was founding editor.
92+
De Chant is also a lecturer in MIT’s Graduate Program in Science Writing, and he was awarded a Knight Science Journalism Fellowship at MIT in 2018, during which time he studied climate technologies and explored new business models for journalism. He received his PhD in environmental science, policy, and management from the University of California, Berkeley, and his BA degree in environmental studies, English, and biology from St. Olaf College.
93+
You can contact or verify outreach from Tim by emailing tim.dechant@techcrunch.com.
94+
View Bio
95+
Rebecca Szkutak
96+
Senior Reporter, Venture
97+
Becca is a senior writer at TechCrunch that covers venture capital trends and startups. She previously covered the same beat for Forbes and the Venture Capital Journal.
98+
You can contact or verify outreach from Becca by emailing rebecca.szkutak@techcrunch.com.
9099
View Bio
91100
October 27-29, 2025
92101
San Francisco

crawled_output/?utm_source=tc&utm_medium=ad&utm_campaign=disrupt2025&utm_content=ticketsales&promo=rightrail_disrupt2025_rb&display=.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ TechCrunch Events
9090
TechCrunch Disrupt 2025
9191
Last day to amplify your brand: Host your Side Event at TechCrunch Disrupt 2025
9292
TechCrunch Events
93-
4 hours ago
93+
3 hours ago
9494
TechCrunch Disrupt 2025
9595
VC leaders from 01 Advisors take the Builders Stage at TechCrunch Disrupt 2025 to share the scaling playbook
9696
TechCrunch Events

crawled_output/Wikipedia.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ Video of Wikimania 2005 – an annual conference for users of Wikipedia and oth
269269
Each article and each user of Wikipedia has an associated and dedicated "talk" page. These form the primary communication channel for editors to discuss, coordinate and debate.[116] Wikipedia's community has been described as cultlike,[117] although not always with entirely negative connotations.[118] Its preference for cohesiveness, even if it requires compromise that includes disregard of credentials, has been referred to as "anti-elitism".[W 34]
270270
Wikipedians and British Museum curators collaborate on the article Hoxne Hoard in June 2010.
271271
Wikipedia does not require that its editors and contributors provide identification.[119] As Wikipedia grew, "Who writes Wikipedia?" became one of the questions frequently asked there.[120] Jimmy Wales once argued that only "a community ... a dedicated group of a few hundred volunteers" makes the bulk of contributions to Wikipedia and that the project is therefore "much like any traditional organization".[121] Since Wikipedia relies on volunteer labour, editors frequently focus on topics that interest them.[122]
272-
The English Wikipedia has 7,055,398 articles, 49,668,018 registered editors, and 111,223 active editors. An editor is considered active if they have made one or more edits in the past 30 days.[W 35] Editors who fail to comply with Wikipedia cultural rituals, such as signing talk page comments, may implicitly signal that they are Wikipedia outsiders, increasing the odds that Wikipedia insiders may target or discount their contributions. Becoming a Wikipedia insider involves non-trivial costs: the contributor is expected to learn Wikipedia-specific technological codes, submit to a sometimes convoluted dispute resolution process, and learn a "baffling culture rich with in-jokes and insider references".[123] Editors who do not log in are in some sense "second-class citizens" on Wikipedia,[123] as "participants are accredited by members of the wiki community, who have a vested interest in preserving the quality of the work product, on the basis of their ongoing participation",[124] but the contribution histories of anonymous unregistered editors recognized only by their IP addresses cannot be attributed to a particular editor with certainty.[124] New editors often struggle to understand Wikipedia's complexity. Experienced editors are encouraged to not "bite" the newcomers in order to create a more welcoming atmosphere.[125]
272+
The English Wikipedia has 7,055,399 articles, 49,668,018 registered editors, and 111,223 active editors. An editor is considered active if they have made one or more edits in the past 30 days.[W 35] Editors who fail to comply with Wikipedia cultural rituals, such as signing talk page comments, may implicitly signal that they are Wikipedia outsiders, increasing the odds that Wikipedia insiders may target or discount their contributions. Becoming a Wikipedia insider involves non-trivial costs: the contributor is expected to learn Wikipedia-specific technological codes, submit to a sometimes convoluted dispute resolution process, and learn a "baffling culture rich with in-jokes and insider references".[123] Editors who do not log in are in some sense "second-class citizens" on Wikipedia,[123] as "participants are accredited by members of the wiki community, who have a vested interest in preserving the quality of the work product, on the basis of their ongoing participation",[124] but the contribution histories of anonymous unregistered editors recognized only by their IP addresses cannot be attributed to a particular editor with certainty.[124] New editors often struggle to understand Wikipedia's complexity. Experienced editors are encouraged to not "bite" the newcomers in order to create a more welcoming atmosphere.[125]
273273
Research
274274
A 2007 study by researchers from Dartmouth College found that "anonymous and infrequent contributors to Wikipedia ... are as reliable a source of knowledge as those contributors who register with the site".[126] Jimmy Wales stated in 2009 that "[I]t turns out over 50% of all the edits are done by just 0.7% of the users ... 524 people ... And in fact, the most active 2%, which is 1400 people, have done 73.4% of all the edits."[121] However, Business Insider editor and journalist Henry Blodget showed in 2009 that in a random sample of articles, most Wikipedia content (measured by the amount of contributed text that survives to the latest sampled edit) is created by "outsiders", while most editing and formatting is done by "insiders".[121]
275275
In 2008, a Slate magazine article reported that "one percent of Wikipedia users are responsible for about half of the site's edits."[127] This method of evaluating contributions was later disputed by Aaron Swartz, who noted that several articles he sampled had large portions of their content (measured by number of characters) contributed by users with low edit counts.[128] A 2008 study found that Wikipedians were less agreeable, open, and conscientious than others,[129] although a later commentary pointed out serious flaws, including that the data showed higher openness and that the differences with the control group and the samples were small.[130] According to a 2009 study, there is "evidence of growing resistance from the Wikipedia community to new content".[131]
@@ -312,7 +312,7 @@ Articles in the 20 largest language editions of Wikipedia
312312
5
313313
6
314314
7
315-
English 7,055,398
315+
English 7,055,399
316316
Cebuano 6,116,080
317317
German 3,049,991
318318
French 2,708,791

0 commit comments

Comments
 (0)