All Articles

The Great Inversion of the Web: Personal Data Clusters & Sovereign AI

The Great Inversion of the Web: Personal Data Clusters & Sovereign AI

This article explores the long-awaited realization of Tim Berners-Lee's vision for the Semantic Web. While the true potential of Web3 has long seemed out of reach, recent technological leaps have finally made it feasible. This deep dive examines the core pillars of our next digital iteration, unpacking concepts like peer-to-peer networks, sovereign data, and the rise of personal AI assistants. Discover how the inevitable "inversion" of the web will move your data out of massive corporate silos and into your own Personal Data Clusters, fundamentally transforming digital ownership, security, and the future of human-AI interaction.

Listen to the Audio Version.

Author: Dante Cullari - CEO, Konvergence Inc

Since the early days of the World Wide Web, Tim Berners-Lee understood the importance of Semantics in the web, that “all the documentation systems out there … being possibly part of a larger imaginary documentation system.”

TimBL is even said to have coined the term Semantic Web, however providing the world with essentially a digital Lego System, and expecting everyone to build the same way is  indeed a very tall ask, and so Semantic Web, sometimes referred to as Web3, has long struggled with feasibility questions. Indeed myself, as a seasoned Web3 Architect, do not believe the world has yet experienced Web3, which means there has long been a kind of potential version of the World Wide Web that has remained seemingly out of reach throughout the turn of the century.

It is only recently, within the decade of this writing, that the technology and heuristics to make Semantic Web not just feasible, but the most likely next iteration of the Web, have come about. It is this new iteration of the Web which we will dive into a bit more in this article.

Briefly first, we need to distinguish two common terms. 

1. Semantic Web - an extension of the Web, where the goal is to make all web data inter-operable, often referred to as normalizing  data, making it’s structure more  uniform and predictable, thus making it machine-readable.

2. Web3 - at it’s core, a peer2peer and distributed communication network, and a common synonym for Semantic Web, but more recently used largely by Blockchain and Cryptocurrency communities and companies. Though Blockchain and crypto technologies have greatly contributed to Web3, and have likely served as the most successful early examples of Web3 being implemented to date, the difference is that Blockchain and cryptocurrency technologies could not properly function without Web3 as their communication network, a centralized semantics utilized by decentralized nodes, but Web3 networks do not require blockchain or cryptocurrencies to function whatsoever.

Web3, while it’s relation to Semantic Web is still very strong, has actually evolved over recent decades as an umbrella term for a broader set of aspirational web features beyond inter-operable data, to also include the following ideal list among likely others, in no certain order:

1. Peer-2-Peer (without intermediaries)

2. Distributed Data - Sovereign Data

3. Eventual Consistency - Shared State

4. Semantic Web

5. Sensors - Internet of Things (IoT)

6. Personal AI Assistants - Sovereign AI

7. Direct to Consumer (D2C) Content Distribution

8. User-Centric Web - Sovereign Custody

9. Open Source

10. Immutability

11. Decentralization

12. Portals & Worlds

13. Crypto-Native

14. Increased Security

15. Global & Inter-Planetary Accessibility

16. Distributed Storage & Compute

It has become clear that a system encompassing nearly all of the original and evolving ideas of what the Web could be, is now appearing for the first time. 

If the Internet and Web are considered influential, this new evolution of Web3 is certainly poised to be just as, if not wildly more influential, to any aspect of human life where data and digital systems currently exist or could.

Let’s explore these various aspects and their potential impacts further.

Peer-2-Peer (P2P)

Many of us remember Napster and Limewire or are familiar with VPNs, torrents or onion routing. These are all examples of Peer-2-Peer Networks which usually means you download a program to your device or server which can then dial directly to any other peer with the same software connected to the network, and once connected, you can then send data and files back and forth. Early versions of Skype are a great example of a P2P network with video being what is sent back and forth. Each user downloads Skype on their computer and once they know the username of someone else, they can then talk to that person directly. The Skype software initially was using the user’s computer hardware resources to run the software, and it’s connecting via the internet to another users’ computer far away. The two servers (computers connected to the internet) are creating a direct Peer-2-Peer connection without the need of an intermediary.

Onion routing, as in Tor browser, and Mixnets, are implementations which attempt to also scramble the traffic in the network so that one piece of data from one server/device ideally looks like every other piece of data from every other server/device, which provides a level of anonymity as well as security.

Onion routing and mixnets have become central concepts for the communication protocols of Web3. While there is currently not one standard Web3 P2P protocol, a popular one called IPFS - Inter-Planetary File System has emerged as a leader. It utilizes a custom P2P software stack called LibP2P, which is the core communication library that initially Bitcoin and currently Ethereum nodes use for their global and decentralized Peer-2-Peer network communication. IPFS adds a very useful file system and content routing protocol which pairs with the P2P network. I will cover IPFS a bit more below. Without IPFS, Ethereum nodes would not be able to confirm network transactions in a trustless, decentralized way.

Distributed Data - Sovereign Data

Distributed Data is common among P2P networks. As in the original Skype example, each user’s computer is outfitted with a copy of the software, and the software relies on each device’s hardware to implement the network, meaning the infrastructure and data of the network is distributed across users. This is one form of distributed data. Another is a clustered cloud model such as Kubernetes, where multiple servers/nodes are outfitted with the software, often based in different geographic locations, and they can connect to different users via node IP addresses, but are also all connected to other nodes and can be subscribed to real-time updates so that all of the nodes are getting access to the same network data in near real time. These clusters can be configured to orchestrate which hardware is used to implement certain tasks in the network, or they may be configured to combine their resources in order to break up or “shard” computation for a single task into more manageable pieces. 

IPFS - Inter-Planetary File System, implements a very interesting model using distributed infrastructure. It boasts a useful concept called Content Addressing which means instead of dialing the server/IP Address where your content exists (ex: youtube.com/id), you basically ask the network if it has seen a certain content id. The id of the content is the hash address that allows you to locate the file in the network even without knowing where it exists at any given time. When requests for a specific content id are sent to a node, if it doesn’t have the file, it will ping other nodes in the network to ask if they have the content, and the nodes will “gossip” back and forth sharing the ids of which content they both hold and have been asked for. Each node in the network is talking to its neighbors all the time, who are constantly talking to their neighbors and so on. 

There are several gossip-like algorithms that have proven effective for sharing and syncing this kind of information throughout the network, which LibP2P and IPFS implement today depending on the size of the network and prioritizations of the users and other actors such as app developers. The speed of the propagation of these algorithms is a constant goal post. No doubt there are many permutations of these algorithms, some yet to be discovered, that will benefit certain network topologies over others. By syncing information about where any content can be reached, and by creating copies of content on different nodes in different geographic locations, we can now create networks which at scale can be much more efficient than traditional TCP/IP routing which the vast majority of our web uses today.   

In yet newer implementations, such as in the Internet Computer protocol and Filecoin’s Virtual Machine (FVM), the goal is that computing power can be rented from multiple third parties using cryptocurrencies like ICP and FIL, and real-time 24/7 resource marketplaces can emerge. These systems provide a sort of cryptographic and independently verifiable receipt of services rendered. In this case the services are cloud based processing to power applications or cloud hard drive storage. In this way, network infrastructure does not have to exist solely on the user’s computer or device, but the hardware used by network users can be distributed across several locations and rented when needed. These are still fairly early stage systems with a lot of room for optimization, though they are beginning to be used in corporate settings. 

With this technology’s rise, most famously implemented as the communication backbone of Ethereum, and in the past, Bitcoin, it’s also not a stretch to visualize the rise of consumer grade Personal Data Clusters (PDCs), a set of personally owned cloud servers likely on different hosting providers, clustered together to host a web user’s data. This is the basis for Sovereign Data - distributed, decentralized, fast, independent and available hardware infrastructure where end-users own and control the private keys to the encryption of their data.

Eventual Consistency - Shared State

Eventual Consistency is also a very important concept in Web3 and all P2P networks. In a network where there are thousands of updates going on constantly, it can take some time for something like a gossip algorithm to propagate those updates throughout the entire network, and this can be a big issue because it means at any given time, certain users in the network will see changes where others will not, and in that time, other updates can take place and things can get complex very quickly. One of the solutions to this is called CRDTs - Conflict-free Replicated Data Types. Basically, when updates happen and are sent to other nodes, those other nodes keep track of the update, but they also keep track of the original time which this update occurred, so that as other updates come in, if it detects that an update it’s already received should come after one that it just received, it can re-arrange those events into the correct order and reconfigure the application’s state so that eventually, the application states across users maintain a consistent order of events. It’s extremely important to maintain at least an eventually consistent application state, because this can be the difference between your bank being aware of a recent payment and your electric utility being aware of it. If they’re not consistent, and if events don’t happen in the right order, sticky situations can occur.

The ideal scenario is for all nodes to be fully consistent all the time and at lightning speed, but given our current technology that is likely a fairly far off goal, and so eventual consistency is the best we can hope for at the moment. An accessible goal now is to further optimize eventually consistent algorithms for speed and accuracy. Without application state consistency at consumer grade speeds, Web3 is essentially impossible. CRDTs is our first promising solution, along with CQRS, and they are both being utilized widely in Web2 today which is a good sign, with many other potential solutions beginning to come to light. 

Semantic Web 

This idea was first crystalized by Tim Bernors-Lee in 1999:

 “I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize.[9]

The main goal of implementing the Semantic Web, or  Metaverse as it’s sometimes called, is to have centralized semantics, but decentralized infrastructure. In this world, any toaster can communicate with any TV (Internet of Things), and every user profile for every service you use is the same if you wish. You only ever fill it out once and you edit the same one. Apps may have their own schemas, but when available, they would use the existing ones compatible with what everybody else uses. This greatly increases the utility and cross-reference-ability of all the data you create which enables so many new possible features that make this version of the web considerably more useful.

Imagine if a Spotify video and a Youtube video are both subsets of the same parent Video schema and Youtube Channels and Spotify Artist Accounts are both subsets of a parent Creator schema. All of a sudden you can create a streaming service that can play video playlists containing both Youtube and Spotify videos, regardless of the source, and you can get recommendations from your friends regardless of which platform they use the most. 

When we make our data inter-operable, we make the web a much more useful place than what we know today. As TimBL touched on, AI and machines talking to machines on our behalf is an additional extension of this feature set which I will expand on more below.

Sensors - Internet of Things (IoT)

In a system implementing a Peer-2-Peer network on distributed software and hardware, which uses IPFS for file storage and content addressing for routing traffic, and implements centrally semantic schemas with eventual consistency across nodes, with lightning fast propagation which all are very real systems present in our world today to at least some degree, it’s not a far cry to then consider that there could soon exist potentially millions of independently verifiable sensors monitoring almost anything that could or should be openly monitored, with sensors operating independently but in constant communication with each other and any other person or peer in the network interested. Not just unifying our schemas, but making our peer communications more efficient means researchers, healthcare workers, scientists, hobbyists and the general public can have real-time access to independently verified data straight from the sources, verified by decentralized oracle networks where data is cross-referenced with other independent sensors to create larger and more accurate pictures of realities on the ground within a certain domain. 

This crowdsourcing of sensor data with ubiquitous and equitable access, can help better inform our decisions and react appropriately to issues, potentially in near real-time, but certainly much faster than today. And when every device - TV, computer, thermostat, speaker, AC unit, etc, can contribute sensor data to this unified network where data is is accessible by end-users, not just the manufacturers, the Internet of Things becomes a powerful tool in everyone’s tool belt.

Imagine if in the early days of the Covid-19  Pandemic, as one example, the Italian people had independently verifiable, near-real-time unified data on the load capacities at various hospitals. Based on this, the Italian government could have issued alerts to people on their way to certain hospitals, routing them to less crowded facilities or asking them to opt to stay home and access help via telemedicine. Imagine if overcrowding of hospitals and staff’s inability to intake patient data and effectively respond to the crisis situation could have been dramatically mitigated, if we had only had the right kinds of sensors, unified schemas and data sharing/syncing technology.  

A big goal of Web3 is to gather, verify and distribute data much more efficiently from the source, directly to segments or the rest of the network as a whole. With equitable access to inter-operable and independently verified data, the goal is a sort of attribute of verifiable transparency. When uncredible data and cheating is easy to spot and incentives are aligned so that participating in and supporting the network are more rewarding than attacking it, as long as there remains equitable access and low barriers to entry, we could see giant leaps in the studies of many fields thanks to far improved synthesis of data and ideas between peers in the network, initially globally accessible, but also eventually inter-planetarily.

Personal AI Assistants - Sovereign AI

With the rise of generative models like ChatGPT and Claude, it’s clear that these systems are only as useful as the data they have access to. With most Web2 user’s data currently being spread across dozens of provider-owned servers and databases, each with their own ontologies and definitions, Web2 has created a less-than-optimal environment for LLM models. If personal agents, or digital twins are actually inevitable, it will require the inversion of our current Web2 model, or the re-centralization of personal data to individualized servers where local AI agents can cross-reference and access all of user’s data as it’s being created. 

Indeed, AI’s hunger for it’s patron’s data will act as a catalyst for Web3 by establishing personalized data lakes on a user level. The big questions are, where is this data stored, who has access, and who owns the private keys to its encryption? Web3 enabling peer-2-peer communication makes Personal Data Cluster nodes the ideal answer to these questions.

When web users own their own data and have control over the software and algorithms that are accessing it, it seems there will also be less anxiety over allowing an artificial intelligence chosen by the user to run machine learning on their data in order to help them optimize many facets of their life without sharing that information directly with 3rd parties. 

Imagine your AI placing your weekly grocery orders, doing your taxes, scheduling meetings or trading stocks for you 24/7. Once people off-load their data and apps to servers or clusters that don’t lose battery and have 99.999% uptime, this becomes not just possible, but probable. Each of us will have our own personal AI assistants in our personal data clusters, and the potential of all of that extra efficiency in our daily lives seems staggering.

Direct to Consumer (D2C) Content Distribution

The world’s first true distributed web browser was called Facebook Newsfeed. Before Facebook’s Newsfeed, internet users would have to search for the site or content they wanted to view, or scroll through an RSS feed, and then they would have to click on a link to go to the content, via something called a website. On Facebook’s Newsfeed, for the first time, internet users could view content from another website directly in-line, on the website they were already on. An early example was playing a Youtube video from within the newsfeed itself, without having to leave Facebook. Though we take this for granted now, prior to this, accessing content online was a much more manual and time-consuming task. With Facebook’s Newsfeed, content (Youtube video) is distributed to the consumer, embedded right into the newsfeed. This made Facebook a sort of one-stop-shop for content viewing, and with the social graph including entities like Musicians, Actors, TV shows, etc., it was a way to continue to follow content creators you liked overtime and aggregate their content.

This method of browsing content is the standard way Web3 browsers will allow you to browse content, except in Web3, when you follow your favorite Musician or TV show, you can connect directly to their servers, and they can distribute their content directly to you and other fans. If all users in the network are running the software on their own machines, and so is the content creator, there is no need for Facebook and algorithmic filters to be an intermediary to this connection. Imagine a streaming service where 100% of the ad revenues go to the content creator instead of 20%. This is a major shift from the way content is distributed today.

User-Centric Web - Sovereign Custody

There is another benefit of a P2P content distribution model, and that is that the consumers and content creators gain agency and controls over how content is distributed. Without 3rd parties like Facebook and Youtube dictating the newsfeed algorithms that populate the content we see, the newsfeed can be optimized for user-centric relevance, instead of optimal screen time and revenue for 3rd parties. Imagine as a consumer, you or your personal AI has the controls to turn the knobs on your newsfeed algorithms, to decide for yourself what content you want to prioritize and also filter out. This is the subject of Konvergence’s first patent in the space. Believe it or not, before this patent, there were no pre-existing social discovery algorithms that provided the user access to customize and control the algorithms for themselves.

This model not only provides controls to consumers, it also provides agency to both them and service providers/content creators. This model puts the content creators in control of the infrastructure their businesses operate on. Just like the internet today as a whole, there is no web-wide content policy. If you’re doing something illegal, you are subject to your local laws, but there’s no community strikes or getting kicked off the web as a whole. Web3 will be no different. The consumers are responsible for filtering out content they don’t want to see, which they can do using customized lists of keywords and content types, or by applying popular defaults. With social trust graph discovery algorithms, you can also use people you actually trust in various domains to filter content for you.

Consumers will be in control of their own data and who can access it. Instead of signing dozens of terms of service from 3rd parties related to data usage, 3rd parties will have to abide by the consumer’s terms for how their data can be used, and consumers can choose not to share it with a 3rd party at all. Part of this data will also include the Private Keys to the encryption of one’s personal data. Instead of private keys being store on 3rd-party servers, or on the user’s own nodes, private keys will likely be decoupled from the nodes and data themselves, utilizing devices like crypto hardware wallets and YubiKeys so that multi-factor authentication is always on, and solely in the hands of the users. In the event of a data breach or ransomware event, users are notified, affected nodes and hosting providers are quarantined and spun down, while new nodes are spun up elsewhere with data replicating from other nodes in the user’s personal cluster, enabling self-healing.

When you have inter-operable schemas throughout the Semantic Web and you are in control of all of your data which is stored on server nodes which you own and control with your own private key storage, your data becomes much more secure and useful to you. All of a sudden there are new applications that can cross-reference your entire data graph. If you’re operating a retail company, your company’s graph contains for example your inventory, product listings and your accounting. In Web2 you might use services like Shopify and Quickbooks to store this data separately for you, but in Web3 it will all be hosted together as one graph database that you control. This means developers can write programs assuming you have access to all of this data and don’t have to write custom scripts to connect Shopify with Quickbooks for example. You as the owner of the data, don’t have to get Quickbooks’ and Shopify’s permission to access this data via APIs. You are literally the master of your domain.

Open Source

The Open Source movement is central to Web3 as well on many different levels. When open source software first became available with GNU and Linux, it was not clear just how big of an impact it would have on IT or our lives, but in the decades since then it has turned out that the impact has turned out to arguably be much greater than that of closed-source software.

Android, the smartphone operating system with the largest market share world-wide is based on Linux. Without Linux being open source, Google would have had to develop their own operating system much like Microsoft and Apple had done years prior. It’s likely that this extra effort would have slowed Android’s rollout considerably and had implications for hardware and chip manufacturers like Qualcomm and Samsung who have gained considerably from being able to compete directly with the iPhone by creating hardware for Google’s open source flavor of Linux, and that competition has also served the general public in helping to meet the large demand for smartphones, driving innovation and keeping costs down. It’s likely that every time you interact with a computer today, you are also interacting in some way with some open-source software. This software can cut down new innovation development times by years in many cases, a metric which is almost unheard of when compared to other efficiency-creating innovations in other domains. Best of all, Open Source is most often free for private and commercial use.

Linus Torvalds, the father of Linux also had another major contribution to Web3. When he created a code Version Control System (VPS) called Git in 2005 (the most popular VPS for developers around the world, maybe you’ve heard of Github), he chose to integrate a data structure called Merkle Trees in order to track the changes to all the code that developers were creating. Merkle Trees are a structure invented by Ralph Merkle, also the inventor of something called Private Key Cryptography, the basis for modern encryption systems that you use everyday when you enter your password, send an encrypted message or authenticate an API. Merkle Trees are also widely used in blockchain technologies. In fact the data structure of most Blockchains is also Merkle Trees, and GIT can be used to track every block in the chain going back to the genesis block. GIT, an open source software, and Merkle Trees have been adopted by many Web3 projects such as IPFS as the main structure for how data is stored on the hardware. It allows for tracking changes, playing back history and for the external analysis of the chain to mathematically prove whether or not a codebase has been changed all the way down to a single byte.

The role of open source software in Web3 cannot be overstated, and it’s clear that this trend will continue to be one of many central pillars of Web3 innovations, including the use of open-source schemas like those available on schema.org,

Immutability

Immutability means unable to be changed. This is a broad goal of Web3 and it can take several forms. If a piece of data can’t be changed, how do you update it? You instead append a new version to the original so you are keeping a trailing history of the data. You never delete the original, you only append new versions to it. This is called an append-only model, and the chain of updates it creates resembles a unidirectional tree or branching structure, referred to as an acyclic graph or Merkle Tree. Acyclic means it’s not a closed loop, but it only goes in one direction, in this case forward in time. Acyclic graphs also allow one to track who made updates as well as revert versions backwards and create copies, or fork of the data as in GIT. The properties of GIT are extremely prevalent throughout Web3, Cryptocurrency and the technology sector as we know it today.

Another implementation of immutability is in trying to make sure that data or chains of data (graphs) don’t get lost or deleted if a server or harddrive goes down somewhere. The goal is to have multiple copies of the data in multiple places so that there is always a back-up in case one or more copies goes down. Sometimes this is referred to as data pinning, as in marking it as a priority for replication. Filecoin, a blockchain and utility token for independently verifiable data storage has already seen the launch of several pinning services built for replicating data on Filecoin.

Web3 will ideally be immutable and acyclic in addition to being controlled by the creator/owner of the data giving us time-series histories of all of our data through time. Imagine having access to one folder on your personal cluster that holds all of your medical records going back to when you were born. Every day is a chance to add new data and new entries. This is something that for most of us today, in the United States at least, is absolutely inconceivable considering all the various private doctors and labs and health facilities we’ve visited over our lifetimes, each with their own record keeping systems. Web3 aims to unify our data securely for maximum utility.

Decentralization

Decentralization is a term that has become very publicized in recent years as cryptocurrency marketcaps have grown larger and larger. Decentralization is also a goal of Web3 that is intended to be implemented on many levels. When an architect designs a building, they do not rely on a central pillar to hold the weight of the entire structure. Instead their goal is to distribute the forces reasonably throughout the building thus decentralizing the forces applied on the structure. This makes the structure more resilient. If one pillar fails, there are others which can hopefully pick up the load. Decentralization in Web3 can also mean geographically distributing resources.

In Web3, we’d like for there to be decentralization ideally on most levels, including energy, infrastructure hosting, storage backup, and even app development. There is even the idea of Decentralized Autonomous Organizations (DAOs), both nonprofit and for profit, which are essentially companies or organizations which have no single leader in charge, but instead control is distributed evenly or fairly throughout the organization. These types of organizations already exist today, however there have been many less than ideal outcomes and it’s safe to say they are still in early stages of development. 

Decentralization being a favorite technique among traditional architects for centuries, it is no different with web architects. Globally decentralized digital systems have turned out to be fairly hard systems to implement however. It wasn’t until Bitcoin gained reasonable scale with a novel incentive structure with independently verifiable and enforceable rule sets using modern cryptography and with P2P eventually-consistent networks like IPFS, that true decentralization came into the realm of reality. Based on this, Web3 with all of it’s ideal wish list items  is finally now attainable.

Portals & Worlds

This goes right in line with interoperable metadata i.e. Semantic Web and advances in user interfaces. When data is truly interoperable, with no problem you could go from playing a game with an avatar that matches that game to playing another game with the same avatar. This makes all games essentially part of the same big world. In this world you can teleport and move throughout without limits. Even on 2D flat websites, you can jump between them with full data interoperability. This is a major dream of Web3. 

When you have full interoperability, it’s not then impossible to imagine you’re on a web app where if you zoom in enough on a particular pixel, you are taken to an entirely different web app experience, turning each piece of future Web3 interfaces into a potential portal of endless portals. This allows full fluidity of web experiences throughout the metaverse. If you’ve never experienced this kind of commute, in Web3 you soon may regularly.

Crypto-Native

The major roadblocks to cryptocurrencies becoming as prevalent as Visa and Mastercard for processing transactions come down simply to scaling of the nodal infrastructure. Currently blockchains require both full and light nodes. Full nodes hold all transactions the entire network has ever processed, while light nodes hold transaction history going back to certain checkpoints, and are generally used to help independently confirm cryptographic proofs and transactions within the network. Different cryptocurrencies require different thresholds for the number of  independent confirmations of proofs recommended to consider a transaction as satisfactorily completed. For some currencies this threshold may be 10 confirmations, while for others it may be 100. This means between 10 and 100 independent nodes needs to run the computations independently, then results are all compared for confirmation of a match. When the number of transactions in the network rises, the more work the nodes in the network have to handle to confirm to satisfaction. 

Currently most blockchain nodes are operated by non-profit organizations and blockchain companies. The dream of individuals users of the network running their own nodes has not materialized as quickly as was foreshadowed. This has left the ratio of nodes to end-users horribly imbalanced, so that the number of end users may outnumber the number of nodes by between 10,000 to 100,000 users for every 1 light node. While this ratio remains in this unbalanced state, cryptocurrencies will never be able to reach Visa scale transaction levels of millions per second. The amount of time it would take to process that level of transactions would push the crypto network likely to a dead stop.

However, with the rise of AI contributing to users re-centralizing their data in their own cloud nodes that they control, it is not a far leap for these users to also install a cryptocurrency light node on their existing data nodes. If every user of a cryptocurrency is able to easily run their own light node, and the nodes are already connected on a P2P communication network, this ratio may well reach a range more like a 3:1 to 10:1 users to nodes ratio, in which case the number of nodes naturally scales with the number of users on the network. If each consumer brings their own node to process their own and their community’s transactions, it becomes difficult to see a reason to pay Visa and Mastercard the 3% fees they take for doing this same job. This ending of transaction fees remains a huge incentive for cryptocurrencies and businesses who adopt them to reach transaction scale, and when Web3 reaches its heights, it is, I believe, extremely likely that transaction fees, and potentially brick and mortar banks, will be made obsolete.

Modern cryptography has long been a staple of Web2, but on Web3 it’s utility will be cranked all the way up. Without cryptography, there would be no P2P communication protocols, no distributed infrastructure, no decentralization or independent verification. Beyond the ingenious use of cryptographic proofs and maths that make Web3 remotely possible today, it’s clear that cryptocurrencies will also be central to the Web3 story.

There are many different types of cryptocurrencies which have been created for many different reasons and use-cases. It can get confusing rather quickly, however as Web3 progresses it’s likely there will emerge clear winners and losers as certain cryptocurrency attributes are desired over others. We will use some for renting network infrastructure, others for validating and cross-referencing data, others for cross-border transactions and for buying basic goods and services, preferably without transaction fees. We will use these cryptos only if they are better, cheaper and more convenient than using traditional fiat currencies. I believe if Web3 does it’s job correctly and embeds iPhone like intuitiveness into user interfaces surrounding crypto use-cases, cryptocurrencies will absolutely be more secure, cheaper and more convenient than fiat currencies, and often times you may find yourself using them without even realizing it, similarly to how most people don’t realize they’re using the SWIFT system when they make a credit card transaction.

I believe it could easily be said that Web3 is the frontend, or user-interface for crypto. Web3 indeed does need crypto, but I believe cryptocurrencies inherently need Web3 to a much larger degree to reach their initially conceived potential.

Increased Security

Along with distributed infrastructure, decentralization and independently verifiable communications, comes increased security inherently. When sensitive user data is not stored in giant 3rd party silos, often couple with millions of other user’s data in the same database, but it is decentralized into thousands of independently owned and operated data centers, the targets get smaller and attacks less profitable. When company data is continually synced with eventual consistency and backed up by multiple hosting providers and decentralized storage providers, data ransomware attacks become ineffective. Web3 architecture changes the economics of cyber attacks considerably. The infrastructure will still be vulnerable to many of the same kinds of attacks that today’s web is vulnerable to, but the incentives and disincentives are very very different and leave users and their data in more secure and stable positions. When an attacker can’t spin up a million bots without considerable cost to participate in the network, and can’t hack a million accounts without targeting a million different independently operating nodes, and when the incentives for them to use the same network infrastructure they would use for implementing attacks, are great enough to instead warrant them adding security to the network, giant shifts in the cybersecurity landscape will take place. 

In a world where for the majority of the lifetime of the Web, we have all been sending unencrypted data around the world, back and forth between servers owned by often complete strangers, and even now with SSL and encrypted data, almost no one owns and controls the private keys to their data, it’s safe to say Web3 will be likely an order of magnitude more secure than Web2.

Detractors to these arguments may say, the inversion of user data from large company silos back to personal data clusters, and re-centralizing user data into a unified graph means if hackers do manage to get through the additional barriers Web3 puts in place to actually get to the data, and somehow, perhaps by brute-force, are able to decrypt a certain user’s data, the attacker will have not a segment of millions of user’s data, but a wholistic view of a single or possibly several user’s data, which could be considered more dangerous. 

However there are a few very important points to consider about this scenario. Firstly, Personal Data Clusters (PDCs) will contain multiple nodes by default, each hosted with different datacenter providers. This allows for the addition of new layers of security such as more robust multi-factor authentication safeguards that Web2 could not rely on without the existence of this individualized node architecture, and it is clear that as Web3 progresses, more and more new types of safeguards will become possible where they simply weren’t a viable option in the past. Perhaps more importantly though, as mentioned previously, AI and digital twin agent technology is very likely to naturally lead to the re-centralization of user data anyway. In this scenario, the traditional Web2 model becomes perhaps an order of magnitude less secure if there are suddenly millions of users storing wholistic data graphs in the same data centers with the same handful of providers. 

The only safe path in this case is to move to a Web3 architecture  where each user’s data is de-coupled from every other user’s data, to decrease the attack surface of scoring any large number of user’s data, and make attacks much more work and resource intensive. The ability for consumers of AI to counter or completely cut off this scary impending issue will likely rest solely in Web3 solutions

Global & Inter-Planetary Accessibility

As with Web2 and cryptocurrencies, Web3 will be a globally accessible system. Of course some infrastructure is needed to support it, but just as people in 3rd-world countries with high inflation are using Bitcoin to store their cash as a kind of digital anti-inflationary mattress, utilizing the infrastructure existing throughout the rest of the world, so will people in developing countries be able to rent infrastructure hosted in other countries to run their Web3 systems. Web3 is inherently built to be protocol-agnostic, meaning whatever means you want to use to send signals back and forth, TCP/IP, IPFS, Cell towers, satellites, even bouncing signals off the moon, Web3 will support it. This ensures that no matter where you are geographically, you can connect to this new web.

It is absolutely clear however, that there are places that Web2 simply can’t scale to, mainly, to other planets. It is for this reason that the Inter-Planetary File System (IPFS) was first devised. Web2 as we know it, utilizing location-based addressing, or URLs, DNS and IP addresses, is extremely inefficient. If everyone in a classroom in New York opens Youtube on their phone and requests the same video at the same time, it’s likely there would be dozens of requests sent to the same datacenter which may exist in another state, possibly California on the other side of the country. Thirty six requests originating from the same location, hopping all the way across the country and back to serve the same content to each device in the class is extremely wasteful, and it simply is not an option at planetary scale. 

Web3 however, will utilize content addressing (described above), so that in this scenario it may be the case that 1 or 2 devices in the class must make the hop all the way across the country and back with the content, but once those devices have downloaded the content, even if each device is only storing corresponding halves, the content can be stitched back together and then distributed to the other student’s devices local in the P2P network, drastically reducing the overhead bandwidth which Web2 makes mandatory today. This, and additional efficiencies such as de-duplication and distributed sharding, make Web3 the clear choice for dealing with the resource-limited realities of future space travel.

Distributed Storage & Compute

It’s easy to be confused by the dream that all of your data will get broken up into little pieces and distributed evenly across the network in an encrypted format that can be validated without reading the data, and that it can be easily re-assembled when you need it, but the truth is that the implementations we have today are really just the first steps towards that dream. Often these systems operate very similarly to traditional centralized systems, where full copies are stored and full copies are also distributed across multiple nodes, with the added functionality of independently verifiable proofs that some action was taken, like proof a file was stored. One may find this to be needlessly wasteful but the reality is that in Web2 today most data is already backed up in more than one place, it’s just usually done in the same datacenter or by the same provider and it’s done with regularly spaced snapshots such as hourly, daily or weekly. In the Web3 model, these backups still exist, it’s just that they are decentralized to completely different geographic locations and providers, and they are actually near real-time backups with eventual consistency, so that they can utilized as a live backup right away in case of an outage, and can be used to validate other copies of your data that exist to make sure no errors exists or data hasn’t been altered between servers.

Currently to store something on Filecoin, you have to select specific Filecoin Storage Providers to store your files for you. If they’re sound, they will have 2 datacenter locations where they store the primary and backup copies of your files. In order to further decentralize your data hosting you should probably select more than one provider. Also your files are never actually stored on the blockchain, that would be far too expensive and time consuming. The blockchain simply acts as an independently verifiable sort of cryptographic receipt that your files are being stored, and only based on the submission and validation of this proof, funds are then released to the provider. This is a much more consumer-friendly system than what we’re used to in Web 2. It provides a level of trustlessness to the market - trust the independent network of nodes, code and independently validated cryptographic proofs, but never trust the provider without this proof. 

Today there are still many aspects of this model, which are required for it to function, that are not in their current form very decentralized yet. Add in certain countries like China dominating the mining space and it’s easy to see that we are still in very early days in terms of the dream of decentralization, and therefore, of Web3, however it is clear that if we do ever wish to embed trustless markets based on independently verified facts into our societies, especially for digital services such as renting file storage, compute and AI processing, which allow for far more transparent, safe and fair marketplaces to exist, that we must continue on this path of decentralized and distributed infrastructure with Web3, or expect not much to improve.

Conclusion - The Inversion of Web2

I hope this overview has given you a broader sense of the vision of Web3, why I believe the inversion of the Web is already in motion, and what to expect in the coming years as related to how the internet, and thus the foundations of our societies is evolving. It’s clear we’re in for a very challenging but exciting decade. As a Web3 architect, I can’t think of a better time to be building on the Web than right now. I hope at the very least, more people will start to ask questions like, Where is my data stored? And  Who owns my private keys? But most of all, I hope this inspires you to do your own independent research and validate these claims for yourself. 

As Bitcoin-maxies like to say Not your private keys, not your Bitcoin. I think the time has come this be expanded to: Not your keys, not your … anything.