Github selects Lithium to run its community forums, Discourse dev is none too pleased

Digitalpoint

Brain
Joined
Aug 29, 2012
Messages
309
For people who are stuck in traditional databases (MySQL, PostgreSQL, SQL Server; etc) and think you have to have one beefy "master" database server, you should do some reading on MySQL Cluster (it's not MySQL... At all).



It's designed to do 10s of millions of queries per second (including writes), with no single point of failure. You literally could unplug servers with no downtime. Upgrades to server software is no downtime. There is no master/slave replication (no replication lag) because everything is fully write capable (in my setup, the db server IP is just "localhost" for both reads and writes. Doesn't matter the physical server.

Your queries per second bottleneck will simply be the bandwidth you have available between your servers. In my case, my servers are connected with 56Gbit InfiniBand, which allows me setup to do somewhere around 40M queries per second.

Again, MySQL Cluster is *not* MySQL in the sense that it doesn't use storage engines that were never designed to scale well (MyISAM, InnoDB, etc.) It's used for things like call logging in phone companies where you have a ton of data coming down and a single second of downtime or lag isn't an option. Or in my case, I use it for everything. Lol
 

we_are_borg

Tazmanian
Joined
Jan 25, 2011
Messages
5,964
Thank you fo the information Digitalpoint.

I have another question for you can you tell us how you do your load balacing? What changed after the interview. I ask this because of another topic where you where tagged.
 

Digitalpoint

Brain
Joined
Aug 29, 2012
Messages
309
Thank you fo the information Digitalpoint.

I have another question for you can you tell us how you do your load balacing? What changed after the interview. I ask this because of another topic where you where tagged.
Load balancing of what?

Databases queries and ElasticSearch IPs are simply "localhost". All nodes can do everything. Most HTTP requests are PHP, which are handled by PHP-FPM. Nginx handles load balancing of php-fpm via it's upstream directive (see this). From nginx.conf:

Code:
        upstream php_cluster {
                server 192.168.10.20:9000       max_fails=3;
                server 192.168.10.21:9000       max_fails=3;
                server 192.168.10.22:9000       max_fails=3;
                server 192.168.10.23:9000       max_fails=3;
                server 192.168.10.24:9000       max_fails=3;
                server 192.168.10.25:9000       max_fails=3;
                server 192.168.10.26:9000       max_fails=3;
                server 192.168.10.27:9000       max_fails=3;
        }

fastcgi_pass then uses php_cluster as it's "address" when you define websites within Nginx.

All servers are running Nginx, although only one is actively serving requests at any point. There's a process that checks the health of the active web server and it's ability to properly serve requests. If that process deems the web server not working or too slow, it will change the active web server via a Cloudflare API call to change the web server requests are routed to (it's a Cloudflare routing entry, not a DNS entry, so TTL isn't an issue... the IP still resolves to Cloudflare). Although it's never had to kick in to roll over to a different web server unexpectedly. But that doesn't mean it keeps checking constantly. :)

I also do some trickery with systemd... specifically, I have an ExecStop directive for the Nginx process that calls my own script if the web server service is being shut down. It basically hits Cloudflare API to see if the server that is shutting down Nginx is currently the one taking requests. If it is, it will change the server handling requests before the service is shutdown. Long story short is if you accidentally shut down Nginx on the wrong server, it handles itself appropriately.

Code:
ExecStop=/home/bin/cloudflare.php omitself

Once upon a time, I had hardware load balancers, but all they really did was introduce a small amount of latency to the network chain. Now all servers can do everything and font-end web server "load balancing" is done via Cloudflare routing (and that web server spreads the underlying PHP requests around to whatever servers are up and available via the upstream directive).

Side note - I setup my own server health checks and handling of a failed server via Cloudflare API before Cloudflare offered their Load Balancing service. You could let Cloudflare handle the server health checks and failover, but then you have to pay for your DNS queries. Didn't see an upside other than added cost for me. https://support.cloudflare.com/hc/en-us/articles/115005254367-Billing-for-Load-Balancing (we have about 2.2M DNS queries per day which would result in $60/month charge for Cloudflare load balancing and failover setup... and they only check server health every 60 seconds, while I'm doing it myself every 30 seconds for free).
 
Last edited:

Digitalpoint

Brain
Joined
Aug 29, 2012
Messages
309
For those who want to get their feet wet with MySQL Cluster, you basically have 3 types of nodes... data nodes, SQL nodes and management nodes. Data nodes are where all your data is sharded/stored, management nodes manage all the other nodes and tell them how to interact with each other, SQL nodes are what you would traditionally use as MySQL. It's what the queries go through... but on the backend, it's going out to the data nodes and doing it's thing, rather than looking at local data.

Logging into the management node and running the "show" command will show you all the nodes that are online:

Code:
twin1:~ # ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]    8 node(s)
id=11    @192.168.10.20  (mysql-5.7.20 ndb-7.5.8, Nodegroup: 0)
id=12    @192.168.10.21  (mysql-5.7.20 ndb-7.5.8, Nodegroup: 1)
id=13    @192.168.10.22  (mysql-5.7.20 ndb-7.5.8, Nodegroup: 2)
id=14    @192.168.10.23  (mysql-5.7.20 ndb-7.5.8, Nodegroup: 3)
id=15    @192.168.10.24  (mysql-5.7.20 ndb-7.5.8, Nodegroup: 0, *)
id=16    @192.168.10.25  (mysql-5.7.20 ndb-7.5.8, Nodegroup: 1)
id=17    @192.168.10.26  (mysql-5.7.20 ndb-7.5.8, Nodegroup: 2)
id=18    @192.168.10.27  (mysql-5.7.20 ndb-7.5.8, Nodegroup: 3)

[ndb_mgmd(MGM)]    2 node(s)
id=1    @192.168.10.20  (mysql-5.7.20 ndb-7.5.8)
id=2    @192.168.10.21  (mysql-5.7.20 ndb-7.5.8)

[mysqld(API)]    33 node(s)
id=21    @192.168.10.20  (mysql-5.7.20 ndb-7.5.8)
id=22    @192.168.10.21  (mysql-5.7.20 ndb-7.5.8)
id=23    @192.168.10.22  (mysql-5.7.20 ndb-7.5.8)
id=24    @192.168.10.23  (mysql-5.7.20 ndb-7.5.8)
id=25    @192.168.10.24  (mysql-5.7.20 ndb-7.5.8)
id=26    @192.168.10.25  (mysql-5.7.20 ndb-7.5.8)
id=27    @192.168.10.26  (mysql-5.7.20 ndb-7.5.8)
id=28    @192.168.10.27  (mysql-5.7.20 ndb-7.5.8)
id=31    @192.168.10.20  (mysql-5.7.20 ndb-7.5.8)
id=32    @192.168.10.21  (mysql-5.7.20 ndb-7.5.8)
id=33    @192.168.10.22  (mysql-5.7.20 ndb-7.5.8)
id=34    @192.168.10.23  (mysql-5.7.20 ndb-7.5.8)
id=35    @192.168.10.24  (mysql-5.7.20 ndb-7.5.8)
id=36    @192.168.10.25  (mysql-5.7.20 ndb-7.5.8)
id=37    @192.168.10.26  (mysql-5.7.20 ndb-7.5.8)
id=38    @192.168.10.27  (mysql-5.7.20 ndb-7.5.8)
id=41    @192.168.10.20  (mysql-5.7.20 ndb-7.5.8)
id=42    @192.168.10.21  (mysql-5.7.20 ndb-7.5.8)
id=43    @192.168.10.22  (mysql-5.7.20 ndb-7.5.8)
id=44    @192.168.10.23  (mysql-5.7.20 ndb-7.5.8)
id=45    @192.168.10.24  (mysql-5.7.20 ndb-7.5.8)
id=46    @192.168.10.25  (mysql-5.7.20 ndb-7.5.8)
id=47    @192.168.10.26  (mysql-5.7.20 ndb-7.5.8)
id=48    @192.168.10.27  (mysql-5.7.20 ndb-7.5.8)
id=51    @192.168.10.20  (mysql-5.7.20 ndb-7.5.8)
id=52    @192.168.10.21  (mysql-5.7.20 ndb-7.5.8)
id=53    @192.168.10.22  (mysql-5.7.20 ndb-7.5.8)
id=54    @192.168.10.23  (mysql-5.7.20 ndb-7.5.8)
id=55    @192.168.10.24  (mysql-5.7.20 ndb-7.5.8)
id=56    @192.168.10.25  (mysql-5.7.20 ndb-7.5.8)
id=57    @192.168.10.26  (mysql-5.7.20 ndb-7.5.8)
id=58    @192.168.10.27  (mysql-5.7.20 ndb-7.5.8)
id=59 (not connected, accepting connect from any host)

you can get reports on things like memory usage for your data nodes (important since by default everything is stored in-memory... can't do 40M queries per second if you are thrashing around reading stuff from disk, now can you? :))

Code:
ndb_mgm> all report mem
Node 11: Data usage is 44%(1464363 32K pages of total 3276800)
Node 11: Index usage is 39%(515696 8K pages of total 1310848)
Node 12: Data usage is 44%(1471889 32K pages of total 3276800)
Node 12: Index usage is 39%(516025 8K pages of total 1310848)
Node 13: Data usage is 45%(1478866 32K pages of total 3276800)
Node 13: Index usage is 39%(516457 8K pages of total 1310848)
Node 14: Data usage is 45%(1477532 32K pages of total 3276800)
Node 14: Index usage is 39%(516417 8K pages of total 1310848)
Node 15: Data usage is 45%(1477744 32K pages of total 3276800)
Node 15: Index usage is 39%(516148 8K pages of total 1310848)
Node 16: Data usage is 44%(1471882 32K pages of total 3276800)
Node 16: Index usage is 39%(516029 8K pages of total 1310848)
Node 17: Data usage is 44%(1465363 32K pages of total 3276800)
Node 17: Index usage is 39%(515997 8K pages of total 1310848)
Node 18: Data usage is 45%(1477527 32K pages of total 3276800)
Node 18: Index usage is 39%(516424 8K pages of total 1310848)

The normal mysql command looks mostly the same... you just have an extra process that handles ndbcluster events always running...

Code:
twin1:~ # mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 66529224
Server version: 5.7.20-ndb-7.5.8-cluster-gpl-log MySQL Cluster Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show processlist;
+----------+-------------+-----------+------+---------+------+-----------------------------------+------------------+
| Id       | User        | Host      | db   | Command | Time | State                             | Info             |
+----------+-------------+-----------+------+---------+------+-----------------------------------+------------------+
|        1 | system user |           |      | Daemon  |    0 | Waiting for event from ndbcluster | NULL             |
| 66529224 | root        | localhost | NULL | Query   |    0 | starting                          | show processlist |
+----------+-------------+-----------+------+---------+------+-----------------------------------+------------------+
2 rows in set (0.00 sec)
 

eva2000

Habitué
Joined
Jan 11, 2004
Messages
1,830
Your queries per second bottleneck will simply be the bandwidth you have available between your servers. In my case, my servers are connected with 56Gbit InfiniBand, which allows me setup to do somewhere around 40M queries per second.
For high availability MySQL Cluster definitely worth it if you can do it and afford it especially at current high memory prices !. These days scaling MySQL has improved though Intel Optane SSD are crazy accelerators for such. Of course provided you have the network capacity/throughput for it to scale.
 

Karll

Adherent
Joined
Dec 9, 2011
Messages
452
How many writes per second does a forum software need?

MySQL Cluster, or to be more specific, MySQL Ndb Cluster (since there is now also a MySQL InnoDB Cluster), is great if you really need write scaling managed automatically by the database side.

However, there are a number of drawbacks/limitations with this solution (see e.g. the documentation), one of them being the larger number of nodes needed.

If your budget is more modest, and your requirements are high availability and read scaling, then you're probably better off with a Galera-based cluster (MariaDB Cluster, Percona XtraDB Cluster or Galera cluster), or the new MySQL InnoDB Cluster. If you need access to support, but can't afford the extortionate prices charged by Oracle (owner of MySQL), then again a Galera-based solution may be better.

Edit: A typical MariaDB Cluster requires 3 nodes for proper high availability.
 
Last edited:

Digitalpoint

Brain
Joined
Aug 29, 2012
Messages
309
Ya, it's probably overkill for your average forum. I run a lotnof non-forum stuff on the same database cluster.

As far as cost, one could certainly buy a support contract if they wanted (just like you can with the normal version of MySQL), but I've never done that myself. So the software cost is $0. You would need 2 nodes (ideally on different physical servers, but not an absolute requirement)... At least if you want data redundancy. You could run it with 1 data nodeC but that would be rather pointless... Better to use normal MySQL probably in that case.

By default all data and indexes are memory resident... That's where the cost will be if you don't own your own hardware, I suspect "renting" memory on a monthly basis can get expensive.

That being said, I suspect Github could afford 2 servers with lots of memory.
 

Karll

Adherent
Joined
Dec 9, 2011
Messages
452
Ya, it's probably overkill for your average forum. I run a lotnof non-forum stuff on the same database cluster.

As far as cost, one could certainly buy a support contract if they wanted (just like you can with the normal version of MySQL), but I've never done that myself. So the software cost is $0. You would need 2 nodes (ideally on different physical servers, but not an absolute requirement)... At least if you want data redundancy. You could run it with 1 data nodeC but that would be rather pointless... Better to use normal MySQL probably in that case.

By default all data and indexes are memory resident... That's where the cost will be if you don't own your own hardware, I suspect "renting" memory on a monthly basis can get expensive.

That being said, I suspect Github could afford 2 servers with lots of memory.
With only 2 nodes you don't get performance benefits (TPS, throughput) over Galera-based clusters, see e.g. the Pythian blog: benchmarking NDB vs Galera. This seems to indicate you need about 6 NDB nodes to see a benefit ...! (And if you want a support contract, that seems to be $39K per year for 6 nodes, which I'm sure wouldn't be a problem for Github, but might be harder to justify more generally.)

Let's face it, MySQL NDB Cluster was a great technology when it was first released, but at this point it seems like a dead-end. Why else would MySQL develop InnoDB Cluster when they already have NDB Cluster? It doesn't really make sense. And why are no recent big names using NDB Cluster? (See customer list.)
 

eva2000

Habitué
Joined
Jan 11, 2004
Messages
1,830

Karll

Adherent
Joined
Dec 9, 2011
Messages
452

eva2000

Habitué
Joined
Jan 11, 2004
Messages
1,830
Good point, these DB proxies are excellent products (I use MaxScale myself, though considering ProxySQL as well), and can help achieve high(er)-availability for both Galera-based clusters as well as master-slave replication configurations. I suppose it can also help with write scaling if combined with sharding.
Yeah ProxySQL has native clustering too now

ProxySQL Query Cache
MaxScale vs ProxySQL
 

Digitalpoint

Brain
Joined
Aug 29, 2012
Messages
309
With only 2 nodes you don't get performance benefits (TPS, throughput) over Galera-based clusters, see e.g. the Pythian blog: benchmarking NDB vs Galera. This seems to indicate you need about 6 NDB nodes to see a benefit ...! (And if you want a support contract, that seems to be $39K per year for 6 nodes, which I'm sure wouldn't be a problem for Github, but might be harder to justify more generally.)

Let's face it, MySQL NDB Cluster was a great technology when it was first released, but at this point it seems like a dead-end. Why else would MySQL develop InnoDB Cluster when they already have NDB Cluster? It doesn't really make sense. And why are no recent big names using NDB Cluster? (See customer list.)
Well a couple things... the benchmark you reference is from 2012. Which means they could have been using MySQL Cluster 7.1 at best. 7.2 was the first version that really was made for speed and every version got quite a bit faster after that.

7.2 was first to reach over a billion queries per minute: http://mikaelronstrom.blogspot.co.uk/2012/05/challenges-in-reaching-1bn-reads-and.html

7.3 gave it 8.5x more transactions per minute vs. 7.2 (which was already a ton faster than 7.1)

upload_2018-4-12_15-54-40.png

7.4 gave it a 50% performance boost over 7.3, etc...

So referencing some benchmark from more than half a decade ago isn't really that interesting. I'd be interested to see a similar thing with new versions of both Galera and MySQL Cluster though.

Not sure why you keep referencing a support contract for MySQL Cluster. It's open source, free software. If you want, you can pay anyone $100k for a support contract for anything if you have money to burn. I've never had a MySQL support contract, nor is it required for licensing or anything else. If you read, it's really not required unless you have a complete non-technical person managing your database servers, at which point you'd be better off taking the support contract cost for any DBMS and hiring someone qualified to run a database server. :)

As far as ndbcluster being a "dead end" because MySQL also develops InnoDB Cluster... well, that's just silly. You know that not all storage engines are the best for everyone, right? Saying one is better for everyone for everything would also imply InnoDB is a dead storage engine because MySQL also has MEMORY/HEAP storage engines.

MySQL is the first to say that no storage engine is best suited for all workloads. There's some good reading about InnoDB Cluster vs. NDB Cluster in the MySQL manual:

https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-compared.html
https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-ndb-innodb-engines.html
https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-ndb-innodb-workloads.html
https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-ndb-innodb-usage.html

They are two totally different storage engines and MySQL isn't dropping one for the other.

There's definitely some things that NDB does not do well from my own experience... full table scans of very large tables and NDB doesn't support FULLTEXT indexes. Neither are an issue for my applications though because I don't want to be doing full table scans of huge tables with *any* storage engine, and FULLTEXT indexes on any DBMS really ends up being patchwork... it works, but it's really not that great because DBMS systems were never really designed at the core to do those types of queries... they excel at pulling records based on a key. For full text document searching, one is always going to be better off using something designed for document searching (Sphinx, Elasticsearch, etc.)

And there's definitely things that NDB is always going to be stronger at vs. anything based on InnoDB (write heavy applications for example). There's a limit on how fast you can write data to disk for disk-based tables. So for something with a LOT of writing... say more than a million SQL UPDATES/INSERTS per second, any disk based database is going to fall over and die even if you have the fastest SSDs money can buy.

If you skim that MySQL developer's blog I linked to earlier (about the billion transactions per minute), you get the general feel that a lot of what he talks about is a pipe dream for disk based DBMS's like InnoDB.

Mikael Ronstrom said:
With a row size of 250 bytes there will be 50 billion rows in the database. I often assume that the checkpoint takes about 5 minutes in my modelling. This means that even with 1 million writes per second less than 1% of the data in the database is updated during a checkpoint.

A more normal update speed would be e.g. 100.000 writes per second. In this case each checkpoint will write 10 Mbytes of rows per second and thus about 6 GBytes is written in 5 minutes. This represents the Delta, the changed records.

I'd like to see InnoDB devs talking about any version of it handling 1M writes per second or even a "more normal" load of only 100,000 writes per second. :)

Either way... long story short is ndbcluster isn't going away... in fact the upcoming 7.6 version has a lot of stuff I'm really looking forward to. But I'll also be the first to admit it's not for everyone... not by a long-shot.
 

SAFAD

Developer
Joined
Aug 3, 2011
Messages
61
Digitalpoint Those are very informative stuff out there, Thank you!

However I would like to point out something many here are missing, TIME.
I am being paid quarter of what Github pays to lithium (monthly ofcourse) to build a "high performance, storage and code scalable" platform (NDA here sorry) in Golang + PostgreSQL (which I chose over all the other stuff out there) that most likely is and will be handling like at least 1/20 of the load that Github is facing, concurrently speaking!

So basically, all that stuff people call "easy" and there are efficient "free" solutions for, let's be honest here, people don't have time, and if they do, it will cost plenty to maintain that kind of beast (speaking of my own experience only, let alone bigger companies with WAY MORE EXPERIMENTED staff).

No one is crazy enough to be on a fixed salary would agree to such task, it's a monstrosity that literally made me rethink my career decisions and think maybe its better to just open a restaurant and be over with the headaches, that is basically what Github management thought, and I support them for that (You got money? spend it on something useful rather than keeping your developers busy doing something futile and wasting their time instead of building something innovative).
 

Karll

Adherent
Joined
Dec 9, 2011
Messages
452
Well a couple things... the benchmark you reference is from 2012. Which means they could have been using MySQL Cluster 7.1 at best. 7.2 was the first version that really was made for speed and every version got quite a bit faster after that.
7.2 was GA 15. Feb 2012, the Pythian blog entry was from 10. Dec 2012. If they were benchmarking against 7.1 that would seem quite unfair. Anyway, I only cited that benchmark because it was the only Galera vs NDB benchmark test I could find.

Not sure why you keep referencing a support contract for MySQL Cluster.
Because that is the way some (most?) companies operate. They want someone to blame (someone with money) if things go wrong. (That may not be useful in the Github forum case, though.)

I'd like to see InnoDB devs talking about any version of it handling 1M writes per second or even a "more normal" load of only 100,000 writes per second. :)
You could do 1M QPS with Galera back in 2015. Writes is another story, of course, but here's an article talking about using 15 m1.xlarge nodes (AWS cloud) with MySQL 5.5, InnoDB and the ParElastic Database Virtualization Engine to achieve just above 1M writes per second:
In a benchmark, a ParElastic system running in the Amazon Cloud was demonstrated ingesting in excess of a million rows of data a second. The details of this benchmark are provided below. The application simulated the loading of a continuous stream of data into a single virtual table exposed by the ParElastic Database Virtualization Engine. The MySQL servers were configured to store the data using the InnoDB storage engine.
Ingesting over 1,000,000 rows a second into MySQL in AWS Cloud (9. Jan. 2017)
 

Digitalpoint

Brain
Joined
Aug 29, 2012
Messages
309
Ya... the writing is always going to be the bottleneck with disk-based database systems. High throughout can always be achieved with expensive/complicated/exotic setups with most any database engine if you must do it. It’s just that MySQL Cluster was designed to do it simply on cheap commodity hardware. My setup can do 40M SQL reads per second *while* also doing 6M SQL writes per second. On an 8 node setup (but really could be done with 4 since 4 are just for data duplication/redundancy. Servers are 5 years old and don’t even have SSDs. It also has the advantage of not needing to do anything at the application level to do stuff like query routing or failover recovery. Things like XenForo, WordPress and other things inherently get full advantage of all the good stuff without any changes (sub-second failover so end users don’t notice a node failure if there was one, don’t need to figure out what server to send writes to, no replication lag, full ACID compliance, hot backups that don’t lock any tables, ability to upgrade server software without taking anything down, etc.)

Like I said previously, not every setup is going to be ideal for every type of application. MySQL Cluster sucks at full table scans (unindexed searches) for example.

However, it can also be really, really, really good and easy.
 

R0binHood

Habitué
Joined
Nov 23, 2011
Messages
1,606
Just got a newsletter from the Github community forum.

Looks like they're on Lithium now, actually looks pretty good. It's at least styled better that some of the other Lithium sites I've seen.

https://github.community/
 

Alex.

The Ancient Dragon
Joined
Jul 1, 2007
Messages
11,568
Thanks for bumping. No, truly, thank you. I suspect at the time of their switching over Microsoft had already sealed the deal on acquiring them. Whatever the monthly fee, it's nothing for Microsoft.
 

R0binHood

Habitué
Joined
Nov 23, 2011
Messages
1,606
I don't remember actually seeing in on Lithium when we talked about it last year. Maybe it was, I can't remember.

All I know is I got a newsletter email today from them and I thought it looked rather good compared to any other Lithium site I've ever visited.
 
Top