⭐ Hightlights
-
Non transaccional database. It is an analitycal database.
-
Distributed database (support multiple machines with a lot of amount of data).
-
Column oriented database.
Use cases
- Real-time analytics.
- Logs, events, and traces.
ANSI SQL vs. ClickHouse SQL
- Syntax is nearly identical (ClickHouse provides addicional data types and many special functions).
- Optimice queries for ClickHouse (only SELECT the columns you need and take advantage of the columnar storage).
Row vs Column Oriented Databases
🔗 Row vs Column Oriented Databases
Understanding columnar storage


Read from parquet file
ClickHouse can read from parquet files.
SELECT count()
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/pypi/2023/pypi_0_7_34.snappy.parquet');
MergeTree
Insert operations create table parts which are merged by a background process with other table parts. This system allows ClickHouse to efficiently handle large amounts insertions of data.

Query for inspecting active parts of a table:
select * from system.parts
where active=1
and table = 'uk_price_paid';

SharedMergeTree engine is only available on the cloud. To find it on the docs you should go to 'Cloud' section. SharedMergeTree docs.
Data Types
System table system.data_type_families provides an overview of all available data types.
LowCardinality(T)
Changes the data type of a column to a dictionary of values. It works with String, FixedString, Date, DateTime, and numbers excepting Decimal.
The efficiency of using LowCardinality data type depends on data diversity. If a dictionary contains less than 10,000 distinct values
Primary Keys


Choose the columns that are most likely to be used in queries for a primary key. The first column in the primary key is the most important, so the order of the columns is important. Put the column with the lowest cardinality first (don't worry about low cardinality and date. Date is a great thing to put in a primary key).
Choose message as a primary key it's a bad idea. Arbitrary strings are terrible inside a primary key and you probably never filter by a long string:


There's no concept of a key, like a foreign key. It's not a relational database.
You're not going to do a bunch of joins. You have to flatten the data when you add it to ClickHouse.
You can do joins in ClickHouse, but that's not the typical thing to do every day.
🔗 Course. ClickHouse Developer On-demand: Module 9. Joining data
Interting data

Move from self-managed to ClickHouse Cloud
Use remoteSecure function.
