Blog

  • avro-sql

    Build Status GitHub license

    Avro-Sql

    This is a library allowing to transform the shape of an Avro record using SQL. It relies on Apache Calcite for the SQL parsing.

    import AvroSql._
    val record: GenericRecord = {...}
    record.scql("SELECT name, address.street.name as streetName")

    As simple as that!

    Let’s say we have the following Avro Schema:

    {
      "type": "record",
      "name": "Pizza",
      "namespace": "com.landoop.sql.avro",
      "fields": [
        {
          "name": "ingredients",
          "type": {
            "type": "array",
            "items": {
              "type": "record",
              "name": "Ingredient",
              "fields": [
                {
                  "name": "name",
                  "type": "string"
                },
                {
                  "name": "sugar",
                  "type": "double"
                },
                {
                  "name": "fat",
                  "type": "double"
                }
              ]
            }
          }
        },
        {
          "name": "vegetarian",
          "type": "boolean"
        },
        {
          "name": "vegan",
          "type": "boolean"
        },
        {
          "name": "calories",
          "type": "int"
        },
        {
          "name": "fieldName",
          "type": "string"
        }
      ]
    }

    using the library one can apply to types of queries:

    • to flatten it
    • to retain the structure while cherry-picking and/or rename fields The difference between the two is marked by the withstructure* keyword. If this is missing you will end up flattening the structure.

    Let’s take a look at the flatten first. There are cases when you are receiving a nested avro structure and you want to flatten the structure while being able to cherry pick the fields and rename them. Imagine we have the following Avro schema:

    {
      "type": "record",
      "name": "Person",
      "namespace": "com.landoop.sql.avro",
      "fields": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "address",
          "type": {
            "type": "record",
            "name": "Address",
            "fields": [
              {
                "name": "street",
                "type": {
                  "type": "record",
                  "name": "Street",
                  "fields": [
                    {
                      "name": "name",
                      "type": "string"
                    }
                  ]
                }
              },
              {
                "name": "street2",
                "type": [
                  "null",
                  "Street"
                ]
              },
              {
                "name": "city",
                "type": "string"
              },
              {
                "name": "state",
                "type": "string"
              },
              {
                "name": "zip",
                "type": "string"
              },
              {
                "name": "country",
                "type": "string"
              }
            ]
          }
        }
      ]
    }
    

    Applying this SQL like syntax

    SELECT 
        name, 
        address.street.*, 
        address.street2.name as streetName2 
    FROM topic
    

    the projected new schema is:

    {
      "type": "record",
      "name": "Person",
      "namespace": "com.landoop.sql.avro",
      "fields": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "name_1",
          "type": "string"
        },
        {
          "name": "streetName2",
          "type": "string"
        }
      ]
    }
    

    There are scenarios where you might want to rename fields and maybe reorder them. By applying this SQL like syntax on the Pizza schema

    SELECT 
           name, 
           ingredients.name as fieldName, 
           ingredients.sugar as fieldSugar, 
           ingredients.*, 
           calories as cals 
    withstructure
    

    we end up projecting the first structure into this one:

    {
      "type": "record",
      "name": "Pizza",
      "namespace": "com.landoop.sql.avro",
      "fields": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "ingredients",
          "type": {
            "type": "array",
            "items": {
              "type": "record",
              "name": "Ingredient",
              "fields": [
                {
                  "name": "fieldName",
                  "type": "string"
                },
                {
                  "name": "fieldSugar",
                  "type": "double"
                },
                {
                  "name": "fat",
                  "type": "double"
                }
              ]
            }
          }
        },
        {
          "name": "cals",
          "type": "int"
        }
      ]
    }

    Flatten rules

    • you can’t flatten a schema containing array fields
    • when flattening and the column name has already been used it will get a index appended. For example if field name appears twice and you don’t specifically rename the second instance (name as renamedName) the new schema will end up containing: name and name_1

    How to use it

    import AvroSql._
    val record: GenericRecord = {...}
    record.scql("SELECT name, address.street.name as streetName")

    As simple as that!

    Query Examples

    You can find more examples in the unit tests, however here are a few used:

    • flattening
    //rename and only pick fields on first level
    SELECT calories as C ,vegan as V ,name as fieldName FROM topic
    
    //Cherry pick fields on different levels in the structure
    SELECT name, address.street.name as streetName FROM topic
    
    //Select and rename fields on nested level
    SELECT name, address.street.*, address.street2.name as streetName2 FROM topic
    
    • retaining the structure
    //you can select itself - obviousely no real gain on this
    SELECT * FROM topic withstructure 
    
    //rename a field 
    SELECT *, name as fieldName FROM topic withstructure
    
    //rename a complex field
    SELECT *, ingredients as stuff FROM topic withstructure
    
    //select a single field
    SELECT vegan FROM topic withstructure
    
    //rename and only select nested fields
    SELECT ingredients.name as fieldName, ingredients.sugar as fieldSugar, ingredients.* FROM topic withstructure
    
    
    

    Release Notes

    0.1 (2017-05-03)

    • first release

    Building

    Requires gradle 3.4.1 to build.

    To build

    gradle compile

    To test

    gradle test

    You can also use the gradle wrapper

    ./gradlew build
    

    To view dependency trees

    gradle dependencies # 
    
    Visit original content creator repository https://github.com/lensesio/avro-sql
  • avro-sql

    Build Status GitHub license

    Avro-Sql

    This is a library allowing to transform the shape of an Avro record using SQL. It relies on Apache Calcite for the SQL parsing.

    import AvroSql._
    val record: GenericRecord = {...}
    record.scql("SELECT name, address.street.name as streetName")

    As simple as that!

    Let’s say we have the following Avro Schema:

    {
      "type": "record",
      "name": "Pizza",
      "namespace": "com.landoop.sql.avro",
      "fields": [
        {
          "name": "ingredients",
          "type": {
            "type": "array",
            "items": {
              "type": "record",
              "name": "Ingredient",
              "fields": [
                {
                  "name": "name",
                  "type": "string"
                },
                {
                  "name": "sugar",
                  "type": "double"
                },
                {
                  "name": "fat",
                  "type": "double"
                }
              ]
            }
          }
        },
        {
          "name": "vegetarian",
          "type": "boolean"
        },
        {
          "name": "vegan",
          "type": "boolean"
        },
        {
          "name": "calories",
          "type": "int"
        },
        {
          "name": "fieldName",
          "type": "string"
        }
      ]
    }

    using the library one can apply to types of queries:

    • to flatten it
    • to retain the structure while cherry-picking and/or rename fields The difference between the two is marked by the withstructure* keyword. If this is missing you will end up flattening the structure.

    Let’s take a look at the flatten first. There are cases when you are receiving a nested avro structure and you want to flatten the structure while being able to cherry pick the fields and rename them. Imagine we have the following Avro schema:

    {
      "type": "record",
      "name": "Person",
      "namespace": "com.landoop.sql.avro",
      "fields": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "address",
          "type": {
            "type": "record",
            "name": "Address",
            "fields": [
              {
                "name": "street",
                "type": {
                  "type": "record",
                  "name": "Street",
                  "fields": [
                    {
                      "name": "name",
                      "type": "string"
                    }
                  ]
                }
              },
              {
                "name": "street2",
                "type": [
                  "null",
                  "Street"
                ]
              },
              {
                "name": "city",
                "type": "string"
              },
              {
                "name": "state",
                "type": "string"
              },
              {
                "name": "zip",
                "type": "string"
              },
              {
                "name": "country",
                "type": "string"
              }
            ]
          }
        }
      ]
    }
    

    Applying this SQL like syntax

    SELECT 
        name, 
        address.street.*, 
        address.street2.name as streetName2 
    FROM topic
    

    the projected new schema is:

    {
      "type": "record",
      "name": "Person",
      "namespace": "com.landoop.sql.avro",
      "fields": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "name_1",
          "type": "string"
        },
        {
          "name": "streetName2",
          "type": "string"
        }
      ]
    }
    

    There are scenarios where you might want to rename fields and maybe reorder them. By applying this SQL like syntax on the Pizza schema

    SELECT 
           name, 
           ingredients.name as fieldName, 
           ingredients.sugar as fieldSugar, 
           ingredients.*, 
           calories as cals 
    withstructure
    

    we end up projecting the first structure into this one:

    {
      "type": "record",
      "name": "Pizza",
      "namespace": "com.landoop.sql.avro",
      "fields": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "ingredients",
          "type": {
            "type": "array",
            "items": {
              "type": "record",
              "name": "Ingredient",
              "fields": [
                {
                  "name": "fieldName",
                  "type": "string"
                },
                {
                  "name": "fieldSugar",
                  "type": "double"
                },
                {
                  "name": "fat",
                  "type": "double"
                }
              ]
            }
          }
        },
        {
          "name": "cals",
          "type": "int"
        }
      ]
    }

    Flatten rules

    • you can’t flatten a schema containing array fields
    • when flattening and the column name has already been used it will get a index appended. For example if field name appears twice and you don’t specifically rename the second instance (name as renamedName) the new schema will end up containing: name and name_1

    How to use it

    import AvroSql._
    val record: GenericRecord = {...}
    record.scql("SELECT name, address.street.name as streetName")

    As simple as that!

    Query Examples

    You can find more examples in the unit tests, however here are a few used:

    • flattening
    //rename and only pick fields on first level
    SELECT calories as C ,vegan as V ,name as fieldName FROM topic
    
    //Cherry pick fields on different levels in the structure
    SELECT name, address.street.name as streetName FROM topic
    
    //Select and rename fields on nested level
    SELECT name, address.street.*, address.street2.name as streetName2 FROM topic
    
    • retaining the structure
    //you can select itself - obviousely no real gain on this
    SELECT * FROM topic withstructure 
    
    //rename a field 
    SELECT *, name as fieldName FROM topic withstructure
    
    //rename a complex field
    SELECT *, ingredients as stuff FROM topic withstructure
    
    //select a single field
    SELECT vegan FROM topic withstructure
    
    //rename and only select nested fields
    SELECT ingredients.name as fieldName, ingredients.sugar as fieldSugar, ingredients.* FROM topic withstructure
    
    
    

    Release Notes

    0.1 (2017-05-03)

    • first release

    Building

    Requires gradle 3.4.1 to build.

    To build

    gradle compile

    To test

    gradle test

    You can also use the gradle wrapper

    ./gradlew build
    

    To view dependency trees

    gradle dependencies # 
    
    Visit original content creator repository https://github.com/lensesio/avro-sql
  • mbit-m08-dc02-nlp

    EJERCICIO NLP (NATURAL LANGUAGE PROCESSING)

    Carlos Alfonsel (carlos.alfonsel@mbitschool.com)

    1. Análisis Exploratorio del Dataset (EDA)

    • Importación de Librerías y Conjunto de Datos.
    • Estudio y representación gráfica de las 8 clases: análisis del balanceo de clases.

    2. Limpieza del Texto

    Se programa la función clean_text() que elimina los números y los signos de puntuación, y convierte todas las palabras a minúsculas:


    pattern = re.compile(‘[{}]’.format(re.escape(string.punctuation)))

    def clean_text(doc):
    doc = re.sub(r’\d+’, ”, doc)
    tokens = nlp(doc)
    tokens = [tok.lower_ for tok in tokens if not tok.is_punct and not tok.is_space]
    filtered_tokens = [pattern.sub(”, token) for token in tokens]
    filtered_text = ‘ ‘.join(filtered_tokens)
    return filtered_text


    3. Definición de Funciones Auxiliares

    Se definen las funciones bow_extractor() y tfidf_extractor para calcular el corpus del texto que se pasa como parámetro:


    def bow_extractor(corpus, ngram_range = (1,1), min_df = 1, max_df = 1.0):
    vectorizer = CountVectorizer(min_df = 1, max_df = 0.95)
    features = vectorizer.fit_transform(corpus)
    return vectorizer, features

    def tfidf_extractor(corpus, ngram_range = (1,1), min_df = 1, max_df = 1.0):
    vectorizer = TfidfVectorizer(min_df = 1, max_df = 0.95)
    features = vectorizer.fit_transform(corpus)
    return vectorizer, features


    4. División del Dataset para Entrenamiento y Validación

    X_train, X_test, y_train, y_test = train_test_split(datos[‘Observaciones’], datos[‘Tipología’], test_size = 0.3, random_state = 0)

    5. Algoritmos de Clasificación

    En este apartado aplicamos los siguientes modelos a nuestros datos: Logistic Regression, Multinomial Naive-Bayes y Linear SVM, con los siguientes resultados en términos de precisión (accuracy):

    Usando características BoW (Bag-of-Words):
    LGR: 0.61
    MNB: 0.58
    SVM: 0.56
    Usando características TF-IDF:

    LGR: 0.55
    MNB: 0.47
    SVM: 0.64

    Optimizando el Modelo Linear SVM con características TF-IDF conseguimos un 0.70 de accuracy.

    6. MEJORAS DE LOS CLASIFICADORES

    En este apartado se plantean varias alternativas para ver si se mejoran los resultados del clasificador:

    6.1. LEMMATIZADO

    Se define la función lemmatize_text() para extraer las raíces de las palabras:


    def lemmatize_text(text):
    tokens = nlp(text)
    lemmatized_tokens = [tok.lemma_ for tok in tokens]
    lemmatized_text = ‘ ‘.join(lemmatized_tokens)

    return lemmatized_text
    

    6.2. NUEVOS CLASIFICADORES

    Definimos tres nuevos clasificadores: Árboles de Decisión, Random Forest y K-Nearest Neighbors, con estos resultados, una vez realizado el lemmatizado del texto:

    Usando características BoW (Bag-of-Words) y lemmatizado:
    CART: 0.58
    RF : 0.67
    KNN : 0.39

    Usando características TF-IDF y lemmatizado:
    CART: 0.56
    RF : 0.64
    KNN : 0.61

    Optimizando el Modelo Decision Tree Classifier (CART) con características TF-IDF conseguimos un 0.65 de accuracy.

    6.3. REDUCCIÓN DE DIMENSIONALIDAD LSA (Latent Semantic Analysis)

    Por último, vamos a probar con una de las técnicas de reducción de dimensionalidad, y analizamos los resultados. Definimos la función lsa_extractor, que genera un modelo Latent Semantic Analysis sobre un corpus de texto y utilizando 100 dimensiones:


    def lsa_extractor(corpus, n_dim = 100):
    tfidf = TfidfVectorizer(use_idf = True)
    svd = TruncatedSVD(n_dim)
    vectorizer = make_pipeline(tfidf, svd, Normalizer(copy = False))
    features = vectorizer.fit_transform(corpus)
    return vectorizer, features


    A continuación, aplicamos los siguientes modelos sobre nuestros datos lemmatizados y habiendo aplicado al texto una reducción LSA de 100 dimensiones: Logistic Regression, Random Forest, K-Nearest Neighbors y Linear SVM.

    Usando características TF-IDF, lemmatizado y reducción de dimensionalidad LSA-100:
    LGR: 0.68
    RF : 0.55
    KNN: 0.61
    SVM: 0.64

    6.4. MODELO CON WORD EMBEDDINGS

    Para finalizar este apartado de mejoras, se aplica un modelo con Word Embeddings promediados sobre los siguientes clasificadores:

    LGR : 0.45
    CART: 0.24
    RF : 0.30
    KNN : 0.30
    SVM : 0.39

    CONCLUSIONES:

    • EL LEMMATIZADO DE LA VARIABLE TARGET MEJORA LOS RESULTADOS.
    • APLICAR UNA REDUCCIÓN DE DIMENSIONALIDAD LSA (Latent Semantic Analysis) MEJORA SIGNIFICATIVAMENTE LOS RESULTADOS.
    • LOS MODELOS CON WORD EMBEDDING PROMEDIADO FUNCIONAN PEOR QUE LOS MODELOS MÁS SIMPLES (BoW, TF-IDF) DEBIDO A QUE NUESTRO CONJUNTO DE DATOS ES MUY PEQUEÑO.
    • MEJOR ALGORITMO ENCONTRADO: MODELO DE REGRESIÓN LOGÍSTICA, CON CARACTERÍSTICAS TF-IDF, CON DATASET LEMMATIZADO Y REDUCCIÓN LSA DE 100 DIMENSIONES.

    Visit original content creator repository
    https://github.com/lesnofla/mbit-m08-dc02-nlp

  • Lumen

    Lumen

    Lumen Overview

    A modern, intuitive color accessibility checker built with Next.js

    MIT License Next.js TypeScript Tailwind CSS

    🌟 Overview

    Lumen is a comprehensive web application designed to help designers and developers ensure their color choices meet accessibility standards. Built with modern web technologies, it provides real-time contrast ratio calculations based on WCAG (Web Content Accessibility Guidelines) standards, making web accessibility testing simple and intuitive.

    🎯 Why Lumen?

    • Instant Feedback: Real-time contrast ratio calculations as you adjust colors
    • WCAG Compliant: Comprehensive testing for AA and AAA accessibility standards
    • Modern UI: Beautiful, responsive interface with dark/light mode support
    • Developer Friendly: Clean, accessible design with smooth animations
    • Multiple Input Methods: Hex codes, visual color picker, and direct input support

    ✨ Features

    🎨 Color Input & Selection

    • Hex Code Input: Direct hex value entry with validation
    • Interactive Color Picker: Visual color selection using react-colorful
    • Real-time Updates: Instant preview as you modify colors
    • Color Validation: Automatic fallback for invalid color values

    📊 WCAG Compliance Testing

    • Normal Text Standards: AA (4.5:1) and AAA (7:1) compliance checking
    • Large Text Standards: AA (3:1) and AAA (4.5:1) compliance checking
    • UI Components: AA (3:1) compliance for interface elements
    • Visual Indicators: Clear pass/fail status with intuitive icons

    👁️ Live Preview

    • Real-time Rendering: See exactly how your colors will look together
    • Sample Content: Test with headings, paragraphs, and buttons
    • Interactive Elements: Preview buttons and UI components
    • Color Information: Display current hex values for reference

    🎯 Accessibility Features

    • Keyboard Navigation: Full keyboard accessibility support
    • Screen Reader Friendly: Proper ARIA labels and semantic HTML
    • Color Contrast: The app itself meets WCAG AA standards
    • Responsive Design: Mobile-first approach with optimized layouts

    🌙 Theme Support

    • Dark/Light Mode: Automatic theme detection with manual toggle
    • Smooth Transitions: Elegant theme switching animations

    🚀 Getting Started

    Prerequisites

    • Node.js 18.0 or later
    • pnpm (recommended) or npm/yarn

    Installation

    1. Clone the repository

      git clone https://github.com/ahmadrafidev/lumen.git
      cd lumen
    2. Install dependencies

      pnpm install
    3. Start the development server

      pnpm dev
    4. Open your browser Navigate to http://localhost:3000

    Build for Production

    # Build the application
    pnpm build
    
    # Start the production server
    pnpm start

    🛠️ Tech Stack

    Core Framework

    UI & Styling

    Color & Accessibility

    • react-colorful – Lightweight color picker
    • Custom WCAG utilities – Precise contrast ratio calculations
    • Relative luminance calculations – Following WCAG 2.1 standards

    Development Tools

    • ESLint – Code linting and formatting
    • PostCSS – CSS processing
    • pnpm – Fast, disk space efficient package manager

    📁 Project Structure

    lumen/
    ├── app/                    # Next.js App Router
    │   ├── page.tsx           # Main application page
    │   ├── layout.tsx         # Root layout with providers
    │   ├── globals.css        # Global styles and CSS variables
    │   └── not-found.tsx      # 404 error page
    ├── components/            # React components
    │   ├── ui/               # Reusable UI primitives
    │   ├── ForegroundCard/   # Foreground color input
    │   ├── BackgroundCard/   # Background color input
    │   ├── ContrastRatioCard/ # Contrast ratio display
    │   ├── LivePreviewCard/  # Real-time preview
    │   ├── PassCheckCard/    # WCAG compliance checker
    │   ├── Header/           # Application header
    │   ├── Footer/           # Application footer
    │   └── Home/             # Main homepage component
    ├── utils/                # Utility functions
    │   └── colorUtils.ts     # Color calculation utilities
    ├── lib/                  # Shared libraries
    ├── public/               # Static assets
    └── ...config files
    

    🔬 How It Works

    Contrast Ratio Calculation

    Lumen uses the official WCAG 2.1 formula for calculating contrast ratios:

    1. Relative Luminance: Calculate the relative luminance of each color
    2. Contrast Ratio: Apply the formula (L1 + 0.05) / (L2 + 0.05)
    3. WCAG Compliance: Compare against WCAG AA/AAA thresholds
    export const calculateContrastRatio = (foreground: string, background: string): number => {
      const lum1 = calculateLuminance(foreground);
      const lum2 = calculateLuminance(background);
      
      return lum1 > lum2
        ? (lum1 + 0.05) / (lum2 + 0.05)
        : (lum2 + 0.05) / (lum1 + 0.05);
    };

    WCAG Standards Implementation

    • AA Normal Text: 4.5:1 minimum contrast ratio
    • AA Large Text: 3:1 minimum contrast ratio
    • AAA Normal Text: 7:1 minimum contrast ratio
    • AAA Large Text: 4.5:1 minimum contrast ratio
    • UI Components: 3:1 minimum contrast ratio

    🤝 Contributing

    We welcome contributions! Here’s how you can help:

    1. Fork the repository
    2. Create a feature branch: git checkout -b feature/amazing-feature
    3. Commit your changes: git commit -m 'Add amazing feature'
    4. Push to the branch: git push origin feature/amazing-feature
    5. Open a Pull Request

    Development Guidelines

    • Follow TypeScript best practices
    • Use conventional commit messages
    • Ensure accessibility standards are met
    • Add tests for new features
    • Update documentation as needed

    📚 Resources & References

    WCAG Guidelines

    Color Theory & Accessibility

    Technical Documentation

    📄 License

    This project is licensed under the MIT License – see the LICENSE file for details.

    👨‍💻 Author

    Ahmad Rafi Wirana (@rafiwiranaa)


    Made with ❤️ for a more accessible web
    Visit original content creator repository https://github.com/ahmadrafidev/lumen