SciForce Blog

Read our blog and carry on

Learn who we are and why we stand out among the others.

iconData Analytics
iconBig Data
iconPredictive analytics
iconData Science
iconMachine Learning
iconComputer Vision
iconNatural Language Processing
iconFollow Sciforce on Medium
OHDSI Global Symposium 2022
WHAT IS GPT-3, HOW DOES IT WORK, AND WHAT DOES IT ACTUALLY DO?What is GPT-3, How Does It Work, and What Does It Actually Do?

GitHub and OpenAI presented a new code-generating tool, Copilot, that is now a part of Visual Studio Code that is autocompleting code snippets. Copilot is based on Codex that is a product of GPT-3, presented a year ago. It seems like the hype around GPT-3 still is not going to evaporate, and we decided to delve into details step-by-step. Check it out. GPT-3 stands for Generative Pre-trained Transformer 3, and it is the third version of the language model that Open AI released in May 2020. It is generative, as GPT-3 can generate long sentences of the unique text as the output. Notice that most neural networks are capable only of spitting out yes or no answers or simple sentences. Pre-trained means that the language model has not been built with any special domain knowledge, but it can complete domain-specific tasks like translation. Thus, GPT-3 is the most innovative language model that has ever existed. Ok, but what is Transformer, then? Simply put, it is the neural network’s architecture developed by Google’s scientists in 2017, and it uses a self-attention mechanism that is a good fit for language understanding. Given that the attention mechanism enabled a breakthrough in the NLP domain in 2015, Transformer became a ground for GPT-1 and Google’s BERT, another great language model. In essence, attention is a function that calculates the probability of the next word appearing, surrounded by the other ones. By the way, we have developed an explainer for BERT. Wait, but what makes GPT-3 so unique? GPT-3 language model has 175 billion parameters, i.e., values that a neural network is optimizing during the training (compare with 1,5 billion parameters of GPT-2). Thus, this language model has excellent potential for automatization across various industries — from customer service to documentation generation. You could play around with the beta of GPT-3 Playground by yourself. How can I use GPT-3 for my applications? As of July 2021, you can join the waitlist since the company can offer a private beta version of its API under the LmaS basis (language-model-as-a-service). Here are the examples that you might have already heard of — GPT-3 is writing stunning fiction. Gwern, author of the who is experimenting both with GPT-2 and GPT-3, states that “GPT-3, however, is not merely a quantitative tweak yielding “GPT-2 but better” — it is qualitatively different.” The beauty of GPT-3 for text generation is that you need to train anything in a usual way. Instead, it would be best to write the prompts for GPT-3 to teach it anything you want. Sharif Shameem used GPT-3 for debuild, a platform that generates code as per request. You could type the request like “create a watermelon-style button” and grab your code to use for an app. You could even use GPT-3 to generate substantial business guidelines, as @zebulgar did. Let us look under the hood and define the nuts and bolts of GPT-3. Larger models are learning efficiently from in-context information To put it bluntly, GPT-3 calculates how likely some word can appear in the text given the other one in this text. It is known as the conditional probability of words. For example, the word chair in the sentences: “Margaret is arranging a garage sale... Maybe we could buy that old ___ “ is much more likely to appear than, let us say, an elephant. That means the probability of a word chair occurring in the prompted text is higher than the probability of an elephant. GPT-3 uses some form of data compression while consuming millions of sample texts to convert the words into vectors, i.e., numeric representations. Later, the language model is unpacking the compressed text in human-friendly sentences. Thus, compressing and decompressing text develops the model’s accuracy while calculating the conditional probability of words. Dataset used to train GPT-3 Since GPT-3 is high-performing in the “few-shot” settings, it can respond in a way consistent with a given example piece of text that has never been exposed before. Thus, it only needs a few examples to produce a relevant response, as it has already been trained on lots of text samples. Check out the research paper for more technical details: Language Models are Few-Shot Learners. The few-shot model needs only a few examples to produce a relevant response, as it has already been trained on lots of text samples. The scheme illustrates the mechanics of English to French translation. After the training, when the language model’s conditional probability as accurate as possible, it can predict the next word while given an input word, sentence, or a fragment as a prompt. Speaking formally, prediction of the next word relates to the natural language inference. In essence, GPT-3 is a text predictor — its output is a statistically plausible response to the given input, grounded on the data it was trained before. However, some critiques arguing that GPT-3 is not the best AI system for question answering and text summarizing. GPT-3 is mediocre compared to the SOTA (state-of-the-art) methods per each NLP task separately, but it is much more general than any previous system, and the upcoming ones will be resembling GPT-3. In general, GPT-3 can perform NLP tasks after a few prompts are given. It demonstrated high performance under the few-shot settings in the following tasks: GPT-3 demonstrated a perplexity of 20,5 (defines how well a probability language model predicts a sample) under the zero-shot circumstances on the Penn Tree Bank (PTB). The closest rival, BERT-Large-CAS, boasts of 31,3. GPT-3 is a leader in Language Modelling on Penn Tree Bank with a perplexity of 20.5 GPT-3 also demonstrates 86,4% accuracy (an 18% increase from previous SOTA models) in the few-shot settings while performing the LAMBADA dataset test. For this test, the model predicts the last word in the sentence, requiring “reading” of the whole paragraph. Important notice: GPT-3 demonstrated these results thanks to the fill-in-the-blank examples like: George bought some baseball equipment, a ball, a glove, and a_____. →” Moreover, researchers report about 79,3% accuracy while picking the best ending of a story while on the HellaSwag dataset in the few-shot settings. And it demonstrated 87,7% accuracy on the StoryCloze 2016 dataset (which is still “4.1% lower than the fine-tuned SOTA using a BERT based model”). … or testing broad factual knowledge with GPT-3. As per the GPT-3 research paper, it was tested on Natural Questions, WebQuestions, and TriviaQA datasets, and the results are the following: GPT-3 in the few-shot settings outperforms fine-tuned SOTA models only on the TriviaQA dataset As for translation, supervised SOTA neural machine translation (NMT) models are the clear leaders in this domain. However, GPT-3 reflects its strength as an English LM, mainly when translating into English. Researchers also state that “GPT-3 significantly outperforms prior unsupervised NMT work when translating into English but underperforms when translating in the other direction.” In general, across all three language models tested (English in combinations with French, German, and Romanian), there is a smooth upward trend with model capacity: Winograd-style tasks are classical NLP tasks, determining word pronoun referring in the sentence when it is grammatically ambiguous but semantically unambiguous for a human. Fine-tuned methods have recently reached human-like performance on the Winograd dataset but still lag behind the more complex Winogrande dataset. GPT-3 results are the following: “On Winograd GPT-3 achieves 88.3%, 89.7%, and 88.6% in the zero-shot, one-shot, and few-shot settings, showing no clear in-context learning but in all cases achieving strong results just a few points below state-of-the-art and estimated human performance. ” As for physical or scientific reasoning, GPT-3 is not outperforming fine-tuned SOTA methods: GPT-3 is not that good at arithmetic still, since the results are the following: However, when it comes to the news article generation, human detection of GPT-3 written news (few-shot settings) is close to chance — 52% of mean accuracy. Well, even the Open AI CEO Sam Altman tweeted that GPT-3 is overhyped, and here is what the researchers themselves state: GPT-3 is not good at text synthesis — while the overall quality of the generated text is high, it starts repeating itself at the document level or when it goes to the long passages. It is also lagging at the domain of the discrete language tasks, having difficulty within “common sense physics”. Thus, it is hard for GPT-3 to answer the question: “If I put cheese into the fridge, will it melt?” GPT-3 has some notable gaps in reading comprehension and comparison tasks. Tasks that empirically benefit from bidirectionally are also areas of improvement for GPT-3. It may include the following: “fill-in-the-blank tasks, tasks that involve looking back and comparing two pieces of content, or tasks that require re-reading or carefully considering a long passage and then generating a very short answer,” as researchers state. Models like GPT-3 have a lot of skills and become “overqualified” for some specific tasks. Moreover, it is the computing-power hungry model: “training the GPT-3v175B consumed several thousand petaflop/s-days of compute during pre-training, compared to tens of petaflop/s-days for a 1.5B parameter GPT-2 model”, as researchers state. Since the model was trained on the content that humans generated on the internet, there are still troubles referring to bias, fairness, and representation. Thus, GPT-3 can generate prejudiced or stereotyped content. But you may already read a lot about it online, or you can check it out in the research paper. The authors are dwelling on it pretty well. GPT-3 is a glimpse of the bright future in NLP, helping to generate code, meaningful pieces of texts, translation, and doing well with different tasks. Also, it has its limitations and ethical issues like generating biased fragments of text. All in all, we are witnessing something interesting, as it always used to be in NLP. Clap for this blog and give some more inspiration to us. Check out more of our posts on NLP: Text Preprocessing for NLP and Machine Learning Tasks Biggest Open Problems in Natural Language Processing A Comprehensive Guide to Natural Language Generation NLP vs. NLU: from Understanding a Language to Its Processing

HOW TO TELL A FANTASTIC DATA STORYHow to tell a fantastic data story

What are the central parts of any data-driven story, and how to apply the Gestalt laws to your data visualization reports? What role does context play in your data story? We are dwelling on these questions and providing you with a list of the best books and tools for stunning data visualization. Check it out! First, let us start from the central part of any good data story — the data-ink ratio. You’ve probably heard of or read something from Edward Tufte, the father of data visualization, who has coined the term. Thus, by saying data-ink ratio, we mean the amount of data-ink divided by the total ink needed for your visualization. Every element of your infographics requires a reason. Simply put, Tufte says that you should use only the required details and remove the visual noise. How far could you go? *Until your visualization communicates the overall idea. Daniel Haight calls it the visualization spectrum, the constant trade-off between **clarity **and *engagement. Daniel proposes to measure the clarity by the time needed to understand the visualization, dependent on information density. Then, you can measure the engagement of your data story by the emotional connection it involves (besides the shares and mentions on social media). Take the data story by Fivethirthyeight about women of color at the US Congress as an example. Authors are using simple elements and not overloading the viewer with needles details (but still communicating the overall story crystal clear). At the same time, looking at timelines of different colors, you can see the drastic changes behind them. That is a pretty good compromise between clarity and engagement. Edward Tufte also coined the term chartjunk, which stands for all the ugly visualization you may see endless online. 11 Reasons Infographics Are Poison And Should Never Be Used On The Internet Again by Walt Hickey, dated back by 2013, is still topical. Thus, it is better to follow the principles that never make infographics appear on the list. Gestalt principles are heuristics standing as the mental shortcuts for the brain and explain how we group the small objects to form the larger ones. Thus, we tend to perceive the things located close to each other as a group, and it is a principle of proximity. Check out also post by Becca Selah, rich in quick and easy tips on clear data visualization. In essence, color conveys information in the most effective way for a human brain. But choosing it according to the rules might be frustrating. As a rule of thumb, remember that choosing a color of the same saturation helps a viewer to perceive such colors as a group. Also, check out the excellent guide from Lisa Charlotte Rost (Datawrapper) since it is the best thing that we have seen for beginners looking for color picking tools. Pro-tip: gray color is not fighting for human attention and should become your best friend. Andy Kirk tells more about it here. Firstly, let us make it clear — a data visualization specialist is a data analyst. Thus, your primary task is to differentiate the signals from the noise, i.e., finding the hidden gems in tons of data. But presenting your findings with good design in mind is not enough while no context is introduced to your audience. Here is when storytelling comes in handy. What is your audience, and how will they see it? This question is relevant for both cases. At the outset of your data-driven story, define your focus — is it a broad or narrow one? In the first case, you will spend lots of time digging in data, so it is crucial to ask your central question. Working with a narrow focus is different. When you have some specific prerequisites at the very beginning and harness several datasets to find the answer, it is the case. Thus, it might be easier to cope with one specific inquiry than to look for some insights in datasets. Consider placing simple insights at the beginning of your story. Thus, you draw in the reader immediately and add some relevant points to illustrate the primary idea better. But when it comes to “well-it-depends-what-are-we-talking-about” answers, try to be mother-like, careful with your audience. Guide them step-by-step into your story. You can use comparing within different elements of periods or apply analogies. Also, it is helpful to use individual data points on a small scale before delving into the large-scaled story. Besides the links to the datasets you have been working with while crafting your story, it is worth sharing your methodology. Thus, your savvy audience may be relaxed while looking at the results. You may read tons of blogs on data visualization, but we believe in the old-fashioned style — build your stable foundation first with classics. Thus, here is the list of ones that we recommend depending on the tasks you are solving. Edward Tufte is the father of data visualization, so we recommend starting with his books to master the main ideas. The Visual Display of Quantitative Information, Envisioning Information, Beautiful Evidence, and Visual Explanation — You are a rock star of data visualization! Naked Statistics: Stripping the Dread from the Data by Charles Wheelan. Applying statistics to your analysis is crucial, and you will delve into the principal concepts like inference, correlation, and regression analysis. How Not to Be Wrong: The Power of Mathematical Thinking by Jordan Ellenberg. We recommend it for those who are coming to DataViz with a non-tech background. Statistics Unplugged by Sally Caldwell. Again, statistics explained, since you will love it. Interactive Data Visualization for the Web: An Introduction to Designing with D3 by Scott Murray comes in handy to create online visualization even if you have no experience with web development. D3.js in Action: Data visualization with JavaScript by Elijah Meeks — a guide on creating interactive graphics with D3. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham to brush up your coding skills with R. Data Visualisation: A Handbook for Data Driven Design by Andy Kirk. This one can help you choose the best visualization for your data, as your insights should not be as clear as engaging, ideally. Visualization Analysis and Design by Tamara Munzner. This one represents a comprehensive, systematic approach to design for DataViz. Information Visualization: Perception for Design by Colin Ware is a cherry on the cake of your designing skills. However, if you’ve got some tasks regarding data visualization right now and have no time for upskilling, we recommend online tools like Florish or Datawrapper. Mala Deep dwells on five free data visualization tools pretty clear, check it out! Also, we would appreciate your suggestions in the comments section!

Serving ML model as an API- Sharing our experience

Serving machine learning models as an API is a common approach for integrating ML capabilities into modern software applications. This process helps to simplify the development of applications and has multiple benefits, such as scalability, efficiency, flexibility, and accessibility. Basically, the aim of such an API is to integrate machine learning models into other components of the application, which enables using the predictive power of machine learning in real-time. So, this process allows systems to use the model's predictions and insights without having to replicate the entire architecture and infrastructure of the model. And today, we would like to share with you our experience in serving the ML model as an API. In this article, we'll walk you through this process and cover the following aspects and steps: Features: Pros: Cons: Features: Pros: Cons: Features: Pros: Cons: Features: Pros: Cons: Choose a deployment platform: You can choose to deploy your API to a server or a cloud-based platform such as Amazon Web Services, Google Cloud Platform, or Microsoft Azure. Consider factors such as scalability, cost, and ease of use when choosing a deployment platform. Set up the environment: Once you've chosen a deployment platform, set up the environment by installing any required dependencies and configuring the server or cloud-based platform. Upload your API code: Upload your API code to the server or cloud-based platform. This can be done by copying the code files to the server or by using a version control system such as Git to push the code to a repository. Configure the API endpoint: Configure the API endpoint on the server or cloud-based platform. This involves specifying the URL for the API endpoint, any required parameters, and any security settings such as authentication and authorization. Test the deployed API: After deploying the API, test it to ensure that it works as expected. You can use the same testing tools, such as Postman that you used during the development phase. Monitor and maintain the API: Once the API is deployed, monitor it to ensure that it is performing well and that it is meeting the required service level agreements (SLAs). You may need to update and maintain the API over time as new features are added or as the underlying technology changes. By deploying your API, you make it accessible to users and allow it to be used in production environments. This can provide significant benefits such as increased efficiency, scalability, and accessibility. Monitoring the API is an important step that will help you ensure that the whole system performs well and meets your requirements. Here are some of the key things you should monitor:

Top Microservices Design Patterns for Your business

Using microservices for building apps is rapidly gaining popularity as they can bring so many different benefits to the business: they are safe and reliable, scalable, optimize the development time and cost, and are simple to deploy. In our previous articles, we discussed the best tools to manage microservices, the advantages and disadvantages of using microservices and the differences they have with monolith architecture, and hexagonal architecture. Despite multiple benefits, app development with microservices has many challenges: Advantages: Disadvantages: When to use the Strangler pattern When not to use Advantages: Disadvantages: When to use the Saga pattern When not to use Saga Advantages: Disadvantages: When to use the Aggregator Pattern When not to use the Aggregator Pattern Advantages: Disadvantages: When to use Event Sourcing When not to use Event Sourcing Advantages: Disadvantages: When to use CQRS When not to use CQRS Disadvantages: When to use Backends for Frontends When not to use Backends for Frontends


2022 Developer Survey: tools, environments and methods for practicing software today.

It is a well-known fact that professionals in the tech industry have different educational backgrounds, use different technologies and tools, etc. Stackoverflow recently launched its annual statistics, where those aspects were explained in detail. The respondents of the study were divided into 2 groups: professional developers and those who are learning to code. We analyzed this survey and decided to post our summary on it. So, today, we would like to show you the following information briefly:

nullTop-5 NLP news of December by a CTO of Sciforce — Max Ved

We are excited to announce that we are launching a new section today at Sciforce — “Top AI news of the month.” We decided to create this column to keep our valued clients aware of the latest news and technologies in the AI world. So, here, we will share some interesting information with you about SciTech! Well, today we will discuss Top-5 NLP news in December:

nullOHDSI Global Symposium 2022

For seven years, the SciForce team has been an active member of the OHDSI scientific community, whose mission is to improve people’s health by empowering the community to obtain evidence-based knowledge that contributes to better decisions in healthcare. In this regard, representatives of our medical team, namely Polina Talapova, Eduard Korchmar, and Denis Kaduk, attended the three-day Global OHDSI Symposium held in North Bethesda, Maryland, USA, on October 14–16, 2022. They heard many fascinating reports of the community’s leads and experts, successfully presented a poster on the topic “Jackalope: A software tool for meaningful post-coordination for ETL purposes” in the Open-Source Analytics Development section (github repos, watched a software demonstration, and participated in such working groups as Oncology, Vocabulary, FHIR-OMOP, Natural Language Processing, CDM and Data Quality. Also, our colleagues met in person with leading OHDSI researchers (George Hripcsak, Patrick Ryan, Christian Reich, Clair Blacketer, Andrew Williams, Rimma Belenkaya, Asieh Golozar, Karthik Natarajan), developers (Christopher Knoll, Paul Nagy, Michael Gurley) and old friends (Dmitry Dymshits, Anna Ostropolets, Alexander Davydov), as well as made new valuable acquaintances. It was a great event and a rewarded experience, expanding minds and horizons. The SciForce team is profoundly grateful to the OHDSI community for the opportunity to be a part of this fantastic journey.

nullGenerative models under a microscope: Comparing VAEs, GANs, and Flow-Based Models

Two years after Generative Adversarial Networks were introduced in a paper by Ian Goodfellow and other researchers, including in 2014, Facebook’s AI research director and one of the most influential AI scientists, Yann LeCun called adversarial training “the most interesting idea in the last ten years in ML.” Interesting and promising as they are, GANs are only a part of the family of generative models that can offer a completely different angle of solving traditional AI problems. When we think of Machine Learning, the first algorithms that will probably come to mind will be discriminative. Discriminative models that predict a label or a category of some input data depending on its features are the core of all classification and prediction solutions. In contrast to such models, generative algorithms help us tell a story about the data, providing a possible explanation of how the data has been generated. Instead of mapping features to labels, like discriminative algorithms do, generative models attempt to predict features given a label. While discriminative models define the relation between a label y and a feature x, generative models answer “how you get x.” Generative Models model P(Observation/Cause) and then use Bayes theorem to compute P(Cause/Observation). In this way, they can capture p(x|y), the probability of x given y, or the probability of features given a label or category. So, actually, generative algorithms can be used as classifiers, but much more, as they model the distribution of individual classes. There are many generative algorithms, yet the most popular models that belong to the Deep Generative Models category are Variational Autoencoders (VAE), GANs, and Flow-based Models. A Variational Autoencoder (VAE) is a generative model that “provides probabilistic descriptions of observations in latent spaces.” Simply put, this means VAEs store latent attributes as probability distributions. The idea of Variational Autoencoder (Kingma & Welling, 2014), or VAE, is deeply rooted in the variational bayesian and graphical model methods. A standard _autoencoder_ comprises a pair of two connected networks, an encoder, and a decoder. The encoder takes in an input and converts it into a smaller representation, which the decoder can use to convert it back to the original input. However, the latent space they convert their inputs to and where their encoded vectors lie may not be continuous or allow easy interpolation. For a generative model, it becomes a problem, since you want to randomly sample from the latent space or generate variations on an input image from a continuous latent space. Variational Autoencoders have their latent spaces continuous _by design_, allowing easy random sampling and interpolation. To achieve this, the hidden nodes of the encoder do not output an encoding vector but_,_ rather, two vectors of the same size: a vector of means and a vector of standard deviations. Each of these hidden nodes will act as its own Gaussian distribution. The new vectors form the parameters of a so-called latent vector of random variables. The _i_th element of both mean and standard deviation vectors corresponds to the ith random variable’s mean and standard deviation values. We sample from this vector to obtain the sampled encoding that is passed to the decoder. Decoders can then sample randomly from the probability distributions for input vectors. This process is stochastic generation. It implies that even for the same input, while the mean and standard deviation remain the same, the actual encoding will somewhat vary on every pass simply due to sampling. The loss of the autoencoder is to minimize both the reconstruction loss (how similar the autoencoder’s output to its input) and its latent loss (how close its hidden nodes were to a normal distribution). The smaller the latent loss, the less information can be encoded that boosts the reconstruction loss. As a result, the VAE is locked in a trade-off between the latent loss and the reconstruction loss. When the latent loss is small, the generated images will resemble the images at train time too much, but they will look bad. If the reconstruction loss is small, the reconstructed images at train time will look good, but novel generated images will be far from the reconstructed images. Obviously, we want both, so it’s important to find a nice equilibrium. VAEs work with remarkably diverse types of data, sequential or nonsequential, continuous or discrete, even labeled or completely unlabelled, making them highly powerful generative tools. A major drawback of VAEs is the blurry outputs that they generate. As suggested by Dosovitskiy & Brox, VAE models tend to produce unrealistic, blurry samples. This is caused by the way data distributions are recovered, and loss functions are calculated. A 2017 paper by Zhao et al. has suggested modifications to VAEs not to use variational Bayes method to improve output quality. Generative Adversarial Networks, or GANs, are a deep-learning-based generative model that is able to generate new content. The GAN architecture was first described in the 2014 paper by Ian Goodfellow, et al. titled “Generative Adversarial Networks.” GANs adopt the supervised learning approach using two sub-models: the generator model that generates new examples and the discriminator model that tries to classify examples as real or fake (generated). GAN sub-models

nullFederated Learning: Your Favorite Guide

Apps from Netflix, Amazon, Google, fraud detection and healthcare algorithms are using federated learning. This way, edge devices like mobile phones can help update ML models while keeping all the data locally — no need for a central server in the loop. Federated learning brings improved apps’ performance and more robust privacy for a user. Read on to find out how. Federated learning (FL) is a form of collaborative machine learning without centralizing training data that comes in handy for specific industries: Federated model averaging enters the picture during this round. Federated averaging works by computing a data-weighted average of the model updates from many gradient descent steps on the device. _Federated averaging works by computing a data-weighted average of the model updates from many steps of the gradient descent on the device_ Moreover, differentially private model averaging ensures the user’s privacy since the server learns the common patterns in the dataset without memorizing individual examples. Thus, devices “clip” the datasets provided when too large, and the server adds noise when combining updates. Meanwhile, federated learning does not apply to all the tasks that machine learning can deal with:

Our contacts