patents.google.com

CN118567626B - A task autonomous processing method, device, storage medium and electronic device - Google Patents

  • ️Fri Nov 29 2024

Detailed Description

The following description of the embodiments of the present invention will be made apparent from, and elucidated with reference to, the drawings of the present specification, in which embodiments described are only some, but not all, embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present specification, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or" describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate that there are three cases of a alone, a and B together, and B alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In the related art, a user hopes to input tasks for application programs based on the intelligent interaction service, then the tasks are autonomously processed based on the intelligent interaction service, in the related art, the intelligent interaction service is often based on a fixed interaction pattern, the tasks which can be covered by the interaction pattern can be processed, and the tasks which cannot be covered by the interaction pattern can not be helped by the intelligent interaction service, so that how to provide convenient and easy-to-use intelligent interaction service is a problem which needs to be solved urgently.

The present specification is described in detail below with reference to specific examples.

Referring to fig. 1, a schematic view of a task autonomous processing system is provided in the present specification. As shown in fig. 1, the task independent processing system may include at least a client cluster and a service platform 100.

The client cluster may include at least one client, as shown in fig. 1, specifically including a client 1 corresponding to a user 1, a client 2 corresponding to a user 2, and a client n corresponding to a user n, where n is an integer greater than 0.

Each client in the client cluster may be a communication enabled electronic device including, but not limited to, a wearable device, a handheld device, a personal computer, a tablet, a vehicle mount device, a smart phone, a computing device, or other processing device connected to a wireless modem, etc. The electronic devices in different networks may be called different names, such as user equipment, access terminals, subscriber units, subscriber stations, mobile stations, remote terminals, mobile devices, user terminals, wireless communication devices, user agents or user equipment, cellular telephones, cordless telephones, personal Digital Assistants (PDAs), electronic devices in 5G networks or future evolution networks, etc.

The service platform 100 may be a single server device, such as a rack-mounted server device, a blade server, a tower server device, or a cabinet server device, or a hardware device with a relatively high computing capability, such as a workstation, a mainframe computer, or the like, or may be a server cluster formed by a plurality of servers, where each server in the service cluster may be formed in a symmetrical manner, where each server is functionally equivalent and functionally equivalent in a transaction link, and each server may individually provide services to the outside, and the individual service may be understood as not requiring assistance of another server.

In one or more embodiments of the present description, the service platform 100 may establish a communication connection with at least one client in the client cluster, and complete interaction of data during task autonomous processing based on the communication connection;

For example, the service platform 100 may provide the agent object service to the client based on the task autonomous processing method of the present specification;

In another example, the service platform 100 may obtain, from the client, a target task for a target transaction input by a user to the agent service object, and perform task execution code generation processing to obtain a target RPA code by using a task processing large language model based on the target task, where the task processing large language model is obtained by performing model training using a reference task processing example for the target transaction based on a basic large language model;

For another example, the client may obtain a target task for a target transaction input by a user to the agent service object, perform task execution code generation processing by using a large task processing language model based on the target task request service platform 100 to obtain a target RPA code, control the agent service object to execute the target RPA code on the target transaction so as to perform task autonomous processing on the target task, and control the agent service object to execute the target RPA code on the target transaction so as to perform task autonomous processing on the target task.

It should be noted that, the service platform 100 establishes a communication connection with at least one client in the client cluster through a network for interactive communication, where the network may be a wireless network, or may be a wired network, where the wireless network includes, but is not limited to, a cellular network, a wireless local area network, an infrared network, or a bluetooth network, and the wired network includes, but is not limited to, an ethernet network, a universal serial bus (universal serial bus, USB), or a controller area network. In one or more embodiments of the specification, techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like are used to represent data exchanged over a network (e.g., a target compression package). All or some of the links may also be encrypted using conventional encryption techniques such as secure sockets layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), etc. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

The task autonomous processing system embodiment provided in the present disclosure belongs to the same concept as the task autonomous processing method in one or more embodiments, and an execution subject corresponding to the task autonomous processing method in one or more embodiments may be the service platform 100, and an execution subject corresponding to the task autonomous processing method in one or more embodiments may also be an electronic device corresponding to a client, which is specifically determined based on an actual application environment. The implementation process of the task autonomous processing system embodiment may be described in detail in the following method embodiment, which is not described herein.

Based on the schematic view of the scenario shown in fig. 1, a task autonomous processing method provided in one or more embodiments of the present disclosure is described in detail below.

Referring to fig. 2, a flow diagram of a task autonomous processing method, which may be implemented in dependence on a computer program and may be executed on a task autonomous processing device based on von neumann system, is provided for one or more embodiments of the present description. The computer program may be integrated in the application or may run as a stand-alone tool class application. The task autonomous processing device may be an electronic device.

Specifically, the task autonomous processing method comprises the following steps:

s102, acquiring a target task aiming at a target transaction, which is input by a user to an agent service object;

In the present specification, an Agent service object (Agent service object) is a computer program with functions of sensing, reasoning and deciding depending on electronic equipment, and the Agent service object is a service bearing object of the Agent, and provides a task autonomous processing function for a target transaction for a user depending on a task processing large language model in the present specification;

for example, the agent service object may be a service control for carrying agent service, and the user may trigger the service control corresponding to the agent service object, and input a target task for the target transaction in an agent service interface corresponding to the service control;

the agent service object can be applied to application scenes such as artificial intelligence, robots, virtual reality, games, automatic control, information retrieval, recommendation systems, natural language processing and the like.

The target transaction can be an application transaction, a Web automation transaction, a robot flow automation transaction, an intelligent virtual assistant transaction and the like;

The target transaction is taken as an application transaction example, the application transaction can be based on a system application or a third party application, the type of the target transaction is not limited, the target transaction can provide services for users, along with popularization of electronic equipment and increasing complexity of transaction services, in one or more embodiments of the specification, in order to facilitate the users, the target transaction services based on the target tasks input by the users can be automatically interacted with and task navigation is performed based on the intelligent agent services, so that the task autonomous processing of the target tasks is completed, the operation path of the users for completing the target tasks is saved, and the task processing efficiency is improved.

The target task may be understood as a task request based on a target transaction, and the target task may be understood as target task request information for the target transaction, which is input by a user, that is, refers to autonomous task processing performed by the user on an agent service object, for requesting the agent service object to directly perform the target task,

Illustratively, the target task for the beverage purchasing procedure can be specifically, for example, but not limited to, "help me buy a large cup of American coffee" and the like;

the target task can be specifically, for example, help me call express, from A land to B land and the like, and the target task is not limited to the specific example;

In the specification, the task autonomous processing method facing the large model scene can be applied to the intelligent agent service object, namely, the user can interact with the intelligent agent service object, so that the intelligent agent service object can accurately and efficiently process the large model to perform 'autonomous operation target transaction' for the user through the task to complete the target task.

The agent service object can comprise a plurality of services aiming at the target transaction, each service can have a corresponding service function depending on the target transaction, the number of the target transaction can be a plurality of different types of applications, and the services of the different types of applications can jointly run to autonomously complete autonomous interaction processing of target characters with the target transaction, so that man-machine interaction based on the target task is converted into interaction between the agent and the applications.

S104, performing task execution code generation processing by adopting a task processing large language model based on the target task to obtain a target RPA code, wherein the task processing large language model is obtained by performing model training by using a reference task processing example aiming at the target transaction based on a basic large language model;

The RPA code refers to robotic process automation (Robotic Process Automation) code, abbreviated RPA code, which is dedicated to automatically performing repetitive and rule-based tasks that are traditionally performed by humans interacting with transactions such as software applications. RPA codes are intended to replicate human actions such as clicking buttons, entering data, and browsing interfaces to simplify transaction interaction flow and improve efficiency;

The target RPA code can be understood as an RPA code obtained by performing task execution code generation processing on a target task by a task processing large language model, the task autonomous processing on the target task can be realized by executing the target RPA code in a target transaction, and the target RPA code can be an executable script generated on the basis of the target task described by natural language.

In the specification, an initial task processing large language model is created in advance based on a basic large language model, then a reference task processing example of a target transaction is collected, the initial task processing large language model is subjected to model training by using the reference task processing example, a task processing large language model capable of processing a target task is obtained, and the task processing large language model is adapted to a reference task processing example of the target transaction so that task intention can be analyzed to generate a target RPA code capable of carrying out target task operation on the target transaction;

in practical application, a task processing large language model is associated with an agent service object, a target task aiming at a target transaction, which is input by a user to the agent service object, is obtained, and task execution code generation processing is carried out by the agent service object based on the target task request task processing large language model, so as to obtain a target RPA code;

in one or more embodiments of the present description, the underlying large language model may be a text-to-speech model, a sense-to-thousand model, a GPT model, and so forth;

and S106, controlling the agent service object to execute the target RPA code on the target transaction so as to perform task autonomous processing on the target task.

Illustratively, the verified target RPA code may be transmitted to the agent service object, where the agent service object executes the target RPA code on the target transaction to process the specified target task.

By way of example, as shown in fig. 3, fig. 3 is a schematic diagram of a task autonomous processing scenario, where a user intends to implement a target task "buy a large cup of american coffee" in a beverage application, and the implementation of "buy a large cup of american coffee" may require multiple steps and cumbersome operations to be implemented in actual operations, the task implementation operation path is relatively long and inconvenient, and based on this, by using the task autonomous processing method of one or more embodiments of the present specification, the user inputs a target task of 'drink application' for a target transaction 'to buy a large cup of American coffee' to an intelligent service object, then a task processing large language model is adopted to generate a task execution code for realizing 'buy a large cup of American coffee' to obtain a target RPA code, and then the intelligent service object is controlled to execute the target RPA code for 'drink application' to realize 'buy a large cup of American coffee' for the user automatically, so that the target task is completed.

In one or more embodiments of the present disclosure, an electronic device obtains a target task for a target transaction input by a user to an agent service object, performs task execution code generation processing by using a large task processing language model based on the target task to obtain a target RPA code, controls the agent service object to execute the target RPA code for the target transaction so as to perform task autonomous processing on the target task, and through the agent service object and the large task processing language model, can effectively bridge interaction between the user and the target transaction, and automates execution of complex user tasks, thereby providing a convenient and easy-to-use intelligent interaction service for the user, and significantly improving work efficiency.

Referring to fig. 4, fig. 4 is a schematic flow diagram illustrating generation of task execution code according to one or more embodiments of the present disclosure. Executing the task execution code generation processing by adopting a task processing large language model based on the target task to obtain a target RPA code, wherein the following modes can be referred to:

s2002, generating a task code generation prompt word aiming at a target task;

in one possible implementation, the target task can be directly used as a task code to generate a prompt word;

In one possible implementation, a task prompt word template may be preset, and a target task is filled in the preset task prompt word template to obtain a task code generation prompt word;

Optionally, the preset task prompt word template may further include a use function tool prompt word for calling a function tool for the basic operation;

In one possible implementation, the task request of the user's target task may be understood. For example, a target task input by a user may be parsed using a Natural Language Processing (NLP) technique, task key information and task demand information are extracted, and based on the parsed task key information and task demand information, an agent service object generates a set of task code generation hint words. These hinting terms are used as part of the task processing large language model input to instruct the model to more accurately generate subsequent sequences of task operations, such as may include task types, key operations, objects, specific parameters, and use function tool-tips that call function tools for basic operations, etc.

S2004, inputting the task code generation prompt word into a task processing large language model, performing task operation sequence generation processing on the target task through the task processing large language model to obtain a target task execution step sequence, and performing task code generation processing by using a basic operation calling function tool based on the target task execution step sequence to obtain a target RPA code;

The task code generation prompt word is input into a task processing large language model, a series of task operation steps are generated by the task processing large language model through the input prompt word, then a basic operation calling function tool is used for determining a target task operation calling function required to be called by each task operation step, and task code generation processing is carried out based on the series of task operation steps and the target task operation calling function, so that target RPA codes are obtained. For example, the target RPA code may be a code script, code program, etc. that invokes a basic operation calling function tool;

In a possible implementation manner, the task processing large language model performs task operation sequence generation processing on the target task to obtain a target task execution step sequence, so as to use the basic operation calling function tool to perform task code generation processing on the basis of the target task execution step sequence to obtain a target RPA code, and the method includes:

a2, performing task operation generation processing on the target task to obtain a target task execution step sequence, and determining execution step interpretation information corresponding to the task execution step sequence;

Illustratively, the task processing large language model parses a user's target task request according to the prompt. The method comprises the steps of understanding specific requirements of a target task, decomposing the specific requirements into a series of specific task execution steps aiming at the target task by combining the target transaction, forming a target task execution step sequence by all task execution steps according to an operation time sequence, and generating execution step interpretation information for the task execution steps in the task execution step sequence;

The step interpretation information is introduced into the target RPA code to increase the transparency of the model processing and the maintainability of the code, and the task processing large language model generates the interpretation information of each step, including the step action information and the step execution information of each step.

A4, determining a target task operation sequence calling function corresponding to the target task execution step sequence through the basic operation calling function tool;

The basic operation calling function tool can be understood as abstracting basic general operations such as clicking, sliding, long pressing, moving and the like for target transactions (services) in advance, the basic operation calling function tool can be adapted to the target transactions, the basic operation calling function tool comprises various predefined basic operation calling functions or API calls, the basic operation calling function tool can greatly reduce the task code generation difficulty of a large model, and only step code fragments of each task execution step need to be generated around the basic operation calling function tool. Basic operation call function tools may include, for example, transactional basic operations, database queries, API interactions, and so forth. It can be understood that the task processing large language model maps each step in the task execution step sequence to one or more specific basic operation calling functions in the basic operation calling function tool, and the basic operation calling function corresponding to each task execution step forms a target task operation sequence calling function according to the operation time sequence;

And A6, performing task code generation processing based on the target task execution step sequence, the execution step interpretation information and the target task operation sequence calling function to obtain a target RPA code.

The task processing large language model integrates the execution step interpretation information, the execution step sequence and the function call, and the task processing large language model generates an actual target RPA code by using the execution step sequence, the step interpretation information and the operation sequence call function obtained in the previous two steps.

S2006, controlling the task processing large language model to output the target RPA code.

The finally generated RPA code is an automated script that is based on the user's original task request and is converted, through intelligent parsing and code generation processes, into specific operations that can be performed on the target transaction. The automated script can perform complex tasks, reduce manual intervention, and improve processing speed and accuracy.

Schematically, as shown in fig. 5, fig. 5 is a simplified schematic diagram of a target RPA code, assuming that the target task is a step sequence consisting of 9 steps of task execution steps as shown in fig. 5, in which the task processing large language model determines that the target task execution step sequence is implemented in a certain beverage application, each step corresponds to a step sequence that requires a basic operation calling function to be called, such as a third step "input meal name matcha lat" requires a basic operation calling function "self clikck and getexpose", each task execution step in the target RPA code is attached with execution step interpretation information, such as "second step click search", "third step input meal name matcha lat", and so on,

In one or more embodiments of the present description, the preferred practices of large language models, natural language processing techniques, and software engineering are leveraged to ensure that each step from user demand to final code generation is efficient and accurate. Such as fig. 5, in this manner, the generated code fragments are displayed, suitable for selecting the meals of an application and the number thereof. Each step contains explanatory notes, so that the readability of codes is enhanced, and efficient and accurate task automation can be realized, thereby remarkably improving the operation efficiency and reducing the error rate. By carefully parsing the user task request, generating detailed execution steps and corresponding codes, time savings, reduced operation path costs, and simplified user transaction flows can be facilitated.

Referring to fig. 6, a flow diagram of a task processing large language model training method is provided for one or more embodiments of the present disclosure, which may be implemented by a computer program and may be executed on a task processing large language model training device based on von neumann system, where the task processing large language model training device may be the same as or different from the task autonomous processing device. The computer program may be integrated in the application or may run as a stand-alone tool class application. The task autonomous processing device may be an electronic device.

S302, creating an initial task processing large language model aiming at a target transaction based on the basic large language model;

in one possible implementation, the initial task processing large language model may be a model structure composed of at least one model module based on a basic large language model component, and then model training is performed on a task processing scene of a target transaction, where the basic large language model (Large Language Model, LLM) is an artificial intelligence content generation model, and is aimed at understanding and generating human language. The basic large language model is trained on a large amount of data, and the trained task processing large language model can perform a wide range of tasks including text summarization, task processing, code generation and the like.

Alternatively, the underlying large language model may be a common AIGC model, such as a large-in-heart model, a large-in-sense model, a large-in-GPT series model, and so on;

S304, acquiring a reference task processing example and a sample task aiming at the target transaction, wherein the task types of the sample task comprise extended task types except the reference task type;

Schematically, a trained basic large language model can be obtained, the basic large language model is adapted to a task processing scene, a reference task processing example and a sample task for a target transaction are obtained, the reference task processing example and the sample task form sample training data together, and an expert end service can be adopted to label sample RPA code labels for the sample tasks in the sample training data in advance;

Optionally, the task types of the sample task include extended task types other than the reference task type, that is, the task processing large language model can learn from the user reference task presentation, popularize to invisible extended tasks, and support providing transparent explanation of the operation of the extended tasks.

In a feasible implementation mode, the initial task processing big model is subjected to model understanding training by adopting a reference task processing example so as to understand the operation flow of a user in the reference task processing example, in the model training process, the initial task processing big model is subjected to task processing training by adopting a sample task to obtain a predicted sample RPA code, and model parameter adjustment is performed on the initial task processing big model based on the predicted sample RPA code and a sample RPA code label so as to obtain a task processing big model after model training.

Then, model training can be carried out on the initial task processing large language model based on the reference task processing example and the sample task to obtain a task processing large language model after model training, and specific reference can be made to S306-S308;

S306, generating a prompt word structure and a basic operation call function tool based on a preset code, and generating a sample task code generation prompt word aiming at a reference task processing example and a sample task;

The system comprises a reference task processing example, a sample task code generating template, a preset code generating prompt word structure, a prompt word generating function tool and a prompt word generating template, wherein the preset code generating prompt word structure can be understood as a prompt word generating template used for generating the sample task code generating prompt word aiming at the reference task processing example and the sample task;

Illustratively, the basic operation calling function tool can be understood as abstracting basic general operations such as clicking, sliding, long pressing, moving, type selecting and the like for a target transaction (service) in advance, the basic operation calling function tool can be adapted to the target transaction, the basic operation calling function tool comprises various predefined basic operation calling functions or API calls, the basic operation calling function tool can greatly reduce task code generation difficulty of a large model, and indicates that the large model can directly call an implemented function attack when generating codes, and only step code fragments of each task execution step need to be generated around the basic operation calling function tool. Basic operation call function tools may include, for example, transactional basic operations, database queries, API interactions, and so forth. It can be understood that the task processing large language model maps each step in the task execution step sequence to one or more specific basic operation calling functions in the basic operation calling function tool, and the basic operation calling function corresponding to each task execution step forms a target task operation sequence calling function according to the operation time sequence;

illustratively, a basic operation calling function tool is provided for code generation, the basic operation calling function tool includes function interfaces for calling basic operation calling functions, please refer to fig. 7, fig. 7 is a schematic diagram of an interface definition description, fig. 7 illustrates function interfaces of some basic operation calling functions in the basic operation calling function tool, such as clickAndGetExpose calling functions, type calling functions, scrollAndGetExpose calling functions, and enter calling functions, and fig. 7 illustrates specific definitions and descriptions of function interfaces of these basic operation calling functions.

In one possible implementation, a method of generating hinting words is illustrated that uses a series of refined hinting words to guide and optimize the behavior of a large language model in generating task processing code. The cue words encompass a number of aspects of the task including roles, skills, constraints, tool descriptions, and reference operation information. Executing the generating prompt word structure based on the preset codes and calling a function tool by basic operation to generate a sample task code generating prompt word aiming at a reference task processing example and a sample task, wherein the following method can be adopted:

1. generating a prompt word structure and a basic operation calling function tool based on a preset code, and determining a role prompt word, a task processing skill prompt word, a constraint condition prompt word, a tool description prompt word and a reference task operation information prompt word;

The preset code generation prompt word structure indicates that role prompt words, task processing skill prompt words, constraint condition prompt words, tool description prompt words and reference task operation information prompt words need to be determined;

Role hint words defining roles or entities involved in task execution. For example, in a transaction autonomous processing scenario, the roles may be code specialists, technical support specialists, and so on. This helps the model understand the task's requirements as to the operation and permissions associated with a particular role.

Task processing skill cues describe the particular skill or technique required to complete a task. For example, if a task involves data analysis, the hint terms may include "data mining," "statistical analysis," and the like.

Constraint hint words, describing constraints or rules that must be considered when performing a task. Such as time constraints, budget constraints, or specific data processing rules (e.g., GDPR compliance).

Tool description prompt words, namely, describing basic operation calling function tools for task processing;

And providing operation information corresponding to a reference task processing example for processing the reference task, and helping the model to know how to learn past experience and solution of the reference task in similar situations. The reference task operation information prompt may represent a query and a complete sequence of operations demonstrating a reference task processing instance;

2. Generating sample task codes aiming at a reference task processing example and a sample task based on the role prompt word, the task processing skill prompt word, the constraint condition prompt word, the tool description prompt word and the reference task operation information prompt word to generate the prompt word.

For example, referring to fig. 8, fig. 8 is a schematic diagram of a present task code generating prompt word, generating a prompt word based on a preset code generating prompt word structure and a sample task code generated by a basic operation calling function tool, in fig. 8, a role prompt word, a task processing skill prompt word, a constraint condition prompt word, a tool description prompt word and a reference task operation information prompt word are given, the reference task operation information prompt word is "example operation sequence" in fig. 8, a { current_remove_with_code } accessory represents a query and a complete operation sequence of a reference task demonstration, and { api_spec } represents the given basic operation calling function tool, and a sample task is "in starbucks for me to order a cup of matchmaking latte".

S308, generating a prompt word based on the sample task code, the reference task processing example and the basic operation calling function tool, performing at least one round of model training on an initial task processing large language model, determining a predicted sample RPA code obtained after the initial task processing large language model performs task execution code generation processing on the sample task, and performing model parameter adjustment on the initial task processing large language model based on the predicted sample RPA code until the initial task processing large language model finishes model training, so as to obtain a task processing large language model.

For example, generating a prompt word, the reference task processing example and the basic operation calling function tool based on the sample task code as model inputs, and outputting the model inputs to an initial task processing large language model for model training;

In the forward propagation training process of the model, an initial task processing large language model carries out task execution code generation processing on the sample task to obtain a prediction sample RPA code;

And in the model back propagation training process, calculating target model loss of the initial task processing large language model based on the prediction sample RPA code, and adopting a model loss value to adjust model parameters until the initial task processing large language model meets the model finishing training condition so as to finish model training and obtain the task processing large language model.

In a possible implementation manner, the model parameter adjustment of the initial task processing large language model based on the prediction sample RPA code comprises the following steps of

And acquiring a sample RPA code label aiming at a sample task, determining a target model loss based on the prediction sample RPA generation and the sample RPA code label, and adopting the target model loss to carry out model parameter adjustment on the initial task processing large language model.

Illustratively, calculating a target model loss by using a loss calculation function based on the prediction sample RPA generation and the sample RPA code tag;

alternatively, the loss calculation function may employ a cross entropy loss calculation function, a hinge loss calculation function, a contrast loss calculation function, a euclidean distance loss calculation function, or the like in the related art;

alternatively, the model end training condition of the model may include, for example, the value of the loss function is less than or equal to a preset loss function threshold, the number of iterations reaches a preset number of times threshold, and so on. The specific model end training conditions may be determined based on actual conditions and are not specifically limited herein.

It should be noted that, the machine learning model according to one or more embodiments of the present disclosure includes, but is not limited to, fitting of one or more of a convolutional neural network (Convolutional Neural Network, CNN) model, a deep neural network (Deep Neural Network, DNN) model, a recurrent neural network (Recurrent Neural Networks, RNN), a model, an embedded (embedding) model, a gradient-lifting decision tree (Gradient Boosting Decision Tree, GBDT) model, a logistic regression (Logistic Regression, LR) model, and the like;

In one or more embodiments of the present disclosure, a task processing large language model may be trained in the foregoing manner, where the task processing large language model may learn from a user reference task presentation, popularize to invisible extended tasks, and support providing transparent interpretation of extended task operations, thereby enhancing the generalization capability of an agent service object that incorporates the task processing large language model.

Referring to fig. 9, fig. 9 is a schematic flow diagram of model training of a task processing large language model according to one or more embodiments of the present disclosure, where the (initial) task processing large language model includes a presentation coding module, a code generation module, a UI (User Interface) mapping module, and a behavior clone fusion module, and specifically performs at least one round of model training on the initial task processing large language model by performing the generating of a prompt word based on the sample task code, the referencing task processing example, and the basic operation calling function tool, which may refer to the following embodiments, specifically:

S4002, generating a prompt word, the reference task processing example and the basic operation calling function tool from the sample task code to perform model training on an initial task processing large language model;

Illustratively, FIG. 10 is a model training framework diagram of a task processing large language model, as shown in FIG. 10, and FIG. 10 shows the combining of Large Language Models (LLMs) with behavioral cloning chains through learning presentations to create an interpretable processing large language model for transactional interactions such as mobile application transactions. An exemplary model structure for (initial) task handling large language models mainly includes a presentation coding module (e.g. "presentation coding" as shown in red font in fig. 10), a code generation module (e.g. "code generation" as shown in red font in fig. 10), a UI mapping module (e.g. "UI mapping" as shown in red font in fig. 10), a behavior clone fusion module for "behavior clone chain fusion" (e.g. "behavior clone (fusion) module" as shown in white font in fig. 10), which work cooperatively to enable LLM agents to learn, popularize to invisible tasks from user presentations, and provide transparent explanation of their operation.

1. The presentation encoding module may capture (as in the reference task processing example entered by the initialization link in fig. 10) a user presentation in the reference task processing example and construct it in a format that is processable by the large language model, extracting rich semantic information using the advanced visual question-answer mode.

Alternatively, the presentation encoding module may be constructed based on a machine learning model, for example, may be constructed using an advanced visual question-and-answer model (Visual Question Answering Model);

2. The code generation module converts the coded presentation into modular, parameterized and interpreted code segments (see code segment c of the code generation link shown in fig. 10) using the LLM (large language model) generation function. Alternatively, the code generation module may be constructed based on a basic large language model;

3. The UI mapping module establishes a corresponding relation between the generated code segment and related interface UI elements in the application program, and ensures accurate and seamless autonomous interaction.

4. The (initial) task processing large language model introduces a behavior clone chain fusion mode, and a behavior clone fusion module (see the behavior clone (fusion) module in fig. 10) is constructed based on the behavior clone chain fusion mode, so that the (initial) task processing large language model is allowed to learn from a plurality of demonstrations, and the learned behaviors are combined into an interaction model with cohesive force and flexibility (called a task interaction behavior knowledge model). The behavior clone chain fusion mode enhances the generalization capability of the Agent, so that the Agent can be effectively adapted to new scenes, and proper learning functions can be combined and executed according to the identified task requirements.

The following is a definition of the model training framework of the task processing large language model shown in fig. 10 in connection with specific steps, as follows:

S4004, performing task demonstration coding training processing by the demonstration coding module based on the reference task processing example to obtain a task demonstration operation structured sequence;

The presentation encoding module captures the user presentation in the reference task processing example and constructs it into a format that can be processed by the large language model, and extracts rich semantic information through the presentation encoding module. The user presentation in the reference task processing example is represented as a series of actions performed within a target transaction, such as an application transaction (these actions constitute an action sequence prototype as shown in fig. 10);

In one possible embodiment, reference may be made to the following manner:

B2, determining a reference task action sequence and a presentation action behavior characterization corresponding to a reference task action in the reference task processing example through the presentation coding module, wherein the presentation action behavior characterization comprises action type characterization, action interaction characterization and transaction interface UI metadata characterization;

The presentation encoding module presents a presentation parsing of a reference task processing instance in which the reference task action sequence is composed of a series of presentation actions (reference task actions), denoted d= { a 1,a2,...,an }, where each reference task action a i is denoted a presentation action behavior characterization (τ i,ei,mi), where the action type characterization τ i represents the basic operation action type (e.g. click, input, scroll, input, return as shown in fig. 10), the action interaction characterization ei represents interaction elements on the action and transaction interfaces, and the transaction interface UI metadata characterization m i may be interpreted as related interface metadata on transaction interfaces such as application transaction interfaces, e.g. text, identifier, boundary.

And B4, carrying out characteristic structural processing on the basis of the reference task action sequence and the demonstration action behavior characterization to obtain a task demonstration operation structural sequence comprising demonstration task action codes, wherein the coding tuples corresponding to the demonstration task action codes comprise action type codes, associated text codes, interactive element identification codes, visual feature codes and application display text list codes.

After the demonstration and analysis is carried out on the reference task processing example, the demonstration and encoding module carries out characteristic structural processing based on the reference task action sequence and the demonstration action behavior characterization to obtain a task demonstration operation structural sequence (which can be called as demonstration encoding) comprising demonstration task action encoding, namely, converting the reference task action sequence D into structural representation to obtain a task demonstration operation structural sequence E D={s1,s2,...,sn;

Each coded presentation task action s i corresponds to a coded group of symbols (coded meta-information combination) (τi, ti, idi, vi, expi);

Wherein, the action type codes are tau i, the associated text codes are t i, the identifiers of the interaction elements are interaction element identification codes id i, the visual characteristic codes are v i, and the application display text list (text list displayed on the transaction screen interface) codes are exp i;

for example, the presentation encoding module may capture rich semantic information about the interactive elements via the VQA model utilized, thereby enabling agents to generalize to never-seen scenarios and provide accurate interpretation of their behavior, and in some embodiments, the presentation encoding module may automatically determine step interpretations for each step upon completion of presentation encoding.

S4006, performing task execution code generation training by the code generation module based on the task demonstration operation structured sequence by using the basic operation calling function tool to obtain a reference task behavior RPA code comprising all reference operation RPA code fragments.

In one possible embodiment, reference may be made to the following manner:

and performing task step execution code conversion by using the basic operation calling function tool based on the task demonstration operation structured sequence to obtain reference operation RPA code fragments corresponding to all reference operations, generating reference execution step explanation for the reference operation RPA code fragments, and generating reference task behavior RPA codes based on all the reference operation RPA code fragments and the reference execution step explanation.

Illustratively, the code generation module is a base LLM-based module that can utilize the generation functionality of the LLM to convert the encoded task presentation operation structured sequence E D into executable RPA code (reference task behavior RPA code).

Performing task step execution code conversion on each coding step s i by using a basic operation calling function tool based on a task demonstration operation structured sequence to obtain a code segment C= { C 1,c2,...,cn } of a reference operation RPA code segment C i, corresponding to all reference operations s i, wherein the generation process can be expressed as C i=L(si, M and theta, wherein theta represents a large language model parameter which can be learned, L represents coding generation processing of a large language model LLM, s i represents demonstration task action coding, and M represents transaction interface UI metadata related to the step s i;

Optionally, the code generation module is configured to mine potential loop structures in the code fragments, i.e. the number of generated code fragments may be smaller than the code fragments in the original reference task, since there are potential loop structures in the code, which the code generation module intelligently recognizes and uses to generate a compact and efficient code that accurately replicates the demonstrated behavior.

In order for the code generation module to generalize the representation to unseen actions, the code generation module identifies a reference task using a presentation technique from image recognition and extracts relevant hyper-parameters h= { H1, H2,..hk } of the model during task processing. These superparameters form a set of choices that allow the large language model to dynamically adjust the generated written code based on the identified parameters. The extraction of the hyper-parameters can be expressed as h j =r (vi, phi), where R is the task process of a large language model with image recognition function with learnable parameters. The generated code segment c i is configured to be modular, parameterized, and accompanied by explanatory comments to ensure transparency and interoperability. The code generation module can easily understand and modify the generated codes, and enhances the interpretability and adaptability of the task processing large language model.

S4008, performing application UI element mapping training processing on the target transaction based on the reference operation RPA code segment by using the UI mapping module to obtain target transaction UI operation mapping information;

The UI Mapping module can also be called as a UI Mapping module, and a corresponding relation is established between a code segment c i generated by operation correspondence and operation related UI elements in a transaction interface such as an application transaction interface, and the combination of the relation is called as target transaction UI operation Mapping information;

alternatively, the UI mapping module mainly adopts two methods, namely text and identifier matching and visual information matching.

Text and identifier matching methods locate target elements of an interaction by matching text and resource identifiers of UI elements. The method utilizes a structured representation of the application UI hierarchy to find the most relevant elements based on the encoded step information.

The visual information matching method utilizes visual features and basic algorithms to locate the UI elements according to their appearance. By comparing the extracted visual features v i with visual information of the UI element, the UI mapping module can accurately identify the target UI element for which a step in the transaction interface is directed in the event of a text or identifier information being obscured or missing.

The UI mapping module may be expressed as U i=M(si, U, ψ), where U i is a UI element corresponding to the encoding step s i, U is a UI hierarchy of the transaction application, and ψ represents a learnable parameter of the mapping function M. The UI mapping module ensures that the generated code can be executed seamlessly within the application, mimicking the user's operations with high fidelity. The UI mapping module also provides an explanation of how the large language model recognizes and interacts with particular UI elements, thereby enhancing the transparency and interpretability of task processing.

S4010, performing cloning behavior characterization on the reference task behavior RPA code through the behavior cloning fusion module to obtain reference task behavior characterization, performing behavior knowledge fusion on the basis of all task behavior characterization to obtain a task interaction behavior knowledge model, performing task operation generation processing on the sample task through the task interaction behavior knowledge model to obtain a sample task execution step sequence, and performing task code generation processing on the basis of the sample task execution step sequence by using the basic operation calling function tool to obtain a predicted sample RPA code;

the task behavior characterization comprises a reference task behavior characterization, or the task behavior characterization comprises a reference task behavior characterization and a sample task behavior characterization corresponding to the prediction sample RPA code.

In order to further enhance the generalization capability of the task processing model, the task processing model introduces a clone fusion module based on a behavior clone chain fusion technology. The clone fusion module learns from a plurality of reference task demonstrations and merges the learned behaviors into a task interaction behavior knowledge model with cohesive force and flexibility, and the task interaction behavior knowledge model is obtained by carrying out behavior knowledge fusion based on all task behavior characterizations based on sample task behavior characterizations in a reverse fine-tuning mode corresponding to a current prediction sample RPA code based on target model loss calculated by a preset sample RPA code label in a model reverse propagation training process.

Let d= { D 1,D2,...,Dm } be a set of m presentations teaching different sample tasks. Each presentation Di is encoded and processed by a code generation module to produce a set of learning behaviors f= { F 1,f2,...,fm } expressed as code functions. A behavioral clone fusion module (denoted B) dynamically invokes and combines these learning functions according to the identified task requirements. Given a new task T, the fusion process can be expressed as f=b (T, F, ζ), where F is the fusion behavior function and ζ represents the learnable parameters of the fusion module. The behavior clone fusion module intelligently selects and executes proper learned functions through the fusion behavior function f, so that the task processing large language model can be effectively adapted to tasks in new scenes. By utilizing knowledge obtained from multiple presentations, a task processing large language model can process a wide range of new tasks by virtue of a task interaction behavior knowledge model and exhibit more robust, flexible behavior.

Illustratively, the behavior clone fusion module plays an important role in enhancing the generalization ability of the task to handle large language models, and combines the learned demonstration knowledge of the reference behavior in a novel manner so that it can solve unknown tasks by combining and adjusting the knowledge obtained from different demonstrations.

In one or more embodiments of the present disclosure, by combining LLM and clone learning in the model training manner described above, the user's operation flow can be understood efficiently. For complex tasks, the meaning of each step is understood by demonstrating the step-by-step disassembly of the coding module. And then, the complex logics are connected in series in a code generation mode, natural language and real operation are mapped accurately through the UI mapping module, meanwhile, the behavior-based clone fusion module can perform deep fusion through learning of a plurality of demonstrations, and the generalization capability of the model is enhanced. The trained task processing large language model is realized, the success rate of the task can be greatly improved, the task processing large language model and the intelligent agent service cooperate to capture demonstration of a user, generate executable code fragments and establish accurate corresponding relations between codes and UI elements.

The task processing large language model training device and the task autonomous processing device provided in the present specification will be described in detail with reference to fig. 11. The task processing large language model training device and the task autonomous processing device shown in fig. 11 are used for executing all or part of the methods of the embodiments shown in fig. 1 to 10 of the present specification, and for convenience of explanation, only the parts relevant to the present specification are shown, and specific technical details are not disclosed, and refer to the embodiments shown in fig. 1 to 11 of the present specification.

Referring to fig. 11, a schematic diagram of a task processing large language model training device according to the present specification is shown. The task processing large language model training device 1 may be realized as all or a part of the apparatus by software, hardware, or a combination of both. According to some embodiments, the task processing large language model training device 1 comprises a model creation module 11, a data acquisition module 12 and a model training module 13, in particular for:

A model creation module 11 for creating an initial task processing large language model for the target transaction based on the basic large language model;

A data acquisition module 12, configured to acquire a reference task processing example and a sample task for the target transaction, where a task type of the sample task includes an extended task type other than the reference task type;

The model training module 13 is configured to perform model training on the initial task processing large language model based on the reference task processing example and the sample task, so as to obtain a task processing large language model after model training.

Optionally, the model training module 13 is configured to:

Generating a prompt word structure and basic operation calling function tools based on preset codes, and generating a sample task code aiming at a reference task processing example and a sample task to generate a prompt word;

And generating a prompt word based on the sample task code, the reference task processing example and the basic operation calling function tool, performing at least one round of model training on an initial task processing big language model, determining a predicted sample RPA code obtained after the initial task processing big language model performs task execution code generation processing on the sample task, and performing model parameter adjustment on the initial task processing big language model based on the predicted sample RPA code until the initial task processing big language model finishes model training, so as to obtain a task processing big language model.

Optionally, the initial task processing large language model includes a presentation encoding module, a code generating module, a UI mapping module and a behavior clone fusion module, and the model training module 13 is configured to:

Generating a prompt word by the sample task code, the reference task processing example and the basic operation calling function tool to perform model training on an initial task processing large language model;

during model training:

performing task demonstration coding training processing based on the reference task processing example through the demonstration coding module to obtain a task demonstration operation structured sequence;

Performing task execution code generation training by the code generation module based on the task demonstration operation structured sequence by using the basic operation calling function tool to obtain reference task behavior RPA codes comprising all reference operation RPA code fragments;

Performing application UI element mapping training processing on the target transaction based on the reference operation RPA code segment by the UI mapping module to obtain target transaction UI operation mapping information;

Performing cloning behavior characterization on the reference task behavior RPA code through the behavior cloning and fusing module to obtain reference task behavior characterization, performing behavior knowledge fusion on the basis of all task behavior characterization to obtain a task interaction behavior knowledge model, performing task operation generation processing on the sample task through the task interaction behavior knowledge model to obtain a sample task execution step sequence, and performing task code generation processing on the basis of the sample task execution step sequence by using the basic operation calling function tool to obtain a predicted sample RPA code;

The behavior characterization comprises a reference task behavior characterization, or the behavior characterization comprises a sample task behavior characterization corresponding to the prediction sample RPA code.

Optionally, the model training module 13 is configured to:

Determining a reference task action sequence and a presentation action behavior characterization corresponding to a reference task action in the reference task processing example through the presentation coding module, wherein the presentation action behavior characterization comprises action type characterization, action interaction characterization and transaction interface UI metadata characterization;

And carrying out characteristic structural processing on the basis of the reference task action sequence and the demonstration action behavior characterization to obtain a task demonstration operation structural sequence comprising demonstration task action codes, wherein the coding tuples corresponding to the demonstration task action codes comprise action type codes, associated text codes, interactive element identification codes, visual feature codes and application display text list codes.

Optionally, the model training module 13 is configured to:

Performing task step execution code conversion by using the basic operation calling function tool based on a task demonstration operation structured sequence to obtain reference operation RPA code fragments corresponding to all reference operations, and generating reference execution step explanation for the reference operation RPA code fragments;

generating a reference task behavior RPA code based on all of the reference operation RPA code fragments and the reference execution step interpretations.

Optionally, the model training module 13 is configured to:

Acquiring a sample RPA code label aiming at a sample task;

And determining a target model loss based on the prediction sample RPA generation and the sample RPA code label, and adopting the target model loss to carry out model parameter adjustment on the initial task processing large language model.

Optionally, the model training module 13 is configured to:

Generating a prompt word structure and a basic operation calling function tool based on a preset code, and determining a role prompt word, a task processing skill prompt word, a constraint condition prompt word, a tool description prompt word and a reference task operation information prompt word;

generating sample task codes aiming at a reference task processing example and a sample task based on the role prompt word, the task processing skill prompt word, the constraint condition prompt word, the tool description prompt word and the reference task operation information prompt word to generate the prompt word.

It should be noted that, the task processing large language model training device provided in the foregoing embodiment may be the same device as the task autonomous processing device or may not be the same device, and when the task processing large language model training device executes the task processing large language model training method, only the division of the foregoing functional modules is used for illustrating, in practical application, the foregoing functional allocation may be completed by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the task processing large language model training device and the task processing large language model training method provided in the foregoing embodiments belong to the same concept, which embody detailed implementation procedures in the method embodiments, and are not described herein again.

Referring to fig. 12, a schematic diagram of the task independent processing device of the present specification is shown. The task autonomous processing device 2 may be implemented as all or part of a device by software, hardware or a combination of both. According to some embodiments, the task autonomous processing device 2 includes a task acquisition module 21, a code generation module 22, and a task processing module 23, specifically configured to:

A task obtaining module 21, configured to obtain a target task for a target transaction, which is input by a user to an agent service object;

A code generating module 22, configured to perform task execution code generation processing by using a task processing large language model based on the target task to obtain a target RPA code, where the task processing large language model is obtained by performing model training using a reference task processing example for the target transaction based on a basic large language model;

and the task processing module 23 is configured to control the agent service object to execute the target RPA code on the target transaction, so as to perform task autonomous processing on the target task.

Optionally, the code generating module 22 is configured to:

Generating a task code generation prompt word aiming at a target task;

Inputting the task code generation prompt word into a task processing large language model, performing task operation sequence generation processing on the target task through the task processing large language model to obtain a target task execution step sequence, and performing task code generation processing by using a basic operation calling function tool based on the target task execution step sequence to obtain a target RPA code;

and controlling the task processing large language model to output the target RPA code.

Optionally, the code generating module 22 is configured to:

Performing task operation generation processing on the target task to obtain a target task execution step sequence, and determining execution step interpretation information corresponding to the task execution step sequence;

Determining a target task operation sequence calling function corresponding to the target task execution step sequence through the basic operation calling function tool;

And calling a function to generate a task code based on the target task execution step sequence, the execution step interpretation information and the target task operation sequence to obtain a target RPA code.

The foregoing description is provided for the purpose of illustration only and does not represent the advantages or disadvantages of the embodiments.

The present disclosure further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded and executed by a processor, and the specific execution process may refer to the specific description of the embodiments shown in fig. 1 to 10, and the description is omitted herein.

The present disclosure further provides a computer program product, where at least one instruction is stored in the computer program product, where the at least one instruction is loaded by the processor and executed by the processor, where the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 10, and the details are not repeated herein.

Referring to fig. 13, a block diagram of an electronic device according to an embodiment of the present disclosure is provided. An electronic device in this specification may include one or more of processor 1010, memory 1020, input device 1030, output device 1040, and bus 1050. The processor 1010, the memory 1020, the input device 1030, and the output device 1040 may be connected by a bus 1050.

Processor 1010 may include one or more processing cores. The processor 1010 utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1020, and invoking data stored in the memory 1020. Alternatively, the processor 1010 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-programmable gate array (FPGA), programmable logic array (programmable logic Array, PLA). The processor 1010 may integrate one or a combination of several of a central processor (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing display contents, and the modem is used for processing wireless communication. It will be appreciated that the modem may not be integrated into the processor 1010 and may be implemented solely by a single communication chip.

The memory 1020 may include random access memory (random Access Memory, RAM) or read-only memory (ROM). Optionally, the memory 1020 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 1020 may be used to store instructions, programs, code, sets of codes, or instruction sets.

The input device 1030 is configured to receive input instructions or data, and the input device 1030 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. Output means 1040 is for outputting instructions or data, and output means 1040 includes, but is not limited to, display devices and speakers, etc. In the embodiment of the present disclosure, the input device 1030 may be a temperature sensor for acquiring an operating temperature of the electronic device. The output device 1040 may be a speaker for outputting audio signals.

In addition, those skilled in the art will appreciate that the configuration of the electronic device shown in the above-described figures does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine certain components, or may have a different arrangement of components. For example, the electronic device further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WIRELESS FIDELITY, WIFI) module, a power supply, and a bluetooth module, which are not described herein.

In the embodiment of the present specification, the execution subject of each step may be the electronic device described above. Optionally, the execution subject of each step is an operating system of the electronic device. The operating system may be an android system, an IOS system, or other operating systems, which embodiments of the present specification are not limited to.

In the electronic device of fig. 13, processor 1010 may be configured to invoke programs stored in memory 1020 and execute to implement the task autonomous processing method and/or the task processing large language model training method as described in various method embodiments of the present specification.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, reference task processing examples, sample tasks, and the like referred to in this specification are all acquired with sufficient authorization.

The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the claims, which follow the meaning of the claims of the present invention.